ModErn Text Analysis
META Enumerates Textual Applications
|
An LDA topic model base class. More...
#include <lda_model.h>
Public Member Functions | |
lda_model (std::shared_ptr< index::forward_index > idx, uint64_t num_topics) | |
Constructs an lda_model over the given set of documents and with a fixed number of topics. More... | |
virtual | ~lda_model ()=default |
Destructor. More... | |
virtual void | run (uint64_t num_iters, double convergence)=0 |
Runs the model for a given number of iterations, or until a convergence criteria is met. More... | |
void | save_doc_topic_distributions (const std::string &filename) const |
Saves the topic proportions \(\theta_d\) for each document to the given file. More... | |
void | save_topic_term_distributions (const std::string &filename) const |
Saves the term distributions \(\phi_j\) for each topic to the given file. More... | |
void | save (const std::string &prefix) const |
Saves the current model to a set of files beginning with prefix: prefix.phi, prefix.theta, and prefix.terms. More... | |
Protected Member Functions | |
lda_model & | operator= (const lda_model &)=delete |
lda_models cannot be copy assigned. | |
lda_model (const lda_model &)=delete | |
lda_models cannot be copy constructed. | |
virtual double | compute_term_topic_probability (term_id term, topic_id topic) const =0 |
virtual double | compute_doc_topic_probability (doc_id doc, topic_id topic) const =0 |
Protected Attributes | |
std::shared_ptr< index::forward_index > | idx_ |
The index containing the documents for the model. | |
size_t | num_topics_ |
The number of topics. | |
size_t | num_words_ |
The number of total unique words. | |
An LDA topic model base class.
meta::topics::lda_model::lda_model | ( | std::shared_ptr< index::forward_index > | idx, |
uint64_t | num_topics | ||
) |
Constructs an lda_model over the given set of documents and with a fixed number of topics.
idx | The index containing the documents to use for the model |
num_topics | The number of topics to find |
|
virtualdefault |
Destructor.
Made virtual to allow for deletion through pointer to base.
|
pure virtual |
Runs the model for a given number of iterations, or until a convergence criteria is met.
num_iters | The maximum allowed number of iterations |
convergence | The convergence criteria (this has different meanings for different subclass models) |
Implemented in meta::topics::lda_cvb, meta::topics::lda_gibbs, and meta::topics::lda_scvb.
void meta::topics::lda_model::save_doc_topic_distributions | ( | const std::string & | filename | ) | const |
Saves the topic proportions \(\theta_d\) for each document to the given file.
Saves the distributions in a simple "human readable" plain-text format.
filename | The file to save \(\theta\) to |
void meta::topics::lda_model::save_topic_term_distributions | ( | const std::string & | filename | ) | const |
Saves the term distributions \(\phi_j\) for each topic to the given file.
Saves the distributions in a simple "human readable" plain-text format.
filename | The file to save \(\phi\) to |
void meta::topics::lda_model::save | ( | const std::string & | prefix | ) | const |
Saves the current model to a set of files beginning with prefix: prefix.phi, prefix.theta, and prefix.terms.
prefix | The prefix for all generated files over this model |
|
protectedpure virtual |
term | The term we are concerned with |
topic | The topic we are concerned with |
Implemented in meta::topics::lda_gibbs, meta::topics::lda_cvb, and meta::topics::lda_scvb.
|
protectedpure virtual |
doc | The document we are concerned with |
topic | The topic we are concerned with |
Implemented in meta::topics::lda_gibbs, meta::topics::lda_cvb, and meta::topics::lda_scvb.