lda_cvb: An implementation of LDA that uses collapsed variational bayes for inference.
More...
#include <lda_cvb.h>
|
| lda_cvb (std::shared_ptr< index::forward_index > idx, uint64_t num_topics, double alpha, double beta) |
| Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively. More...
|
|
virtual | ~lda_cvb ()=default |
| Destructor: virtual for potential subclassing.
|
|
void | run (uint64_t num_iters, double convergence=1e-3) |
| Runs the variational inference algorithm for a maximum number of iterations, or until the given convergence criterion is met. More...
|
|
| lda_model (std::shared_ptr< index::forward_index > idx, uint64_t num_topics) |
| Constructs an lda_model over the given set of documents and with a fixed number of topics. More...
|
|
virtual | ~lda_model ()=default |
| Destructor. More...
|
|
void | save_doc_topic_distributions (const std::string &filename) const |
| Saves the topic proportions \(\theta_d\) for each document to the given file. More...
|
|
void | save_topic_term_distributions (const std::string &filename) const |
| Saves the term distributions \(\phi_j\) for each topic to the given file. More...
|
|
void | save (const std::string &prefix) const |
| Saves the current model to a set of files beginning with prefix: prefix.phi, prefix.theta, and prefix.terms. More...
|
|
|
std::vector< std::vector< stats::multinomial< topic_id > > > | gamma_ |
| Variational distributions \(\gamma_{ij}\), which represent the soft topic assignments for each word occurrence \(i\) in document \(j\). More...
|
|
std::vector< stats::multinomial< term_id > > | phi_ |
| The word distributions for each topic, \(\phi_t\).
|
|
std::vector< stats::multinomial< topic_id > > | theta_ |
| The topic distributions for each document, \(\theta_d\).
|
|
std::shared_ptr< index::forward_index > | idx_ |
| The index containing the documents for the model.
|
|
size_t | num_topics_ |
| The number of topics.
|
|
size_t | num_words_ |
| The number of total unique words.
|
|
lda_cvb: An implementation of LDA that uses collapsed variational bayes for inference.
Specifically, it uses the CVB0 algorithm detailed in Asuncion et. al.
- See also
- http://www.ics.uci.edu/~asuncion/pubs/UAI_09.pdf
meta::topics::lda_cvb::lda_cvb |
( |
std::shared_ptr< index::forward_index > |
idx, |
|
|
uint64_t |
num_topics, |
|
|
double |
alpha, |
|
|
double |
beta |
|
) |
| |
Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively.
- Parameters
-
idx | The index containing the documents to model |
num_topics | The number of topics to infer |
alpha | The hyperparameter for the Dirichlet prior over \(\phi\) |
beta | The hyperparameter for the Dirichlet prior over \(\theta\) |
void meta::topics::lda_cvb::run |
( |
uint64_t |
num_iters, |
|
|
double |
convergence = 1e-3 |
|
) |
| |
|
virtual |
Runs the variational inference algorithm for a maximum number of iterations, or until the given convergence criterion is met.
The convergence criterion is determined as the maximum difference in any of the variational parameters \(\gamma_{dij}\) in a given iteration.
- Parameters
-
num_iters | The maximum number of iterations to run the sampler for |
convergence | The lowest maximum difference in any \(\gamma_{dij}\) to be allowed before considering the inference to have converged |
Implements meta::topics::lda_model.
double meta::topics::lda_cvb::perform_iteration |
( |
uint64_t |
iter | ) |
|
|
protected |
Performs one iteration of the inference algorithm.
- Parameters
-
iter | The current iteration number |
- Returns
- the maximum change in any of the \(\gamma_{dij}\)s
double meta::topics::lda_cvb::compute_term_topic_probability |
( |
term_id |
term, |
|
|
topic_id |
topic |
|
) |
| const |
|
overrideprotectedvirtual |
- Returns
- the probability that the given term appears in the given topic
- Parameters
-
term | The term we are concerned with |
topic | The topic we are concerned with |
Implements meta::topics::lda_model.
double meta::topics::lda_cvb::compute_doc_topic_probability |
( |
doc_id |
doc, |
|
|
topic_id |
topic |
|
) |
| const |
|
overrideprotectedvirtual |
- Returns
- the probability that the given topic is picked for the given document
- Parameters
-
doc | The document we are concerned with |
topic | The topic we are concerned with |
Implements meta::topics::lda_model.
Variational distributions \(\gamma_{ij}\), which represent the soft topic assignments for each word occurrence \(i\) in document \(j\).
Indexed as gamma_[d][i]
The documentation for this class was generated from the following files: