lda_scvb: An implementation of LDA that uses stochastic collapsed variational Bayes for inference.
More...
#include <lda_scvb.h>
|
| lda_scvb (std::shared_ptr< index::forward_index > idx, uint64_t num_topics, double alpha, double beta, uint64_t minibatch_size=100) |
| Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively. More...
|
|
virtual | ~lda_scvb ()=default |
| Destructor: virtual for potential subclassing.
|
|
virtual void | run (uint64_t num_iters, double convergence=0) override |
| Runs the variational inference algorithm for a maximum number of iterations. More...
|
|
| lda_model (std::shared_ptr< index::forward_index > idx, uint64_t num_topics) |
| Constructs an lda_model over the given set of documents and with a fixed number of topics. More...
|
|
virtual | ~lda_model ()=default |
| Destructor. More...
|
|
void | save_doc_topic_distributions (const std::string &filename) const |
| Saves the topic proportions \(\theta_d\) for each document to the given file. More...
|
|
void | save_topic_term_distributions (const std::string &filename) const |
| Saves the term distributions \(\phi_j\) for each topic to the given file. More...
|
|
void | save (const std::string &prefix) const |
| Saves the current model to a set of files beginning with prefix: prefix.phi, prefix.theta, and prefix.terms. More...
|
|
|
void | initialize (std::mt19937 &gen) |
| Initialize the model with random parameters. More...
|
|
void | perform_iteration (uint64_t iter, const std::vector< doc_id > &docs) |
| Performs one iteration (e.g., one minibatch) of the inference algorithm. More...
|
|
|
std::vector< std::vector< double > > | topic_term_count_ |
| Contains the expected counts for each word being assigned a given topic. More...
|
|
std::vector< std::vector< double > > | doc_topic_count_ |
| Contains the expected counts for each topic being assigned in a given document. More...
|
|
std::vector< double > | topic_count_ |
| Contains the expected number of times the given topic has been assigned to a word. More...
|
|
const double | alpha_ |
| The hyperparameter on \(\theta\), the topic proportions.
|
|
const double | beta_ |
| The hyperparameter on \(\phi\), the topic distributions.
|
|
const uint64_t | minibatch_size_ |
| The size of the minibatches.
|
|
lda_scvb: An implementation of LDA that uses stochastic collapsed variational Bayes for inference.
Specifically, it uses the SCVB0 algorithm detailed in Foulds et. al.
- See also
- http://dl.acm.org/citation.cfm?id=2487575.2487697
meta::topics::lda_scvb::lda_scvb |
( |
std::shared_ptr< index::forward_index > |
idx, |
|
|
uint64_t |
num_topics, |
|
|
double |
alpha, |
|
|
double |
beta, |
|
|
uint64_t |
minibatch_size = 100 |
|
) |
| |
Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively.
Adheres to a step-size schedule of \(\frac{s}{(\tau + t)^\kappa}\).
- Parameters
-
idx | The index containing the documents to model |
num_topics | The number of topics to infer |
alpha | The hyperparameter for the Dirichlet prior over \(\phi\) |
beta | The hyperparameter for the Dirichlet prior over \(\theta\) |
minibatch_size | The number of documents to consider in a minibatch |
void meta::topics::lda_scvb::run |
( |
uint64_t |
num_iters, |
|
|
double |
convergence = 0 |
|
) |
| |
|
overridevirtual |
Runs the variational inference algorithm for a maximum number of iterations.
TODO: Is there a convenient convergence criterion for SCVB0?
- Parameters
-
num_iters | The maximum number of iterations (in terms of minibatches) to run the inference algorithm for |
convergence | Unused |
Implements meta::topics::lda_model.
double meta::topics::lda_scvb::compute_term_topic_probability |
( |
term_id |
term, |
|
|
topic_id |
topic |
|
) |
| const |
|
overrideprotectedvirtual |
- Returns
- the probability that the given term appears in the given topic
- Parameters
-
term | The term we are concerned with |
topic | The topic we are concerned with |
Implements meta::topics::lda_model.
double meta::topics::lda_scvb::compute_doc_topic_probability |
( |
doc_id |
doc, |
|
|
topic_id |
topic |
|
) |
| const |
|
overrideprotectedvirtual |
- Returns
- the probability that the given topic is picked for the given document
- Parameters
-
doc | The document we are concerned with |
topic | The topic we are concerned with |
Implements meta::topics::lda_model.
void meta::topics::lda_scvb::initialize |
( |
std::mt19937 & |
gen | ) |
|
|
private |
Initialize the model with random parameters.
- Parameters
-
gen | The random number generator to use. |
void meta::topics::lda_scvb::perform_iteration |
( |
uint64_t |
iter, |
|
|
const std::vector< doc_id > & |
docs |
|
) |
| |
|
private |
Performs one iteration (e.g., one minibatch) of the inference algorithm.
- Parameters
-
iter | The iteration number |
docs | Contains the minibatch in indexes [0, minibatch_size_] |
std::vector<std::vector<double> > meta::topics::lda_scvb::topic_term_count_ |
|
private |
Contains the expected counts for each word being assigned a given topic.
Indexed as topic_term_count_[k][w]
where k
is a topic_id
and w
is a term_id
.
std::vector<std::vector<double> > meta::topics::lda_scvb::doc_topic_count_ |
|
private |
Contains the expected counts for each topic being assigned in a given document.
Indexed as doc_topic_count_[d][k]
where d
is a doc_id
and k
is a topic_id
.
std::vector<double> meta::topics::lda_scvb::topic_count_ |
|
private |
Contains the expected number of times the given topic has been assigned to a word.
Can be inferred from the above maps, but is included here for performance reasons.
The documentation for this class was generated from the following files: