ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Protected Member Functions | Private Member Functions | Private Attributes | List of all members
meta::topics::lda_scvb Class Reference

lda_scvb: An implementation of LDA that uses stochastic collapsed variational Bayes for inference. More...

#include <lda_scvb.h>

Inheritance diagram for meta::topics::lda_scvb:
meta::topics::lda_model

Public Member Functions

 lda_scvb (std::shared_ptr< index::forward_index > idx, uint64_t num_topics, double alpha, double beta, uint64_t minibatch_size=100)
 Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively. More...
 
virtual ~lda_scvb ()=default
 Destructor: virtual for potential subclassing.
 
virtual void run (uint64_t num_iters, double convergence=0) override
 Runs the variational inference algorithm for a maximum number of iterations. More...
 
- Public Member Functions inherited from meta::topics::lda_model
 lda_model (std::shared_ptr< index::forward_index > idx, uint64_t num_topics)
 Constructs an lda_model over the given set of documents and with a fixed number of topics. More...
 
virtual ~lda_model ()=default
 Destructor. More...
 
void save_doc_topic_distributions (const std::string &filename) const
 Saves the topic proportions \(\theta_d\) for each document to the given file. More...
 
void save_topic_term_distributions (const std::string &filename) const
 Saves the term distributions \(\phi_j\) for each topic to the given file. More...
 
void save (const std::string &prefix) const
 Saves the current model to a set of files beginning with prefix: prefix.phi, prefix.theta, and prefix.terms. More...
 

Protected Member Functions

virtual double compute_term_topic_probability (term_id term, topic_id topic) const override
 
virtual double compute_doc_topic_probability (doc_id doc, topic_id topic) const override
 
- Protected Member Functions inherited from meta::topics::lda_model
lda_modeloperator= (const lda_model &)=delete
 lda_models cannot be copy assigned.
 
 lda_model (const lda_model &)=delete
 lda_models cannot be copy constructed.
 

Private Member Functions

void initialize (std::mt19937 &gen)
 Initialize the model with random parameters. More...
 
void perform_iteration (uint64_t iter, const std::vector< doc_id > &docs)
 Performs one iteration (e.g., one minibatch) of the inference algorithm. More...
 

Private Attributes

std::vector< std::vector< double > > topic_term_count_
 Contains the expected counts for each word being assigned a given topic. More...
 
std::vector< std::vector< double > > doc_topic_count_
 Contains the expected counts for each topic being assigned in a given document. More...
 
std::vector< double > topic_count_
 Contains the expected number of times the given topic has been assigned to a word. More...
 
const double alpha_
 The hyperparameter on \(\theta\), the topic proportions.
 
const double beta_
 The hyperparameter on \(\phi\), the topic distributions.
 
const uint64_t minibatch_size_
 The size of the minibatches.
 

Additional Inherited Members

- Protected Attributes inherited from meta::topics::lda_model
std::shared_ptr< index::forward_indexidx_
 The index containing the documents for the model.
 
size_t num_topics_
 The number of topics.
 
size_t num_words_
 The number of total unique words.
 

Detailed Description

lda_scvb: An implementation of LDA that uses stochastic collapsed variational Bayes for inference.

Specifically, it uses the SCVB0 algorithm detailed in Foulds et. al.

See also
http://dl.acm.org/citation.cfm?id=2487575.2487697

Constructor & Destructor Documentation

meta::topics::lda_scvb::lda_scvb ( std::shared_ptr< index::forward_index idx,
uint64_t  num_topics,
double  alpha,
double  beta,
uint64_t  minibatch_size = 100 
)

Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively.

Adheres to a step-size schedule of \(\frac{s}{(\tau + t)^\kappa}\).

Parameters
idxThe index containing the documents to model
num_topicsThe number of topics to infer
alphaThe hyperparameter for the Dirichlet prior over \(\phi\)
betaThe hyperparameter for the Dirichlet prior over \(\theta\)
minibatch_sizeThe number of documents to consider in a minibatch

Member Function Documentation

void meta::topics::lda_scvb::run ( uint64_t  num_iters,
double  convergence = 0 
)
overridevirtual

Runs the variational inference algorithm for a maximum number of iterations.

TODO: Is there a convenient convergence criterion for SCVB0?

Parameters
num_itersThe maximum number of iterations (in terms of minibatches) to run the inference algorithm for
convergenceUnused

Implements meta::topics::lda_model.

double meta::topics::lda_scvb::compute_term_topic_probability ( term_id  term,
topic_id  topic 
) const
overrideprotectedvirtual
Returns
the probability that the given term appears in the given topic
Parameters
termThe term we are concerned with
topicThe topic we are concerned with

Implements meta::topics::lda_model.

double meta::topics::lda_scvb::compute_doc_topic_probability ( doc_id  doc,
topic_id  topic 
) const
overrideprotectedvirtual
Returns
the probability that the given topic is picked for the given document
Parameters
docThe document we are concerned with
topicThe topic we are concerned with

Implements meta::topics::lda_model.

void meta::topics::lda_scvb::initialize ( std::mt19937 &  gen)
private

Initialize the model with random parameters.

Parameters
genThe random number generator to use.
void meta::topics::lda_scvb::perform_iteration ( uint64_t  iter,
const std::vector< doc_id > &  docs 
)
private

Performs one iteration (e.g., one minibatch) of the inference algorithm.

Parameters
iterThe iteration number
docsContains the minibatch in indexes [0, minibatch_size_]

Member Data Documentation

std::vector<std::vector<double> > meta::topics::lda_scvb::topic_term_count_
private

Contains the expected counts for each word being assigned a given topic.

Indexed as topic_term_count_[k][w] where k is a topic_id and w is a term_id.

std::vector<std::vector<double> > meta::topics::lda_scvb::doc_topic_count_
private

Contains the expected counts for each topic being assigned in a given document.

Indexed as doc_topic_count_[d][k] where d is a doc_id and k is a topic_id.

std::vector<double> meta::topics::lda_scvb::topic_count_
private

Contains the expected number of times the given topic has been assigned to a word.

Can be inferred from the above maps, but is included here for performance reasons.


The documentation for this class was generated from the following files: