ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
meta::topics::lda_cvb Class Reference

lda_cvb: An implementation of LDA that uses collapsed variational bayes for inference. More...

#include <lda_cvb.h>

Inheritance diagram for meta::topics::lda_cvb:
meta::topics::lda_model

Public Member Functions

 lda_cvb (std::shared_ptr< index::forward_index > idx, uint64_t num_topics, double alpha, double beta)
 Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively. More...
 
virtual ~lda_cvb ()=default
 Destructor: virtual for potential subclassing.
 
void run (uint64_t num_iters, double convergence=1e-3)
 Runs the variational inference algorithm for a maximum number of iterations, or until the given convergence criterion is met. More...
 
- Public Member Functions inherited from meta::topics::lda_model
 lda_model (std::shared_ptr< index::forward_index > idx, uint64_t num_topics)
 Constructs an lda_model over the given set of documents and with a fixed number of topics. More...
 
virtual ~lda_model ()=default
 Destructor. More...
 
void save_doc_topic_distributions (const std::string &filename) const
 Saves the topic proportions \(\theta_d\) for each document to the given file. More...
 
void save_topic_term_distributions (const std::string &filename) const
 Saves the term distributions \(\phi_j\) for each topic to the given file. More...
 
void save (const std::string &prefix) const
 Saves the current model to a set of files beginning with prefix: prefix.phi, prefix.theta, and prefix.terms. More...
 

Protected Member Functions

void initialize ()
 Initializes the parameters randomly.
 
double perform_iteration (uint64_t iter)
 Performs one iteration of the inference algorithm. More...
 
virtual double compute_term_topic_probability (term_id term, topic_id topic) const override
 
virtual double compute_doc_topic_probability (doc_id doc, topic_id topic) const override
 
- Protected Member Functions inherited from meta::topics::lda_model
lda_modeloperator= (const lda_model &)=delete
 lda_models cannot be copy assigned.
 
 lda_model (const lda_model &)=delete
 lda_models cannot be copy constructed.
 

Protected Attributes

std::vector< std::vector< stats::multinomial< topic_id > > > gamma_
 Variational distributions \(\gamma_{ij}\), which represent the soft topic assignments for each word occurrence \(i\) in document \(j\). More...
 
std::vector< stats::multinomial< term_id > > phi_
 The word distributions for each topic, \(\phi_t\).
 
std::vector< stats::multinomial< topic_id > > theta_
 The topic distributions for each document, \(\theta_d\).
 
- Protected Attributes inherited from meta::topics::lda_model
std::shared_ptr< index::forward_indexidx_
 The index containing the documents for the model.
 
size_t num_topics_
 The number of topics.
 
size_t num_words_
 The number of total unique words.
 

Detailed Description

lda_cvb: An implementation of LDA that uses collapsed variational bayes for inference.

Specifically, it uses the CVB0 algorithm detailed in Asuncion et. al.

See also
http://www.ics.uci.edu/~asuncion/pubs/UAI_09.pdf

Constructor & Destructor Documentation

meta::topics::lda_cvb::lda_cvb ( std::shared_ptr< index::forward_index idx,
uint64_t  num_topics,
double  alpha,
double  beta 
)

Constructs the lda model over the given documents, with the given number of topics, and hyperparameters \(\alpha\) and \(\beta\) for the priors on \(\phi\) (topic distributions) and \(\theta\) (topic proportions), respectively.

Parameters
idxThe index containing the documents to model
num_topicsThe number of topics to infer
alphaThe hyperparameter for the Dirichlet prior over \(\phi\)
betaThe hyperparameter for the Dirichlet prior over \(\theta\)

Member Function Documentation

void meta::topics::lda_cvb::run ( uint64_t  num_iters,
double  convergence = 1e-3 
)
virtual

Runs the variational inference algorithm for a maximum number of iterations, or until the given convergence criterion is met.

The convergence criterion is determined as the maximum difference in any of the variational parameters \(\gamma_{dij}\) in a given iteration.

Parameters
num_itersThe maximum number of iterations to run the sampler for
convergenceThe lowest maximum difference in any \(\gamma_{dij}\) to be allowed before considering the inference to have converged

Implements meta::topics::lda_model.

double meta::topics::lda_cvb::perform_iteration ( uint64_t  iter)
protected

Performs one iteration of the inference algorithm.

Parameters
iterThe current iteration number
Returns
the maximum change in any of the \(\gamma_{dij}\)s
double meta::topics::lda_cvb::compute_term_topic_probability ( term_id  term,
topic_id  topic 
) const
overrideprotectedvirtual
Returns
the probability that the given term appears in the given topic
Parameters
termThe term we are concerned with
topicThe topic we are concerned with

Implements meta::topics::lda_model.

double meta::topics::lda_cvb::compute_doc_topic_probability ( doc_id  doc,
topic_id  topic 
) const
overrideprotectedvirtual
Returns
the probability that the given topic is picked for the given document
Parameters
docThe document we are concerned with
topicThe topic we are concerned with

Implements meta::topics::lda_model.

Member Data Documentation

std::vector<std::vector<stats::multinomial<topic_id> > > meta::topics::lda_cvb::gamma_
protected

Variational distributions \(\gamma_{ij}\), which represent the soft topic assignments for each word occurrence \(i\) in document \(j\).

Indexed as gamma_[d][i]


The documentation for this class was generated from the following files: