ModErn Text Analysis
META Enumerates Textual Applications
lda_cvb.h
Go to the documentation of this file.
1 
10 #ifndef META_TOPICS_LDA_CVB_H_
11 #define META_TOPICS_LDA_CVB_H_
12 
13 #include "stats/multinomial.h"
14 #include "topics/lda_model.h"
15 
16 namespace meta
17 {
18 namespace topics
19 {
20 
28 class lda_cvb : public lda_model
29 {
30  public:
44  lda_cvb(std::shared_ptr<index::forward_index> idx, uint64_t num_topics,
45  double alpha, double beta);
46 
50  virtual ~lda_cvb() = default;
51 
65  void run(uint64_t num_iters, double convergence = 1e-3);
66 
67  protected:
71  void initialize();
72 
79  double perform_iteration(uint64_t iter);
80 
81  virtual double
82  compute_term_topic_probability(term_id term,
83  topic_id topic) const override;
84 
85  virtual double compute_doc_topic_probability(doc_id doc,
86  topic_id topic) const override;
87 
95  std::vector<std::vector<stats::multinomial<topic_id>>> gamma_;
96 
100  std::vector<stats::multinomial<term_id>> phi_;
101 
105  std::vector<stats::multinomial<topic_id>> theta_;
106 };
107 }
108 }
109 #endif
virtual double compute_term_topic_probability(term_id term, topic_id topic) const override
Definition: lda_cvb.cpp:152
An LDA topic model base class.
Definition: lda_model.h:25
virtual double compute_doc_topic_probability(doc_id doc, topic_id topic) const override
Definition: lda_cvb.cpp:158
void run(uint64_t num_iters, double convergence=1e-3)
Runs the variational inference algorithm for a maximum number of iterations, or until the given conve...
Definition: lda_cvb.cpp:39
void initialize()
Initializes the parameters randomly.
Definition: lda_cvb.cpp:61
std::vector< std::vector< stats::multinomial< topic_id > > > gamma_
Variational distributions , which represent the soft topic assignments for each word occurrence in d...
Definition: lda_cvb.h:95
double perform_iteration(uint64_t iter)
Performs one iteration of the inference algorithm.
Definition: lda_cvb.cpp:97
lda_cvb: An implementation of LDA that uses collapsed variational bayes for inference.
Definition: lda_cvb.h:28
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
std::vector< stats::multinomial< topic_id > > theta_
The topic distributions for each document, .
Definition: lda_cvb.h:105
std::vector< stats::multinomial< term_id > > phi_
The word distributions for each topic, .
Definition: lda_cvb.h:100
lda_cvb(std::shared_ptr< index::forward_index > idx, uint64_t num_topics, double alpha, double beta)
Constructs the lda model over the given documents, with the given number of topics, and hyperparameters and for the priors on (topic distributions) and (topic proportions), respectively.
Definition: lda_cvb.cpp:16
virtual ~lda_cvb()=default
Destructor: virtual for potential subclassing.