ModErn Text Analysis
META Enumerates Textual Applications
parallel_lda_gibbs.h
Go to the documentation of this file.
1 
10 #ifndef META_PARALLEL_LDA_GIBBS_H_
11 #define META_PARALLEL_LDA_GIBBS_H_
12 
13 #include <thread>
14 
15 #include "parallel/thread_pool.h"
16 #include "topics/lda_gibbs.h"
17 
18 namespace meta
19 {
20 namespace topics
21 {
22 
30 {
31  public:
34 
38  virtual ~parallel_lda_gibbs() = default;
39 
40  protected:
41  virtual void initialize() override;
42 
55  virtual void perform_iteration(uint64_t iter, bool init = false) override;
56 
57  virtual void decrease_counts(topic_id topic, term_id term,
58  doc_id doc) override;
59 
60  virtual void increase_counts(topic_id topic, term_id term,
61  doc_id doc) override;
62 
63  virtual double compute_sampling_weight(term_id term, doc_id doc,
64  topic_id topic) const override;
65 
70 
77  std::unordered_map<std::thread::id,
78  std::vector<stats::multinomial<term_id>>> phi_diffs_;
79 };
80 }
81 }
82 
83 #endif
virtual void increase_counts(topic_id topic, term_id term, doc_id doc) override
Increases all counts associated with the given topic, term, and document by one.
Definition: parallel_lda_gibbs.cpp:91
A LDA topic model implemented using a collapsed gibbs sampler.
Definition: lda_gibbs.h:29
parallel::thread_pool pool_
The thread pool used for parallelization.
Definition: parallel_lda_gibbs.h:69
std::unordered_map< std::thread::id, std::vector< stats::multinomial< term_id > > > phi_diffs_
Stores the difference in topic_term counts on a per-thread basis for use in the reduction step...
Definition: parallel_lda_gibbs.h:78
lda_gibbs(std::shared_ptr< index::forward_index > idx, uint64_t num_topics, double alpha, double beta)
Constructs the lda model over the given documents, with the given number of topics, and hyperparameters and for the priors on (topic distributions) and (topic proportions), respectively.
Definition: lda_gibbs.cpp:18
virtual void decrease_counts(topic_id topic, term_id term, doc_id doc) override
Decreases all counts associated with the given topic, term, and document by one.
Definition: parallel_lda_gibbs.cpp:83
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
virtual void initialize() override
Initializes the first set of topic assignments for inference.
Definition: parallel_lda_gibbs.cpp:16
Represents a collection of a fixed number of threads, which tasks can be added to.
Definition: thread_pool.h:33
virtual void perform_iteration(uint64_t iter, bool init=false) override
Performs a sampling iteration of the AD-LDA algorithm.
Definition: parallel_lda_gibbs.cpp:23
An LDA topic model implemented using the Approximate Distributed LDA algorithm.
Definition: parallel_lda_gibbs.h:29
virtual double compute_sampling_weight(term_id term, doc_id doc, topic_id topic) const override
Computes a weight proportional to .
Definition: parallel_lda_gibbs.cpp:99
virtual ~parallel_lda_gibbs()=default
Destructor: virtual for potential subclassing.