ModErn Text Analysis
META Enumerates Textual Applications
naive_bayes.h
Go to the documentation of this file.
1 
9 #ifndef META_NAIVE_BAYES_H_
10 #define META_NAIVE_BAYES_H_
11 
12 #include <unordered_map>
13 #include "index/forward_index.h"
16 #include "meta.h"
17 #include "stats/multinomial.h"
18 #include "util/sparse_vector.h"
19 
20 namespace meta
21 {
22 namespace classify
23 {
24 
29 class naive_bayes : public classifier
30 {
31  public:
33  const static constexpr double default_alpha = 0.1;
35  const static constexpr double default_beta = 0.1;
36 
44  naive_bayes(std::shared_ptr<index::forward_index> idx,
45  double alpha = default_alpha, double beta = default_beta);
46 
53  void train(const std::vector<doc_id>& docs) override;
54 
61  class_label classify(doc_id d_id) override;
62 
66  void reset() override;
67 
71  const static std::string id;
72 
73  private:
78 
83 };
84 
89 template <>
90 std::unique_ptr<classifier>
91  make_classifier<naive_bayes>(const cpptoml::table& config,
92  std::shared_ptr<index::forward_index> idx);
93 
94 }
95 }
96 #endif
Contains top-level namespace documentation for the META toolkit.
naive_bayes(std::shared_ptr< index::forward_index > idx, double alpha=default_alpha, double beta=default_beta)
Constructor: learns class models based on a collection of training documents.
Definition: naive_bayes.cpp:19
static const std::string id
The identifier for this classifier.
Definition: naive_bayes.h:71
std::unique_ptr< classifier > make_classifier< naive_bayes >(const cpptoml::table &config, std::shared_ptr< index::forward_index > idx)
Specialization of the factory method used for creating naive bayes classifiers.
Definition: naive_bayes.cpp:87
static const constexpr double default_alpha
The default parameter.
Definition: naive_bayes.h:33
Represents a sparse vector, indexed by type Index and storing values of type Value.
Definition: sparse_vector.h:28
util::sparse_vector< class_label, stats::multinomial< term_id > > term_probs_
Contains P(term|class) for each class.
Definition: naive_bayes.h:77
static const constexpr double default_beta
The default parameter.
Definition: naive_bayes.h:35
void train(const std::vector< doc_id > &docs) override
Creates a classification model based on training documents.
Definition: naive_bayes.cpp:39
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
A classifier uses a document's feature space to identify which group it belongs to.
Definition: classifier.h:24
Implements the Naive Bayes classifier, a simplistic probabilistic classifier that uses Bayes' theorem...
Definition: naive_bayes.h:29
class_label classify(doc_id d_id) override
Classifies a document into a specific group, as determined by training data.
Definition: naive_bayes.cpp:54
stats::multinomial< class_label > class_probs_
Contains the number of documents in each class.
Definition: naive_bayes.h:82
void reset() override
Resets any learning information associated with this classifier.
Definition: naive_bayes.cpp:32