ModErn Text Analysis
META Enumerates Textual Applications
|
Implements the Naive Bayes classifier, a simplistic probabilistic classifier that uses Bayes' theorem with strong feature independence assumptions. More...
#include <naive_bayes.h>
Public Member Functions | |
naive_bayes (std::shared_ptr< index::forward_index > idx, double alpha=default_alpha, double beta=default_beta) | |
Constructor: learns class models based on a collection of training documents. More... | |
void | train (const std::vector< doc_id > &docs) override |
Creates a classification model based on training documents. More... | |
class_label | classify (doc_id d_id) override |
Classifies a document into a specific group, as determined by training data. More... | |
void | reset () override |
Resets any learning information associated with this classifier. | |
Public Member Functions inherited from meta::classify::classifier | |
classifier (std::shared_ptr< index::forward_index > idx) | |
virtual confusion_matrix | test (const std::vector< doc_id > &docs) |
Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More... | |
virtual confusion_matrix | cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1) |
Performs k-fold cross-validation on a set of documents. More... | |
Static Public Attributes | |
static const constexpr double | default_alpha = 0.1 |
The default \(\alpha\) parameter. | |
static const constexpr double | default_beta = 0.1 |
The default \(beta\) parameter. | |
static const std::string | id = "naive-bayes" |
The identifier for this classifier. | |
Private Attributes | |
util::sparse_vector< class_label, stats::multinomial< term_id > > | term_probs_ |
Contains P(term|class) for each class. | |
stats::multinomial< class_label > | class_probs_ |
Contains the number of documents in each class. | |
Additional Inherited Members | |
Protected Attributes inherited from meta::classify::classifier | |
std::shared_ptr< index::forward_index > | idx_ |
the index that the classifer is run on | |
Implements the Naive Bayes classifier, a simplistic probabilistic classifier that uses Bayes' theorem with strong feature independence assumptions.
meta::classify::naive_bayes::naive_bayes | ( | std::shared_ptr< index::forward_index > | idx, |
double | alpha = default_alpha , |
||
double | beta = default_beta |
||
) |
Constructor: learns class models based on a collection of training documents.
idx | The index to run the classifier on |
alpha | Optional smoothing parameter for term frequencies |
beta | Optional smoothing parameter for class frequencies |
|
overridevirtual |
Creates a classification model based on training documents.
Calculates \(P(term|class)\) and \(P(class)\) for all the training documents.
docs | The training documents |
Implements meta::classify::classifier.
|
overridevirtual |
Classifies a document into a specific group, as determined by training data.
d_id | The document to classify |
Implements meta::classify::classifier.