ModErn Text Analysis
META Enumerates Textual Applications
|
A classifier uses a document's feature space to identify which group it belongs to. More...
#include <classifier.h>
Public Member Functions | |
classifier (std::shared_ptr< index::forward_index > idx) | |
virtual class_label | classify (doc_id d_id)=0 |
Classifies a document into a specific group, as determined by training data. More... | |
virtual void | train (const std::vector< doc_id > &docs)=0 |
Creates a classification model based on training documents. More... | |
virtual confusion_matrix | test (const std::vector< doc_id > &docs) |
Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More... | |
virtual confusion_matrix | cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1) |
Performs k-fold cross-validation on a set of documents. More... | |
virtual void | reset ()=0 |
Clears any learning data associated with this classifier. | |
Protected Attributes | |
std::shared_ptr< index::forward_index > | idx_ |
the index that the classifer is run on | |
Private Member Functions | |
void | create_even_split (std::vector< doc_id > &docs, int seed=2) const |
Modifies input_docs to be a vector of size <= the original vector size with an even distribution of class labels per document. More... | |
A classifier uses a document's feature space to identify which group it belongs to.
meta::classify::classifier::classifier | ( | std::shared_ptr< index::forward_index > | idx | ) |
idx | The index to run the classifier on |
|
pure virtual |
Classifies a document into a specific group, as determined by training data.
d_id | The document to classify |
Implemented in meta::classify::dual_perceptron, meta::classify::logistic_regression, meta::classify::winnow, meta::classify::svm_wrapper, meta::classify::naive_bayes, meta::classify::knn, meta::classify::one_vs_one, meta::classify::nearest_centroid, meta::classify::one_vs_all, and meta::classify::binary_classifier.
|
pure virtual |
Creates a classification model based on training documents.
docs | The training documents |
Implemented in meta::classify::dual_perceptron, meta::classify::logistic_regression, meta::classify::svm_wrapper, meta::classify::sgd, meta::classify::winnow, meta::classify::one_vs_one, meta::classify::naive_bayes, meta::classify::knn, meta::classify::nearest_centroid, and meta::classify::one_vs_all.
|
virtual |
Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify().
docs | The documents to classify |
Reimplemented in meta::classify::svm_wrapper.
|
virtual |
Performs k-fold cross-validation on a set of documents.
When using this function, it is not necessary to call train() or test() first.
input_docs | Testing documents |
k | The number of folds |
even_split | Whether to evenly split the data by class for a fair baseline |
seed | The seed for the RNG used to shuffle the documents |
|
private |
Modifies input_docs to be a vector of size <= the original vector size with an even distribution of class labels per document.
input_docs | |
seed | The seed for the RNG used to shuffle the documents |