ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Protected Attributes | Private Member Functions | List of all members
meta::classify::classifier Class Referenceabstract

A classifier uses a document's feature space to identify which group it belongs to. More...

#include <classifier.h>

Inheritance diagram for meta::classify::classifier:
meta::classify::binary_classifier meta::classify::dual_perceptron meta::classify::knn meta::classify::logistic_regression meta::classify::naive_bayes meta::classify::nearest_centroid meta::classify::one_vs_all meta::classify::one_vs_one meta::classify::svm_wrapper meta::classify::winnow

Public Member Functions

 classifier (std::shared_ptr< index::forward_index > idx)
 
virtual class_label classify (doc_id d_id)=0
 Classifies a document into a specific group, as determined by training data. More...
 
virtual void train (const std::vector< doc_id > &docs)=0
 Creates a classification model based on training documents. More...
 
virtual confusion_matrix test (const std::vector< doc_id > &docs)
 Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More...
 
virtual confusion_matrix cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1)
 Performs k-fold cross-validation on a set of documents. More...
 
virtual void reset ()=0
 Clears any learning data associated with this classifier.
 

Protected Attributes

std::shared_ptr< index::forward_indexidx_
 the index that the classifer is run on
 

Private Member Functions

void create_even_split (std::vector< doc_id > &docs, int seed=2) const
 Modifies input_docs to be a vector of size <= the original vector size with an even distribution of class labels per document. More...
 

Detailed Description

A classifier uses a document's feature space to identify which group it belongs to.

Constructor & Destructor Documentation

meta::classify::classifier::classifier ( std::shared_ptr< index::forward_index idx)
Parameters
idxThe index to run the classifier on

Member Function Documentation

virtual class_label meta::classify::classifier::classify ( doc_id  d_id)
pure virtual

Classifies a document into a specific group, as determined by training data.

Parameters
d_idThe document to classify
Returns
the class it belongs to

Implemented in meta::classify::dual_perceptron, meta::classify::logistic_regression, meta::classify::winnow, meta::classify::svm_wrapper, meta::classify::naive_bayes, meta::classify::knn, meta::classify::one_vs_one, meta::classify::nearest_centroid, meta::classify::one_vs_all, and meta::classify::binary_classifier.

virtual void meta::classify::classifier::train ( const std::vector< doc_id > &  docs)
pure virtual
confusion_matrix meta::classify::classifier::test ( const std::vector< doc_id > &  docs)
virtual

Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify().

Parameters
docsThe documents to classify
Returns
a confusion_matrix detailing the performance of the classifier

Reimplemented in meta::classify::svm_wrapper.

confusion_matrix meta::classify::classifier::cross_validate ( const std::vector< doc_id > &  input_docs,
size_t  k,
bool  even_split = false,
int  seed = 1 
)
virtual

Performs k-fold cross-validation on a set of documents.

When using this function, it is not necessary to call train() or test() first.

Parameters
input_docsTesting documents
kThe number of folds
even_splitWhether to evenly split the data by class for a fair baseline
seedThe seed for the RNG used to shuffle the documents
Returns
a confusion_matrix containing the results over all the folds
void meta::classify::classifier::create_even_split ( std::vector< doc_id > &  docs,
int  seed = 2 
) const
private

Modifies input_docs to be a vector of size <= the original vector size with an even distribution of class labels per document.

Parameters
input_docs
seedThe seed for the RNG used to shuffle the documents

The documentation for this class was generated from the following files: