ModErn Text Analysis
META Enumerates Textual Applications
classifier.h
Go to the documentation of this file.
1 
9 #ifndef META_CLASSIFIER_H_
10 #define META_CLASSIFIER_H_
11 
12 #include <vector>
14 
15 namespace meta
16 {
17 namespace classify
18 {
19 
25 {
26  public:
30  classifier(std::shared_ptr<index::forward_index> idx);
31 
38  virtual class_label classify(doc_id d_id) = 0;
39 
44  virtual void train(const std::vector<doc_id>& docs) = 0;
45 
54  virtual confusion_matrix test(const std::vector<doc_id>& docs);
55 
66  virtual confusion_matrix
67  cross_validate(const std::vector<doc_id>& input_docs, size_t k,
68  bool even_split = false, int seed = 1);
69 
73  virtual void reset() = 0;
74 
75  protected:
77  std::shared_ptr<index::forward_index> idx_;
78 
79  private:
86  void create_even_split(std::vector<doc_id>& docs, int seed = 2) const;
87 };
88 }
89 }
90 #endif
virtual void train(const std::vector< doc_id > &docs)=0
Creates a classification model based on training documents.
virtual void reset()=0
Clears any learning data associated with this classifier.
void create_even_split(std::vector< doc_id > &docs, int seed=2) const
Modifies input_docs to be a vector of size <= the original vector size with an even distribution of c...
Definition: classifier.cpp:59
Allows interpretation of classification errors.
Definition: confusion_matrix.h:25
virtual class_label classify(doc_id d_id)=0
Classifies a document into a specific group, as determined by training data.
std::shared_ptr< index::forward_index > idx_
the index that the classifer is run on
Definition: classifier.h:77
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
A classifier uses a document's feature space to identify which group it belongs to.
Definition: classifier.h:24
virtual confusion_matrix test(const std::vector< doc_id > &docs)
Classifies a collection document into specific groups, as determined by training data; this function ...
Definition: classifier.cpp:22
classifier(std::shared_ptr< index::forward_index > idx)
Definition: classifier.cpp:16
virtual confusion_matrix cross_validate(const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1)
Performs k-fold cross-validation on a set of documents.
Definition: classifier.cpp:32