|
ModErn Text Analysis
META Enumerates Textual Applications
|
Implements the nearest centroid classification algorithm. More...
#include <nearest_centroid.h>
Classes | |
| class | nearest_centroid_exception |
| Basic exception for nearest_centroid interactions. More... | |
Public Member Functions | |
| nearest_centroid (std::shared_ptr< index::inverted_index > idx, std::shared_ptr< index::forward_index > f_idx) | |
| void | train (const std::vector< doc_id > &docs) override |
| Creates a classification model based on training documents. More... | |
| class_label | classify (doc_id d_id) override |
| Classifies a document into a specific group, as determined by training data. More... | |
| void | reset () override |
| Resets any learning information associated with this classifier. | |
Public Member Functions inherited from meta::classify::classifier | |
| classifier (std::shared_ptr< index::forward_index > idx) | |
| virtual confusion_matrix | test (const std::vector< doc_id > &docs) |
| Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More... | |
| virtual confusion_matrix | cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1) |
| Performs k-fold cross-validation on a set of documents. More... | |
Static Public Attributes | |
| static const std::string | id = "nearest-centroid" |
| Identifier for this classifier. | |
Private Member Functions | |
| double | cosine_sim (const std::vector< std::pair< term_id, double >> &doc, const std::unordered_map< term_id, double > ¢roid) |
Private Attributes | |
| std::shared_ptr< index::inverted_index > | inv_idx_ |
| Inverted index used for ranking. | |
| std::unordered_map< class_label, std::unordered_map< term_id, double > > | centroids_ |
| The document centroids for this learner. | |
Additional Inherited Members | |
Protected Attributes inherited from meta::classify::classifier | |
| std::shared_ptr< index::forward_index > | idx_ |
| the index that the classifer is run on | |
Implements the nearest centroid classification algorithm.
nearest_centroid creates a prototype document for each distinct class as an average of all documents in that class. This is called the centroid. A query (testing document) is then compared against each centroid. The class label of the centroid they query is closest to is returned.
| meta::classify::nearest_centroid::nearest_centroid | ( | std::shared_ptr< index::inverted_index > | idx, |
| std::shared_ptr< index::forward_index > | f_idx | ||
| ) |
| idx | The index to run the classifier on |
|
overridevirtual |
Creates a classification model based on training documents.
| docs | The training documents |
Implements meta::classify::classifier.
|
overridevirtual |
Classifies a document into a specific group, as determined by training data.
| d_id | The document to classify |
Implements meta::classify::classifier.
|
private |
| d_id | |
| centroid |
1.8.9.1