ModErn Text Analysis
META Enumerates Textual Applications
|
Implements the nearest centroid classification algorithm. More...
#include <nearest_centroid.h>
Classes | |
class | nearest_centroid_exception |
Basic exception for nearest_centroid interactions. More... | |
Public Member Functions | |
nearest_centroid (std::shared_ptr< index::inverted_index > idx, std::shared_ptr< index::forward_index > f_idx) | |
void | train (const std::vector< doc_id > &docs) override |
Creates a classification model based on training documents. More... | |
class_label | classify (doc_id d_id) override |
Classifies a document into a specific group, as determined by training data. More... | |
void | reset () override |
Resets any learning information associated with this classifier. | |
Public Member Functions inherited from meta::classify::classifier | |
classifier (std::shared_ptr< index::forward_index > idx) | |
virtual confusion_matrix | test (const std::vector< doc_id > &docs) |
Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More... | |
virtual confusion_matrix | cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1) |
Performs k-fold cross-validation on a set of documents. More... | |
Static Public Attributes | |
static const std::string | id = "nearest-centroid" |
Identifier for this classifier. | |
Private Member Functions | |
double | cosine_sim (const std::vector< std::pair< term_id, double >> &doc, const std::unordered_map< term_id, double > ¢roid) |
Private Attributes | |
std::shared_ptr< index::inverted_index > | inv_idx_ |
Inverted index used for ranking. | |
std::unordered_map< class_label, std::unordered_map< term_id, double > > | centroids_ |
The document centroids for this learner. | |
Additional Inherited Members | |
Protected Attributes inherited from meta::classify::classifier | |
std::shared_ptr< index::forward_index > | idx_ |
the index that the classifer is run on | |
Implements the nearest centroid classification algorithm.
nearest_centroid creates a prototype document for each distinct class as an average of all documents in that class. This is called the centroid. A query (testing document) is then compared against each centroid. The class label of the centroid they query is closest to is returned.
meta::classify::nearest_centroid::nearest_centroid | ( | std::shared_ptr< index::inverted_index > | idx, |
std::shared_ptr< index::forward_index > | f_idx | ||
) |
idx | The index to run the classifier on |
|
overridevirtual |
Creates a classification model based on training documents.
docs | The training documents |
Implements meta::classify::classifier.
|
overridevirtual |
Classifies a document into a specific group, as determined by training data.
d_id | The document to classify |
Implements meta::classify::classifier.
|
private |
d_id | |
centroid |