ModErn Text Analysis
META Enumerates Textual Applications
Classes | Public Member Functions | Static Public Attributes | Private Member Functions | Private Attributes | List of all members
meta::classify::nearest_centroid Class Reference

Implements the nearest centroid classification algorithm. More...

#include <nearest_centroid.h>

Inheritance diagram for meta::classify::nearest_centroid:
meta::classify::classifier

Classes

class  nearest_centroid_exception
 Basic exception for nearest_centroid interactions. More...
 

Public Member Functions

 nearest_centroid (std::shared_ptr< index::inverted_index > idx, std::shared_ptr< index::forward_index > f_idx)
 
void train (const std::vector< doc_id > &docs) override
 Creates a classification model based on training documents. More...
 
class_label classify (doc_id d_id) override
 Classifies a document into a specific group, as determined by training data. More...
 
void reset () override
 Resets any learning information associated with this classifier.
 
- Public Member Functions inherited from meta::classify::classifier
 classifier (std::shared_ptr< index::forward_index > idx)
 
virtual confusion_matrix test (const std::vector< doc_id > &docs)
 Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More...
 
virtual confusion_matrix cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1)
 Performs k-fold cross-validation on a set of documents. More...
 

Static Public Attributes

static const std::string id = "nearest-centroid"
 Identifier for this classifier.
 

Private Member Functions

double cosine_sim (const std::vector< std::pair< term_id, double >> &doc, const std::unordered_map< term_id, double > &centroid)
 

Private Attributes

std::shared_ptr< index::inverted_indexinv_idx_
 Inverted index used for ranking.
 
std::unordered_map< class_label, std::unordered_map< term_id, double > > centroids_
 The document centroids for this learner.
 

Additional Inherited Members

- Protected Attributes inherited from meta::classify::classifier
std::shared_ptr< index::forward_indexidx_
 the index that the classifer is run on
 

Detailed Description

Implements the nearest centroid classification algorithm.

nearest_centroid creates a prototype document for each distinct class as an average of all documents in that class. This is called the centroid. A query (testing document) is then compared against each centroid. The class label of the centroid they query is closest to is returned.

See also
Centroid-Based Document Classification: Analysis and Experimental Results, Eui-Hong Han and George Karypis, 2000

Constructor & Destructor Documentation

meta::classify::nearest_centroid::nearest_centroid ( std::shared_ptr< index::inverted_index idx,
std::shared_ptr< index::forward_index f_idx 
)
Parameters
idxThe index to run the classifier on

Member Function Documentation

void meta::classify::nearest_centroid::train ( const std::vector< doc_id > &  docs)
overridevirtual

Creates a classification model based on training documents.

Parameters
docsThe training documents

Implements meta::classify::classifier.

class_label meta::classify::nearest_centroid::classify ( doc_id  d_id)
overridevirtual

Classifies a document into a specific group, as determined by training data.

Parameters
d_idThe document to classify
Returns
the class it belongs to

Implements meta::classify::classifier.

double meta::classify::nearest_centroid::cosine_sim ( const std::vector< std::pair< term_id, double >> &  doc,
const std::unordered_map< term_id, double > &  centroid 
)
private
Parameters
d_id
centroid
Returns
the cosine similarity between the query and a centroid

The documentation for this class was generated from the following files: