ModErn Text Analysis
META Enumerates Textual Applications
|
Implements the Winnow classifier, a simplistic linear classifier for linearly-separable data. More...
#include <winnow.h>
Public Member Functions | |
winnow (std::shared_ptr< index::forward_index > idx, double m=default_m, double gamma=default_gamma, size_t max_iter=default_max_iter) | |
Constructs a winnow classifier with the given multiplier, error threshold, and maximum iterations. More... | |
void | train (const std::vector< doc_id > &docs) override |
Trains the winnow on the given training documents. More... | |
class_label | classify (doc_id d_id) override |
Classifies the given document. More... | |
void | reset () override |
Resets all learned information for this winnow so it may be re-learned. | |
Public Member Functions inherited from meta::classify::classifier | |
classifier (std::shared_ptr< index::forward_index > idx) | |
virtual confusion_matrix | test (const std::vector< doc_id > &docs) |
Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More... | |
virtual confusion_matrix | cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1) |
Performs k-fold cross-validation on a set of documents. More... | |
Static Public Attributes | |
static const constexpr double | default_m = 1.5 |
The default \(m\) parameter. | |
static const constexpr double | default_gamma = 0.05 |
The default \(gamma\) parameter. | |
static const constexpr size_t | default_max_iter = 100 |
The default number of allowed iterations. | |
static const std::string | id = "winnow" |
The identifier for this classifier. | |
Private Member Functions | |
double | get_weight (const class_label &label, const term_id &term) const |
void | zero_weights (const std::vector< doc_id > &docs) |
Initializes the weight vectors to zero for every class label. More... | |
Private Attributes | |
std::unordered_map< class_label, std::unordered_map< term_id, double > > | weights_ |
The weight vectors for each class label. | |
const double | m_ |
\(m\), the multiplicative learning rate. | |
const double | gamma_ |
\(\gamma\), the error threshold. | |
const size_t | max_iter_ |
The maximum number of iterations for training. | |
Additional Inherited Members | |
Protected Attributes inherited from meta::classify::classifier | |
std::shared_ptr< index::forward_index > | idx_ |
the index that the classifer is run on | |
Implements the Winnow classifier, a simplistic linear classifier for linearly-separable data.
As opposed to winnow (which uses an additive update rule), winnow uses a multiplicative update rule.
meta::classify::winnow::winnow | ( | std::shared_ptr< index::forward_index > | idx, |
double | m = default_m , |
||
double | gamma = default_gamma , |
||
size_t | max_iter = default_max_iter |
||
) |
Constructs a winnow classifier with the given multiplier, error threshold, and maximum iterations.
idx | The index to run the classifier on |
m | \(m\), the multiplicative learning rate |
gamma | \(gamma\), the error threshold |
max_iter | The maximum number of iterations for training |
|
overridevirtual |
Trains the winnow on the given training documents.
Maintains a set of weight vectors \(w_1,\ldots,w_K\) where \(K\) is the number of classes and updates them for each training document seen in each iteration. This continues until the error threshold is met or the maximum number of iterations is completed.
docs | The training set |
Implements meta::classify::classifier.
|
overridevirtual |
Classifies the given document.
The class label returned is \(\argmax_k(w_k^\intercal x_n + b)\)—in other words, the class whose associated weight vector gives the highest result.
doc | The document to be classified |
Implements meta::classify::classifier.
|
private |
label | The class label for the weight vector we want |
term | The term whose weight should be returned |
|
private |
Initializes the weight vectors to zero for every class label.
docs | The set of documents to collect class labels from. |