ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Static Public Attributes | Private Types | Private Member Functions | Private Attributes | List of all members
meta::classify::dual_perceptron Class Reference

Implements a perceptron classifier, but using the dual formulation of the problem. More...

#include <dual_perceptron.h>

Inheritance diagram for meta::classify::dual_perceptron:
meta::classify::classifier

Public Member Functions

template<class Kernel >
 dual_perceptron (std::shared_ptr< index::forward_index > idx, Kernel &&kernel_fn=kernel::polynomial{}, double alpha=default_alpha, double gamma=default_gamma, double bias=default_bias, uint64_t max_iter=default_max_iter)
 Constructs a dual_perceptron classifier over the given index and with the given paramters. More...
 
void train (const std::vector< doc_id > &docs) override
 Trains the perceptron on the given training documents. More...
 
class_label classify (doc_id d_id) override
 Classifies the given document. More...
 
void reset () override
 Resets all learned information for this perceptron so it may be re-learned.
 
- Public Member Functions inherited from meta::classify::classifier
 classifier (std::shared_ptr< index::forward_index > idx)
 
virtual confusion_matrix test (const std::vector< doc_id > &docs)
 Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More...
 
virtual confusion_matrix cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1)
 Performs k-fold cross-validation on a set of documents. More...
 

Static Public Attributes

static const constexpr double default_alpha = 0.1
 The default \(\alpha\) parameter.
 
static const constexpr double default_gamma = 0.05
 The default \(\gamma\) parameter.
 
static const constexpr double default_bias = 0
 The default \(b\) parameter.
 
static const constexpr uint64_t default_max_iter = 100
 The default number of allowed iterations.
 
static const std::string id = "dual-perceptron"
 The identifier for this classifier.
 

Private Types

using pdata = decltype(idx_->search_primary(doc_id{}))
 Convenience typedef for the postings data type.
 

Private Member Functions

void decrease_weight (const class_label &label, const doc_id &id)
 Decreases the "weight" (mistake count) for a given class label and document. More...
 

Private Attributes

std::unordered_map< class_label, std::unordered_map< doc_id, uint64_t > > weights_
 The "weight" (mistake count) vectors for each class label.
 
std::function< double(pdata, pdata)> kernel_
 The kernel function to be used in lieu of a dot product.
 
const double alpha_
 \(\alpha\), the learning rate
 
const double gamma_
 \(\gamma\), the error threshold (in terms of percentage of mistakes on the training data in one iteration of training).
 
const double bias_
 \(b\), the bias factor.
 
const uint64_t max_iter_
 The maximum number of iterations for training.
 

Additional Inherited Members

- Protected Attributes inherited from meta::classify::classifier
std::shared_ptr< index::forward_indexidx_
 the index that the classifer is run on
 

Detailed Description

Implements a perceptron classifier, but using the dual formulation of the problem.

This allows the perceptron to be used for data that is not necessarily linearly separable via the use of a kernel function.

Constructor & Destructor Documentation

template<class Kernel >
meta::classify::dual_perceptron::dual_perceptron ( std::shared_ptr< index::forward_index idx,
Kernel &&  kernel_fn = kernel::polynomial{},
double  alpha = default_alpha,
double  gamma = default_gamma,
double  bias = default_bias,
uint64_t  max_iter = default_max_iter 
)
inline

Constructs a dual_perceptron classifier over the given index and with the given paramters.

Parameters
idxThe index to run the classifier on
kernel_fnThe kernel function to be used
alpha\(\alpha\), the learning rate
gamma\(\gamma\), the error threshold (in terms of percentage of mistakes on one training run)
bias\(b\), the bias
max_iterThe maximum allowed iterations for training.

Member Function Documentation

void meta::classify::dual_perceptron::train ( const std::vector< doc_id > &  docs)
overridevirtual

Trains the perceptron on the given training documents.

Maintains a set of weight vectors \(w_1,\ldots,w_K\) where \(K\) is the number of classes and updates them for each training document seen in each iteration. This continues until the error threshold is met or the maximum number of iterations is completed.

Contrary to the regular perceptron, since this is the dual formulation, its vectors are "mistake vectors" that keep track of how often a given training instance was misclassified.

Parameters
docsThe training set

Implements meta::classify::classifier.

class_label meta::classify::dual_perceptron::classify ( doc_id  d_id)
overridevirtual

Classifies the given document.

The class label returned is \(\arg\!\max_k(\sum_d(w_k^d*(K(d,x) + b))\)—in other words, the class whose associated weight vector gives the highest result.

Parameters
docThe document to be classified
Returns
the class label determined for the document

Implements meta::classify::classifier.

void meta::classify::dual_perceptron::decrease_weight ( const class_label &  label,
const doc_id &  id 
)
private

Decreases the "weight" (mistake count) for a given class label and document.

Parameters
labelThe class label
idThe document

The documentation for this class was generated from the following files: