ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Static Public Attributes | Private Attributes | List of all members
meta::classify::logistic_regression Class Reference

Multinomial logistic regression. More...

#include <logistic_regression.h>

Inheritance diagram for meta::classify::logistic_regression:
meta::classify::classifier

Public Member Functions

 logistic_regression (const std::string &prefix, std::shared_ptr< index::forward_index > idx, double alpha=sgd::default_alpha, double gamma=sgd::default_gamma, double bias=sgd::default_bias, double lambda=sgd::default_lambda, uint64_t max_iter=sgd::default_max_iter)
 
std::unordered_map< class_label, double > predict (doc_id d_id)
 Obtains the probability that the given document belongs to each class. More...
 
virtual class_label classify (doc_id d_id) override
 Classifies a document into a specific group, as determined by training data. More...
 
virtual void train (const std::vector< doc_id > &docs) override
 Creates a classification model based on training documents. More...
 
virtual void reset () override
 Clears any learning data associated with this classifier.
 
- Public Member Functions inherited from meta::classify::classifier
 classifier (std::shared_ptr< index::forward_index > idx)
 
virtual confusion_matrix test (const std::vector< doc_id > &docs)
 Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More...
 
virtual confusion_matrix cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1)
 Performs k-fold cross-validation on a set of documents. More...
 

Static Public Attributes

static const std::string id = "logistic-regression"
 the identifier for this classifier
 

Private Attributes

std::unordered_map< class_label, sgdclassifiers_
 the set of \(K-1\) independent classifiers
 
class_label pivot_
 the class chosen to be the pivot element
 

Additional Inherited Members

- Protected Attributes inherited from meta::classify::classifier
std::shared_ptr< index::forward_indexidx_
 the index that the classifer is run on
 

Detailed Description

Multinomial logistic regression.

If there are \(K\) classes, this uses SGD to perform \(K-1\) independent logistic regressions by picking class \(K\) as a pivot (that is, each of the \(K-1\) independent regressions is done against the \(K\)-th class).

The probability of each class is then:

\begin{align*} P(y_i = 1) &= \frac{\exp(predict_1(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = 2) &= \frac{\exp(predict_2(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ &\vdots\\ P(y_i = K-1) &= \frac{\exp(predict_{K-1}(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = K) &= \frac{1}{1+\sum_{k=1}^K \exp(predict_k(x_i))} \end{align*}

where \(predict_k(x_i)\) is the result of running the predict function on the \(k\)-th classifier with the \(i\)-th example. The output of classifier::classify(), then, is the class with the highest probability based on the above formulas.

The individual class probabilities may be recovered by using the predict function: this returns an unordered_map of class_label to probability.

Constructor & Destructor Documentation

meta::classify::logistic_regression::logistic_regression ( const std::string &  prefix,
std::shared_ptr< index::forward_index idx,
double  alpha = sgd::default_alpha,
double  gamma = sgd::default_gamma,
double  bias = sgd::default_bias,
double  lambda = sgd::default_lambda,
uint64_t  max_iter = sgd::default_max_iter 
)
Parameters
prefixThe prefix for the model files
idxThe index to run the classifier on
alpha\(\alpha\), the learning rate for each of the independent regressions
gamma\(\gamma\), the error threshold for each of the independent regressions
bias\(b\), the bias term for each of the independent regressions
lambda\(\lambda\), the regularization constant for each of the independent regressions
max_iterThe maximum number of iterations for training each independent regression

Member Function Documentation

std::unordered_map< class_label, double > meta::classify::logistic_regression::predict ( doc_id  d_id)

Obtains the probability that the given document belongs to each class.

Parameters
d_idThe document to obtain class-membership probabilities for
Returns
a map from class label to probability of membership
class_label meta::classify::logistic_regression::classify ( doc_id  d_id)
overridevirtual

Classifies a document into a specific group, as determined by training data.

Parameters
d_idThe document to classify
Returns
the class it belongs to

Implements meta::classify::classifier.

void meta::classify::logistic_regression::train ( const std::vector< doc_id > &  docs)
overridevirtual

Creates a classification model based on training documents.

Parameters
docsThe training documents

Implements meta::classify::classifier.


The documentation for this class was generated from the following files: