ModErn Text Analysis
META Enumerates Textual Applications
|
Multinomial logistic regression. More...
#include <logistic_regression.h>
Public Member Functions | |
logistic_regression (const std::string &prefix, std::shared_ptr< index::forward_index > idx, double alpha=sgd::default_alpha, double gamma=sgd::default_gamma, double bias=sgd::default_bias, double lambda=sgd::default_lambda, uint64_t max_iter=sgd::default_max_iter) | |
std::unordered_map< class_label, double > | predict (doc_id d_id) |
Obtains the probability that the given document belongs to each class. More... | |
virtual class_label | classify (doc_id d_id) override |
Classifies a document into a specific group, as determined by training data. More... | |
virtual void | train (const std::vector< doc_id > &docs) override |
Creates a classification model based on training documents. More... | |
virtual void | reset () override |
Clears any learning data associated with this classifier. | |
Public Member Functions inherited from meta::classify::classifier | |
classifier (std::shared_ptr< index::forward_index > idx) | |
virtual confusion_matrix | test (const std::vector< doc_id > &docs) |
Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More... | |
virtual confusion_matrix | cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1) |
Performs k-fold cross-validation on a set of documents. More... | |
Static Public Attributes | |
static const std::string | id = "logistic-regression" |
the identifier for this classifier | |
Private Attributes | |
std::unordered_map< class_label, sgd > | classifiers_ |
the set of \(K-1\) independent classifiers | |
class_label | pivot_ |
the class chosen to be the pivot element | |
Additional Inherited Members | |
Protected Attributes inherited from meta::classify::classifier | |
std::shared_ptr< index::forward_index > | idx_ |
the index that the classifer is run on | |
Multinomial logistic regression.
If there are \(K\) classes, this uses SGD to perform \(K-1\) independent logistic regressions by picking class \(K\) as a pivot (that is, each of the \(K-1\) independent regressions is done against the \(K\)-th class).
The probability of each class is then:
\begin{align*} P(y_i = 1) &= \frac{\exp(predict_1(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = 2) &= \frac{\exp(predict_2(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ &\vdots\\ P(y_i = K-1) &= \frac{\exp(predict_{K-1}(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = K) &= \frac{1}{1+\sum_{k=1}^K \exp(predict_k(x_i))} \end{align*}
where \(predict_k(x_i)\) is the result of running the predict
function on the \(k\)-th classifier with the \(i\)-th example. The output of classifier::classify()
, then, is the class with the highest probability based on the above formulas.
The individual class probabilities may be recovered by using the predict
function: this returns an unordered_map
of class_label
to probability.
meta::classify::logistic_regression::logistic_regression | ( | const std::string & | prefix, |
std::shared_ptr< index::forward_index > | idx, | ||
double | alpha = sgd::default_alpha , |
||
double | gamma = sgd::default_gamma , |
||
double | bias = sgd::default_bias , |
||
double | lambda = sgd::default_lambda , |
||
uint64_t | max_iter = sgd::default_max_iter |
||
) |
prefix | The prefix for the model files |
idx | The index to run the classifier on |
alpha | \(\alpha\), the learning rate for each of the independent regressions |
gamma | \(\gamma\), the error threshold for each of the independent regressions |
bias | \(b\), the bias term for each of the independent regressions |
lambda | \(\lambda\), the regularization constant for each of the independent regressions |
max_iter | The maximum number of iterations for training each independent regression |
std::unordered_map< class_label, double > meta::classify::logistic_regression::predict | ( | doc_id | d_id | ) |
Obtains the probability that the given document belongs to each class.
d_id | The document to obtain class-membership probabilities for |
|
overridevirtual |
Classifies a document into a specific group, as determined by training data.
d_id | The document to classify |
Implements meta::classify::classifier.
|
overridevirtual |
Creates a classification model based on training documents.
docs | The training documents |
Implements meta::classify::classifier.