|
ModErn Text Analysis
META Enumerates Textual Applications
|
Multinomial logistic regression. More...
#include <logistic_regression.h>
Public Member Functions | |
| logistic_regression (const std::string &prefix, std::shared_ptr< index::forward_index > idx, double alpha=sgd::default_alpha, double gamma=sgd::default_gamma, double bias=sgd::default_bias, double lambda=sgd::default_lambda, uint64_t max_iter=sgd::default_max_iter) | |
| std::unordered_map< class_label, double > | predict (doc_id d_id) |
| Obtains the probability that the given document belongs to each class. More... | |
| virtual class_label | classify (doc_id d_id) override |
| Classifies a document into a specific group, as determined by training data. More... | |
| virtual void | train (const std::vector< doc_id > &docs) override |
| Creates a classification model based on training documents. More... | |
| virtual void | reset () override |
| Clears any learning data associated with this classifier. | |
Public Member Functions inherited from meta::classify::classifier | |
| classifier (std::shared_ptr< index::forward_index > idx) | |
| virtual confusion_matrix | test (const std::vector< doc_id > &docs) |
| Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More... | |
| virtual confusion_matrix | cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1) |
| Performs k-fold cross-validation on a set of documents. More... | |
Static Public Attributes | |
| static const std::string | id = "logistic-regression" |
| the identifier for this classifier | |
Private Attributes | |
| std::unordered_map< class_label, sgd > | classifiers_ |
| the set of \(K-1\) independent classifiers | |
| class_label | pivot_ |
| the class chosen to be the pivot element | |
Additional Inherited Members | |
Protected Attributes inherited from meta::classify::classifier | |
| std::shared_ptr< index::forward_index > | idx_ |
| the index that the classifer is run on | |
Multinomial logistic regression.
If there are \(K\) classes, this uses SGD to perform \(K-1\) independent logistic regressions by picking class \(K\) as a pivot (that is, each of the \(K-1\) independent regressions is done against the \(K\)-th class).
The probability of each class is then:
\begin{align*} P(y_i = 1) &= \frac{\exp(predict_1(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = 2) &= \frac{\exp(predict_2(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ &\vdots\\ P(y_i = K-1) &= \frac{\exp(predict_{K-1}(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = K) &= \frac{1}{1+\sum_{k=1}^K \exp(predict_k(x_i))} \end{align*}
where \(predict_k(x_i)\) is the result of running the predict function on the \(k\)-th classifier with the \(i\)-th example. The output of classifier::classify(), then, is the class with the highest probability based on the above formulas.
The individual class probabilities may be recovered by using the predict function: this returns an unordered_map of class_label to probability.
| meta::classify::logistic_regression::logistic_regression | ( | const std::string & | prefix, |
| std::shared_ptr< index::forward_index > | idx, | ||
| double | alpha = sgd::default_alpha, |
||
| double | gamma = sgd::default_gamma, |
||
| double | bias = sgd::default_bias, |
||
| double | lambda = sgd::default_lambda, |
||
| uint64_t | max_iter = sgd::default_max_iter |
||
| ) |
| prefix | The prefix for the model files |
| idx | The index to run the classifier on |
| alpha | \(\alpha\), the learning rate for each of the independent regressions |
| gamma | \(\gamma\), the error threshold for each of the independent regressions |
| bias | \(b\), the bias term for each of the independent regressions |
| lambda | \(\lambda\), the regularization constant for each of the independent regressions |
| max_iter | The maximum number of iterations for training each independent regression |
| std::unordered_map< class_label, double > meta::classify::logistic_regression::predict | ( | doc_id | d_id | ) |
Obtains the probability that the given document belongs to each class.
| d_id | The document to obtain class-membership probabilities for |
|
overridevirtual |
Classifies a document into a specific group, as determined by training data.
| d_id | The document to classify |
Implements meta::classify::classifier.
|
overridevirtual |
Creates a classification model based on training documents.
| docs | The training documents |
Implements meta::classify::classifier.
1.8.9.1