Multinomial logistic regression. More...

#include <logistic_regression.h>

Inheritance diagram for meta::classify::logistic_regression:

Public Member Functions
	logistic_regression (const std::string &prefix, std::shared_ptr< index::forward_index > idx, double alpha=sgd::default_alpha, double gamma=sgd::default_gamma, double bias=sgd::default_bias, double lambda=sgd::default_lambda, uint64_t max_iter=sgd::default_max_iter)

std::unordered_map< class_label, double >	predict (doc_id d_id)
	Obtains the probability that the given document belongs to each class. More...

virtual class_label	classify (doc_id d_id) override
	Classifies a document into a specific group, as determined by training data. More...

virtual void	train (const std::vector< doc_id > &docs) override
	Creates a classification model based on training documents. More...

virtual void	reset () override
	Clears any learning data associated with this classifier.

Public Member Functions inherited from meta::classify::classifier
	classifier (std::shared_ptr< index::forward_index > idx)

virtual confusion_matrix	test (const std::vector< doc_id > &docs)
	Classifies a collection document into specific groups, as determined by training data; this function will make repeated calls to classify(). More...

virtual confusion_matrix	cross_validate (const std::vector< doc_id > &input_docs, size_t k, bool even_split=false, int seed=1)
	Performs k-fold cross-validation on a set of documents. More...

Static Public Attributes
static const std::string	id = "logistic-regression"
	the identifier for this classifier

Private Attributes
std::unordered_map< class_label, sgd >	classifiers_
	the set of \(K-1\) independent classifiers

class_label	pivot_
	the class chosen to be the pivot element

Additional Inherited Members
Protected Attributes inherited from meta::classify::classifier
std::shared_ptr< index::forward_index >	idx_
	the index that the classifer is run on

Detailed Description

Multinomial logistic regression.

If there are \(K\) classes, this uses SGD to perform \(K-1\) independent logistic regressions by picking class \(K\) as a pivot (that is, each of the \(K-1\) independent regressions is done against the \(K\)-th class).

The probability of each class is then:

\begin{align*} P(y_i = 1) &= \frac{\exp(predict_1(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = 2) &= \frac{\exp(predict_2(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ &\vdots\\ P(y_i = K-1) &= \frac{\exp(predict_{K-1}(x_i))}{1+\sum_{k=1}^K \exp(predict_k(x_i))}\\ P(y_i = K) &= \frac{1}{1+\sum_{k=1}^K \exp(predict_k(x_i))} \end{align*}

where \(predict_k(x_i)\) is the result of running the predict function on the \(k\)-th classifier with the \(i\)-th example. The output of classifier::classify(), then, is the class with the highest probability based on the above formulas.

The individual class probabilities may be recovered by using the predict function: this returns an unordered_map of class_label to probability.

Constructor & Destructor Documentation

meta::classify::logistic_regression::logistic_regression	(	const std::string &	prefix,
		std::shared_ptr< index::forward_index >	idx,
		double	alpha = `sgd::default_alpha`,
		double	gamma = `sgd::default_gamma`,
		double	bias = `sgd::default_bias`,
		double	lambda = `sgd::default_lambda`,
		uint64_t	max_iter = `sgd::default_max_iter`
	)

Parameters

prefix	The prefix for the model files
idx	The index to run the classifier on
alpha	\(\alpha\), the learning rate for each of the independent regressions
gamma	\(\gamma\), the error threshold for each of the independent regressions
bias	\(b\), the bias term for each of the independent regressions
lambda	\(\lambda\), the regularization constant for each of the independent regressions
max_iter	The maximum number of iterations for training each independent regression

Member Function Documentation

std::unordered_map< class_label, double > meta::classify::logistic_regression::predict ( doc_id d_id )

Obtains the probability that the given document belongs to each class.

Parameters

d_id	The document to obtain class-membership probabilities for

Returns: a map from class label to probability of membership

class_label meta::classify::logistic_regression::classify ( doc_id d_id )

overridevirtual

Classifies a document into a specific group, as determined by training data.

Parameters

d_id	The document to classify

Returns: the class it belongs to

Implements meta::classify::classifier.

void meta::classify::logistic_regression::train ( const std::vector< doc_id > & docs )

overridevirtual

Creates a classification model based on training documents.

Parameters

docs	The training documents

Implements meta::classify::classifier.

The documentation for this class was generated from the following files:

/home/chase/projects/meta/include/classify/classifier/logistic_regression.h
/home/chase/projects/meta/src/classify/classifier/logistic_regression.cpp

Public Member Functions

Static Public Attributes

Private Attributes

Additional Inherited Members

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation