Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc). More...

Namespaces
	filters
	Contains filters that mutate existing token streams in a filter chain.

	tokenizers
	Contains tokenizers that start off a filter chain.

Classes
class	analyzer
	An class that provides a framework to produce token counts from documents. More...

class	analyzer_factory
	Factory that is responsible for creating analyzers from configuration files. More...

class	branch_featurizer
	Tokenizes parse trees by extracting branching factor features. More...

class	depth_featurizer
	Tokenizes parse trees by extracting depth features. More...

class	featurizer_factory
	Factory that is responsible for creating tree featurizers from configuration files. More...

class	filter_factory
	Factory that is responsible for creating filters during analyzer construction. More...

class	libsvm_analyzer
	libsvm_analyzer tokenizes documents that have been created from a line_corpus, where each line is in libsvm input format and stored in the document's content field. More...

class	multi_analyzer
	The multi_analyzer class contains more than one analyzer. More...

class	ngram_analyzer
	Analyzes documents based on an ngram word model, where the value for n is supplied by the user. More...

class	ngram_pos_analyzer
	Analyzes documents based on part-of-speech tags instead of words. More...

class	ngram_word_analyzer
	Analyzes documents using their tokenized words. More...

class	semi_skeleton_featurizer
	Tokenizes parse trees by keeping track of only a single node label and the underlying tree structure. More...

class	skeleton_featurizer
	Tokenizes parse trees by only tokenizing the tree structure itself. More...

class	subtree_featurizer
	Tokenizes parse trees by counting occurrences of subtrees in a document's parse tree. More...

class	tag_featurizer
	Tokenizes parse trees by looking at labels of leaf and interior nodes. More...

class	token_stream
	Base class that represents a stream of tokens that have been extracted from a document. More...

class	tree_analyzer
	Base class tokenizing using parse tree features. More...

class	tree_featurizer
	Base class for featurizers that convert trees into features in a document. More...

Functions
template<class Analyzer >
std::unique_ptr< analyzer >	make_analyzer (const cpptoml::table &, const cpptoml::table &)
	Factory method for creating an analyzer. More...

template<class Analyzer >
void	register_analyzer ()
	Registration method for analyzers. More...

template<class Tokenizer >
void	register_tokenizer ()
	Registration method for tokenizers. More...

template<class Filter >
void	register_filter ()
	Registration method for filters. More...

template<>
std::unique_ptr< analyzer >	make_analyzer< ngram_word_analyzer > (const cpptoml::table &, const cpptoml::table &)
	Specialization of the factory method for creating ngram_word_analyzers.

template<class Featurizer >
std::unique_ptr< tree_featurizer >	make_featurizer ()
	Factory method for creating a featurizer.

template<class Featurizer >
void	register_featurizer ()
	Registration method for analyzers. More...

template<>
std::unique_ptr< analyzer >	make_analyzer< tree_analyzer > (const cpptoml::table &, const cpptoml::table &)
	Specialization of the factory method for creating tree analyzers.

template<>
std::unique_ptr< analyzer >	make_analyzer< ngram_pos_analyzer > (const cpptoml::table &, const cpptoml::table &)
	Specialization of the factory method for creating ngram_pos_analyzers.

Variables
	stream_ {other.stream_->clone()}

	crf_ {other.crf_}

Detailed Description

Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc).

Function Documentation

template<class Analyzer >

std::unique_ptr<analyzer> meta::analyzers::make_analyzer	(	const cpptoml::table &	,
		const cpptoml::table &
	)

Factory method for creating an analyzer.

This should be specialized if your given tokenizer requires special construction behavior.

template<class Analyzer >

void meta::analyzers::register_analyzer ( )

Registration method for analyzers.

Clients should use this method to register any new filters they write.

template<class Tokenizer >

void meta::analyzers::register_tokenizer ( )

Registration method for tokenizers.

Clients should use this method to register any new tokenizers they write.

template<class Filter >

void meta::analyzers::register_filter ( )

Registration method for filters.

Clients should use this method to register any new filters they write.

template<class Featurizer >

void meta::analyzers::register_featurizer ( )

Registration method for analyzers.

Clients should use this method to register any new filters they write.

Namespaces

Classes

Functions

Variables

Detailed Description

Function Documentation