ModErn Text Analysis
META Enumerates Textual Applications
Namespaces | Classes | Functions | Variables
meta::analyzers Namespace Reference

Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc). More...

Namespaces

 filters
 Contains filters that mutate existing token streams in a filter chain.
 
 tokenizers
 Contains tokenizers that start off a filter chain.
 

Classes

class  analyzer
 An class that provides a framework to produce token counts from documents. More...
 
class  analyzer_factory
 Factory that is responsible for creating analyzers from configuration files. More...
 
class  branch_featurizer
 Tokenizes parse trees by extracting branching factor features. More...
 
class  depth_featurizer
 Tokenizes parse trees by extracting depth features. More...
 
class  featurizer_factory
 Factory that is responsible for creating tree featurizers from configuration files. More...
 
class  filter_factory
 Factory that is responsible for creating filters during analyzer construction. More...
 
class  libsvm_analyzer
 libsvm_analyzer tokenizes documents that have been created from a line_corpus, where each line is in libsvm input format and stored in the document's content field. More...
 
class  multi_analyzer
 The multi_analyzer class contains more than one analyzer. More...
 
class  ngram_analyzer
 Analyzes documents based on an ngram word model, where the value for n is supplied by the user. More...
 
class  ngram_pos_analyzer
 Analyzes documents based on part-of-speech tags instead of words. More...
 
class  ngram_word_analyzer
 Analyzes documents using their tokenized words. More...
 
class  semi_skeleton_featurizer
 Tokenizes parse trees by keeping track of only a single node label and the underlying tree structure. More...
 
class  skeleton_featurizer
 Tokenizes parse trees by only tokenizing the tree structure itself. More...
 
class  subtree_featurizer
 Tokenizes parse trees by counting occurrences of subtrees in a document's parse tree. More...
 
class  tag_featurizer
 Tokenizes parse trees by looking at labels of leaf and interior nodes. More...
 
class  token_stream
 Base class that represents a stream of tokens that have been extracted from a document. More...
 
class  tree_analyzer
 Base class tokenizing using parse tree features. More...
 
class  tree_featurizer
 Base class for featurizers that convert trees into features in a document. More...
 

Functions

template<class Analyzer >
std::unique_ptr< analyzermake_analyzer (const cpptoml::table &, const cpptoml::table &)
 Factory method for creating an analyzer. More...
 
template<class Analyzer >
void register_analyzer ()
 Registration method for analyzers. More...
 
template<class Tokenizer >
void register_tokenizer ()
 Registration method for tokenizers. More...
 
template<class Filter >
void register_filter ()
 Registration method for filters. More...
 
template<>
std::unique_ptr< analyzermake_analyzer< ngram_word_analyzer > (const cpptoml::table &, const cpptoml::table &)
 Specialization of the factory method for creating ngram_word_analyzers.
 
template<class Featurizer >
std::unique_ptr< tree_featurizermake_featurizer ()
 Factory method for creating a featurizer.
 
template<class Featurizer >
void register_featurizer ()
 Registration method for analyzers. More...
 
template<>
std::unique_ptr< analyzermake_analyzer< tree_analyzer > (const cpptoml::table &, const cpptoml::table &)
 Specialization of the factory method for creating tree analyzers.
 
template<>
std::unique_ptr< analyzermake_analyzer< ngram_pos_analyzer > (const cpptoml::table &, const cpptoml::table &)
 Specialization of the factory method for creating ngram_pos_analyzers.
 

Variables

 stream_ {other.stream_->clone()}
 
 crf_ {other.crf_}
 

Detailed Description

Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc).

Function Documentation

template<class Analyzer >
std::unique_ptr<analyzer> meta::analyzers::make_analyzer ( const cpptoml::table &  ,
const cpptoml::table &   
)

Factory method for creating an analyzer.

This should be specialized if your given tokenizer requires special construction behavior.

template<class Analyzer >
void meta::analyzers::register_analyzer ( )

Registration method for analyzers.

Clients should use this method to register any new filters they write.

template<class Tokenizer >
void meta::analyzers::register_tokenizer ( )

Registration method for tokenizers.

Clients should use this method to register any new tokenizers they write.

template<class Filter >
void meta::analyzers::register_filter ( )

Registration method for filters.

Clients should use this method to register any new filters they write.

template<class Featurizer >
void meta::analyzers::register_featurizer ( )

Registration method for analyzers.

Clients should use this method to register any new filters they write.