ModErn Text Analysis
META Enumerates Textual Applications
|
Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc). More...
Namespaces | |
filters | |
Contains filters that mutate existing token streams in a filter chain. | |
tokenizers | |
Contains tokenizers that start off a filter chain. | |
Classes | |
class | analyzer |
An class that provides a framework to produce token counts from documents. More... | |
class | analyzer_factory |
Factory that is responsible for creating analyzers from configuration files. More... | |
class | branch_featurizer |
Tokenizes parse trees by extracting branching factor features. More... | |
class | depth_featurizer |
Tokenizes parse trees by extracting depth features. More... | |
class | featurizer_factory |
Factory that is responsible for creating tree featurizers from configuration files. More... | |
class | filter_factory |
Factory that is responsible for creating filters during analyzer construction. More... | |
class | libsvm_analyzer |
libsvm_analyzer tokenizes documents that have been created from a line_corpus, where each line is in libsvm input format and stored in the document's content field. More... | |
class | multi_analyzer |
The multi_analyzer class contains more than one analyzer. More... | |
class | ngram_analyzer |
Analyzes documents based on an ngram word model, where the value for n is supplied by the user. More... | |
class | ngram_pos_analyzer |
Analyzes documents based on part-of-speech tags instead of words. More... | |
class | ngram_word_analyzer |
Analyzes documents using their tokenized words. More... | |
class | semi_skeleton_featurizer |
Tokenizes parse trees by keeping track of only a single node label and the underlying tree structure. More... | |
class | skeleton_featurizer |
Tokenizes parse trees by only tokenizing the tree structure itself. More... | |
class | subtree_featurizer |
Tokenizes parse trees by counting occurrences of subtrees in a document's parse tree. More... | |
class | tag_featurizer |
Tokenizes parse trees by looking at labels of leaf and interior nodes. More... | |
class | token_stream |
Base class that represents a stream of tokens that have been extracted from a document. More... | |
class | tree_analyzer |
Base class tokenizing using parse tree features. More... | |
class | tree_featurizer |
Base class for featurizers that convert trees into features in a document. More... | |
Functions | |
template<class Analyzer > | |
std::unique_ptr< analyzer > | make_analyzer (const cpptoml::table &, const cpptoml::table &) |
Factory method for creating an analyzer. More... | |
template<class Analyzer > | |
void | register_analyzer () |
Registration method for analyzers. More... | |
template<class Tokenizer > | |
void | register_tokenizer () |
Registration method for tokenizers. More... | |
template<class Filter > | |
void | register_filter () |
Registration method for filters. More... | |
template<> | |
std::unique_ptr< analyzer > | make_analyzer< ngram_word_analyzer > (const cpptoml::table &, const cpptoml::table &) |
Specialization of the factory method for creating ngram_word_analyzers. | |
template<class Featurizer > | |
std::unique_ptr< tree_featurizer > | make_featurizer () |
Factory method for creating a featurizer. | |
template<class Featurizer > | |
void | register_featurizer () |
Registration method for analyzers. More... | |
template<> | |
std::unique_ptr< analyzer > | make_analyzer< tree_analyzer > (const cpptoml::table &, const cpptoml::table &) |
Specialization of the factory method for creating tree analyzers. | |
template<> | |
std::unique_ptr< analyzer > | make_analyzer< ngram_pos_analyzer > (const cpptoml::table &, const cpptoml::table &) |
Specialization of the factory method for creating ngram_pos_analyzers. | |
Variables | |
stream_ {other.stream_->clone()} | |
crf_ {other.crf_} | |
Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc).
std::unique_ptr<analyzer> meta::analyzers::make_analyzer | ( | const cpptoml::table & | , |
const cpptoml::table & | |||
) |
Factory method for creating an analyzer.
This should be specialized if your given tokenizer requires special construction behavior.
void meta::analyzers::register_analyzer | ( | ) |
Registration method for analyzers.
Clients should use this method to register any new filters they write.
void meta::analyzers::register_tokenizer | ( | ) |
Registration method for tokenizers.
Clients should use this method to register any new tokenizers they write.
void meta::analyzers::register_filter | ( | ) |
Registration method for filters.
Clients should use this method to register any new filters they write.
void meta::analyzers::register_featurizer | ( | ) |
Registration method for analyzers.
Clients should use this method to register any new filters they write.