|
ModErn Text Analysis
META Enumerates Textual Applications
|
Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc). More...
Namespaces | |
| filters | |
| Contains filters that mutate existing token streams in a filter chain. | |
| tokenizers | |
| Contains tokenizers that start off a filter chain. | |
Classes | |
| class | analyzer |
| An class that provides a framework to produce token counts from documents. More... | |
| class | analyzer_factory |
| Factory that is responsible for creating analyzers from configuration files. More... | |
| class | branch_featurizer |
| Tokenizes parse trees by extracting branching factor features. More... | |
| class | depth_featurizer |
| Tokenizes parse trees by extracting depth features. More... | |
| class | featurizer_factory |
| Factory that is responsible for creating tree featurizers from configuration files. More... | |
| class | filter_factory |
| Factory that is responsible for creating filters during analyzer construction. More... | |
| class | libsvm_analyzer |
| libsvm_analyzer tokenizes documents that have been created from a line_corpus, where each line is in libsvm input format and stored in the document's content field. More... | |
| class | multi_analyzer |
| The multi_analyzer class contains more than one analyzer. More... | |
| class | ngram_analyzer |
| Analyzes documents based on an ngram word model, where the value for n is supplied by the user. More... | |
| class | ngram_pos_analyzer |
| Analyzes documents based on part-of-speech tags instead of words. More... | |
| class | ngram_word_analyzer |
| Analyzes documents using their tokenized words. More... | |
| class | semi_skeleton_featurizer |
| Tokenizes parse trees by keeping track of only a single node label and the underlying tree structure. More... | |
| class | skeleton_featurizer |
| Tokenizes parse trees by only tokenizing the tree structure itself. More... | |
| class | subtree_featurizer |
| Tokenizes parse trees by counting occurrences of subtrees in a document's parse tree. More... | |
| class | tag_featurizer |
| Tokenizes parse trees by looking at labels of leaf and interior nodes. More... | |
| class | token_stream |
| Base class that represents a stream of tokens that have been extracted from a document. More... | |
| class | tree_analyzer |
| Base class tokenizing using parse tree features. More... | |
| class | tree_featurizer |
| Base class for featurizers that convert trees into features in a document. More... | |
Functions | |
| template<class Analyzer > | |
| std::unique_ptr< analyzer > | make_analyzer (const cpptoml::table &, const cpptoml::table &) |
| Factory method for creating an analyzer. More... | |
| template<class Analyzer > | |
| void | register_analyzer () |
| Registration method for analyzers. More... | |
| template<class Tokenizer > | |
| void | register_tokenizer () |
| Registration method for tokenizers. More... | |
| template<class Filter > | |
| void | register_filter () |
| Registration method for filters. More... | |
| template<> | |
| std::unique_ptr< analyzer > | make_analyzer< ngram_word_analyzer > (const cpptoml::table &, const cpptoml::table &) |
| Specialization of the factory method for creating ngram_word_analyzers. | |
| template<class Featurizer > | |
| std::unique_ptr< tree_featurizer > | make_featurizer () |
| Factory method for creating a featurizer. | |
| template<class Featurizer > | |
| void | register_featurizer () |
| Registration method for analyzers. More... | |
| template<> | |
| std::unique_ptr< analyzer > | make_analyzer< tree_analyzer > (const cpptoml::table &, const cpptoml::table &) |
| Specialization of the factory method for creating tree analyzers. | |
| template<> | |
| std::unique_ptr< analyzer > | make_analyzer< ngram_pos_analyzer > (const cpptoml::table &, const cpptoml::table &) |
| Specialization of the factory method for creating ngram_pos_analyzers. | |
Variables | |
| stream_ {other.stream_->clone()} | |
| crf_ {other.crf_} | |
Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc).
| std::unique_ptr<analyzer> meta::analyzers::make_analyzer | ( | const cpptoml::table & | , |
| const cpptoml::table & | |||
| ) |
Factory method for creating an analyzer.
This should be specialized if your given tokenizer requires special construction behavior.
| void meta::analyzers::register_analyzer | ( | ) |
Registration method for analyzers.
Clients should use this method to register any new filters they write.
| void meta::analyzers::register_tokenizer | ( | ) |
Registration method for tokenizers.
Clients should use this method to register any new tokenizers they write.
| void meta::analyzers::register_filter | ( | ) |
Registration method for filters.
Clients should use this method to register any new filters they write.
| void meta::analyzers::register_featurizer | ( | ) |
Registration method for analyzers.
Clients should use this method to register any new filters they write.
1.8.9.1