ModErn Text Analysis
META Enumerates Textual Applications
|
Contains tokenizers that start off a filter chain. More...
Classes | |
class | character_tokenizer |
Converts documents into streams of characters. More... | |
class | icu_tokenizer |
Converts documents into streams of tokens by following the unicode standards for sentence and word segmentation. More... | |
class | whitespace_tokenizer |
Converts documents into streams of whitespace delimited tokens. More... | |
Functions | |
template<class Tokenizer > | |
std::unique_ptr< token_stream > | make_tokenizer (const cpptoml::table &) |
Factory method for creating a tokenizer. More... | |
Contains tokenizers that start off a filter chain.
std::unique_ptr<token_stream> meta::analyzers::tokenizers::make_tokenizer | ( | const cpptoml::table & | ) |
Factory method for creating a tokenizer.
This should be specialized if your given tokenizer requires special construction behavior.