ModErn Text Analysis
META Enumerates Textual Applications
|
Analyzes documents based on an ngram word model, where the value for n is supplied by the user. More...
#include <ngram_analyzer.h>
Public Member Functions | |
ngram_analyzer (uint16_t n) | |
Constructor. More... | |
virtual uint16_t | n_value () const |
Public Member Functions inherited from meta::analyzers::analyzer | |
virtual | ~analyzer ()=default |
A default virtual destructor. | |
virtual void | tokenize (corpus::document &doc)=0 |
Tokenizes a document. More... | |
virtual std::unique_ptr< analyzer > | clone () const =0 |
Clones this analyzer. | |
Protected Member Functions | |
virtual std::string | wordify (const std::deque< std::string > &words) const |
Turns a list of words into an ngram string. More... | |
Private Attributes | |
uint16_t | n_val_ |
The value of n for this ngram analyzer. | |
Additional Inherited Members | |
Static Public Member Functions inherited from meta::analyzers::analyzer | |
static std::unique_ptr< analyzer > | load (const cpptoml::table &config) |
static std::unique_ptr< token_stream > | default_filter_chain (const cpptoml::table &config) |
static std::unique_ptr< token_stream > | load_filters (const cpptoml::table &global, const cpptoml::table &config) |
static std::unique_ptr< token_stream > | load_filter (std::unique_ptr< token_stream > src, const cpptoml::table &config) |
static io::parser | create_parser (const corpus::document &doc, const std::string &extension, const std::string &delims) |
static std::string | get_content (const corpus::document &doc) |
Analyzes documents based on an ngram word model, where the value for n is supplied by the user.
This class is abstract, as it only provides the framework for ngram tokenization.
ngram_analyzer::ngram_analyzer | ( | uint16_t | n | ) |
Constructor.
n | The value of n in ngram. |
|
virtual |
|
protectedvirtual |
Turns a list of words into an ngram string.
words | The deque representing a list of words |