|
ModErn Text Analysis
META Enumerates Textual Applications
|
Analyzes documents based on an ngram word model, where the value for n is supplied by the user. More...
#include <ngram_analyzer.h>
Public Member Functions | |
| ngram_analyzer (uint16_t n) | |
| Constructor. More... | |
| virtual uint16_t | n_value () const |
Public Member Functions inherited from meta::analyzers::analyzer | |
| virtual | ~analyzer ()=default |
| A default virtual destructor. | |
| virtual void | tokenize (corpus::document &doc)=0 |
| Tokenizes a document. More... | |
| virtual std::unique_ptr< analyzer > | clone () const =0 |
| Clones this analyzer. | |
Protected Member Functions | |
| virtual std::string | wordify (const std::deque< std::string > &words) const |
| Turns a list of words into an ngram string. More... | |
Private Attributes | |
| uint16_t | n_val_ |
| The value of n for this ngram analyzer. | |
Additional Inherited Members | |
Static Public Member Functions inherited from meta::analyzers::analyzer | |
| static std::unique_ptr< analyzer > | load (const cpptoml::table &config) |
| static std::unique_ptr< token_stream > | default_filter_chain (const cpptoml::table &config) |
| static std::unique_ptr< token_stream > | load_filters (const cpptoml::table &global, const cpptoml::table &config) |
| static std::unique_ptr< token_stream > | load_filter (std::unique_ptr< token_stream > src, const cpptoml::table &config) |
| static io::parser | create_parser (const corpus::document &doc, const std::string &extension, const std::string &delims) |
| static std::string | get_content (const corpus::document &doc) |
Analyzes documents based on an ngram word model, where the value for n is supplied by the user.
This class is abstract, as it only provides the framework for ngram tokenization.
| ngram_analyzer::ngram_analyzer | ( | uint16_t | n | ) |
Constructor.
| n | The value of n in ngram. |
|
virtual |
|
protectedvirtual |
Turns a list of words into an ngram string.
| words | The deque representing a list of words |
1.8.9.1