|
ModErn Text Analysis
META Enumerates Textual Applications
|
Analyzes documents using their tokenized words. More...
#include <ngram_word_analyzer.h>
Public Member Functions | |
| ngram_word_analyzer (uint16_t n, std::unique_ptr< token_stream > stream) | |
| Constructor. More... | |
| ngram_word_analyzer (const ngram_word_analyzer &other) | |
| Copy constructor. More... | |
| virtual void | tokenize (corpus::document &doc) override |
| Tokenizes a file into a document. More... | |
Public Member Functions inherited from meta::util::multilevel_clonable< analyzer, ngram_analyzer, ngram_word_analyzer > | |
| virtual std::unique_ptr< analyzer > | clone () const |
| Clones the given object. More... | |
Static Public Attributes | |
| static const std::string | id = "ngram-word" |
| Identifier for this analyzer. | |
Private Types | |
| using | base = util::multilevel_clonable< analyzer, ngram_analyzer, ngram_word_analyzer > |
Private Attributes | |
| std::unique_ptr< token_stream > | stream_ |
| The token stream to be used for extracting tokens. | |
Analyzes documents using their tokenized words.
| meta::analyzers::ngram_word_analyzer::ngram_word_analyzer | ( | uint16_t | n, |
| std::unique_ptr< token_stream > | stream | ||
| ) |
Constructor.
| n | The value of n to use for the ngrams. |
| stream | The stream to read tokens from. |
| meta::analyzers::ngram_word_analyzer::ngram_word_analyzer | ( | const ngram_word_analyzer & | other | ) |
Copy constructor.
| other | The other ngram_word_analyzer to copy from |
|
overridevirtual |
Tokenizes a file into a document.
| doc | The document to store the tokenized information in |
1.8.9.1