ModErn Text Analysis
META Enumerates Textual Applications
|
Analyzes documents using their tokenized words. More...
#include <ngram_word_analyzer.h>
Public Member Functions | |
ngram_word_analyzer (uint16_t n, std::unique_ptr< token_stream > stream) | |
Constructor. More... | |
ngram_word_analyzer (const ngram_word_analyzer &other) | |
Copy constructor. More... | |
virtual void | tokenize (corpus::document &doc) override |
Tokenizes a file into a document. More... | |
Public Member Functions inherited from meta::util::multilevel_clonable< analyzer, ngram_analyzer, ngram_word_analyzer > | |
virtual std::unique_ptr< analyzer > | clone () const |
Clones the given object. More... | |
Static Public Attributes | |
static const std::string | id = "ngram-word" |
Identifier for this analyzer. | |
Private Types | |
using | base = util::multilevel_clonable< analyzer, ngram_analyzer, ngram_word_analyzer > |
Private Attributes | |
std::unique_ptr< token_stream > | stream_ |
The token stream to be used for extracting tokens. | |
Analyzes documents using their tokenized words.
meta::analyzers::ngram_word_analyzer::ngram_word_analyzer | ( | uint16_t | n, |
std::unique_ptr< token_stream > | stream | ||
) |
Constructor.
n | The value of n to use for the ngrams. |
stream | The stream to read tokens from. |
meta::analyzers::ngram_word_analyzer::ngram_word_analyzer | ( | const ngram_word_analyzer & | other | ) |
Copy constructor.
other | The other ngram_word_analyzer to copy from |
|
overridevirtual |
Tokenizes a file into a document.
doc | The document to store the tokenized information in |