ModErn Text Analysis
META Enumerates Textual Applications
ngram_word_analyzer.h
Go to the documentation of this file.
1 
9 #ifndef META_NGRAM_WORD_ANALYZER_H_
10 #define META_NGRAM_WORD_ANALYZER_H_
11 
14 #include "util/clonable.h"
15 
16 namespace meta
17 {
18 namespace analyzers
19 {
20 
25  : public util::multilevel_clonable<analyzer, ngram_analyzer,
26  ngram_word_analyzer>
27 {
30 
31  public:
37  ngram_word_analyzer(uint16_t n, std::unique_ptr<token_stream> stream);
38 
43  ngram_word_analyzer(const ngram_word_analyzer& other);
44 
49  virtual void tokenize(corpus::document& doc) override;
50 
52  const static std::string id;
53 
54  private:
56  std::unique_ptr<token_stream> stream_;
57 };
58 
62 template <>
63 std::unique_ptr<analyzer>
64  make_analyzer<ngram_word_analyzer>(const cpptoml::table&,
65  const cpptoml::table&);
66 }
67 }
68 #endif
Analyzes documents using their tokenized words.
Definition: ngram_word_analyzer.h:24
std::unique_ptr< analyzer > make_analyzer< ngram_word_analyzer >(const cpptoml::table &, const cpptoml::table &)
Specialization of the factory method for creating ngram_word_analyzers.
Definition: ngram_word_analyzer.cpp:55
static const std::string id
Identifier for this analyzer.
Definition: ngram_word_analyzer.h:52
virtual void tokenize(corpus::document &doc) override
Tokenizes a file into a document.
Definition: ngram_word_analyzer.cpp:34
std::unique_ptr< token_stream > stream_
The token stream to be used for extracting tokens.
Definition: ngram_word_analyzer.h:56
Template class to facilitate polymorphic cloning.
Definition: clonable.h:28
Represents an indexable document.
Definition: document.h:31
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
An class that provides a framework to produce token counts from documents.
Definition: analyzer.h:41
ngram_word_analyzer(uint16_t n, std::unique_ptr< token_stream > stream)
Constructor.
Definition: ngram_word_analyzer.cpp:21
Analyzes documents based on an ngram word model, where the value for n is supplied by the user...
Definition: ngram_analyzer.h:27