ModErn Text Analysis
META Enumerates Textual Applications
whitespace_tokenizer.h
Go to the documentation of this file.
1 
9 #ifndef META_WHITESPACE_TOKENIZER_H_
10 #define META_WHITESPACE_TOKENIZER_H_
11 
12 #include "analyzers/token_stream.h"
13 #include "util/clonable.h"
14 
15 namespace meta
16 {
17 namespace corpus
18 {
19 class document;
20 }
21 }
22 
23 namespace meta
24 {
25 namespace analyzers
26 {
27 namespace tokenizers
28 {
29 
35 class whitespace_tokenizer : public util::clonable<token_stream,
36  whitespace_tokenizer>
37 {
38  public:
43 
48  void set_content(const std::string& content) override;
49 
55  std::string next() override;
56 
60  operator bool() const override;
61 
63  const static std::string id;
64 
65  private:
67  std::string content_;
68 
70  uint64_t idx_;
71 };
72 }
73 }
74 }
75 #endif
static const std::string id
Identifier for this tokenizer.
Definition: whitespace_tokenizer.h:63
uint64_t idx_
Character index into the current buffer.
Definition: whitespace_tokenizer.h:70
void set_content(const std::string &content) override
Sets the content for the tokenizer to parse.
Definition: whitespace_tokenizer.cpp:26
std::string content_
Buffered string content for this tokenizer.
Definition: whitespace_tokenizer.h:67
Template class to facilitate polymorphic cloning.
Definition: clonable.h:28
std::string next() override
Definition: whitespace_tokenizer.cpp:32
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
Converts documents into streams of whitespace delimited tokens.
Definition: whitespace_tokenizer.h:35
whitespace_tokenizer()
Creates a whitespace_tokenizer.
Definition: whitespace_tokenizer.cpp:22