ModErn Text Analysis
META Enumerates Textual Applications
icu_tokenizer.h
Go to the documentation of this file.
1 
9 #ifndef META_ICU_TOKENIZER_H_
10 #define META_ICU_TOKENIZER_H_
11 
12 #include "analyzers/token_stream.h"
13 #include "util/clonable.h"
14 #include "util/pimpl.h"
15 
16 namespace meta
17 {
18 namespace corpus
19 {
20 class document;
21 }
22 }
23 
24 namespace meta
25 {
26 namespace analyzers
27 {
28 namespace tokenizers
29 {
30 
35 class icu_tokenizer : public util::clonable<token_stream, icu_tokenizer>
36 {
37  public:
41  icu_tokenizer();
42 
47  icu_tokenizer(const icu_tokenizer& other);
48 
54 
59 
67  void set_content(const std::string& content) override;
68 
75  std::string next() override;
76 
80  operator bool() const override;
81 
83  const static std::string id;
84 
85  private:
87  class impl;
88 
91 };
92 }
93 }
94 }
95 #endif
util::pimpl< impl > impl_
The implementation for this tokenizer.
Definition: icu_tokenizer.h:87
void set_content(const std::string &content) override
Sets the content for the tokenizer to parse.
Definition: icu_tokenizer.cpp:104
icu_tokenizer()
Creates an icu_tokenizer.
static const std::string id
Identifier for this tokenizer.
Definition: icu_tokenizer.h:83
Class to assist in simple pointer-to-implementation classes.
Definition: pimpl.h:26
~icu_tokenizer()
Destroys an icu_tokenizer.
Template class to facilitate polymorphic cloning.
Definition: clonable.h:28
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
std::string next() override
Definition: icu_tokenizer.cpp:109
Implementation class for the icu_tokenizer.
Definition: icu_tokenizer.cpp:28
Converts documents into streams of tokens by following the unicode standards for sentence and word se...
Definition: icu_tokenizer.h:35