|
ModErn Text Analysis
META Enumerates Textual Applications
|
Converts documents into streams of characters. More...
#include <character_tokenizer.h>
Public Member Functions | |
| character_tokenizer () | |
| Creates a character_tokenizer. | |
| void | set_content (const std::string &content) override |
| Sets the content for the tokenizer. More... | |
| std::string | next () override |
| operator bool () const override | |
| Determines if there are more tokens in the document. | |
Public Member Functions inherited from meta::util::multilevel_clonable< Root, Base, Derived > | |
| virtual std::unique_ptr< Root > | clone () const |
| Clones the given object. More... | |
Static Public Attributes | |
| static const std::string | id = "character-tokenizer" |
| Identifier for this tokenizer. | |
Private Attributes | |
| std::string | content_ |
| The buffered string content for this tokenizer. | |
| uint64_t | idx_ |
| Character index into the current buffer. | |
Converts documents into streams of characters.
This is the simplest tokenizer.
|
override |
Sets the content for the tokenizer.
| content | The string content to set |
|
override |
1.8.9.1