ModErn Text Analysis
META Enumerates Textual Applications
|
Converts documents into streams of characters. More...
#include <character_tokenizer.h>
Public Member Functions | |
character_tokenizer () | |
Creates a character_tokenizer. | |
void | set_content (const std::string &content) override |
Sets the content for the tokenizer. More... | |
std::string | next () override |
operator bool () const override | |
Determines if there are more tokens in the document. | |
Public Member Functions inherited from meta::util::multilevel_clonable< Root, Base, Derived > | |
virtual std::unique_ptr< Root > | clone () const |
Clones the given object. More... | |
Static Public Attributes | |
static const std::string | id = "character-tokenizer" |
Identifier for this tokenizer. | |
Private Attributes | |
std::string | content_ |
The buffered string content for this tokenizer. | |
uint64_t | idx_ |
Character index into the current buffer. | |
Converts documents into streams of characters.
This is the simplest tokenizer.
|
override |
Sets the content for the tokenizer.
content | The string content to set |
|
override |