ModErn Text Analysis
META Enumerates Textual Applications
|
Implementation class for the icu_tokenizer. More...
Public Member Functions | |
void | set_content (std::string content) |
std::string | next () |
operator bool () const | |
True if tokens is not empty. | |
Private Attributes | |
utf::segmenter | segmenter_ |
UTF segmenter to use for this tokenizer. | |
std::deque< std::string > | tokens_ |
Buffered tokens. | |
Implementation class for the icu_tokenizer.
|
inline |
content | The string content to set TODO: can we make this be a streaming API instead of buffering all of the tokens? |
|
inline |