ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Static Public Attributes | Private Attributes | List of all members
meta::analyzers::tokenizers::whitespace_tokenizer Class Reference

Converts documents into streams of whitespace delimited tokens. More...

#include <whitespace_tokenizer.h>

Inheritance diagram for meta::analyzers::tokenizers::whitespace_tokenizer:
meta::util::multilevel_clonable< Root, Base, Derived >

Public Member Functions

 whitespace_tokenizer ()
 Creates a whitespace_tokenizer.
 
void set_content (const std::string &content) override
 Sets the content for the tokenizer to parse. More...
 
std::string next () override
 
 operator bool () const override
 Determines if there are more tokens in the document.
 
- Public Member Functions inherited from meta::util::multilevel_clonable< Root, Base, Derived >
virtual std::unique_ptr< Root > clone () const
 Clones the given object. More...
 

Static Public Attributes

static const std::string id = "whitespace-tokenizer"
 Identifier for this tokenizer.
 

Private Attributes

std::string content_
 Buffered string content for this tokenizer.
 
uint64_t idx_
 Character index into the current buffer.
 

Detailed Description

Converts documents into streams of whitespace delimited tokens.

This tokenizer preserves the whitespace, but combines adjacent non-whitespace characters together into individual tokens.

Member Function Documentation

void meta::analyzers::tokenizers::whitespace_tokenizer::set_content ( const std::string &  content)
override

Sets the content for the tokenizer to parse.

Parameters
contentThe string content to set
std::string meta::analyzers::tokenizers::whitespace_tokenizer::next ( )
override
Returns
the next token in the document. This will either be a whitespace character, or a token consisting of a sequence of non-whitespace characters.

The documentation for this class was generated from the following files: