Contains tokenizers that start off a filter chain. More...

Classes
class	character_tokenizer
	Converts documents into streams of characters. More...

class	icu_tokenizer
	Converts documents into streams of tokens by following the unicode standards for sentence and word segmentation. More...

class	whitespace_tokenizer
	Converts documents into streams of whitespace delimited tokens. More...

Functions
template<class Tokenizer >
std::unique_ptr< token_stream >	make_tokenizer (const cpptoml::table &)
	Factory method for creating a tokenizer. More...

Detailed Description

Contains tokenizers that start off a filter chain.

Function Documentation

template<class Tokenizer >

std::unique_ptr<token_stream> meta::analyzers::tokenizers::make_tokenizer ( const cpptoml::table & )

Factory method for creating a tokenizer.

This should be specialized if your given tokenizer requires special construction behavior.