ModErn Text Analysis
META Enumerates Textual Applications
Classes | Namespaces
icu_tokenizer.h File Reference
#include "analyzers/token_stream.h"
#include "util/clonable.h"
#include "util/pimpl.h"

Go to the source code of this file.


class  meta::analyzers::tokenizers::icu_tokenizer
 Converts documents into streams of tokens by following the unicode standards for sentence and word segmentation. More...


 The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
 Various ways to convert corpus formats into META-readable documents.
 Contains various ways to segment text and deal with preprocessed files (POS tags, parse trees, etc).
 Contains tokenizers that start off a filter chain.

Detailed Description

Chase Geigle

All files in META are released under the MIT license. For more details, consult the file LICENSE in the root of the project.