ModErn Text Analysis
META Enumerates Textual Applications
Classes | Public Member Functions | Static Public Member Functions | List of all members
meta::analyzers::analyzer Class Referenceabstract

An class that provides a framework to produce token counts from documents. More...

#include <analyzer.h>

Inheritance diagram for meta::analyzers::analyzer:
meta::analyzers::ngram_analyzer

Classes

class  analyzer_exception
 Basic exception for analyzer interactions. More...
 

Public Member Functions

virtual ~analyzer ()=default
 A default virtual destructor.
 
virtual void tokenize (corpus::document &doc)=0
 Tokenizes a document. More...
 
virtual std::unique_ptr< analyzerclone () const =0
 Clones this analyzer.
 

Static Public Member Functions

static std::unique_ptr< analyzerload (const cpptoml::table &config)
 
static std::unique_ptr< token_streamdefault_filter_chain (const cpptoml::table &config)
 
static std::unique_ptr< token_streamload_filters (const cpptoml::table &global, const cpptoml::table &config)
 
static std::unique_ptr< token_streamload_filter (std::unique_ptr< token_stream > src, const cpptoml::table &config)
 
static io::parser create_parser (const corpus::document &doc, const std::string &extension, const std::string &delims)
 
static std::string get_content (const corpus::document &doc)
 

Detailed Description

An class that provides a framework to produce token counts from documents.

All analyzers inherit from this class and (possibly) implement tokenize().

Member Function Documentation

virtual void meta::analyzers::analyzer::tokenize ( corpus::document doc)
pure virtual

Tokenizes a document.

Parameters
docThe document to store the tokenized information in
std::unique_ptr< analyzer > meta::analyzers::analyzer::load ( const cpptoml::table &  config)
static
Parameters
configThe config group used to create the analyzer from
Returns
an analyzer as specified by a config object
std::unique_ptr< token_stream > meta::analyzers::analyzer::default_filter_chain ( const cpptoml::table &  config)
static
Parameters
configThe config group used to create the analyzer from
Returns
the default filter chain for this version of MeTA, based on a config object
std::unique_ptr< token_stream > meta::analyzers::analyzer::load_filters ( const cpptoml::table &  global,
const cpptoml::table &  config 
)
static
Parameters
globalThe original config object with all parameters
configThe config group used to create the filters from
Returns
a filter chain as specified by a config object
std::unique_ptr< token_stream > meta::analyzers::analyzer::load_filter ( std::unique_ptr< token_stream src,
const cpptoml::table &  config 
)
static
Parameters
srcThe token stream that will feed into this filter
configThe config group used to create the filter from
Returns
a single filter specified by a config object
io::parser meta::analyzers::analyzer::create_parser ( const corpus::document doc,
const std::string &  extension,
const std::string &  delims 
)
static
Parameters
docThe document to parse
extensionThe possible file extension for this document if it is represented by a file on disk
delimsPossible character delimiters to use when parsing the file
Returns
a parser suited to read data that this document represents
std::string meta::analyzers::analyzer::get_content ( const corpus::document doc)
static
Parameters
docThe document to get content for
Returns
the contents of the document, as a string

The documentation for this class was generated from the following files: