An class that provides a framework to produce token counts from documents.
More...
#include <analyzer.h>
An class that provides a framework to produce token counts from documents.
All analyzers inherit from this class and (possibly) implement tokenize().
Tokenizes a document.
- Parameters
-
doc | The document to store the tokenized information in |
std::unique_ptr< analyzer > meta::analyzers::analyzer::load |
( |
const cpptoml::table & |
config | ) |
|
|
static |
- Parameters
-
config | The config group used to create the analyzer from |
- Returns
- an analyzer as specified by a config object
std::unique_ptr< token_stream > meta::analyzers::analyzer::default_filter_chain |
( |
const cpptoml::table & |
config | ) |
|
|
static |
- Parameters
-
config | The config group used to create the analyzer from |
- Returns
- the default filter chain for this version of MeTA, based on a config object
std::unique_ptr< token_stream > meta::analyzers::analyzer::load_filters |
( |
const cpptoml::table & |
global, |
|
|
const cpptoml::table & |
config |
|
) |
| |
|
static |
- Parameters
-
global | The original config object with all parameters |
config | The config group used to create the filters from |
- Returns
- a filter chain as specified by a config object
std::unique_ptr< token_stream > meta::analyzers::analyzer::load_filter |
( |
std::unique_ptr< token_stream > |
src, |
|
|
const cpptoml::table & |
config |
|
) |
| |
|
static |
- Parameters
-
src | The token stream that will feed into this filter |
config | The config group used to create the filter from |
- Returns
- a single filter specified by a config object
io::parser meta::analyzers::analyzer::create_parser |
( |
const corpus::document & |
doc, |
|
|
const std::string & |
extension, |
|
|
const std::string & |
delims |
|
) |
| |
|
static |
- Parameters
-
doc | The document to parse |
extension | The possible file extension for this document if it is represented by a file on disk |
delims | Possible character delimiters to use when parsing the file |
- Returns
- a parser suited to read data that this document represents
std::string meta::analyzers::analyzer::get_content |
( |
const corpus::document & |
doc | ) |
|
|
static |
- Parameters
-
doc | The document to get content for |
- Returns
- the contents of the document, as a string
The documentation for this class was generated from the following files: