ModErn Text Analysis
META Enumerates Textual Applications
Public Member Functions | Protected Member Functions | Private Attributes | List of all members
meta::analyzers::ngram_analyzer Class Reference

Analyzes documents based on an ngram word model, where the value for n is supplied by the user. More...

#include <ngram_analyzer.h>

Inheritance diagram for meta::analyzers::ngram_analyzer:
meta::analyzers::analyzer

Public Member Functions

 ngram_analyzer (uint16_t n)
 Constructor. More...
 
virtual uint16_t n_value () const
 
- Public Member Functions inherited from meta::analyzers::analyzer
virtual ~analyzer ()=default
 A default virtual destructor.
 
virtual void tokenize (corpus::document &doc)=0
 Tokenizes a document. More...
 
virtual std::unique_ptr< analyzerclone () const =0
 Clones this analyzer.
 

Protected Member Functions

virtual std::string wordify (const std::deque< std::string > &words) const
 Turns a list of words into an ngram string. More...
 

Private Attributes

uint16_t n_val_
 The value of n for this ngram analyzer.
 

Additional Inherited Members

- Static Public Member Functions inherited from meta::analyzers::analyzer
static std::unique_ptr< analyzerload (const cpptoml::table &config)
 
static std::unique_ptr< token_streamdefault_filter_chain (const cpptoml::table &config)
 
static std::unique_ptr< token_streamload_filters (const cpptoml::table &global, const cpptoml::table &config)
 
static std::unique_ptr< token_streamload_filter (std::unique_ptr< token_stream > src, const cpptoml::table &config)
 
static io::parser create_parser (const corpus::document &doc, const std::string &extension, const std::string &delims)
 
static std::string get_content (const corpus::document &doc)
 

Detailed Description

Analyzes documents based on an ngram word model, where the value for n is supplied by the user.

This class is abstract, as it only provides the framework for ngram tokenization.

Constructor & Destructor Documentation

ngram_analyzer::ngram_analyzer ( uint16_t  n)

Constructor.

Parameters
nThe value of n in ngram.

Member Function Documentation

uint16_t ngram_analyzer::n_value ( ) const
virtual
Returns
the value of n used for the ngrams
std::string ngram_analyzer::wordify ( const std::deque< std::string > &  words) const
protectedvirtual

Turns a list of words into an ngram string.

Parameters
wordsThe deque representing a list of words
Returns
the ngrams in string format

The documentation for this class was generated from the following files: