Analyzes documents based on part-of-speech tags instead of words. More...

#include <ngram_pos_analyzer.h>

Inheritance diagram for meta::analyzers::ngram_pos_analyzer:

Public Member Functions
	ngram_pos_analyzer (uint16_t n, std::unique_ptr< token_stream > stream, const std::string &crf_prefix)
	Constructor. More...

	ngram_pos_analyzer (const ngram_pos_analyzer &other)
	Copy constructor. More...

virtual void	tokenize (corpus::document &doc) override
	Tokenizes a file into a document. More...

Public Member Functions inherited from meta::util::multilevel_clonable< analyzer, ngram_analyzer, ngram_pos_analyzer >
virtual std::unique_ptr< analyzer >	clone () const
	Clones the given object. More...

Static Public Attributes
static const std::string	id = "ngram-pos"
	Identifier for this analyzer.

Private Types
using	base = util::multilevel_clonable< analyzer, ngram_analyzer, ngram_pos_analyzer >

Private Attributes
std::unique_ptr< token_stream >	stream_
	The token stream to be used for extracting tokens.

std::shared_ptr< sequence::crf >	crf_
	The CRF used to tag the sentences.

const sequence::sequence_analyzer	seq_analyzer_
	Generates features for the CRF; const indicates testing mode.

Detailed Description

Analyzes documents based on part-of-speech tags instead of words.

The recommended tokenizer for use with this analyzer is icu-tokenizer with no other filters added. This tokenizer should be used to ensure that capital letters and such may be used as features. Function words and stop words should not be removed and words should not be stemmed for the same reason.

Constructor & Destructor Documentation

meta::analyzers::ngram_pos_analyzer::ngram_pos_analyzer	(	uint16_t	n,
		std::unique_ptr< token_stream >	stream,
		const std::string &	crf_prefix
	)

Constructor.

Parameters

n	The value of n to use for the ngrams.
stream	The stream to read tokens from.
crf_prefix

meta::analyzers::ngram_pos_analyzer::ngram_pos_analyzer ( const ngram_pos_analyzer & other )

Copy constructor.

Parameters

other The other ngram_pos_analyzer to copy from

Member Function Documentation

virtual void meta::analyzers::ngram_pos_analyzer::tokenize ( corpus::document & doc )

overridevirtual

Tokenizes a file into a document.

Parameters

doc	The document to store the tokenized information in

The documentation for this class was generated from the following files:

/home/chase/projects/meta/include/sequence/analyzers/ngram_pos_analyzer.h
/home/chase/projects/meta/src/sequence/analyzers/ngram_pos_analyzer.cpp

Public Member Functions

Static Public Attributes

Private Types

Private Attributes

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation