ModErn Text Analysis
META Enumerates Textual Applications
Main Page
Related Pages
Namespaces
Classes
Files
File List
File Members
include
sequence
analyzers
ngram_pos_analyzer.h
Go to the documentation of this file.
1
9
#ifndef META_NGRAM_POS_ANALYZER_H_
10
#define META_NGRAM_POS_ANALYZER_H_
11
12
#include <string>
13
#include "
analyzers/analyzer_factory.h
"
14
#include "
sequence/sequence_analyzer.h
"
15
#include "
analyzers/ngram/ngram_analyzer.h
"
16
#include "
sequence/crf/crf.h
"
17
#include "
util/clonable.h
"
18
19
namespace
meta
20
{
21
namespace
analyzers
22
{
23
31
class
ngram_pos_analyzer
32
:
public
util::multilevel_clonable
<analyzer, ngram_analyzer,
33
ngram_pos_analyzer>
34
{
35
using
base
=
util::multilevel_clonable
<
analyzer
,
ngram_analyzer
,
36
ngram_pos_analyzer
>;
37
38
public
:
45
ngram_pos_analyzer
(uint16_t n, std::unique_ptr<token_stream> stream,
46
const
std::string& crf_prefix);
47
52
ngram_pos_analyzer
(
const
ngram_pos_analyzer& other);
53
58
virtual
void
tokenize
(
corpus::document
& doc)
override
;
59
61
const
static
std::string
id
;
62
63
private
:
65
std::unique_ptr<token_stream>
stream_
;
66
68
std::shared_ptr<sequence::crf>
crf_
;
69
71
const
sequence::sequence_analyzer
seq_analyzer_
;
72
};
73
77
template
<>
78
std::unique_ptr<analyzer>
79
make_analyzer<ngram_pos_analyzer>
(
const
cpptoml::table&,
80
const
cpptoml::table&);
81
}
82
83
namespace
sequence
84
{
88
void
register_analyzers
();
89
}
90
}
91
92
#endif
meta::analyzers::ngram_pos_analyzer::tokenize
virtual void tokenize(corpus::document &doc) override
Tokenizes a file into a document.
meta::analyzers::ngram_pos_analyzer::id
static const std::string id
Identifier for this analyzer.
Definition:
ngram_pos_analyzer.h:61
meta::sequence::register_analyzers
void register_analyzers()
Registers analyzers provided by the meta-sequence-analyzers library.
Definition:
ngram_pos_analyzer.cpp:112
meta::analyzers::make_analyzer< ngram_pos_analyzer >
std::unique_ptr< analyzer > make_analyzer< ngram_pos_analyzer >(const cpptoml::table &, const cpptoml::table &)
Specialization of the factory method for creating ngram_pos_analyzers.
meta::analyzers::ngram_pos_analyzer
Analyzes documents based on part-of-speech tags instead of words.
Definition:
ngram_pos_analyzer.h:31
meta::analyzers::ngram_pos_analyzer::crf_
std::shared_ptr< sequence::crf > crf_
The CRF used to tag the sentences.
Definition:
ngram_pos_analyzer.h:68
ngram_analyzer.h
meta::analyzers::ngram_pos_analyzer::ngram_pos_analyzer
ngram_pos_analyzer(uint16_t n, std::unique_ptr< token_stream > stream, const std::string &crf_prefix)
Constructor.
Definition:
ngram_pos_analyzer.cpp:20
analyzer_factory.h
crf.h
meta::analyzers::ngram_pos_analyzer::stream_
std::unique_ptr< token_stream > stream_
The token stream to be used for extracting tokens.
Definition:
ngram_pos_analyzer.h:65
meta::util::multilevel_clonable
Template class to facilitate polymorphic cloning.
Definition:
clonable.h:28
clonable.h
meta::corpus::document
Represents an indexable document.
Definition:
document.h:31
meta
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition:
analyzer.h:24
sequence_analyzer.h
meta::analyzers::analyzer
An class that provides a framework to produce token counts from documents.
Definition:
analyzer.h:41
meta::sequence::sequence_analyzer
Analyzer that operates over sequences, generating features based on a set of "observation functions"...
Definition:
sequence_analyzer.h:49
meta::analyzers::ngram_pos_analyzer::seq_analyzer_
const sequence::sequence_analyzer seq_analyzer_
Generates features for the CRF; const indicates testing mode.
Definition:
ngram_pos_analyzer.h:71
meta::analyzers::ngram_analyzer
Analyzes documents based on an ngram word model, where the value for n is supplied by the user...
Definition:
ngram_analyzer.h:27
Generated on Tue Mar 3 2015 23:20:16 for ModErn Text Analysis by
1.8.9.1