ModErn Text Analysis
META Enumerates Textual Applications
ngram_pos_analyzer.h
Go to the documentation of this file.
1 
9 #ifndef META_NGRAM_POS_ANALYZER_H_
10 #define META_NGRAM_POS_ANALYZER_H_
11 
12 #include <string>
16 #include "sequence/crf/crf.h"
17 #include "util/clonable.h"
18 
19 namespace meta
20 {
21 namespace analyzers
22 {
23 
32  : public util::multilevel_clonable<analyzer, ngram_analyzer,
33  ngram_pos_analyzer>
34 {
37 
38  public:
45  ngram_pos_analyzer(uint16_t n, std::unique_ptr<token_stream> stream,
46  const std::string& crf_prefix);
47 
52  ngram_pos_analyzer(const ngram_pos_analyzer& other);
53 
58  virtual void tokenize(corpus::document& doc) override;
59 
61  const static std::string id;
62 
63  private:
65  std::unique_ptr<token_stream> stream_;
66 
68  std::shared_ptr<sequence::crf> crf_;
69 
72 };
73 
77 template <>
78 std::unique_ptr<analyzer>
79  make_analyzer<ngram_pos_analyzer>(const cpptoml::table&,
80  const cpptoml::table&);
81 }
82 
83 namespace sequence
84 {
88 void register_analyzers();
89 }
90 }
91 
92 #endif
virtual void tokenize(corpus::document &doc) override
Tokenizes a file into a document.
static const std::string id
Identifier for this analyzer.
Definition: ngram_pos_analyzer.h:61
void register_analyzers()
Registers analyzers provided by the meta-sequence-analyzers library.
Definition: ngram_pos_analyzer.cpp:112
std::unique_ptr< analyzer > make_analyzer< ngram_pos_analyzer >(const cpptoml::table &, const cpptoml::table &)
Specialization of the factory method for creating ngram_pos_analyzers.
Analyzes documents based on part-of-speech tags instead of words.
Definition: ngram_pos_analyzer.h:31
std::shared_ptr< sequence::crf > crf_
The CRF used to tag the sentences.
Definition: ngram_pos_analyzer.h:68
ngram_pos_analyzer(uint16_t n, std::unique_ptr< token_stream > stream, const std::string &crf_prefix)
Constructor.
Definition: ngram_pos_analyzer.cpp:20
std::unique_ptr< token_stream > stream_
The token stream to be used for extracting tokens.
Definition: ngram_pos_analyzer.h:65
Template class to facilitate polymorphic cloning.
Definition: clonable.h:28
Represents an indexable document.
Definition: document.h:31
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
An class that provides a framework to produce token counts from documents.
Definition: analyzer.h:41
Analyzer that operates over sequences, generating features based on a set of "observation functions"...
Definition: sequence_analyzer.h:49
const sequence::sequence_analyzer seq_analyzer_
Generates features for the CRF; const indicates testing mode.
Definition: ngram_pos_analyzer.h:71
Analyzes documents based on an ngram word model, where the value for n is supplied by the user...
Definition: ngram_analyzer.h:27