ModErn Text Analysis
META Enumerates Textual Applications
Main Page
Related Pages
Namespaces
Classes
Files
File List
File Members
include
analyzers
tokenizers
character_tokenizer.h
Go to the documentation of this file.
1
9
#ifndef META_CHARACTER_TOKENIZER_H_
10
#define META_CHARACTER_TOKENIZER_H_
11
12
#include "
analyzers/token_stream.h
"
13
#include "
util/clonable.h
"
14
15
namespace
meta
16
{
17
namespace
corpus
18
{
19
class
document;
20
}
21
}
22
23
namespace
meta
24
{
25
namespace
analyzers
26
{
27
namespace
tokenizers
28
{
29
34
class
character_tokenizer
35
:
public
util::clonable
<token_stream, character_tokenizer>
36
{
37
public
:
41
character_tokenizer
();
42
47
void
set_content
(
const
std::string& content)
override
;
48
53
std::string
next
()
override
;
54
58
operator
bool()
const override
;
59
61
const
static
std::string
id
;
62
63
private
:
65
std::string
content_
;
66
68
uint64_t
idx_
;
69
};
70
}
71
}
72
}
73
#endif
meta::analyzers::tokenizers::character_tokenizer::set_content
void set_content(const std::string &content) override
Sets the content for the tokenizer.
Definition:
character_tokenizer.cpp:24
meta::analyzers::tokenizers::character_tokenizer
Converts documents into streams of characters.
Definition:
character_tokenizer.h:34
meta::util::multilevel_clonable
Template class to facilitate polymorphic cloning.
Definition:
clonable.h:28
clonable.h
meta
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition:
analyzer.h:24
meta::analyzers::tokenizers::character_tokenizer::idx_
uint64_t idx_
Character index into the current buffer.
Definition:
character_tokenizer.h:68
meta::analyzers::tokenizers::character_tokenizer::next
std::string next() override
Definition:
character_tokenizer.cpp:30
meta::analyzers::tokenizers::character_tokenizer::id
static const std::string id
Identifier for this tokenizer.
Definition:
character_tokenizer.h:61
meta::analyzers::tokenizers::character_tokenizer::content_
std::string content_
The buffered string content for this tokenizer.
Definition:
character_tokenizer.h:65
token_stream.h
meta::analyzers::tokenizers::character_tokenizer::character_tokenizer
character_tokenizer()
Creates a character_tokenizer.
Definition:
character_tokenizer.cpp:19
Generated on Tue Mar 3 2015 23:20:16 for ModErn Text Analysis by
1.8.9.1