ModErn Text Analysis
META Enumerates Textual Applications
Main Page
Related Pages
Namespaces
Classes
Files
File List
File Members
include
analyzers
tokenizers
whitespace_tokenizer.h
Go to the documentation of this file.
1
9
#ifndef META_WHITESPACE_TOKENIZER_H_
10
#define META_WHITESPACE_TOKENIZER_H_
11
12
#include "
analyzers/token_stream.h
"
13
#include "
util/clonable.h
"
14
15
namespace
meta
16
{
17
namespace
corpus
18
{
19
class
document;
20
}
21
}
22
23
namespace
meta
24
{
25
namespace
analyzers
26
{
27
namespace
tokenizers
28
{
29
35
class
whitespace_tokenizer
:
public
util::clonable
<token_stream,
36
whitespace_tokenizer>
37
{
38
public
:
42
whitespace_tokenizer
();
43
48
void
set_content
(
const
std::string& content)
override
;
49
55
std::string
next
()
override
;
56
60
operator
bool()
const override
;
61
63
const
static
std::string
id
;
64
65
private
:
67
std::string
content_
;
68
70
uint64_t
idx_
;
71
};
72
}
73
}
74
}
75
#endif
meta::analyzers::tokenizers::whitespace_tokenizer::id
static const std::string id
Identifier for this tokenizer.
Definition:
whitespace_tokenizer.h:63
meta::analyzers::tokenizers::whitespace_tokenizer::idx_
uint64_t idx_
Character index into the current buffer.
Definition:
whitespace_tokenizer.h:70
meta::analyzers::tokenizers::whitespace_tokenizer::set_content
void set_content(const std::string &content) override
Sets the content for the tokenizer to parse.
Definition:
whitespace_tokenizer.cpp:26
meta::analyzers::tokenizers::whitespace_tokenizer::content_
std::string content_
Buffered string content for this tokenizer.
Definition:
whitespace_tokenizer.h:67
meta::util::multilevel_clonable
Template class to facilitate polymorphic cloning.
Definition:
clonable.h:28
meta::analyzers::tokenizers::whitespace_tokenizer::next
std::string next() override
Definition:
whitespace_tokenizer.cpp:32
clonable.h
meta
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition:
analyzer.h:24
meta::analyzers::tokenizers::whitespace_tokenizer
Converts documents into streams of whitespace delimited tokens.
Definition:
whitespace_tokenizer.h:35
meta::analyzers::tokenizers::whitespace_tokenizer::whitespace_tokenizer
whitespace_tokenizer()
Creates a whitespace_tokenizer.
Definition:
whitespace_tokenizer.cpp:22
token_stream.h
Generated on Tue Mar 3 2015 23:20:16 for ModErn Text Analysis by
1.8.9.1