ModErn Text Analysis
META Enumerates Textual Applications
Main Page
Related Pages
Namespaces
Classes
Files
File List
File Members
include
utf
segmenter.h
Go to the documentation of this file.
1
10
#ifndef META_UTF_SEGMENTER_H_
11
#define META_UTF_SEGMENTER_H_
12
13
#include <string>
14
#include <vector>
15
#include "
util/pimpl.h
"
16
17
namespace
meta
18
{
19
namespace
utf
20
{
21
26
class
segmenter
27
{
28
public
:
33
class
segment
34
{
35
public
:
42
segment
(int32_t begin, int32_t end);
43
44
private
:
45
friend
segmenter
;
46
// using int32_t here because of ICU, which accepts only int32_t as
47
// its indexes
49
int32_t
begin_
;
51
int32_t
end_
;
52
};
53
59
segmenter
();
60
64
segmenter
(
const
segmenter
&);
65
69
~segmenter
();
70
76
void
set_content
(
const
std::string& str);
77
84
std::vector<segment>
sentences
()
const
;
85
92
std::vector<segment>
words
()
const
;
93
102
std::vector<segment>
words
(
const
segment
& seg)
const
;
103
109
std::string
content
(
const
segment
& seg)
const
;
110
111
private
:
112
class
impl
;
114
util::pimpl<impl>
impl_
;
115
};
116
}
117
}
118
#endif
meta::utf::segmenter::segment::end_
int32_t end_
The ending index of this segment.
Definition:
segmenter.h:51
meta::utf::segmenter::~segmenter
~segmenter()
Destructor for segmenter.
meta::utf::segmenter::segment::segment
segment(int32_t begin, int32_t end)
Creates a segment.
Definition:
segmenter.cpp:203
meta::utf::segmenter::sentences
std::vector< segment > sentences() const
Segments the current content into sentences by following the unicode segmentation standard...
Definition:
segmenter.cpp:183
meta::utf::segmenter::set_content
void set_content(const std::string &str)
Resets the content of the segmenter to the given string.
Definition:
segmenter.cpp:178
meta::utf::segmenter::segment::begin_
int32_t begin_
The beginning index of this segment.
Definition:
segmenter.h:49
meta::util::pimpl
Class to assist in simple pointer-to-implementation classes.
Definition:
pimpl.h:26
meta::utf::segmenter::words
std::vector< segment > words() const
Segments the current content into words by following the unicode segmentation standard.
Definition:
segmenter.cpp:188
meta::utf::segmenter::segment
Represents a segment within a unicode string.
Definition:
segmenter.h:33
meta
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition:
analyzer.h:24
meta::utf::segmenter::impl_
util::pimpl< impl > impl_
A pointer to the implementation class for the segmenter.
Definition:
segmenter.h:112
meta::utf::segmenter::impl
Implementation class for the segmenter.
Definition:
segmenter.cpp:20
meta::utf::segmenter
Class that encapsulates segmenting unicode strings.
Definition:
segmenter.h:26
meta::utf::segmenter::content
std::string content(const segment &seg) const
Definition:
segmenter.cpp:198
pimpl.h
meta::utf::segmenter::segmenter
segmenter()
Constructs a segmenter.
Definition:
segmenter.cpp:165
Generated on Tue Mar 3 2015 23:20:16 for ModErn Text Analysis by
1.8.9.1