ModErn Text Analysis
META Enumerates Textual Applications
segmenter.h
Go to the documentation of this file.
1 
10 #ifndef META_UTF_SEGMENTER_H_
11 #define META_UTF_SEGMENTER_H_
12 
13 #include <string>
14 #include <vector>
15 #include "util/pimpl.h"
16 
17 namespace meta
18 {
19 namespace utf
20 {
21 
26 class segmenter
27 {
28  public:
33  class segment
34  {
35  public:
42  segment(int32_t begin, int32_t end);
43 
44  private:
45  friend segmenter;
46  // using int32_t here because of ICU, which accepts only int32_t as
47  // its indexes
49  int32_t begin_;
51  int32_t end_;
52  };
53 
59  segmenter();
60 
64  segmenter(const segmenter&);
65 
69  ~segmenter();
70 
76  void set_content(const std::string& str);
77 
84  std::vector<segment> sentences() const;
85 
92  std::vector<segment> words() const;
93 
102  std::vector<segment> words(const segment& seg) const;
103 
109  std::string content(const segment& seg) const;
110 
111  private:
112  class impl;
115 };
116 }
117 }
118 #endif
int32_t end_
The ending index of this segment.
Definition: segmenter.h:51
~segmenter()
Destructor for segmenter.
segment(int32_t begin, int32_t end)
Creates a segment.
Definition: segmenter.cpp:203
std::vector< segment > sentences() const
Segments the current content into sentences by following the unicode segmentation standard...
Definition: segmenter.cpp:183
void set_content(const std::string &str)
Resets the content of the segmenter to the given string.
Definition: segmenter.cpp:178
int32_t begin_
The beginning index of this segment.
Definition: segmenter.h:49
Class to assist in simple pointer-to-implementation classes.
Definition: pimpl.h:26
std::vector< segment > words() const
Segments the current content into words by following the unicode segmentation standard.
Definition: segmenter.cpp:188
Represents a segment within a unicode string.
Definition: segmenter.h:33
The ModErn Text Analysis toolkit is a suite of natural language processing, classification, information retreival, data mining, and other applications of text processing.
Definition: analyzer.h:24
util::pimpl< impl > impl_
A pointer to the implementation class for the segmenter.
Definition: segmenter.h:112
Implementation class for the segmenter.
Definition: segmenter.cpp:20
Class that encapsulates segmenting unicode strings.
Definition: segmenter.h:26
std::string content(const segment &seg) const
Definition: segmenter.cpp:198
segmenter()
Constructs a segmenter.
Definition: segmenter.cpp:165