ModErn Text Analysis
META Enumerates Textual Applications
|
Class that encapsulates segmenting unicode strings. More...
#include <segmenter.h>
Classes | |
class | impl |
Implementation class for the segmenter. More... | |
class | segment |
Represents a segment within a unicode string. More... | |
Public Member Functions | |
segmenter () | |
Constructs a segmenter. More... | |
segmenter (const segmenter &) | |
Copy constructs a segmenter. | |
~segmenter () | |
Destructor for segmenter. | |
void | set_content (const std::string &str) |
Resets the content of the segmenter to the given string. More... | |
std::vector< segment > | sentences () const |
Segments the current content into sentences by following the unicode segmentation standard. More... | |
std::vector< segment > | words () const |
Segments the current content into words by following the unicode segmentation standard. More... | |
std::vector< segment > | words (const segment &seg) const |
Segments a given segment into words by following the unicode segmentation standard. More... | |
std::string | content (const segment &seg) const |
Private Attributes | |
util::pimpl< impl > | impl_ |
A pointer to the implementation class for the segmenter. | |
Class that encapsulates segmenting unicode strings.
Supports segmenting sentences as well as words.
meta::utf::segmenter::segmenter | ( | ) |
Constructs a segmenter.
An instance of segmenter may be used to segment many different unicode strings, and it is encouraged to re-use one if you are segmenting many strings.
void meta::utf::segmenter::set_content | ( | const std::string & | str | ) |
Resets the content of the segmenter to the given string.
str | A utf-8 string that should be segmented |
auto meta::utf::segmenter::sentences | ( | ) | const |
Segments the current content into sentences by following the unicode segmentation standard.
auto meta::utf::segmenter::words | ( | ) | const |
Segments the current content into words by following the unicode segmentation standard.
auto meta::utf::segmenter::words | ( | const segment & | seg | ) | const |
Segments a given segment into words by following the unicode segmentation standard.
Typically, this would be used to further segment a sentence segment into its constituent words.
seg | the segment to sub-segment into words |
std::string meta::utf::segmenter::content | ( | const segment & | seg | ) | const |
seg | the segment to get content for |