|
ModErn Text Analysis
META Enumerates Textual Applications
|
Class that encapsulates segmenting unicode strings. More...
#include <segmenter.h>
Classes | |
| class | impl |
| Implementation class for the segmenter. More... | |
| class | segment |
| Represents a segment within a unicode string. More... | |
Public Member Functions | |
| segmenter () | |
| Constructs a segmenter. More... | |
| segmenter (const segmenter &) | |
| Copy constructs a segmenter. | |
| ~segmenter () | |
| Destructor for segmenter. | |
| void | set_content (const std::string &str) |
| Resets the content of the segmenter to the given string. More... | |
| std::vector< segment > | sentences () const |
| Segments the current content into sentences by following the unicode segmentation standard. More... | |
| std::vector< segment > | words () const |
| Segments the current content into words by following the unicode segmentation standard. More... | |
| std::vector< segment > | words (const segment &seg) const |
| Segments a given segment into words by following the unicode segmentation standard. More... | |
| std::string | content (const segment &seg) const |
Private Attributes | |
| util::pimpl< impl > | impl_ |
| A pointer to the implementation class for the segmenter. | |
Class that encapsulates segmenting unicode strings.
Supports segmenting sentences as well as words.
| meta::utf::segmenter::segmenter | ( | ) |
Constructs a segmenter.
An instance of segmenter may be used to segment many different unicode strings, and it is encouraged to re-use one if you are segmenting many strings.
| void meta::utf::segmenter::set_content | ( | const std::string & | str | ) |
Resets the content of the segmenter to the given string.
| str | A utf-8 string that should be segmented |
| auto meta::utf::segmenter::sentences | ( | ) | const |
Segments the current content into sentences by following the unicode segmentation standard.
| auto meta::utf::segmenter::words | ( | ) | const |
Segments the current content into words by following the unicode segmentation standard.
| auto meta::utf::segmenter::words | ( | const segment & | seg | ) | const |
Segments a given segment into words by following the unicode segmentation standard.
Typically, this would be used to further segment a sentence segment into its constituent words.
| seg | the segment to sub-segment into words |
| std::string meta::utf::segmenter::content | ( | const segment & | seg | ) | const |
| seg | the segment to get content for |
1.8.9.1