Implementation class for the segmenter.
More...
|
| enum | segment_t { SENTENCES,
WORDS
} |
| | Tag class for the segmentation strategy.
|
| |
|
|
| impl () |
| | Constructs a new impl.
|
| |
| | impl (const impl &other) |
| | Copy constructs an impl. More...
|
| |
|
| impl (impl &&)=default |
| | Defaulted move constructor.
|
| |
| void | set_content (const std::string &str) |
| | Sets the content of the segmenter. More...
|
| |
| std::string | substr (int32_t begin, int32_t end) const |
| | Obtains a utf-8 encoded string by first extracting the utf-16 encoded substring between the given indices and converting that substring to utf-8. More...
|
| |
| std::vector< segment > | sentences () const |
| | Segments the entire content into sentences. More...
|
| |
| std::vector< segment > | words () const |
| | Segments the entire content into words. More...
|
| |
| std::vector< segment > | segments (int32_t first, int32_t last, segment_t type) const |
| | Generic segmentation method that operates on the substring between the given indices, using the given strategy for segmenting that substring. More...
|
| |
|
|
icu::UnicodeString | u_str_ |
| | The internal ICU string.
|
| |
|
std::unique_ptr< icu::BreakIterator > | sentence_iter_ |
| | A pointer to a sentence break iterator.
|
| |
|
std::unique_ptr< icu::BreakIterator > | word_iter_ |
| | A pointer to a word break iterator.
|
| |
Implementation class for the segmenter.
| meta::utf::segmenter::impl::impl |
( |
const impl & |
other | ) |
|
|
inline |
Copy constructs an impl.
- Parameters
-
| void meta::utf::segmenter::impl::set_content |
( |
const std::string & |
str | ) |
|
|
inline |
Sets the content of the segmenter.
- Parameters
-
| std::string meta::utf::segmenter::impl::substr |
( |
int32_t |
begin, |
|
|
int32_t |
end |
|
) |
| const |
|
inline |
Obtains a utf-8 encoded string by first extracting the utf-16 encoded substring between the given indices and converting that substring to utf-8.
- Parameters
-
| begin | The beginning index |
| end | The ending index |
- Returns
- the substring between begin and end
| std::vector<segment> meta::utf::segmenter::impl::sentences |
( |
| ) |
const |
|
inline |
Segments the entire content into sentences.
- Returns
- a vector of segments representing sentences
| std::vector<segment> meta::utf::segmenter::impl::words |
( |
| ) |
const |
|
inline |
Segments the entire content into words.
- Returns
- a vector of segments representing words
| std::vector<segment> meta::utf::segmenter::impl::segments |
( |
int32_t |
first, |
|
|
int32_t |
last, |
|
|
segment_t |
type |
|
) |
| const |
|
inline |
Generic segmentation method that operates on the substring between the given indices, using the given strategy for segmenting that substring.
- Parameters
-
| first | The index of the beginning of the string to work on |
| last | The index of the end of the string to work on |
| type | The type of segmentation to perform |
- Returns
- a vector of segments (whose meaning depends on
type)
The documentation for this class was generated from the following file: