Implementation class for the segmenter.
More...
|
enum | segment_t { SENTENCES,
WORDS
} |
| Tag class for the segmentation strategy.
|
|
|
| impl () |
| Constructs a new impl.
|
|
| impl (const impl &other) |
| Copy constructs an impl. More...
|
|
| impl (impl &&)=default |
| Defaulted move constructor.
|
|
void | set_content (const std::string &str) |
| Sets the content of the segmenter. More...
|
|
std::string | substr (int32_t begin, int32_t end) const |
| Obtains a utf-8 encoded string by first extracting the utf-16 encoded substring between the given indices and converting that substring to utf-8. More...
|
|
std::vector< segment > | sentences () const |
| Segments the entire content into sentences. More...
|
|
std::vector< segment > | words () const |
| Segments the entire content into words. More...
|
|
std::vector< segment > | segments (int32_t first, int32_t last, segment_t type) const |
| Generic segmentation method that operates on the substring between the given indices, using the given strategy for segmenting that substring. More...
|
|
|
icu::UnicodeString | u_str_ |
| The internal ICU string.
|
|
std::unique_ptr< icu::BreakIterator > | sentence_iter_ |
| A pointer to a sentence break iterator.
|
|
std::unique_ptr< icu::BreakIterator > | word_iter_ |
| A pointer to a word break iterator.
|
|
Implementation class for the segmenter.
meta::utf::segmenter::impl::impl |
( |
const impl & |
other | ) |
|
|
inline |
Copy constructs an impl.
- Parameters
-
void meta::utf::segmenter::impl::set_content |
( |
const std::string & |
str | ) |
|
|
inline |
Sets the content of the segmenter.
- Parameters
-
std::string meta::utf::segmenter::impl::substr |
( |
int32_t |
begin, |
|
|
int32_t |
end |
|
) |
| const |
|
inline |
Obtains a utf-8 encoded string by first extracting the utf-16 encoded substring between the given indices and converting that substring to utf-8.
- Parameters
-
begin | The beginning index |
end | The ending index |
- Returns
- the substring between begin and end
std::vector<segment> meta::utf::segmenter::impl::sentences |
( |
| ) |
const |
|
inline |
Segments the entire content into sentences.
- Returns
- a vector of segments representing sentences
std::vector<segment> meta::utf::segmenter::impl::words |
( |
| ) |
const |
|
inline |
Segments the entire content into words.
- Returns
- a vector of segments representing words
std::vector<segment> meta::utf::segmenter::impl::segments |
( |
int32_t |
first, |
|
|
int32_t |
last, |
|
|
segment_t |
type |
|
) |
| const |
|
inline |
Generic segmentation method that operates on the substring between the given indices, using the given strategy for segmenting that substring.
- Parameters
-
first | The index of the beginning of the string to work on |
last | The index of the end of the string to work on |
type | The type of segmentation to perform |
- Returns
- a vector of segments (whose meaning depends on
type
)
The documentation for this class was generated from the following file: