|
ModErn Text Analysis
META Enumerates Textual Applications
|
Functions for converting to and from various character sets. More...
Classes | |
| class | icu_handle |
| Internal class that ensures that ICU cleans up all of its "still-reachable" memory before program termination. More... | |
| class | segmenter |
| Class that encapsulates segmenting unicode strings. More... | |
| class | transformer |
| Class that encapsulates transliteration of unicode strings. More... | |
Functions | |
| std::string | to_utf8 (const std::string &str, const std::string &charset) |
| Converts a string from the given charset to utf8. More... | |
| std::u16string | to_utf16 (const std::string &str, const std::string &charset) |
| Converts a string fro the given charset to utf16. More... | |
| std::string | to_utf8 (const std::u16string &str) |
| Converts a string from utf16 to utf8. More... | |
| std::u16string | to_utf16 (const std::string &str) |
| Converts a string from utf8 to utf16. More... | |
| std::string | tolower (const std::string &str) |
| Lowercases a utf8 string. More... | |
| std::string | toupper (const std::string &str) |
| Uppercases a utf8 string. More... | |
| std::string | foldcase (const std::string &str) |
| Folds the case of a utf8 string. More... | |
| std::string | transform (const std::string &str, const std::string &id) |
| Transliterates a utf8 string, using the rules defined in ICU. More... | |
| std::string | remove_if (const std::string &str, std::function< bool(uint32_t)> pred) |
| Removes UTF-32 codepoints that match the given function. More... | |
| uint64_t | length (const std::string &str) |
| bool | isalpha (uint32_t codepoint) |
| bool | isblank (uint32_t codepoint) |
| std::u16string | icu_to_u16str (const icu::UnicodeString &icu_str) |
| Helper method that converts an ICU string to a std::u16string. More... | |
| std::string | icu_to_u8str (const icu::UnicodeString &icu_str) |
| Helper method that converts an ICU string to a std::string in utf8. More... | |
| void | utf8_append_codepoint (std::string &dest, uint32_t codepoint) |
| Helper method that appends a UTF-32 codepoint to the given utf8 string. More... | |
Functions for converting to and from various character sets.
| std::string meta::utf::to_utf8 | ( | const std::string & | str, |
| const std::string & | charset | ||
| ) |
Converts a string from the given charset to utf8.
| str | The string to convert |
| charset | The charset of the given string |
| std::u16string meta::utf::to_utf16 | ( | const std::string & | str, |
| const std::string & | charset | ||
| ) |
Converts a string fro the given charset to utf16.
| str | The string to convert |
| charset | The charset of the given string |
| std::string meta::utf::to_utf8 | ( | const std::u16string & | str | ) |
Converts a string from utf16 to utf8.
| str | The string to convert |
| std::u16string meta::utf::to_utf16 | ( | const std::string & | str | ) |
Converts a string from utf8 to utf16.
| str | The string to convert |
| std::string meta::utf::tolower | ( | const std::string & | str | ) |
Lowercases a utf8 string.
| str | The string to convert |
| std::string meta::utf::toupper | ( | const std::string & | str | ) |
Uppercases a utf8 string.
| str | The string to convert |
| std::string meta::utf::foldcase | ( | const std::string & | str | ) |
Folds the case of a utf8 string.
This is like lowercase, but a bit more general.
| str | The string to convert |
| std::string meta::utf::transform | ( | const std::string & | str, |
| const std::string & | id | ||
| ) |
Transliterates a utf8 string, using the rules defined in ICU.
| str | The string to transliterate |
| id | The ICU identifier for the transliteration method to use |
| std::string meta::utf::remove_if | ( | const std::string & | str, |
| std::function< bool(uint32_t)> | pred | ||
| ) |
Removes UTF-32 codepoints that match the given function.
| str | The string to remove characters from |
| pred | The predicate that returns true for codepoints that should be removed |
| uint64_t meta::utf::length | ( | const std::string & | str | ) |
| str | The string to find the length of |
| bool meta::utf::isalpha | ( | uint32_t | codepoint | ) |
| codepoint | The codepoint in question |
| bool meta::utf::isblank | ( | uint32_t | codepoint | ) |
| codepoint | The codepoint in question |
|
inline |
Helper method that converts an ICU string to a std::u16string.
| icu_str | The ICU string to be converted |
|
inline |
Helper method that converts an ICU string to a std::string in utf8.
| icu_str | The ICU string to be converted |
|
inline |
Helper method that appends a UTF-32 codepoint to the given utf8 string.
| dest | The string to append the codepoint to |
| codepoint | The UTF-32 codepoint to append |
1.8.9.1