ModErn Text Analysis
META Enumerates Textual Applications
|
Functions for converting to and from various character sets. More...
Classes | |
class | icu_handle |
Internal class that ensures that ICU cleans up all of its "still-reachable" memory before program termination. More... | |
class | segmenter |
Class that encapsulates segmenting unicode strings. More... | |
class | transformer |
Class that encapsulates transliteration of unicode strings. More... | |
Functions | |
std::string | to_utf8 (const std::string &str, const std::string &charset) |
Converts a string from the given charset to utf8. More... | |
std::u16string | to_utf16 (const std::string &str, const std::string &charset) |
Converts a string fro the given charset to utf16. More... | |
std::string | to_utf8 (const std::u16string &str) |
Converts a string from utf16 to utf8. More... | |
std::u16string | to_utf16 (const std::string &str) |
Converts a string from utf8 to utf16. More... | |
std::string | tolower (const std::string &str) |
Lowercases a utf8 string. More... | |
std::string | toupper (const std::string &str) |
Uppercases a utf8 string. More... | |
std::string | foldcase (const std::string &str) |
Folds the case of a utf8 string. More... | |
std::string | transform (const std::string &str, const std::string &id) |
Transliterates a utf8 string, using the rules defined in ICU. More... | |
std::string | remove_if (const std::string &str, std::function< bool(uint32_t)> pred) |
Removes UTF-32 codepoints that match the given function. More... | |
uint64_t | length (const std::string &str) |
bool | isalpha (uint32_t codepoint) |
bool | isblank (uint32_t codepoint) |
std::u16string | icu_to_u16str (const icu::UnicodeString &icu_str) |
Helper method that converts an ICU string to a std::u16string. More... | |
std::string | icu_to_u8str (const icu::UnicodeString &icu_str) |
Helper method that converts an ICU string to a std::string in utf8. More... | |
void | utf8_append_codepoint (std::string &dest, uint32_t codepoint) |
Helper method that appends a UTF-32 codepoint to the given utf8 string. More... | |
Functions for converting to and from various character sets.
std::string meta::utf::to_utf8 | ( | const std::string & | str, |
const std::string & | charset | ||
) |
Converts a string from the given charset to utf8.
str | The string to convert |
charset | The charset of the given string |
std::u16string meta::utf::to_utf16 | ( | const std::string & | str, |
const std::string & | charset | ||
) |
Converts a string fro the given charset to utf16.
str | The string to convert |
charset | The charset of the given string |
std::string meta::utf::to_utf8 | ( | const std::u16string & | str | ) |
Converts a string from utf16 to utf8.
str | The string to convert |
std::u16string meta::utf::to_utf16 | ( | const std::string & | str | ) |
Converts a string from utf8 to utf16.
str | The string to convert |
std::string meta::utf::tolower | ( | const std::string & | str | ) |
Lowercases a utf8 string.
str | The string to convert |
std::string meta::utf::toupper | ( | const std::string & | str | ) |
Uppercases a utf8 string.
str | The string to convert |
std::string meta::utf::foldcase | ( | const std::string & | str | ) |
Folds the case of a utf8 string.
This is like lowercase, but a bit more general.
str | The string to convert |
std::string meta::utf::transform | ( | const std::string & | str, |
const std::string & | id | ||
) |
Transliterates a utf8 string, using the rules defined in ICU.
str | The string to transliterate |
id | The ICU identifier for the transliteration method to use |
std::string meta::utf::remove_if | ( | const std::string & | str, |
std::function< bool(uint32_t)> | pred | ||
) |
Removes UTF-32 codepoints that match the given function.
str | The string to remove characters from |
pred | The predicate that returns true for codepoints that should be removed |
uint64_t meta::utf::length | ( | const std::string & | str | ) |
str | The string to find the length of |
bool meta::utf::isalpha | ( | uint32_t | codepoint | ) |
codepoint | The codepoint in question |
bool meta::utf::isblank | ( | uint32_t | codepoint | ) |
codepoint | The codepoint in question |
|
inline |
Helper method that converts an ICU string to a std::u16string.
icu_str | The ICU string to be converted |
|
inline |
Helper method that converts an ICU string to a std::string in utf8.
icu_str | The ICU string to be converted |
|
inline |
Helper method that appends a UTF-32 codepoint to the given utf8 string.
dest | The string to append the codepoint to |
codepoint | The UTF-32 codepoint to append |