Timeline for answer to How do I properly use std::string on UTF-8 in C++? by Matthieu M.
Current License: CC BY-SA 4.0
Post Revisions
24 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jul 17, 2023 at 2:26 | review | Suggested edits | |||
| Jul 17, 2023 at 7:10 | |||||
| Jun 20, 2020 at 9:12 | history | edited | Community Bot |
Commonmark migration
|
|
| Aug 28, 2019 at 4:41 | vote | accept | Saddle Point | ||
| Aug 28, 2019 at 4:41 | vote | accept | Saddle Point | ||
| Aug 28, 2019 at 4:41 | |||||
| Aug 28, 2019 at 4:41 | vote | accept | Saddle Point | ||
| Aug 28, 2019 at 4:41 | |||||
| Jun 13, 2018 at 8:40 | audit | First posts | |||
| Jun 13, 2018 at 8:41 | |||||
| Jun 8, 2018 at 11:06 | audit | First posts | |||
| Jun 8, 2018 at 11:18 | |||||
| May 30, 2018 at 22:36 | audit | First posts | |||
| May 30, 2018 at 22:36 | |||||
| May 24, 2018 at 1:33 | audit | First posts | |||
| May 24, 2018 at 1:33 | |||||
| May 23, 2018 at 12:15 | audit | First posts | |||
| May 23, 2018 at 12:28 | |||||
| May 20, 2018 at 23:10 | audit | First posts | |||
| May 20, 2018 at 23:10 | |||||
| May 19, 2018 at 0:59 | audit | First posts | |||
| May 19, 2018 at 0:59 | |||||
| May 18, 2018 at 13:52 | audit | First posts | |||
| May 18, 2018 at 14:12 | |||||
| May 18, 2018 at 12:45 | history | edited | Matthieu M. | CC BY-SA 4.0 |
added 398 characters in body
|
| May 18, 2018 at 12:41 | comment | added | Matthieu M. | @Muzer: Ah yes indeed, only matching byte for byte works. I'll amend with concerns about normalization/collation/locales. | |
| May 18, 2018 at 12:38 | comment | added | Muzer |
str.find("...")str.fin works only if you only care about matching byte-for-byte - otherwise you'll need a proper normalisation-and-locale-aware comparison. Other than that this seems like a pretty good answer, and shows why I kind of hate the Unicode "support" which exists in languages like Python3.
|
|
| May 18, 2018 at 12:32 | history | edited | Matthieu M. | CC BY-SA 4.0 |
added 219 characters in body
|
| May 18, 2018 at 11:25 | history | edited | Matthieu M. | CC BY-SA 4.0 |
added 70 characters in body
|
| May 18, 2018 at 11:23 | comment | added | Matthieu M. |
@Quentin: Yes. I should add it to the list of alternatives! By the way, there's a nifty typedef: std::u32string.
|
|
| May 18, 2018 at 11:22 | comment | added | Quentin |
For portability, would std::basic_string<char32_t> work as expected on both *nix and Windows?
|
|
| May 18, 2018 at 11:16 | comment | added | Saddle Point |
My bad, str.find works. Is there any way to fixed the string size/length issue and string iteration issue?
|
|
| May 18, 2018 at 11:06 | comment | added | Matthieu M. |
@Edityouprofile: str.find("哈") should work (see ideone.com/s9i1yf), but str.find('哈') will not because '哈' is a multi-byte characters. str.find_first_of("哈") will not work (only works for ASCII patterns). Regex should work fine for ASCII patterns; however beware of character classes and "repeaters" (eg. "哈?" may only make the last byte conditional).
|
|
| May 18, 2018 at 9:42 | comment | added | Saddle Point |
Thanks for the great details! I'm trying to take some time to figure all these out! About the original questions, besides str.find_first_of, str.find or std::regex seems not work for non ASCII inputs (e.g. "哈" or u8"哈") given std::string str(u8"哈哈haha");
|
|
| May 18, 2018 at 8:51 | history | answered | Matthieu M. | CC BY-SA 4.0 |