Timeline for UTF-8 decoding library
Current License: CC BY-SA 3.0
10 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Jun 26, 2012 at 4:28 | comment | added | Alexis Wilke | Hmmm... the EURO character is found in ISO-8859-15, -16, and -7. Not -9. Anyway, with Unicode and the Internet, ISO-8859-1 is what you hear about because all the other 8 bit encodings are not represented 1 to 1 with any other plane in Unicode. | |
| Jun 26, 2012 at 4:19 | history | edited | Alexis Wilke | CC BY-SA 3.0 |
inform about way to implement mblen()
|
| Jun 25, 2012 at 11:10 | comment | added | DevSolar | @Alexis Wilke: "Once converted", correct. Oh, by the way, "what people use these days", IF they're still using an 8-byte codepage, is usually ISO-8859-15. That might change once the Euro currency is history, but ATM Latin-1 is "common" because people cannot remember it's actually Latin-9... | |
| Jun 25, 2012 at 10:42 | comment | added | Alexis Wilke | Yes, the first 256 characters of UCS-2 are the same as UCS-4, UTF-16 and UTF-8 once converted. They're all ISO-8859-1. Converting to another encoding (such as CP1252) requires tables or a library such as iconv (which I recommend you avoid!) | |
| Jun 25, 2012 at 10:33 | history | edited | Alexis Wilke | CC BY-SA 3.0 |
Added info about getting the length in characters
|
| Jun 25, 2012 at 10:26 | comment | added | Alexis Wilke | First of all, I did not say UTF-16. On Windows they use UCS-2. They don't know what UTF-16 is. Second, the first plane of Unicode is ISO-8859-1, whatever you say, that's what it is. Third, CP1252 is specific to Windows and if you convert from UTF-8 you're not going to get CP1252 which is why I mention that you get ISO-8859-1. Then it's your problem to properly select the correct font to render the text later. If you know what encoding you have, you can do it. | |
| Jun 25, 2012 at 10:20 | comment | added | ctrl-alt-delor | Or how about converting to utf-16 or utf-32 for internal processing. | |
| Jun 25, 2012 at 10:19 | comment | added | Konrad Rudolph | That’s not what OP wants. Why would he want to convert UTF-16 losslessly to single-byte codepoints? The question doesn’t imply this anywhere. Mention of ISO-8859-1 is just misguided. "in most cases [it’s] what people use these days" is completely wrong. In fact, modern browsers actually use a different encoding even if you explicitly request this encoding because almost nobody ever means ISO-8859-1, even if they say so. | |
| Jun 25, 2012 at 10:17 | comment | added | Steve Jessop | "in most cases ISO-8859-1 is what people use these days". On the interwebs, I see CP1252 mislabelled as ISO-8859-1 fairly frequently. Not sure which one you'd say they were "using" in that case, but it pretty much doesn't matter what "most people" are using, what matters is the minority of people whose text breaks your code ;-) | |
| Jun 25, 2012 at 10:14 | history | answered | Alexis Wilke | CC BY-SA 3.0 |