Skip to main content

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Timeline for UTF-8 decoding library

Current License: CC BY-SA 3.0

10 events

when toggle format	what	by	license	comment
Jun 26, 2012 at 4:28	comment	added	Alexis Wilke		Hmmm... the EURO character is found in ISO-8859-15, -16, and -7. Not -9. Anyway, with Unicode and the Internet, ISO-8859-1 is what you hear about because all the other 8 bit encodings are not represented 1 to 1 with any other plane in Unicode.
Jun 26, 2012 at 4:19	history	edited	Alexis Wilke	CC BY-SA 3.0	inform about way to implement mblen()
Jun 25, 2012 at 11:10	comment	added	DevSolar		@Alexis Wilke: "Once converted", correct. Oh, by the way, "what people use these days", IF they're still using an 8-byte codepage, is usually ISO-8859-15. That might change once the Euro currency is history, but ATM Latin-1 is "common" because people cannot remember it's actually Latin-9...
Jun 25, 2012 at 10:42	comment	added	Alexis Wilke		Yes, the first 256 characters of UCS-2 are the same as UCS-4, UTF-16 and UTF-8 once converted. They're all ISO-8859-1. Converting to another encoding (such as CP1252) requires tables or a library such as iconv (which I recommend you avoid!)
Jun 25, 2012 at 10:33	history	edited	Alexis Wilke	CC BY-SA 3.0	Added info about getting the length in characters
Jun 25, 2012 at 10:26	comment	added	Alexis Wilke		First of all, I did not say UTF-16. On Windows they use UCS-2. They don't know what UTF-16 is. Second, the first plane of Unicode is ISO-8859-1, whatever you say, that's what it is. Third, CP1252 is specific to Windows and if you convert from UTF-8 you're not going to get CP1252 which is why I mention that you get ISO-8859-1. Then it's your problem to properly select the correct font to render the text later. If you know what encoding you have, you can do it.
Jun 25, 2012 at 10:20	comment	added	ctrl-alt-delor		Or how about converting to utf-16 or utf-32 for internal processing.
Jun 25, 2012 at 10:19	comment	added	Konrad Rudolph		That’s not what OP wants. Why would he want to convert UTF-16 losslessly to single-byte codepoints? The question doesn’t imply this anywhere. Mention of ISO-8859-1 is just misguided. "in most cases [it’s] what people use these days" is completely wrong. In fact, modern browsers actually use a different encoding even if you explicitly request this encoding because almost nobody ever means ISO-8859-1, even if they say so.
Jun 25, 2012 at 10:17	comment	added	Steve Jessop		"in most cases ISO-8859-1 is what people use these days". On the interwebs, I see CP1252 mislabelled as ISO-8859-1 fairly frequently. Not sure which one you'd say they were "using" in that case, but it pretty much doesn't matter what "most people" are using, what matters is the minority of people whose text breaks your code ;-)
Jun 25, 2012 at 10:14	history	answered	Alexis Wilke	CC BY-SA 3.0