Questions tagged [utf-8]
UTF-8 (Unicode Transformation Format, 8 bits) is a character encoding that describes each Unicode code point using a byte sequence of one to six bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.
41 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
6
votes
1
answer
773
views
Transcoding UTF-8 to UTF-16-LE in VBA
VBA is a language that's lacking a lot of basic functionality. (Pun intended)
Most libraries, if they exist in the first place, are OS-specific, and even some of the inbuilt functions don't work on ...
3
votes
1
answer
347
views
Determining if a file is UTF-8 text by looking at its first n bytes
I'm trying to find out whether a particular file is UTF-8 encoded readable text, by which I mean printable symbols, whitespaces, \n, ...
2
votes
2
answers
381
views
Wielding .NET masterfully to encode non-alphanumeric characters into utf-8 hex representation
I have these two methods that work, but I also hate because they almost certainly can be improved. I'm hoping to gain some guidance from others who are more knowledgable of .NET's offering for ...
0
votes
1
answer
749
views
I wrote a header file to write German umlauts in a textfile properly
This function is about the fact that a std::wstring was used in another cpp file in order to be able to read strings with German umlauts from the console. Since it is difficult to get wstrings into a ...
7
votes
3
answers
3k
views
C++ UTF-8 decoder
While writing simple text rendering I found a lack of utf-8 decoders. Most decoders I found required allocating enough space for decoded string. In worse case that would mean that the decoded string ...
2
votes
1
answer
113
views
Validator and Sanitizer for HTML 5 attribute regex according to current HTML living standard
According to https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 an HTML 5 attribute name is defined like this:
Attribute names must consist of one or more characters other than
controls, ...
3
votes
2
answers
611
views
Find the UTF-8 Length of a given codepoint
A codepoint in UTF-8 can be represented by one to four bytes. Given a codepoint, I need to determine the length (in bytes) of the codepoint if it were represented in UTF-8. For this, I've written the ...
2
votes
1
answer
709
views
A C++ function to read Code Points from an UTF-8 Stream
I've written a function that reads and returns one UTF-8 code point from an istream. I am wondering if the code is efficient or if there are some obvious problems with the implementation.
...
2
votes
1
answer
2k
views
The conversion from UTF-16 to UTF-8
I have created a function that converts from UTF-16 to UTF-8.
This function converts from UTF-16 to codepoint firstly, then from codepoint to UTF-8.
...
22
votes
6
answers
4k
views
Transcode UCS-4BE to UTF-8
Below is my entire program. You can read what it does thanks to the comments and specifications in particular.
My question is: can it be improved? Would it be possible, for example, to avoid writing a ...
3
votes
1
answer
220
views
Save on typing while using UTF8 encoding
Typing in something like Encoding.UTF8.GetString(...) and Encoding.UTF8.GetBytes(...) everywhere in your code could be ...
8
votes
1
answer
157
views
Checking whether a string fragment could be part of a longer UTF-8 string
Although UTF-8 validation is a common task, I'm trying to solve a slightly different task; given a string of bytes, work out whether it could potentially be a fragment of a valid UTF-8 string. That's ...
8
votes
1
answer
203
views
myUTF-8 small lib (validate UTF-8, guess language, count chars)
I'm new to C language and never got my self into the details of UTF-8, and after reading some articles about it, I wanted to try and play with UTF-8 with C language for both fun and practicing ...
5
votes
1
answer
13k
views
Convert to UTF-8 all files in a directory
1. Summary
I can't find, how I can to refactor multiple with open for one file.
2. Expected behavior of program
Program detect encoding for each file in the ...
9
votes
5
answers
8k
views
Convert UTF8 string to UTF32 string in C
I'm doing some recreational programming in C (after spending some time in C++, but professionally using only PHP/JavaScript).
I wrote a UTF8 to UTF32 converter and just wanted to know if I made some ...