Skip to main content
Code Review

Questions tagged [utf-8]

UTF-8 (Unicode Transformation Format, 8 bits) is a character encoding that describes each Unicode code point using a byte sequence of one to six bytes. It is backwards-compatible with ASCII while still supporting representation of all Unicode code points.

Filter by
Sorted by
Tagged with
6 votes
1 answer
773 views

Transcoding UTF-8 to UTF-16-LE in VBA

VBA is a language that's lacking a lot of basic functionality. (Pun intended) Most libraries, if they exist in the first place, are OS-specific, and even some of the inbuilt functions don't work on ...
GWD's user avatar
GWD
  • 195
3 votes
1 answer
347 views

Determining if a file is UTF-8 text by looking at its first n bytes

I'm trying to find out whether a particular file is UTF-8 encoded readable text, by which I mean printable symbols, whitespaces, \n, ...
korolev's user avatar
  • 33
2 votes
2 answers
381 views

Wielding .NET masterfully to encode non-alphanumeric characters into utf-8 hex representation

I have these two methods that work, but I also hate because they almost certainly can be improved. I'm hoping to gain some guidance from others who are more knowledgable of .NET's offering for ...
Greg H's user avatar
  • 23
0 votes
1 answer
749 views

I wrote a header file to write German umlauts in a textfile properly

This function is about the fact that a std::wstring was used in another cpp file in order to be able to read strings with German umlauts from the console. Since it is difficult to get wstrings into a ...
Daniel's user avatar
  • 303
7 votes
3 answers
3k views

C++ UTF-8 decoder

While writing simple text rendering I found a lack of utf-8 decoders. Most decoders I found required allocating enough space for decoded string. In worse case that would mean that the decoded string ...
KlemenPl's user avatar
  • 225
2 votes
1 answer
113 views

Validator and Sanitizer for HTML 5 attribute regex according to current HTML living standard

According to https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 an HTML 5 attribute name is defined like this: Attribute names must consist of one or more characters other than controls, ...
3 votes
2 answers
611 views

Find the UTF-8 Length of a given codepoint

A codepoint in UTF-8 can be represented by one to four bytes. Given a codepoint, I need to determine the length (in bytes) of the codepoint if it were represented in UTF-8. For this, I've written the ...
2 votes
1 answer
709 views

A C++ function to read Code Points from an UTF-8 Stream

I've written a function that reads and returns one UTF-8 code point from an istream. I am wondering if the code is efficient or if there are some obvious problems with the implementation. ...
2 votes
1 answer
2k views

The conversion from UTF-16 to UTF-8

I have created a function that converts from UTF-16 to UTF-8. This function converts from UTF-16 to codepoint firstly, then from codepoint to UTF-8. ...
Lion King's user avatar
  • 149
22 votes
6 answers
4k views

Transcode UCS-4BE to UTF-8

Below is my entire program. You can read what it does thanks to the comments and specifications in particular. My question is: can it be improved? Would it be possible, for example, to avoid writing a ...
3 votes
1 answer
220 views

Save on typing while using UTF8 encoding

Typing in something like Encoding.UTF8.GetString(...) and Encoding.UTF8.GetBytes(...) everywhere in your code could be ...
Dmitry Nogin's user avatar
  • 6,131
8 votes
1 answer
157 views

Checking whether a string fragment could be part of a longer UTF-8 string

Although UTF-8 validation is a common task, I'm trying to solve a slightly different task; given a string of bytes, work out whether it could potentially be a fragment of a valid UTF-8 string. That's ...
ais523's user avatar
  • 181
8 votes
1 answer
203 views

myUTF-8 small lib (validate UTF-8, guess language, count chars)

I'm new to C language and never got my self into the details of UTF-8, and after reading some articles about it, I wanted to try and play with UTF-8 with C language for both fun and practicing ...
5 votes
1 answer
13k views

Convert to UTF-8 all files in a directory

1. Summary I can't find, how I can to refactor multiple with open for one file. 2. Expected behavior of program Program detect encoding for each file in the ...
9 votes
5 answers
8k views

Convert UTF8 string to UTF32 string in C

I'm doing some recreational programming in C (after spending some time in C++, but professionally using only PHP/JavaScript). I wrote a UTF8 to UTF32 converter and just wanted to know if I made some ...
S22h's user avatar
  • 193

15 30 50 per page
1
2 3

AltStyle によって変換されたページ (->オリジナル) /