25,276 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
Best practices
0
votes
1
replies
69
views
what is the best practice to get the right compare of byte array?
actually I want to get the right encoding of the text so I write the code :
I have to compare the byte array of the encoding and get the right compare and then know which it is.
but there is many and ...
-2
votes
1
answer
124
views
How to stop Directory.EnumerateFiles() from silently adding U+FFFD for non-utf8 filenames? [closed]
On linux, I created a filename with special bytes: touch $'/tmp/bad/\xff\x1f.jpg'
c# prints out the filename as: /tmp/bad/�.jpg
I would rather c# throw an exception, instead of silently printing junk ...
2
votes
2
answers
86
views
Possible scenarios where c16rtomb produces multibyte sequence with a size greater than 2
According to cppreference:
size_t c16rtomb( char* restrict s, char16_t c16, mbstate_t* restrict ps );
Converts a single code point from its variable-length 16-bit wide character representation (...
0
votes
1
answer
81
views
Change micropython unicode notation from \xb0 to \u00b0 in string
I am using Micropython on the ESP32.
I have the following string, which includes the unicode character \xb0.
a = 'abc\xb0def'
First, I will need to change the notation to the \U00XX form, second I ...
Best practices
0
votes
4
replies
132
views
How to search through Unicode text with ASCII input in Python
I have a corpus of text which includes some accented words, such as épée, and I would like people to be able to easily search through it using ASCII input. Ideally, they would simply type protege or ...
0
votes
0
answers
145
views
C++ Unicode Problems/Questions
I wrote two versions of a little program in C++ with MSVC on Windows 11:
First one:
#include <iostream>
#include <Windows.h>
int main()
{
SetConsoleOutputCP(CP_UTF8);
std::cout &...
1
vote
2
answers
113
views
Excel adding unicode symbol from r csv output
Excel is adding a unicode character to a summary file I save from r as a .csv, where it adds "¬" in front of "±". Is there a way to edit the r script to prevent this?
cola <- c(&...
Advice
1
vote
0
replies
56
views
Does CLDR specify language-specific versions of "Label: Text"?
In German and English, we use a colon followed by a space to separate a term from its explanation, or a label from the text that follows. But other languages do it differently. Rather than have a &...
0
votes
2
answers
95
views
Is my understanding of Unicode's definition of Code Unit correct?
The Unicode standards states:
Code unit: The minimal bit combination that can represent a unit of encoded text for processing or interchange.
The Unicode Standard uses 8-bit code units in the UTF-8 ...
21
votes
4
answers
2k
views
Why doesn't printing a Unicode character with wprintf work?
I made a small C program that should print an emoji:
#include <stdio.h>
#include <windows.h>
int main(void) {
SetConsoleOutputCP(CP_UTF8);
printf("\U0001F625\n"); // 😥
...
-1
votes
1
answer
105
views
What is an explanation of how a TTF font "glyph" can be one or more unicode point? [closed]
I am tinkering with fonteditor-core NPM package, to figure out how to do font subsetting (get a subset of glyphs from a font).
I noticed the data model is basically:
export namespace TTF {
type ...
0
votes
1
answer
76
views
Arabic text gets corrupted to "E1-(' ('D9'DE" when creating PDF annotations with @iwater/annotpdf library
I'm using the @iwater/annotpdf library to create PDF annotations in an Angular application. When I try to create free text annotations with Arabic text, the text gets corrupted to unreadable ...
4
votes
1
answer
224
views
Why does MSVC's std::print corrupt long unicode strings when printing with utf-8?
If I compile the following code in Visual Studio with the /utf-8 flag enabled:
#include <print>
int main() {
std::println("{}", "▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊...
4
votes
2
answers
182
views
How does PCRE \p{N} define a "numeric character"
In PCRE-regular expressions, \p{N} is supposed to match "Any kind of numeric character from any script". According to descriptions on RegexInfo and also on sidebar-explanations on regex101....
2
votes
1
answer
119
views
Display game variation tree using compact vertical layout
I want to create a multi-line string that can be printed out or displayed via curses. This string displays the moves and variations of a go/weiqi/baduk game from an sgf file using unicode characters. ...