How to convert Unicode char to "Unicode HEX Position" in Arduino

Question 1

How to convert Unicode char to "Unicode HEX Position" in Arduino or C

i will share a picture here :

for example in JavaScript you can do that with charCodeAt(); ! this function will return exactly the char-code and then you can convert it to hex!

for example in JavaScript i can do that like this to return exact table value

 var inpString = 'س';
 var myChar=0;
 var output = 0;
 myChar = inpString.charCodeAt(0);
 output = (ToHex((myChar&0xff00)>>8 )) + (ToHex( myChar&0xff ));
 
 function ToHex(i)
 {
 var sHex = "0123456789ABCDEF";
 var Out = "";
 Out = sHex.charAt(i&0xf);
 i>>=4;
 Out = sHex.charAt(i&0xf) + Out;
 return Out;
 }
 alert(output);

So how can i do that in Arduino ? its using to send unicode char in PDU mode in Arduino i just need to convert unicode char like this -> 'س' to correct Unicode HEX Position that i shared in the picture above

for example 'س' is 0633 or 'A' is 0041 or 'ب' is 067E

Question 2

Or simply console.log("س".charCodeAt(0).toString(16)).

Question 3

please do not post at multiple locations ... stackoverflow.com/questions/62878287/…

Question 4

Unlike JavaScript, C++ makes no difference between a character and its code point. Thus, 'A', 0x41 and 65 are just different ways of writing the same number.

Note, however, that the char type is intended to hold ASCII only. For everything else, you may try using wide characters. For example, the program

void setup() {
 Serial.begin(9600);
 wchar_t c = L'س';
 Serial.println(c, 16);
}
void loop() {}

outputs 633 on the serial port. Note the second argument to Serial.println() which specifies base 16. Default is to print numbers in decimal.

Beware that the representation of wide characters is implementation defined, and the avr-libc doesn't provide support for manipulating them or strings made with them. If you want to transmit them, you will also have to decide for yourself how to break them down into a sequence of bytes, as that's the only thing a serial port (or I2C, or SPI for that matter) can transmit. UTF-8 is the most popular choice. I doubt wide characters are popular in embedded systems at all.

Question 5

so the difference to the sketch in my answer is the encoding of the source code versus the encoding of Serial Monitor.

Question 6

@Juraj: The encoding of the source code is irrelevant to my answer as long as the dev environment is consistent (same encoding used by the editor and assumed by the compiler): the compiler initializes c with the code point of the character the editor shows between the quotes. It is basically equivalent to writing wchar_t c = 0x633, but in a way that hopefully makes more sense to the programmer. As soon as the program does I/O with non-ASCII characters, it will have to make a decision about the character encoding it is going to use.

Question 7

This will read and print unicode characters from/to Serial Monitor and print their HEX codes. Please set the line ending in Serial Monitor to NL and confirm the entered character with Enter.

void setup() {
 Serial.begin(115200);
}
void loop() {
 if (Serial.available()) {
 char buff[4];
 int l = Serial.readBytesUntil('\n', buff, sizeof(buff) - 1);
 if (l > 0) {
 buff[l] = 0;
 Serial.println(buff);
 Serial.print(buff[0], HEX);
 if (l > 1) {
 Serial.print(buff[1], HEX);
 }
 Serial.println();
 }
 }
}

Question 8

thank you for the answer , this code print -> D8B3 for -> 'س' , Not 0633 ! but its working correct for ascii character

Question 9

1. If buff[0] is less than 16, you would have to zero pad. 2. This program assumes the serial monitor sends characters as UCS-2BE, which is not the case. Like almost everything nowadays, it uses UTF-8 for input and output.

Question 10

@EdgarBonet, my sketch doesn't assume anything. it prints the hex values and I know it is UTF-8. and visible characters have codes > 0x10

Question 11

The question is about printing the code point (not the code units!) of a character in hex. Your sketch prints the hex values of pairs of bytes, concatenated. The way they are concatenated embeds the implicit assumption that these bytes represent 16-bit numbers transmitted in big endian order. Considering these 16-bit numbers as equivalent to code points is only valid if the characters are transmitted as UCS-2.

Question 12

A reliable way to output Unicode chars is to use Octal equivalents in the string you are printing. e.g.

Serial.print("342円204円211円");

will output °F provided the receiver has font for that unicode.

Using Non-ASCII chars in Arduino has a .jar file that converts between Unicode chars, \u... strings and Octal

Hex \x.. is not used because C compilers can get confused if the next character after the two hex digits is 'a' to 'f'. Using octal avoids this problem. The GCC compiler used by Arduino also does not accept all unicode sequences such as \u0020 enter image description here

score 1 · Accepted Answer · 2020-07-13 20:03:37Z

Unlike JavaScript, C++ makes no difference between a character and its code point. Thus, 'A', 0x41 and 65 are just different ways of writing the same number.

Note, however, that the char type is intended to hold ASCII only. For everything else, you may try using wide characters. For example, the program

void setup() {
 Serial.begin(9600);
 wchar_t c = L'س';
 Serial.println(c, 16);
}
void loop() {}

outputs 633 on the serial port. Note the second argument to Serial.println() which specifies base 16. Default is to print numbers in decimal.

Beware that the representation of wide characters is implementation defined, and the avr-libc doesn't provide support for manipulating them or strings made with them. If you want to transmit them, you will also have to decide for yourself how to break them down into a sequence of bytes, as that's the only thing a serial port (or I2C, or SPI for that matter) can transmit. UTF-8 is the most popular choice. I doubt wide characters are popular in embedded systems at all.

so the difference to the sketch in my answer is the encoding of the source code versus the encoding of Serial Monitor.
@Juraj: The encoding of the source code is irrelevant to my answer as long as the dev environment is consistent (same encoding used by the editor and assumed by the compiler): the compiler initializes c with the code point of the character the editor shows between the quotes. It is basically equivalent to writing wchar_t c = 0x633, but in a way that hopefully makes more sense to the programmer. As soon as the program does I/O with non-ASCII characters, it will have to make a decision about the character encoding it is going to use.

Stack Exchange Network

How to convert Unicode char to "Unicode HEX Position" in Arduino

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to convert Unicode char to "Unicode HEX Position" in Arduino

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions