How to convert Unicode char to "Unicode HEX Position" in Arduino or C
i will share a picture here :
for example in JavaScript you can do that with charCodeAt();
! this function will return exactly the char-code and then you can convert it to hex!
for example in JavaScript i can do that like this to return exact table value
var inpString = 'س';
var myChar=0;
var output = 0;
myChar = inpString.charCodeAt(0);
output = (ToHex((myChar&0xff00)>>8 )) + (ToHex( myChar&0xff ));
function ToHex(i)
{
var sHex = "0123456789ABCDEF";
var Out = "";
Out = sHex.charAt(i&0xf);
i>>=4;
Out = sHex.charAt(i&0xf) + Out;
return Out;
}
alert(output);
So how can i do that in Arduino ? its using to send unicode char in PDU mode in Arduino i just need to convert unicode char like this -> 'س' to correct Unicode HEX Position that i shared in the picture above
for example 'س' is 0633 or 'A' is 0041 or 'ب' is 067E
3 Answers 3
Unlike JavaScript, C++ makes no difference between a character and its
code point. Thus, 'A'
, 0x41
and 65
are just different ways of
writing the same number.
Note, however, that the char
type is intended to hold ASCII only. For
everything else, you may try using wide characters. For example, the
program
void setup() {
Serial.begin(9600);
wchar_t c = L'س';
Serial.println(c, 16);
}
void loop() {}
outputs 633
on the serial port. Note the second argument to
Serial.println()
which specifies base 16. Default is to print
numbers in decimal.
Beware that the representation of wide characters is implementation defined, and the avr-libc doesn't provide support for manipulating them or strings made with them. If you want to transmit them, you will also have to decide for yourself how to break them down into a sequence of bytes, as that's the only thing a serial port (or I2C, or SPI for that matter) can transmit. UTF-8 is the most popular choice. I doubt wide characters are popular in embedded systems at all.
-
so the difference to the sketch in my answer is the encoding of the source code versus the encoding of Serial Monitor.2020年07月14日 05:22:17 +00:00Commented Jul 14, 2020 at 5:22
-
@Juraj: The encoding of the source code is irrelevant to my answer as long as the dev environment is consistent (same encoding used by the editor and assumed by the compiler): the compiler initializes
c
with the code point of the character the editor shows between the quotes. It is basically equivalent to writingwchar_t c = 0x633
, but in a way that hopefully makes more sense to the programmer. As soon as the program does I/O with non-ASCII characters, it will have to make a decision about the character encoding it is going to use.Edgar Bonet– Edgar Bonet2020年07月14日 08:51:04 +00:00Commented Jul 14, 2020 at 8:51
This will read and print unicode characters from/to Serial Monitor and print their HEX codes. Please set the line ending in Serial Monitor to NL and confirm the entered character with Enter.
void setup() {
Serial.begin(115200);
}
void loop() {
if (Serial.available()) {
char buff[4];
int l = Serial.readBytesUntil('\n', buff, sizeof(buff) - 1);
if (l > 0) {
buff[l] = 0;
Serial.println(buff);
Serial.print(buff[0], HEX);
if (l > 1) {
Serial.print(buff[1], HEX);
}
Serial.println();
}
}
}
-
thank you for the answer , this code print -> D8B3 for -> 'س' , Not 0633 ! but its working correct for ascii characterermya– ermya2020年07月13日 18:58:49 +00:00Commented Jul 13, 2020 at 18:58
-
1. If
buff[0]
is less than 16, you would have to zero pad. 2. This program assumes the serial monitor sends characters as UCS-2BE, which is not the case. Like almost everything nowadays, it uses UTF-8 for input and output.Edgar Bonet– Edgar Bonet2020年07月13日 21:02:50 +00:00Commented Jul 13, 2020 at 21:02 -
@EdgarBonet, my sketch doesn't assume anything. it prints the hex values and I know it is UTF-8. and visible characters have codes > 0x102020年07月14日 04:43:10 +00:00Commented Jul 14, 2020 at 4:43
-
The question is about printing the code point (not the code units!) of a character in hex. Your sketch prints the hex values of pairs of bytes, concatenated. The way they are concatenated embeds the implicit assumption that these bytes represent 16-bit numbers transmitted in big endian order. Considering these 16-bit numbers as equivalent to code points is only valid if the characters are transmitted as UCS-2.Edgar Bonet– Edgar Bonet2020年07月14日 08:33:47 +00:00Commented Jul 14, 2020 at 8:33
A reliable way to output Unicode chars is to use Octal equivalents in the string you are printing. e.g.
Serial.print("342円204円211円");
will output °F provided the receiver has font for that unicode.
Using Non-ASCII chars in Arduino has a .jar file that converts between Unicode chars, \u... strings and Octal
Hex \x.. is not used because C compilers can get confused if the next character after the two hex digits is 'a' to 'f'. Using octal avoids this problem. The GCC compiler used by Arduino also does not accept all unicode sequences such as \u0020 enter image description here
console.log("س".charCodeAt(0).toString(16))
.