When reading and writing with a python serial port connection to arduino, if I don't use latin-1 ('ISO-8859-1'), the results are not as expected. Like if I have
int outP = 5;
//...
int outV = Serial.read();
analogWrite(outP, outV);
While with python I have
serial_port.write(chr(255).encode())
I read 3.78 V from the pin, whereas if I use
serial_port.write(chr(255).encode(encoding = 'latin-1'))
I get 5.04 V. I have read latin-1 and utf-8 don't always match, but is there something about arduino that requires using latin-1? Of course, these return different values when testing each encoding, using 255 gives b'\xc3\xbf' (from 'ÿ') with utf-8 or b'\xff' from ('ÿ') with latin-1, but why does arduino work with latin-1?
FYI other options that work are
v = 255
serial_port.write(v.to_bytes(1, byteorder = 'big'))
serial_port.write(bytes[v])
1 Answer 1
but is there something about arduino that requires using latin-1?
No, not really.
What it comes down to is that Serial.read()
reads bytes, irrespective of whatever encoding they may be being used with. ISO-8859 only encodes character code points in the 0-255 range, so when you choose a 0-255 code point and send it as ISO-8859, it gets sent as 1 byte, which is how you're code has been written to receive it.
A 255 code point as utf-8 would require multiple bytes to encode and so would result in multiple Serial.read()
calls for a given value.
If you open a python3 REPL and put chr(255).encode()
or more explicitly chr(255).encode('utf-8')
you will see it results in b'\xc3\xbf'
. So on the Arduino side you will see this as two separate Serial.read()
results, 0xC3
and 0xBF
.
When you're just taking in strings and UTF-8 and splatting them out via Serial.println()
, the Arduino is blissfully unaware of that they're UTF-8 and not ISO-8859. Really you could use any encoding you wanted, the the caveat being that if you're using c-strings to store them, the encoding would need to be one that doens't allow for a single null byte to be in the middle of the string.
You may want to look into using struct.pack rather than cobbling together python bytestrings with sequences of chr()
and .encode(whatever)
. struct.pack
will also result in a bytestring, but it's more purpose built for doing this.
You can import struct
and then struct.pack('B', 255)
will result in b'\xff
same as chr(255).encode('iso8859-1')
, but at least expresses intent. 'B'
here signifies an unsigned char
range, you'll find the format specifiers in the documentation. You'll get a greater benefit when you begin using multiple fields where struct.pack
will be a lot less unwieldy than the chr
and encode
and string concatenation.
-
1Yes,
len(chr(255).encode(encoding = 'latin-1'))
returns 1, whilelen(chr(255).encode())
returns 2anon– anon2021年03月09日 15:48:07 +00:00Commented Mar 9, 2021 at 15:48 -
ok thanks, what format should be used with struct.pack?anon– anon2021年03月09日 15:53:42 +00:00Commented Mar 9, 2021 at 15:53
-
I have migrated my previous comment into the answer and addressed your question there.timemage– timemage2021年03月09日 16:01:23 +00:00Commented Mar 9, 2021 at 16:01