10

I'm having an issue converting from a particular Uint8Array to a string and back. I'm working in the browser and in Chrome which natively supports the TextEncoder/TextDecoder modules.

If I start with a simple case, everything seems to work well:

const uintArray = new TextEncoder().encode('silly face demons'); // Uint8Array(17) [115, 105, 108, 108, 121, 32, 102, 97, 99, 101, 32, 100, 101, 109, 111, 110, 115] new TextDecoder().decode(uintArray); // silly face demons

But the following case is not giving me the results I expect. Without getting into too much of the details (it's cryptography related), let's start with the fact that I'm provided with the following Uint8Array:

Uint8Array(24) [58, 226, 7, 102, 202, 238, 58, 234, 217, 17, 189, 208, 46, 34, 254, 4, 76, 249, 169, 101, 112, 102, 140, 208]

and what I want to do is to convert that to a string and then later decrypt the string back to the original array, but I get this:

const uintArray = new Uint8Array([58, 226, 7, 102, 202, 238, 58, 234, 217, 17, 189, 208, 46, 34, 254, 4, 76, 249, 169, 101, 112, 102, 140, 208]); new TextDecoder().decode(uint8Array); // :�f��:����."�L��epf�� new TextEncoder().encode(':�f��:����."�L��epf��');

...which results in: Uint8Array(48) [58, 239, 191, 189, 7, 102, 239, 191, 189, 239, 191, 189, 58, 239, 191, 189, 239, 191, 189, 17, 239, 191, 189, 239, 191, 189, 46, 34, 239, 191, 189, 4, 76, 239, 191, 189, 239, 191, 189, 101, 112, 102, 239, 191, 189, 239, 191, 189]

The array has doubled. Encoding is a bit out of my wheel house. Can anyone tell me why the array has doubled (I'm assuming it's an alternate representation of the original array...?). Also, and more importantly, is there a way I could get back to the original array (i.e. undouble the one I'm getting)?

asked Jul 17, 2018 at 0:32
2
  • It is simple: Not all byte values correspond to string characters. Not in ASCII or unicode. Also there is misuse of encrypt/decrypt encode/decode, they are not the same thing. Commented Jul 17, 2018 at 1:01
  • 3
    If you want only to convert it into string and back and get coresponding values, you can do: var str = String.fromCharCode(...uintArray) and then Uint8Array.from([...str].map(ch => ch.charCodeAt())) Commented Jul 17, 2018 at 1:30

2 Answers 2

12

You have code points in the array that you are trying to convert to utf-8 that don't make sense or are not allowed. Pretty much everything >= 128 requires special handling. Some of these are allowed but are leading bytes for multiple byte sequences and some like 254 are just not allowed. If you want to convert back and forth you will need to make sure you are creating valid utf-8. The codepage layout here might be useful: https://en.wikipedia.org/wiki/UTF-8#Codepage_layout as might the description of illegal byte sequences: https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences.

As a concrete example, this:

let arr = new TextDecoder().decode(new Uint8Array([194, 169]))
let res = new TextEncoder().encode(arr) // => [194, 168]

works because [194, 169] is valid utf-8 for © but:

let arr = new TextDecoder().decode(new Uint8Array([194, 27]))
let res = new TextEncoder().encode(arr) // => [239, 191, 189, 27]

doesn't because it's not a valid sequence.

answered Jul 17, 2018 at 1:16

3 Comments

Thanks. That makes sense. I think I may just be using the wrong encoding here. Perhaps base64 would help me?
Just for the record.. base64 did the trick for me. I guess I just had code points in my Uint8Array that can't be represented by utf8.
Probably should never use UTF-8 for this type of work. As strings from "individual "bytes should be single byte character sets, where there is no such thing as an invalid character, especially if bit mapping.
2

To get string from Uint8Array and back:

var u8arr = new Uint8Array([34, 128, 255]);
var u8str = u8arr.toString(); // Convert Uint8Array to String
console.log(u8str);
var u8arr2 = Uint8Array.from(u8str.split(',').map(x=>parseInt(x,10)));
console.log(u8arr2); // back to Uint8Array

This does not suffer from utf-8 issues.

answered Jun 9, 2022 at 8:24

2 Comments

There's obviously something encoded in the Uint8Array. This answer doesn't attempt to decode anything.
@DavidR. This answers OP's question of how to convert Uint8Array to string and back. OP is not trying to encode/decode to/from UTF-8, but only trying to convert Uint8Array to string and back. However he is using the TextEncoder/Decoder functions which is why he is facing issues.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.