I want to write a function that gets a string of hex numbers (two hex numbers represent a u8
value) then returns a vector of u8
values, e.g. the string 1f 0a d1
should be converted into [31, 10, 209]
. The input string may contains non hex characters, then the function must be able to filter these characters, such as:
1f\x0ad1
1f \x0a\xD1
\x1F \x0a \xd1
\x1f \x0A\xd1
...
All of them lead to the output [31, 10, 209]
. My solution is that following:
fn parse_hex(hex_asm: &str) -> Vec<u8> {
let hex_chars: Vec<char> = hex_asm.as_bytes().iter().filter_map(|b| {
let ch = char::from(*b);
if ('0' <= ch && ch <= '9') || ('a' <= ch && ch <= 'f') || ('A' <= ch && ch <= 'F') {
Some(ch)
} else {
None
}
}).collect();
let mut index = 0usize;
let (odd_chars, even_chars): (Vec<char>, Vec<char>) = hex_chars.into_iter().partition(|_| {
index = index + 1;
index % 2 == 1
});
odd_chars.into_iter().zip(even_chars.into_iter()).map(|(c0, c1)| {
fn hexchar2int(ch: char) -> u8 {
if '0' <= ch && ch <= '9' {
ch as u8 - '0' as u8
} else {
0xa +
if 'a' <= ch && ch <= 'f' {
ch as u8 - 'a' as u8
} else if 'A' <= ch && ch <= 'F' {
ch as u8 - 'A' as u8
} else {
unreachable!()
}
}
}
hexchar2int(c0) * 0x10 + hexchar2int(c1)
}).collect::<Vec<u8>>()
}
But it does make me happy. There are several problems that I may think of:
- It needs to create two sub-vectors, one for odd-index element, one for even-index
- Because of performance, I wrote the function
hexchar2int
(which is ugly) to convert two hex number into au8
value instead of usingu8::from_str_radix
I wonder if there is a better method to do that.
1 Answer 1
Note that this code will not work on any non-ASCII strings. These are more and more common, especially considering the global community we are a part of (and don't forget emoji 🙃).
Use
Vec<_>
to avoid redundantly specifying the inner type when collecting.Use byte literals
b'x'
instead of casting characters to bytes.You can match on ranges of characters. I find this aesthetically pleasing.
Don't collect an iterator into a
Vec
just to callinto_iter
on theVec
. Instead, use the original iterator withpartition
.Don't convert from
u8
tochar
and back. You only need to make a single transformation.There's no need to use the turbofish with the final
collect
since it's being returned and the type can be inferred.There's no need to call
into_iter
forzip
's argument, it's implied because it takes anIntoIterator
.Instead of partitioning into vectors, just grab two values out of the iterator at a time. Using
fuse
allows callingnext
after it's already returnedNone
.Your code handles the check of a hex digit twice, leading to the
unreachable
. Instead, perform the conversion when performing the check.
fn parse_hex(hex_asm: &str) -> Vec<u8> {
let mut hex_bytes = hex_asm.as_bytes().iter().filter_map(|b| {
match b {
b'0'...b'9' => Some(b - b'0'),
b'a'...b'f' => Some(b - b'a' + 10),
b'A'...b'F' => Some(b - b'A' + 10),
_ => None,
}
}).fuse();
let mut bytes = Vec::new();
while let (Some(h), Some(l)) = (hex_bytes.next(), hex_bytes.next()) {
bytes.push(h << 4 | l)
}
bytes
}
u8::from_str_radix
is overkill because in this case I need to create a string of two characters. \$\endgroup\$