Given a C# string which is a set of hexadecimal numbers such as:
string HexString = "202048656c6c6f20576f726c64313233212020";
Where those hexadecimal numbers represent the ASCII text:
" Hello World123! "
I need to convert the HexString
to a String and I have the following code which works, but looks so excessive!
string HexStringToString(string HexString) {
string stringValue = "";
for (int i = 0; i < HexString.Length / 2; i++) {
string hexChar = HexString.Substring(i * 2, 2);
int hexValue = Convert.ToInt32(hexChar, 16);
stringValue += Char.ConvertFromUtf32(hexValue);
}
return stringValue;
}
Am I missing some elegant method?
5 Answers 5
The method does two different things and thus should be split in two:
Interpret a hex string as a sequence of bytes. You can find many possible implementations at How do you convert Byte Array to Hexadecimal String, and vice versa?.
Yours has quadratic runtime (due to the string concatenation pattern RobH noted) and creates a new string object for each byte.
Keeping it similar to yours, but reducing it to linear runtime:
public static byte[] HexStringToBytes(string hexString) { if(hexString == null) throw new ArgumentNullException("hexString"); if(hexString.Length % 2 != 0) throw new ArgumentException("hexString must have an even length", "hexString"); var bytes = new byte[hexString.Length / 2]; for (int i = 0; i < bytes.Length; i++) { string currentHex = hexString.Substring(i * 2, 2); bytes[i] = Convert.ToByte(currentHex, 16); } return bytes; }
This code is still relatively slow, creating a new substring for each byte and using
Convert.ToByte
, but I'd only complicate that after benchmarking revealed this as relevant cost.Interpret the sequence of bytes as an ISO-8859-1 encoded string. This is equivalent to your code, since the first 256 code-points in Unicode match the ISO-8859-1 single-byte encoding.
I'd use:
Encoding.GetEncoding("ISO-8859-1").GetString(bytes)
You should consider using UTF-8 instead, so you can support any Unicode code-point and not just those common in western Europe.
-
\$\begingroup\$ I think this answer is better than mine - I certainly think you're right about splitting it into two methods. Good point about being explicit about the encoding as well. \$\endgroup\$RobH– RobH2015年07月24日 18:21:31 +00:00Commented Jul 24, 2015 at 18:21
-
\$\begingroup\$ ANSI Windows-1252 encoding is favored over ISO-8859-1. \$\endgroup\$dfhwze– dfhwze2019年06月05日 05:24:55 +00:00Commented Jun 5, 2019 at 5:24
I don't think there is a built-in method. Yours is pretty good but we could make some improvements:
- Parameters should be camelCase => hexString.
- You should favour
StringBuilder
when building up strings. - You should step through the string in increments of 2 to cut down on the maths.
- You should validate the argument.
- You should prefer
var
when the type is obvious.
Result of those points:
string HexStringToString(string hexString)
{
if (hexString == null || (hexString.Length & 1) == 1)
{
throw new ArgumentException();
}
var sb = new StringBuilder();
for (var i = 0; i < hexString.Length; i += 2) {
var hexChar = hexString.Substring(i, 2);
sb.Append((char)Convert.ToByte(hexChar, 16));
}
return sb.ToString();
}
-
\$\begingroup\$ I really appreciate your improvements. My code is typically just reviewed by me, so I don't get much good advice! I will have to start using StringBuilder as it really is not in my repertoire (I have a C/assy background). As far as the math of i+2 vs i*2, I would "assume" the compiler sees *2 as a shift and no real multiplication. (I totally misunderstand var.) In this case the hexString is known to consist of pairs of hex digits on entry; however, to make it reusable, definitely validate. As to camelCase, I have to admit my functions, properties, methods and parameters are ALLOverthePlace! \$\endgroup\$frog_jr– frog_jr2015年07月24日 19:45:54 +00:00Commented Jul 24, 2015 at 19:45
-
\$\begingroup\$ @frog_jr - I simplified the maths to make it easier to read. I don't generally worry about perf unless I see a problem when profiling. \$\endgroup\$RobH– RobH2015年07月26日 11:34:09 +00:00Commented Jul 26, 2015 at 11:34
-
\$\begingroup\$ This solution is ok for ASCII, but is bad for UTF8 encoded HEX \$\endgroup\$nemke– nemke2015年12月02日 13:38:58 +00:00Commented Dec 2, 2015 at 13:38
-
\$\begingroup\$ To elaborate, If you try to convert Hex string
3C 6E 61 6D 65 3E D0 9D D0 B5 D0 BC D0 B0 D1 9A D0 B0 3C 2F 6E 61 6D 65 3E
wich represent my Cyrillic name within tag name<name>Немања</name>
you will get garbage like<name>ÐемаÑа</name>
instead, and that's all because you are grouping characters by 2. \$\endgroup\$nemke– nemke2015年12月02日 14:13:04 +00:00Commented Dec 2, 2015 at 14:13 -
\$\begingroup\$ @nemke - I realized what you meant so I deleted my comment. The OP specifically mentions ASCII but you are of course correct that it doesn't scale to other character encodings. \$\endgroup\$RobH– RobH2015年12月02日 14:17:22 +00:00Commented Dec 2, 2015 at 14:17
I'm going to depart from the other answers and focus on this bit:
Am I missing some elegant method?
Do you consider Regex to be elegant? You could reduce the amount of code required at the cost of performance. Take the following Regex expression:
(?<=\G..)(?!$)
Broken down:
(?<= # Look-behind that won't actually be captured
\G # Zero-width assertion
.. # Match exactly two characters
)(?!$) # Do not match an empty group at the end of the string
Then it's just a matter of transforming the string array into a collection of characters and joining them all back together. Using Linq's Select
and the string.Join
method, this can be done quickly.
A short implementation may look like:
string HexStringToString(string hexString)
{
string[] hexValues = Regex.Split(hexString, "(?<=\\G..)(?!$)");
var characters = hexValues.Select(hex => (char)Convert.ToByte(hex, 16));
return string.Join(string.Empty, characters);
}
Elegant? Sure. You could even do it all on a single line:
string HexStringToString(string hexString)
{
return string.Join("", Regex.Split(hexString, "(?<=\\G..)(?!$)").Select(x => (char)Convert.ToByte(x, 16)));
}
But elegance is never more valuable than readability and maintainability.
As @CodesInChaos said, your method is doing multiple things and should be split apart. I would even break out the functionality of separating a string into its own method, perhaps as an extension method off of the String
class.
-
\$\begingroup\$ This is much better 🙂 string HexStringToString(string hexString) => string.Join("", Regex.Split(hexString, "(?<=\\G..)(?!$)").Select(x => (char)Convert.ToByte(x, 16))); \$\endgroup\$Auto– Auto2022年03月10日 17:27:00 +00:00Commented Mar 10, 2022 at 17:27
If you are looking for elegant you can consider the functional paradigm. Here I have added a function SelectPair which maps 2 elements of an IEnumerable to a single element; allowing the 2 characters of the hex string to be extracted together.
The main code then reduces to
static string HexStringToString(string hexString)
{
return
String.Join(
"",
hexString
.ToCharArray()
.SelectPair(
(ch1,ch2) => ch1.ToString() + ch2)
.Select(
hexChar => (char) Convert.ToByte(hexChar, 16)));
}
SelectPair is an extension method, which can be reused elsewhere.
public static class LinqExt
{
public
static
IEnumerable<TResult>
SelectPair<TSource, TResult>(
this
IEnumerable<TSource> list,
Func<TSource,TSource,TResult> onPair)
{
var odd = default(TSource);
var isOdd = true;
foreach(var item in list)
{
if (isOdd)
{
odd = item;
}
else
{
yield return onPair(odd, item);
}
isOdd = !isOdd;
}
}
}
I like the original code best. I don't know of a better way than iterating through the string and the original code is actually pretty readable and likely no slower than any of the other options.
static
as it appears that it doesn't access any class instance data. Further, it looks like an excellent candidate to be an extension method (simply change the parameter to readthis string HexString
after making the methodstatic
). \$\endgroup\$