I'm in need to make a byte[]
-> T
extension method and need it to be fast (no need for it being pretty)
This function will be called 100's of 1000's of times in very short succession in an absolute performance critical environment.
We're currently optimizing on "ticks" level, every tick translates to a couple milliseconds higher in the callstack, thus the need of raw speed over maintainability (not how I like to design software, but the reasoning behind this is out of scope).
Consider the following code, it's clean and maintainable, but it's relatively slow (probably due to boxing and unboxing), Can this be optimized to be faster?
public static T ConvertTo<T>(this byte[] bytes, int offset = 0)
{
var type = typeof(T);
if (type == typeof(sbyte)) return bytes[offset].As<T>();
if (type == typeof(byte)) return bytes[offset].As<T>();
if (type == typeof(short)) return BitConverter.ToInt16(bytes, offset).As<T>();
if (type == typeof(ushort)) return BitConverter.ToUInt32(bytes, offset).As<T>();
if (type == typeof(int)) return BitConverter.ToInt32(bytes, offset).As<T>();
if (type == typeof(uint)) return BitConverter.ToUInt32(bytes, offset).As<T>();
if (type == typeof(long)) return BitConverter.ToInt64(bytes, offset).As<T>();
if (type == typeof(ulong)) return BitConverter.ToUInt64(bytes, offset).As<T>();
throw new NotImplementedException();
}
public static T As<T>(this object o)
{
return (T)o;
}
And yes it needs to be generic, sadly
-
6\$\begingroup\$ In my opinion, you've not given us enough context here. We have no idea what the input looks like or why you insist that it must be a generic method. Both of these are very relevant to the review. Also, are you already sure it isn't performant enough? Do you have well defined parameters of how fast is fast enough? Have you already benchmarked the code against that target? \$\endgroup\$RubberDuck– RubberDuck2015年08月22日 11:14:15 +00:00Commented Aug 22, 2015 at 11:14
-
\$\begingroup\$ My guess is late-bound data from data tables with binary fields, but based on OP's comments, any improvements to the whole data mapping paradigm have to be made at a larger architectural level that can't be done at this time. Even if it's not DataTables the same thing applies to pretty much any data mapping scenario. \$\endgroup\$moarboilerplate– moarboilerplate2015年08月22日 11:40:38 +00:00Commented Aug 22, 2015 at 11:40
-
2\$\begingroup\$ If you want performance, then throw out LINQ and access the array directly. (I saw a 4x speed-up in a simple test.) \$\endgroup\$hangy– hangy2015年08月23日 10:12:56 +00:00Commented Aug 23, 2015 at 10:12
1 Answer 1
First, you seem to have a minor typo on this line:
if (type == typeof(ushort)) return BitConverter.ToUInt32(bytes, offset).As<T>();
That should be:
if (type == typeof(ushort)) return BitConverter.ToUInt16(bytes, offset).As<T>();
Just as well, you have a bug in here that is a pretty big one (causes exception on any attempts to convert anything to sbyte
with your method):
if (type == typeof(sbyte)) return bytes[offset].As<T>();
Should be:
if (type == typeof(sbyte)) return ((sbyte)bytes[offset]).As<T>();
If you really need speed, you should probably not use the BitConverter
class in this situation. Use bitwise operators, as they are much faster.
See this answer for a comparison.
You should also not use .As<T>
either, and instead cast within the method using: (T)(object)
. This eliminates unnecessary stack overhead.
Two improvements we can make immediately:
- Replace all the
BitConverter
work with bitwise work. - Replace all the
.As<T>()
with(T)(object)
casts instead.
Performance Evaluation
The performance difference, when those changes are made is significant for certain types of T
.
ConvertTo1<byte> on 50000000 rounds, time (in ms) taken: 3465
ConvertTo1<short> on 50000000 rounds, time (in ms) taken: 4217
ConvertTo1<int> on 50000000 rounds, time (in ms) taken: 5586
ConvertTo1<long> on 50000000 rounds, time (in ms) taken: 7665
ConvertTo2<byte> on 50000000 rounds, time (in ms) taken: 4995
ConvertTo2<short> on 50000000 rounds, time (in ms) taken: 5775
ConvertTo2<int> on 50000000 rounds, time (in ms) taken: 6945
ConvertTo2<long> on 50000000 rounds, time (in ms) taken: 8492
ConvertToInt on 50000000 rounds, time (in ms) taken: 1092
Verifying results of conversions are same:
SByte test: 1, 1: True
Byte test: 1, 1: True
Short test: 10497, 10497: True
UShort test: 10497, 10497: True
Int test: 755378433, 755378433: True
UInt test: 755378433, 755378433: True
Long test: 4041804345027995905, 4041804345027995905: True
ULong test: 4041804345027995905, 4041804345027995905: True
Now, to explain, ConvertTo1<T>
is the optimized method, ConvertTo2<T>
is the original, and ConvertToInt
is a strongly-typed method for converting directly to int
.
By removing .As<T>()
and replacing with (T)(object)
, and replacing all the BitConverter
work with bitwise work, we cut our work down to:
- 69% original time for
byte
- 73% original time for
short
- 80% original time for
int
- 90% original time for
long
Also note that we returned the exact same values for all situations.
Just how bad is performance dragging because of the boxing?
Badly. If we consider another method, ConvertTo3
which is strongly typed around an int
parameter, we can come up with the following result:
ConvertToInt on 50000000 rounds, time (in ms) taken: 1034
Now that is with a method that only returns an int
value.
This leads us to be able to conclude that, due to the boxing of the generic types, and recasting them, we lose a lot of performance. (Around 500% or so right-off-the-bat.) The issues isn't the fact that you use generics, the issue is that you use boxing several times within, which creates extra overhead.
Using an .As<T>()
method
While it would make sense to think that this would be fairly quick, with no drawbacks, this is actually much slower than inline calls due to the overhead of having an additional method call. Replacing .As<T>()
with (T)(object)
right out of the gate gave us significant performance back. It may even be acceptable to only make that change and continue using BitConverter
.
In fact, upon further research, it seems that if you replace the ConvertTo2
method I supplied with this variant:
public static T ConvertTo2<T>(this byte[] bytes, int offset = 0)
{
var type = typeof(T);
if (type == typeof(sbyte)) return (T)(object)((sbyte)bytes[offset]);
if (type == typeof(byte)) return (T)(object)bytes[offset];
if (type == typeof(short)) return (T)(object)BitConverter.ToInt16(bytes, offset);
if (type == typeof(ushort)) return (T)(object)BitConverter.ToUInt16(bytes, offset);
if (type == typeof(int)) return (T)(object)BitConverter.ToInt32(bytes, offset);
if (type == typeof(uint)) return (T)(object)BitConverter.ToUInt32(bytes, offset);
if (type == typeof(long)) return (T)(object)BitConverter.ToInt64(bytes, offset);
if (type == typeof(ulong)) return (T)(object)BitConverter.ToUInt64(bytes, offset);
throw new NotImplementedException();
}
The performance difference between ConvertTo1
and ConvertTo2
is very minimal. (Though, removing the BitConverter
work still does seem to make it slightly faster, the difference is very insignificant at this level.)
Food for thought.
Code
Here's the code I used for conversions:
public static class Extensions
{
public static T ConvertTo1<T>(this byte[] bytes, int offset = 0)
{
var type = typeof(T);
if (type == typeof(sbyte)) return (T)(object)((sbyte)bytes[offset]);
if (type == typeof(byte)) return (T)(object)bytes[offset];
if (type == typeof(short)) return (T)(object)((short)(bytes[offset + 1] << 8 | bytes[offset]));
if (type == typeof(ushort)) return (T)(object)((ushort)(bytes[offset + 1] << 8 | bytes[offset]));
if (type == typeof(int)) return (T)(object)(bytes[offset + 3] << 24 | bytes[offset + 2] << 16 | bytes[offset + 1] << 8 | bytes[offset]);
if (type == typeof(uint)) return (T)(object)((uint)bytes[offset + 3] << 24 | (uint)bytes[offset + 2] << 16 | (uint)bytes[offset + 1] << 8 | bytes[offset]);
if (type == typeof(long)) return (T)(object)((long)bytes[offset + 7] << 56 | (long)bytes[offset + 6] << 48 | (long)bytes[offset + 5] << 40 | (long)bytes[offset + 4] << 32 | (long)bytes[offset + 3] << 24 | (long)bytes[offset + 2] << 16 | (long)bytes[offset + 1] << 8 | bytes[offset]);
if (type == typeof(ulong)) return (T)(object)((ulong)bytes[offset + 7] << 56 | (ulong)bytes[offset + 6] << 48 | (ulong)bytes[offset + 5] << 40 | (ulong)bytes[offset + 4] << 32 | (ulong)bytes[offset + 3] << 24 | (ulong)bytes[offset + 2] << 16 | (ulong)bytes[offset + 1] << 8 | bytes[offset]);
throw new NotImplementedException();
}
public static T ConvertTo2<T>(this byte[] bytes, int offset = 0)
{
var type = typeof(T);
if (type == typeof(sbyte)) return ((sbyte)bytes[offset]).As<T>();
if (type == typeof(byte)) return bytes[offset].As<T>();
if (type == typeof(short)) return BitConverter.ToInt16(bytes, offset).As<T>();
if (type == typeof(ushort)) return BitConverter.ToUInt16(bytes, offset).As<T>();
if (type == typeof(int)) return BitConverter.ToInt32(bytes, offset).As<T>();
if (type == typeof(uint)) return BitConverter.ToUInt32(bytes, offset).As<T>();
if (type == typeof(long)) return BitConverter.ToInt64(bytes, offset).As<T>();
if (type == typeof(ulong)) return BitConverter.ToUInt64(bytes, offset).As<T>();
throw new NotImplementedException();
}
public static int ConvertToInt(this byte[] bytes, int offset = 0)
{
return (bytes[offset + 3] << 24 | bytes[offset + 2] << 16 | bytes[offset + 1] << 8 | bytes[offset]);
}
public static T As<T>(this object o)
{
return (T)o;
}
}
And the test code:
public class CR_101636
{
public static void _Main(string[] args)
{
var rounds = 50000000;
var bytes = new byte[] { 1, 41, 6, 45, 163, 95, 23, 56 };
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < rounds; i++)
{
byte result = bytes.ConvertTo1<byte>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo1<byte> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
short result = bytes.ConvertTo1<short>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo1<short> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
int result = bytes.ConvertTo1<int>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo1<int> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
long result = bytes.ConvertTo1<long>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo1<long> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
byte result = bytes.ConvertTo2<byte>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo2<byte> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
short result = bytes.ConvertTo2<short>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo2<short> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
int result = bytes.ConvertTo2<int>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo2<int> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
long result = bytes.ConvertTo2<long>(0);
}
sw.Stop();
Console.WriteLine("ConvertTo2<long> on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
sw.Start();
for (int i = 0; i < rounds; i++)
{
int result = bytes.ConvertToInt(0);
}
sw.Stop();
Console.WriteLine("ConvertToInt on {0} rounds, time (in ms) taken: {1}", rounds, sw.ElapsedMilliseconds);
sw.Reset();
Console.WriteLine("");
Console.WriteLine("Verifying results of conversions are same:");
Console.WriteLine("SByte test: {0}, {1}: {2}", bytes.ConvertTo1<sbyte>(), bytes.ConvertTo2<sbyte>(), bytes.ConvertTo1<sbyte>() == bytes.ConvertTo2<sbyte>());
Console.WriteLine("Byte test: {0}, {1}: {2}", bytes.ConvertTo1<byte>(), bytes.ConvertTo2<byte>(), bytes.ConvertTo1<byte>() == bytes.ConvertTo2<byte>());
Console.WriteLine("Short test: {0}, {1}: {2}", bytes.ConvertTo1<short>(), bytes.ConvertTo2<short>(), bytes.ConvertTo1<short>() == bytes.ConvertTo2<short>());
Console.WriteLine("UShort test: {0}, {1}: {2}", bytes.ConvertTo1<ushort>(), bytes.ConvertTo2<ushort>(), bytes.ConvertTo1<ushort>() == bytes.ConvertTo2<ushort>());
Console.WriteLine("Int test: {0}, {1}: {2}", bytes.ConvertTo1<int>(), bytes.ConvertTo2<int>(), bytes.ConvertTo1<int>() == bytes.ConvertTo2<int>());
Console.WriteLine("UInt test: {0}, {1}: {2}", bytes.ConvertTo1<uint>(), bytes.ConvertTo2<uint>(), bytes.ConvertTo1<uint>() == bytes.ConvertTo2<uint>());
Console.WriteLine("Long test: {0}, {1}: {2}", bytes.ConvertTo1<long>(), bytes.ConvertTo2<long>(), bytes.ConvertTo1<long>() == bytes.ConvertTo2<long>());
Console.WriteLine("ULong test: {0}, {1}: {2}", bytes.ConvertTo1<ulong>(), bytes.ConvertTo2<ulong>(), bytes.ConvertTo1<ulong>() == bytes.ConvertTo2<ulong>());
}
}
In your Program
(or wherever):
CR_101636._Main(args);
-
\$\begingroup\$ @moarboilerplate I added a little clarification on that. Thanks! :) \$\endgroup\$Der Kommissar– Der Kommissar2015年08月25日 16:11:39 +00:00Commented Aug 25, 2015 at 16:11
-
1\$\begingroup\$ With that said, a generic method means that the type is known at compile time, then turned into a runtime type, so you would need a pretty specific use case... \$\endgroup\$moarboilerplate– moarboilerplate2015年08月25日 16:24:50 +00:00Commented Aug 25, 2015 at 16:24