Following the avro schema documentation for decimals I've created a method to turn a decimal into a byte array. The goals are:
- Should be represented by a non-scaled integer
- Should be big-endian
I've tested the code I wrote, but I'd like to know if the program still works in any unforeseen corner cases.
private static byte[] ToBigEndianByteArray(decimal value, int scale)
{
var bigInteger = new BigInteger(value * (decimal)Math.Pow(10, scale));
var bytes = bigInteger.ToByteArray();
return BitConverter.IsLittleEndian ? bytes.Reverse().Select(ReverseEndianness).ToArray() : bytes;
static byte ReverseEndianness(byte input)
{
const int bits = 8;
return (byte)Enumerable.Range(0, bits - 1)
.Aggregate(0, (accumulator, index) =>
BitAtIndexIsSet(input, index)
? SetBitAtIndex(accumulator, bits - 1 - index)
: accumulator);
static int SetBitAtIndex(int value, int index) => value | 1 << index;
static bool BitAtIndexIsSet(int value, int index) => (value & (1 << index)) != 0;
}
}
Example usage: ToBigEndianByteArray(8.45m, 2)
produces 0000001101001101
(845 in decimal).
-
\$\begingroup\$ Here we can review code which works as expected. So, asking a question like this someone could tell me if this looks correct or not does indicate that you are not sure whether it works as expected. \$\endgroup\$Peter Csala– Peter Csala2022年04月04日 11:25:42 +00:00Commented Apr 4, 2022 at 11:25
-
1\$\begingroup\$ BTW why do you have so many nested static local functions? \$\endgroup\$Peter Csala– Peter Csala2022年04月04日 11:29:51 +00:00Commented Apr 4, 2022 at 11:29
-
2\$\begingroup\$ @PeterCsala as far as I can see I followed the spec and it seems to produce the correct output. This may not be the goal of the site, but I'd expect a code review to cover whether the code actually does what it intends to. As to your second question I've scoped them to the only place that they're used. \$\endgroup\$NickL– NickL2022年04月04日 13:16:58 +00:00Commented Apr 4, 2022 at 13:16
-
2\$\begingroup\$ @PeterCsala With the given explanation, it's fine. Code doesn't have to be perfect, it has to work to the best of the authors knowledge. That seems to be the case. \$\endgroup\$Mast– Mast ♦2022年04月04日 17:23:03 +00:00Commented Apr 4, 2022 at 17:23
-
2\$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . \$\endgroup\$Mast– Mast ♦2022年04月04日 17:23:06 +00:00Commented Apr 4, 2022 at 17:23
1 Answer 1
It does not look correct to me, in two ways:
- Computing
Math.Pow(10, scale)
. This is not exact when scale is negative or large. Since it's fine for moderate positive scales and the amount of inexactness in other cases typically won't be very high, it may be difficult to actually produce a wrong result, but I would not dare to rely on it - especially for a format that is supposed to be exact. - Big-endian does not mean reversing the bits in every byte, which I think the
ReverseEndianness
does, but it's a relatively confusing function, and I don't mean the bitwise part of it but rather everything except that. I would have used something like this adapted for C#. Doesn't really matter in the end, because the specification does not say to reverse the bits to begin with, so you can remove that whole function.
Furthermore I think the API is hard to use correctly, because it assumes that you already have a scale, which is not trivial to find. Passing the wrong scale can very easily result in information being lost.
Alternatively, you may be able to use the GetBits method on the decimal, extract its internal scale (and use it directly as the scale in avro format, which uses the same kind of scales), and convert the integer part that it has internally to a big-endian (but not bit-reversed) array of bytes. It seems to me that this would severely limit the possibility of both incorrect API usage, and incorrect implementation.
-
\$\begingroup\$ > Alternatively, you may be able to use the GetBits method on the decimal, extract its internal scale I'm not sure about this, the schema is static and the input values for the fields shouldn't be changing scale. How do you suppose to do that dynamically? > Big-endian does not mean reversing the bits in every byte, which I think the ReverseEndianness does I'm struggling to understand this. At first I just reversed the byte array, but then thought that can't possibly be correct. \$\endgroup\$NickL– NickL2022年04月04日 13:27:44 +00:00Commented Apr 4, 2022 at 13:27
-
\$\begingroup\$ @NickL if the scale is fixed then please ignore that suggestion. As for big-endian, reversing the bytes (but not the bits within them) is right. \$\endgroup\$user555045– user5550452022年04月04日 13:33:03 +00:00Commented Apr 4, 2022 at 13:33
-
\$\begingroup\$ Thank you very much for your help. I'd been running
string.Join("", bytes.SelectMany(b => new BitArray(new[] { b }).Cast<bool>()).Select(b => b ? 1 : 0))
. My rationale was that it was reversing a collection of collections of bits. If only the outer collection was reversed, then each inner collection would need to be reversed too \$\endgroup\$NickL– NickL2022年04月04日 13:37:34 +00:00Commented Apr 4, 2022 at 13:37 -
1\$\begingroup\$ I totally agree
decimal.GetBits
should be used. I can comment onMath.Pow
. As I explained in an answer on SO, powers of ten generally cannot be exact in adouble
. Try for example(1e23).ToString("G17")
orMath.Pow(10.0, 23.0).ToString("G17")
. The subsequent cast fromdouble
todecimal
can lose precision again in a way that can actually save you, but it is fragile! There isBigInteger.Pow
which is exact. \$\endgroup\$Jeppe Stig Nielsen– Jeppe Stig Nielsen2022年04月05日 05:59:52 +00:00Commented Apr 5, 2022 at 5:59 -
\$\begingroup\$ @JeppeStigNielsen thanks for the information. I don't believe I can use
decimal.GetBits
because it has to match the schema, which is set at design time. I will look at usingBigInteger.Pow
, but realistically I doubt scales that high would ever be used. \$\endgroup\$NickL– NickL2022年04月05日 10:38:11 +00:00Commented Apr 5, 2022 at 10:38