I've created an Open-source ADO.NET driver for ClickHouse database: ClickHouse.Client
Would appreciate if someone has any hints to improve code quality/performance (or suggest new ideas)
The core part of binary deserialization (as the 'hot path' in profiling):
using System;
using System.Net;
using System.Numerics;
using System.Text;
using ClickHouse.Client.Types;
namespace ClickHouse.Client.Formats
{
internal class BinaryStreamReader : IDisposable
{
private readonly ExtendedBinaryReader reader;
public BinaryStreamReader(ExtendedBinaryReader reader)
{
this.reader = reader;
}
public void Dispose() => reader.Dispose();
public object ReadValue(ClickHouseType databaseType, bool nullAsDbNull)
{
switch (databaseType.TypeCode)
{
case ClickHouseTypeCode.UInt8:
return reader.ReadByte();
case ClickHouseTypeCode.UInt16:
return reader.ReadUInt16();
case ClickHouseTypeCode.UInt32:
return reader.ReadUInt32();
case ClickHouseTypeCode.UInt64:
return reader.ReadUInt64();
case ClickHouseTypeCode.Int8:
return reader.ReadSByte();
case ClickHouseTypeCode.Int16:
return reader.ReadInt16();
case ClickHouseTypeCode.Int32:
return reader.ReadInt32();
case ClickHouseTypeCode.Int64:
return reader.ReadInt64();
case ClickHouseTypeCode.Float32:
return reader.ReadSingle();
case ClickHouseTypeCode.Float64:
return reader.ReadDouble();
case ClickHouseTypeCode.String:
return reader.ReadString();
case ClickHouseTypeCode.FixedString:
var stringInfo = (FixedStringType)databaseType;
return Encoding.UTF8.GetString(reader.ReadBytes(stringInfo.Length));
case ClickHouseTypeCode.Array:
var arrayTypeInfo = (ArrayType)databaseType;
var length = reader.Read7BitEncodedInt();
var data = new object[length];
for (var i = 0; i < length; i++)
data[i] = ReadValue(arrayTypeInfo.UnderlyingType, nullAsDbNull);
return data;
case ClickHouseTypeCode.Nullable:
var nullableTypeInfo = (NullableType)databaseType;
if (reader.ReadByte() > 0)
{
return nullAsDbNull ? DBNull.Value : null;
}
else
{
return ReadValue(nullableTypeInfo.UnderlyingType, nullAsDbNull);
}
case ClickHouseTypeCode.Date:
var days = reader.ReadUInt16();
return TypeConverter.DateTimeEpochStart.AddDays(days);
case ClickHouseTypeCode.DateTime:
var seconds = reader.ReadUInt32();
return TypeConverter.DateTimeEpochStart.AddSeconds(seconds);
case ClickHouseTypeCode.UUID:
// Weird byte manipulation because of C#'s strange Guid implementation
var bytes = new byte[16];
reader.Read(bytes, 6, 2);
reader.Read(bytes, 4, 2);
reader.Read(bytes, 0, 4);
reader.Read(bytes, 8, 8);
Array.Reverse(bytes, 8, 8);
return new Guid(bytes);
case ClickHouseTypeCode.IPv4:
var ipv4bytes = reader.ReadBytes(4);
Array.Reverse(ipv4bytes);
return new IPAddress(ipv4bytes);
case ClickHouseTypeCode.IPv6:
var ipv6bytes = reader.ReadBytes(16);
return new IPAddress(ipv6bytes);
case ClickHouseTypeCode.Tuple:
var tupleTypeInfo = (TupleType)databaseType;
var count = tupleTypeInfo.UnderlyingTypes.Length;
var contents = new object[count];
for (var i = 0; i < count; i++)
{
// Underlying data in Tuple should always be null, not DBNull
contents[i] = ReadValue(tupleTypeInfo.UnderlyingTypes[i], false);
if (contents[i] is DBNull)
contents[i] = null;
}
return tupleTypeInfo.MakeTuple(contents);
case ClickHouseTypeCode.Decimal:
var decimalTypeInfo = (DecimalType)databaseType;
var factor = (int)Math.Pow(10, decimalTypeInfo.Scale);
var value = new BigInteger(reader.ReadBytes(decimalTypeInfo.Size));
return (decimal)value / factor;
case ClickHouseTypeCode.Nothing:
break;
case ClickHouseTypeCode.Nested:
throw new NotSupportedException("Nested types cannot be read directly");
case ClickHouseTypeCode.Enum8:
var enum8TypeInfo = (EnumType)databaseType;
return enum8TypeInfo.Lookup(reader.ReadSByte());
case ClickHouseTypeCode.Enum16:
var enum16TypeInfo = (EnumType)databaseType;
return enum16TypeInfo.Lookup(reader.ReadInt16());
case ClickHouseTypeCode.LowCardinality:
var lcCardinality = (LowCardinalityType)databaseType;
return ReadValue(lcCardinality.UnderlyingType, nullAsDbNull);
}
throw new NotImplementedException($"Reading of {databaseType.TypeCode} is not implemented");
}
}
}
1 Answer 1
it's fine with me, but since it's open-source, you should consider the various skill levels that would work with this library. So, to make it more readable, I would prefer to divide it into smaller blocks that would handle each type separately.
For instance, for integer types (uint8, uint16 ..etc), you can move it to a private method like this :
private object IsInteger(ClickHouseType databaseType)
{
switch(databaseType.TypeCode)
{
case ClickHouseTypeCode.UInt8:
return reader.ReadByte();
case ClickHouseTypeCode.UInt16:
return reader.ReadUInt16();
case ClickHouseTypeCode.UInt32:
return reader.ReadUInt32();
case ClickHouseTypeCode.UInt64:
return reader.ReadUInt64();
case ClickHouseTypeCode.Int8:
return reader.ReadSByte();
case ClickHouseTypeCode.Int16:
return reader.ReadInt16();
case ClickHouseTypeCode.Int32:
return reader.ReadInt32();
case ClickHouseTypeCode.Int64:
return reader.ReadInt64();
case ClickHouseTypeCode.Float32:
return reader.ReadSingle();
case ClickHouseTypeCode.Float64:
return reader.ReadDouble();
default:
return null;
}
}
and call it back from ReadValue
method. You can apply the same thing on all types, and use if
statement if you see that the code has multiple lines (like Array, UUID..etc) types. This would improve readability, and also would give you more flexibility to maintain each type and expand it as needed.
Also, if you use this approach, I would highly recommend to return the specific type of each type instead of object
on your private methods.
Example :
private Guid IsGuid(ClickHouseType databaseType)
{
if (databaseType.TypeCode == ClickHouseTypeCode.UUID)
{
var bytes = new byte[16];
reader.Read(bytes, 6, 2);
reader.Read(bytes, 4, 2);
reader.Read(bytes, 0, 4);
reader.Read(bytes, 8, 8);
Array.Reverse(bytes, 8, 8);
return new Guid(bytes);
}
return Guid.Empty;
}
You can either try to take object
and then convert and return the corresponded type, or define each type explicit in your class as you don't need to base your whole project into castings from object to another type, you need to do it once, and do the rest of the process on that type.
UPDATE
I've just need to add this approach as well :
private readonly ClickHouseType _databaseType;
public T ReadValue<T>()
{
var type = typeof(T);
if (type == typeof(ushort))
{
return (T)Convert.ChangeType(reader.ReadUInt16(), type);
}
if (type == typeof(uint))
{
return (T)Convert.ChangeType(reader.ReadUInt32(), type);
}
if (type == typeof(ulong))
{
return (T)Convert.ChangeType(reader.ReadUInt64(), type);
}
if (type == typeof(short))
{
return (T)Convert.ChangeType(reader.ReadInt16(), type);
}
if (type == typeof(int))
{
return (T)Convert.ChangeType(reader.ReadInt32(), type);
}
if (type == typeof(long))
{
return (T)Convert.ChangeType(reader.ReadInt64(), type);
}
if (type == typeof(float))
{
return (T)Convert.ChangeType(reader.ReadSingle(), type);
}
if (type == typeof(double))
{
return (T)Convert.ChangeType(reader.ReadDouble(), type);
}
if (type == typeof(decimal))
{
var decimalTypeInfo = (DecimalType) _databaseType;
var factor = (int)Math.Pow(10, decimalTypeInfo.Scale);
var value = new BigInteger(reader.ReadBytes(decimalTypeInfo.Size));
var result = (decimal)value / factor;
return (T)Convert.ChangeType(result, type);
}
if (type == typeof(sbyte))
{
return (T)Convert.ChangeType(reader.ReadSByte(), type);
}
if (type == typeof(byte))
{
return (T)Convert.ChangeType(reader.ReadByte(), type);
}
if (type == typeof(string))
{
return (T)Convert.ChangeType(reader.ReadString(), type);
}
if (type == typeof(FixedStringType))
{
var stringInfo = (FixedStringType) _databaseType;
return (T)Convert.ChangeType(Encoding.UTF8.GetString(reader.ReadBytes(stringInfo.Length)), type);
}
if (type == typeof(Guid))
{
var bytes = new byte[16];
reader.Read(bytes, 6, 2);
reader.Read(bytes, 4, 2);
reader.Read(bytes, 0, 4);
reader.Read(bytes, 8, 8);
Array.Reverse(bytes, 8, 8);
return (T)Convert.ChangeType(new Guid(bytes), type);
}
return default;
}
example usage :
var result = ReadValue<decimal>();
what I think you need is a class to handle the conversion for each type, as this would really be useful in your project. Then, you can just use this class across your project to just convert the types as needed. Some custom types that I've seen needs to be implemented and used as a strong-typed object instead of using Enum
. Like UInt8
for example. If I'm in your place, I would create a class for each type, and define the defaults values, conditions (if any), and also converters for each. Then, I Create a static converter class which would call back the ConvertTo
method in each type (which it's been already implemented in each class).
-
\$\begingroup\$ While I appreciate the effort, these advices would kill performance.
switch
is compiled to lookup table in release mode, whereasif-else
would cause a lot of extra checks.TypeCode
comparison was used to avoid castingis
checks which happen implicitly inEquals
checks.(T)Convert.ChangeType(reader.ReadUInt16(), type)
means that the value would be boxed in anobject
, then unboxed to aT
, then boxed intoobject
again because ultimately the value is stored inobject[]
(and type is not known in advance). Also method calledIsGuid
returningGuid
sounds like a bit of oxymoron \$\endgroup\$DarkWanderer– DarkWanderer2020年02月16日 10:15:41 +00:00Commented Feb 16, 2020 at 10:15 -
\$\begingroup\$ @DarkWanderer
these advices would kill performance
well, instead of guessing, test then ask. that's for performance.For the genericT
, you should know thatT
is a placeholder, and there is no actual casting (or boxing) in generics, as they will be compiled for a specified type at compile-time. For theIsGuid
(you might be aware) that naming convention may or may not be perfect in examples as they're for demonstration purpose only and not intended to be used in production environments. and yesIsGuid
is not a good naming convention for returningGuid
which I did not notice it. \$\endgroup\$iSR5– iSR52020年02月16日 20:29:57 +00:00Commented Feb 16, 2020 at 20:29