I am writing code that reads data from a DB, where boolean values are stored as 0/1 nullable integers.
I decided to write a simple function to encapsulate the logic which converts the integer values to boolean
. Thing is, I sometimes need to regard the db null
value as boolean
false, and sometimes just as null
.
So I've come up with the following two extension methods:
// Read the specified column value from row as nullable boolean,
// treating integer 0 as false, and any other non-null value as true.
public static bool? ReadAsBoolNullable(this DataRow row, string columnName)
{
if (row.IsNull(columnName))
{
return null;
}
else
{
return row[columnName] != 0;
}
}
// Read the specified column value from row as boolean,
// treating null and integer 0 as false, and any other value as true.
public static bool ReadAsBool(this DataRow row, string columnName)
{
bool? data = row.ReadAsBoolNullable(columnName);
return data.HasValue ? data.Value : false;
}
My main concern about this code is that I potentially have hundreds of millions of records, each having up to 10 boolean (not nullable) columns. So the second function will be executed a lot and will have to call the first one a lot. Could this damage performance significantly, given that the function itself doesn't do much?
2 Answers 2
As with all performance related questions, if and only if you can gather profiling evidence that shows that your current code is meaningfully detrimental to performance, don't call the first method in the second one, but rather repeat the code. This goes slightly against DRY (just barely, considering the very short code), but you will have to weigh the value of maintainability vs that of whatever performance gain is achieved.
In practice, I suspect that the effect on performance will be negligible if not non-existent. It's not as if you're not calling a slew of other methods and properties anyways.
The key point is profile profile profile.
-
1\$\begingroup\$ I would bet the database call is the actual bottleneck here. Time would probably be better served optimizing the database query or data manipulation algorithms if there is a performance issue. But as @Rotem said, it comes down to what comes out after profiling. \$\endgroup\$Cemafor– Cemafor2015年09月30日 21:09:52 +00:00Commented Sep 30, 2015 at 21:09
After re-reading your question and some research the easiest way would be to just use the provided extension methods of the DataRowExtensions
class in the System.Data.DataSetExtensions
namespace.
This would result in the former methods to be changed like so
public static bool? ReadAsBoolNullable(this DataRow row, string columnName)
{
return row.Field<bool?>(columnName);
}
public static bool ReadAsBool(this DataRow row, string columnName)
{
return row.Field<bool>(columnName);
}
or much better update the calling code to just use the extension methods provided by the DataRowExtensions
class.
This will work only if the type of column in question is mysqlserver/sqlserver 'bit' type.
For a "true" integer type you should use the following
public static bool? ReadAsBoolNullable(this DataRow row, string columnName)
{
int? currentValue = row.Field<int?>(columnName);
if (currentValue.HasValue)
{
return currentValue.Value != 0;
}
return null;
}
public static bool ReadAsBool(this DataRow row, string columnName)
{
return row.Field<int>(columnName) != 0;
}
which uses the said extensionmethods too and additional removes the redundant else
of the ReadAsBoolNullable()
method.
A style note
For an if..else
construct like
if (someCondition)
{
return someValue;
}
else
{
return someOtherValue;
}
the else
is redundant because if the condition is true
it will never be reached.
Edit based on the comment
It might not have been very clear from the question, but I treat all the columns as potentially nullable. The difference between the two methods is just how they treat that potential null value.
This just makes no sense for me. If I query a table where I know that a column can't be null I would only call the ReadAsBool()
method. It will slow down the whole process if you first call ReadAsNullable()
and based on that result either return true
or false
.
So either you distinguish between nullable and non nullable columns or you don't.
-
\$\begingroup\$ The way I understood the problem was that the consumer may want to treat a null in the database as null or as false, but this code doesn't know which they want so he is exposing both as an option. I think his concern is just that the second function may end up calling the first a lot and he is worried about the performance of that call. \$\endgroup\$Cemafor– Cemafor2015年09月29日 18:01:53 +00:00Commented Sep 29, 2015 at 18:01
-
\$\begingroup\$ no reason for code duplication here \$\endgroup\$George Polevoy– George Polevoy2015年09月29日 23:03:51 +00:00Commented Sep 29, 2015 at 23:03
-
\$\begingroup\$ @Cemafor is right -
ReadAsBool
should treat null as false, but your code will throw aNullReferenceException
(attempting to assign anull
value to anint
). And even if it wereField<int?>
, it would still returntrue
fornull
, asnull != 0
, right? And thanks for the style note, will keep that in mind. @GeorgePolevoy Mind to elaborate on that? The code duplication is very small, as the method doesn't do much anyway, but it would save the hundreds of millions of (potential) function calls to the first method. \$\endgroup\$Johan Hirsch– Johan Hirsch2015年09月30日 07:37:24 +00:00Commented Sep 30, 2015 at 7:37 -
\$\begingroup\$ @Heslacher I do distinguish between them (based on tables\columns in a table), so I need to expose both. \$\endgroup\$Johan Hirsch– Johan Hirsch2015年09月30日 07:44:58 +00:00Commented Sep 30, 2015 at 7:44
-
\$\begingroup\$ Then both versions (bit and integer) will work. The comment of Cemafor had been on an older version of this answer. The
ReadAsBool
doesn't have to treat null as false, because it is only used on non nullable columns. \$\endgroup\$Heslacher– Heslacher2015年09月30日 07:47:34 +00:00Commented Sep 30, 2015 at 7:47