Reading data with different but similar semantics

Question 1

I am writing code that reads data from a DB, where boolean values are stored as 0/1 nullable integers.

I decided to write a simple function to encapsulate the logic which converts the integer values to boolean. Thing is, I sometimes need to regard the db null value as boolean false, and sometimes just as null.

So I've come up with the following two extension methods:

// Read the specified column value from row as nullable boolean,
// treating integer 0 as false, and any other non-null value as true.
public static bool? ReadAsBoolNullable(this DataRow row, string columnName)
{
 if (row.IsNull(columnName))
 {
 return null;
 } 
 else 
 {
 return row[columnName] != 0;
 } 
}
// Read the specified column value from row as boolean,
// treating null and integer 0 as false, and any other value as true.
public static bool ReadAsBool(this DataRow row, string columnName) 
{
 bool? data = row.ReadAsBoolNullable(columnName);
 return data.HasValue ? data.Value : false;
}

My main concern about this code is that I potentially have hundreds of millions of records, each having up to 10 boolean (not nullable) columns. So the second function will be executed a lot and will have to call the first one a lot. Could this damage performance significantly, given that the function itself doesn't do much?

Question 2

As with all performance related questions, if and only if you can gather profiling evidence that shows that your current code is meaningfully detrimental to performance, don't call the first method in the second one, but rather repeat the code. This goes slightly against DRY (just barely, considering the very short code), but you will have to weigh the value of maintainability vs that of whatever performance gain is achieved.

In practice, I suspect that the effect on performance will be negligible if not non-existent. It's not as if you're not calling a slew of other methods and properties anyways.

The key point is profile profile profile.

Question 3

I would bet the database call is the actual bottleneck here. Time would probably be better served optimizing the database query or data manipulation algorithms if there is a performance issue. But as @Rotem said, it comes down to what comes out after profiling.

Question 4

After re-reading your question and some research the easiest way would be to just use the provided extension methods of the DataRowExtensions class in the System.Data.DataSetExtensions namespace.

This would result in the former methods to be changed like so

public static bool? ReadAsBoolNullable(this DataRow row, string columnName)
{
 return row.Field<bool?>(columnName);
}
public static bool ReadAsBool(this DataRow row, string columnName) 
{
 return row.Field<bool>(columnName);
}

or much better update the calling code to just use the extension methods provided by the DataRowExtensions class.

This will work only if the type of column in question is mysqlserver/sqlserver 'bit' type.

For a "true" integer type you should use the following

public static bool? ReadAsBoolNullable(this DataRow row, string columnName)
{
 int? currentValue = row.Field<int?>(columnName);
 if (currentValue.HasValue)
 {
 return currentValue.Value != 0;
 }
 return null;
}
public static bool ReadAsBool(this DataRow row, string columnName)
{
 return row.Field<int>(columnName) != 0;
}

which uses the said extensionmethods too and additional removes the redundant else of the ReadAsBoolNullable() method.

A style note

For an if..else construct like

if (someCondition)
{
 return someValue;
}
else
{
 return someOtherValue;
}

the else is redundant because if the condition is true it will never be reached.

Edit based on the comment

It might not have been very clear from the question, but I treat all the columns as potentially nullable. The difference between the two methods is just how they treat that potential null value.

This just makes no sense for me. If I query a table where I know that a column can't be null I would only call the ReadAsBool() method. It will slow down the whole process if you first call ReadAsNullable() and based on that result either return true or false.

So either you distinguish between nullable and non nullable columns or you don't.

Question 5

The way I understood the problem was that the consumer may want to treat a null in the database as null or as false, but this code doesn't know which they want so he is exposing both as an option. I think his concern is just that the second function may end up calling the first a lot and he is worried about the performance of that call.

Question 6

no reason for code duplication here

Question 7

@Cemafor is right - ReadAsBool should treat null as false, but your code will throw a NullReferenceException (attempting to assign a null value to an int). And even if it were Field<int?>, it would still return true for null, as null != 0, right? And thanks for the style note, will keep that in mind. @GeorgePolevoy Mind to elaborate on that? The code duplication is very small, as the method doesn't do much anyway, but it would save the hundreds of millions of (potential) function calls to the first method.

Question 8

@Heslacher I do distinguish between them (based on tables\columns in a table), so I need to expose both.

Question 9

Then both versions (bit and integer) will work. The comment of Cemafor had been on an older version of this answer. The ReadAsBool doesn't have to treat null as false, because it is only used on non nullable columns.

Rotem Rotem 6935 silver badges13 bronze badges · Accepted Answer · 2015-09-30 08:08:05Z

As with all performance related questions, if and only if you can gather profiling evidence that shows that your current code is meaningfully detrimental to performance, don't call the first method in the second one, but rather repeat the code. This goes slightly against DRY (just barely, considering the very short code), but you will have to weigh the value of maintainability vs that of whatever performance gain is achieved.

In practice, I suspect that the effect on performance will be negligible if not non-existent. It's not as if you're not calling a slew of other methods and properties anyways.

The key point is profile profile profile.

I would bet the database call is the actual bottleneck here. Time would probably be better served optimizing the database query or data manipulation algorithms if there is a performance issue. But as @Rotem said, it comes down to what comes out after profiling.

Stack Exchange Network

Reading data with different but similar semantics

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Reading data with different but similar semantics

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions