I am trying to check if a class has only numeric values in it and then return a bool
, however, some instances there may be non numeric chars in an object that should only contain numeric values.
At the present I am using reflection to loop through the object (Which in some cases can have up 200 properties) and then check the property value with Regex
to see if it is a numeric value. The reason for the regex instead of Int.TryParse
or Int64.TryParse
is b/c every now and again a non integer value is mixed in with the text, but I know the file is an all integer file from the way it is setup. This however is not the issue as the Regex is working as expected.
The structure of the StringCsv
and IntergerCsv
Entities are different (What I mean is that the structure of the CSV's with string's and int's are fundamentally different) , but I need to save the data to an Entity that is consistent, which means I need to check which CSV is being imported to differentiate between the methods needed to save to the database. ie, There is a date column, but it differs between the Interger
CSV and the String
CSV.
Also, I can't add the data dirrectly into the Entities StringCsv
and IntergerCsv
b/c they need an Id
which is a Guid
and the CSV file does not contain that, which is why I read to the the class GovernmentCsvRecord
and then covert the object to the Entity.
Below are samples of CSV files, sorry I can't post actual values.
The string file enter image description here
The Int file enter image description here
I should also mention that the first property will always be a GUID, that also needs to be taken into consideration. (I should probably be skipping that property anyway in my current code, but obviously don't.)
I should also mention I am using a custom library to import the CSV files.
To check the value is numeric:
static bool IsMatch(string str)
{
return Regex.IsMatch(str, @"^[abcd][\d0-9]{5}$|[\d0-9]",
RegexOptions.IgnoreCase | RegexOptions.Multiline);
}
Loop through each property and use above code to check the value,
static bool IsNumericFile<T>(List<T> list)
{
foreach (var item in list)
{
List<string> values
= typeof(T).GetProperties()
.Select(propertyInfo => propertyInfo.GetValue(item, null))
.Where(s => s != null)
.Select(s => s.ToString())
.Where(str => str.Length > 0)
.ToList();
foreach (var propertyString in values)
{
var isNumeric = IsMatch(propertyString);
//Jump out if non numeric value is found
if (!isNumeric) return false;
}
}
return true;
}
Reading the file,
// Get the data from the CSV
static List<T> ImportCsv<T>(string file, string delimeter = ",", bool hasHeader = false)
{
using (var reader = File.OpenText(file))
{
var csv = new CsvReader(reader);
csv.Configuration.Delimiter = delimeter;
csv.Configuration.HasHeaderRecord = hasHeader;
csv.Configuration.MissingFieldFound = null;
csv.Configuration.HeaderValidated = null;
csv.Configuration.Encoding = Encoding.GetEncoding("Shift_JIS");
return csv.GetRecords<T>().ToList();
}
}
//Convert the data to the required entities
public IEnumerable<StringCsv> ReadStringCsvFromFile(string filePath)
{
return ImportCsv<GovernmentCsvRecord>(filePath).Skip(3).Select(GovernmentCsvRecord.StringCsvGovernmentToEntity);
}
public IEnumerable<IntegerCsv> ReadIntCsvFromFile(string filePath)
{
return ImportCsv<GovernmentCsvRecord>(filePath).Skip(3).Select(GovernmentCsvRecord.IntCsvGovernmentToEntity);
}
I am purposefully not posting the complete GovernmentCsvRecord
object as it is an object with 200 string properties. I have posted the class with limited properties to illustrate how the class is set up with its properties and functions.
I convert to each entity via the functions in the GovernmentCsvRecord
, StringCsvGovernmentToEntity
and IntCsvGovernmentToEntity
.
The StringCsv
entity
public class GovernmentCsvRecord
{
public string Col1 { get; set; }
public string Col2 { get; set; }
public string Col3 { get; set; }
//etc 200 properties in class
public static StringCsv StringCsvGovernmentToEntity(GovernmentCsvRecord data)
{
StringCsv entity = new StringCsv();
entity.Id = Guid.NewGuid();
entity.Col1 = data.Col1;
entity.Col2 = data.Col2;
//etc until 200 properties are filled
}
//The `IntegerCsv` entity
public static IntegerCsv IntCsvGovernmentToEntity(GovernmentCsvRecord data)
{
IntegerCsv entity = new IntegerCsv();
entity.Id = Guid.NewGuid();
entity.Col1 = data.Col1;
entity.Col2 = data.Col2;
//etc until 200 properties are filled
}
}
Then the usage,
List<StringCsv> StringList = new List<StringCsv>();
List<IntegerCsv> IntList = new List<IntegerCsv>();
StringList = ReadStringCsvFromFile(FileName).ToList();
if (!IsNumericFile(StringList))
{
IntList = ReadIntCsvFromFile(FileName).ToList();
}
if (IntList.Count > 0)
{
//Do stuff with intList
}
else
{
//Do stuff with stringList
}
At present, it takes about 8 to 10ms to complete a check of one object, so performance wise it is not a huge issue, but I was wondering if there maybe was a better approach to do something like this?
1 Answer 1
- Instead of throwing away that list of
GovernmentCsvRecord
objects after converting it to aStringCsv
list, you could keep it around so you don't have to read the csv file again if theIsNumericFile
check fails. - It looks like you can validate the list of
GovernmentCsvRecord
objects directly instead of first converting them toStringCsv
objects. However, with the way you're converting them, and taking the numeric check into account, it's probably better to read each row into an array of strings. Apparently the easiest way to do that is to giveGovernmentCsvRecord
a singlestring[]
property. That lets you validate fields without having to use reflection. - That regular expression can be simplified a lot. The first part matches strings like
"a12345"
- anything that starts with an a, b, c or d, followed by exactly 5 digits. The second part matches anything that contains at least one digit - which covers everything that the first part covers, and more, so just the second part is sufficient. Also note that\d
matches decimal digits from a variety of scripts, including0-9
, so[\d0-9]
can be simplified to just\d
. But you may want to use0-9
instead, unless you also want to accept digits like'೬'
and'६'
.
-
1\$\begingroup\$ +1. And you absolutely correct. As I was re-writing the question, saw I was doing a fair bit of redundant code. As you say, validate the
GovernmentCsvReord
class and then convert to the needed entities. Thanks for the advice on the Regex, I am not very good with it, so that helps enormously as well. \$\endgroup\$KyloRen– KyloRen2019年07月10日 23:22:35 +00:00Commented Jul 10, 2019 at 23:22 -
2\$\begingroup\$ Another optimization is to create and reuse a single (compiled)
Regex
instance instead of calling the staticIsMatch
method, so the regex engine doesn't have to parse the pattern each time. Or, for such a simple pattern,string.Any(char.IsDigit)
(for\d
) orstring.Any(c => c >= '0' && c <= '9')
(for0-9
) is even faster. \$\endgroup\$Pieter Witvoet– Pieter Witvoet2019年07月10日 23:38:47 +00:00Commented Jul 10, 2019 at 23:38 -
1\$\begingroup\$ Doing that instead of Regex brings 1500 line file with 200 properties down to 20ms from 30ms. One third faster is on just that is awesome. Overall I think I have reduced run time to half its original time it took. Thanks for the help.There are a couple more things I think I can do, but this has got me on the right track. \$\endgroup\$KyloRen– KyloRen2019年07月11日 00:29:31 +00:00Commented Jul 11, 2019 at 0:29
StringCsv
andIntegerCsv
? The nameReadIntCsvFromFile
implies that it reads numeric data, but it's only called when the input contains non-numeric fields, so what doesIntCsvGovernmentToEntity
do thatStringCsvGovernmentToEntity
does not? And why read intoGovernmentCsvRecord
objects only to convert immediately afterwards - why not read directly into aStringCsv
object, and convert that to anIntegerCsv
when necessary? \$\endgroup\$