This is legacy, so I'm not really looking for improvements on architectural approach.
Here's what the method does:
- Extracts filename from a string containing full file path (passed in as parameter)
- Checks to see if the filename extracted is not null or empty or just white spaces
If check in step 2 passes, then it uses naming conventions in the filename, to determine the logical "Type" for the file as follows:
- If filename contains
__Import__
, then setType
toSource Data
- If filename contains
__Export__
, then setType
toDestination Data
- If filename contains
__Transform__
, then setType
toTransformational Data
- If none of the conditions from 3.1 through 3.3 are met, then default the "Type" to
General
.
- If filename contains
Is there a way to make the if...else if... else if...
block more concise, and elegant without loss of functionality?
private static string GetFileDataType(string fullFileNameWithPath)
{
// extract filename only
var fileName = Path.GetFileNameWithoutExtension(fullFileNameWithPath);
var fileDataType = string.Empty;
// if filename is not empty
if (!string.IsNullOrWhiteSpace(fileName))
{
// if... else... to determine type of file contents based on file name convention
// legacy code, so not looking for architectural improvements etc... :)
if (fileName.Contains("__Import__"))
{
fileDataType = "Source Data";
}
else if (fileName.Contains("__Export__"))
{
fileDataType = "Destination Data";
}
else if (fileName.Contains("__Transform__"))
{
fileDataType = "Transformational Data";
}
else
{
fileDataType = "General";
}
}
return fileDataType;
}
-
1\$\begingroup\$ You need to explain what your code is doing first. \$\endgroup\$t3chb0t– t3chb0t2018年08月01日 17:24:20 +00:00Commented Aug 1, 2018 at 17:24
-
\$\begingroup\$ Apologies, thought it was "self describing"... my bad. I've added the explanation to my question. \$\endgroup\$Shiva– Shiva2018年08月01日 17:38:00 +00:00Commented Aug 1, 2018 at 17:38
-
2\$\begingroup\$ I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Feel free to edit and give it a different title if there is something more appropriate. \$\endgroup\$Sᴀᴍ Onᴇᴌᴀ– Sᴀᴍ Onᴇᴌᴀ ♦2018年08月01日 17:43:20 +00:00Commented Aug 1, 2018 at 17:43
-
\$\begingroup\$ @SamOnela I looked at the site goals link you provided and you are right. The title looks appropriate :) \$\endgroup\$Shiva– Shiva2018年08月01日 17:57:12 +00:00Commented Aug 1, 2018 at 17:57
5 Answers 5
I find the cleanest way is to replace the ugly if/else if/else
with a dictionary:
private static IDictionary<string, string> FileDataTypes = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase)
{
["Import"] = "Source Data",
// ..
}
The also not so pretty Contains
can be replaced with a nice regex that will grow or shirnk automatically if you add/remove mappings:
private static string GetFileDataType(string fullFileNameWithPath)
{
var fileName = Path.GetFileNameWithoutExtension(fullFileNameWithPath);
var fileContentsMatch = Regex.Match(fileName, $"__(?<Type>{string.Join("|", FileDataTypes.Keys)})__", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
return
fileContentsMatch.Success && FileDataTypes.TryGetValue(fileContentsMatch.Groups["Type"].Value, out var fileDataType)
? fileDataType
: FileDataTypes[string.Empty];
}
Notice that both the dictionary and the regex are case insensitive. You should always make paths case insensitive in Windows. You are lucky that this worked for so long.
(Dislaimer: It's just an example so I was to lazy to implement all the empty/null checks)
The method you have provided looks clean, is short and is easy to understand. It does what it should do but calls Contains()
in the worst case 3 times which could be avoided.
A filedatatype other than General
can only happen if at least the delimiter __
is found twice. So why don't we Split()
the filname and if we get an array which length is at least 3
we have something to work with.
Let us assume the filename without extension reads SomeFile__Export__SomeMoreText
if we call Split()
using __
as splitting argument we would get {"SomeFile", "Export", "SomeMoreText"}
now we can easily iterate over the array starting at the second element and ending at the element before last and using a switch
on each element to gain the desired filedatatype.
Overall this would look like
private static string GetFileDataType(string fullFileNameWithPath)
{
var fileName = Path.GetFileNameWithoutExtension(fullFileNameWithPath);
if (string.IsNullOrWhiteSpace(fileName)) { return string.Empty; }
var possibleFileDataTypes = fileName.Split(new string[] { "__" }, StringSplitOptions.None);
if (possibleFileDataTypes.Length < 3) { return "General"; }
for (var i = 1; i < possibleFileDataTypes.Length - 1; i++)
{
switch (possibleFileDataTypes[i])
{
case "Import":
return "Source Data";
case "Export":
return "Destination Data";
case "Transform":
return "Transformational Data";
}
}
return "General";
}
and produces the following results
GetFileDataType(string.Empty) -> string.Emtpty
GetFileDataType(null) -> string.Emtpty
GetFileDataType(@"C:\folder\Text__Export__SomeOtherText.txt") -> "Destination Data"
GetFileDataType(@"C:\folder\Text__SomeOtherText__SomeMoreText.txt") -> "General"
GetFileDataType(@"C:\folder\Some__File__Import__Some__More__Text.txt") -> "Source Data"
GetFileDataType(@"C:\folder\__Transform__Some__More__Text.txt") -> "Transformational Data"
GetFileDataType(@"C:\folder\__Export__.txt") -> "Destination Data"
GetFileDataType(@"C:\folder\__Export.txt") -> "General"
GetFileDataType(@"C:\folder\____.txt") -> "General"
GetFileDataType(@"C:\folder\____.txt") -> "General"
You're searching through the file name for each type until you find one. If you use the __
as delimiters and find the indexes of the string between them, you will only search the whole string once.
By extracting the substring with the type, now you can use a switch
block to return the appropriate value:
private static string GetFileDataType(string fullFileNameWithPath)
{
const string Delimiter = "__";
// extract filename only
var fileName = Path.GetFileNameWithoutExtension(fullFileNameWithPath);
var fileDataType = string.Empty;
int indexA = fileName.IndexOf(Delimiter);
string type = "";
if(indexA != -1 && indexA != 0)
{
indexA += 2;
int indexB = fileName.IndexOf(Delimiter, indexA);
if(indexB != -1)
{
type = fileName.Substring(indexA, indexB - indexA);
}
}
switch(type)
{
case "Import":
return "Source Data";
case "Export":
return "Destination Data";
case "Transform":
return "Transformational Data";
default:
return "General";
}
}
If the string is empty or malformed "General" will be returned. Because the check for this is in one place it can easily be changed, if desired.
-
\$\begingroup\$ You can just use
.Trim('_')
to remove the leading and trailing underscores ;-) \$\endgroup\$t3chb0t– t3chb0t2018年08月01日 18:53:24 +00:00Commented Aug 1, 2018 at 18:53 -
2\$\begingroup\$ Oh, or not... I now see it can by anywhere... well, then let's regex it. \$\endgroup\$t3chb0t– t3chb0t2018年08月01日 19:01:04 +00:00Commented Aug 1, 2018 at 19:01
-
1\$\begingroup\$ That would probably work. But typically filenames aren't very long and unless they're processing millions of filenames, I'm not sure if there would be any appreciable performance gain. \$\endgroup\$user33306– user333062018年08月01日 19:04:10 +00:00Commented Aug 1, 2018 at 19:04
-
1\$\begingroup\$ @Shiva exactly, it's the pattern that matches the names and stores the match in the named group
Type
. \$\endgroup\$t3chb0t– t3chb0t2018年08月01日 19:57:11 +00:00Commented Aug 1, 2018 at 19:57 -
2\$\begingroup\$ This doesn't produce the same result like the original code. For
__Transform__SomeMoreText
it returns "General", fornull
it throws an exception, forstring.Empty
it returns "General" instead ofstring.empty
, for__Some__Export__SomeMoreText
it returns "General" instead ofDestination Data
\$\endgroup\$Heslacher– Heslacher2018年08月02日 05:45:39 +00:00Commented Aug 2, 2018 at 5:45
A different approach based on the idea to use something like a dictonary (t3chb0t's answer) is using two arrays of strings internally. One to check if this "string-component" is present in the file name, the other one to return the file type.
Using these two you need to iterate the first array and return from the second array using same index if you find something.
This removes the if/else structure and seems to be easy to read to me, eventough (or maybe because) it is not as advanced as the other answers and keeps the contains().
private static string GetFileDataType(string fullFileNameWithPath)
{
// extract filename only
var fileName = Path.GetFileNameWithoutExtension(fullFileNameWithPath);
if (string.IsNullOrWhiteSpace(fileName))
return string.Empty;
String[] fileNameComponents = { "__Import__", "__Export__", "__Transform__" };
String[] fileTypes = { "Source Data", "Destination Data", "Transformational Data" };
for (int i = 0; i < fileNameComponents.Length; i++)
{
if (fileName.Contains(fileNameComponents[i]))
return fileTypes[i];
}
return "General";
}
This approach only aims at the core of the question (removing the if/else) whitout considering any improvments to performance.
Using a declarative approach instead of an imperative one allows a more expressive code.
private static string GetFileDataType(string fullFileNameWithPath)
{
var fileName = Path.GetFileNameWithoutExtension(fullFileNameWithPath);
return matchingRules.First(tuple => tuple.Item1(fileName)).Item2;
}
With matchingRules
declared as follow:
var matchingRules = new List<(Func<string, bool>, string)>();
matchingRules.Add((f => !string.IsNullOrWhiteSpace(f) && f.Contains("__Import__"), "Source Data"));
matchingRules.Add((f => !string.IsNullOrWhiteSpace(f) && f.Contains("__Export__"), "Destination Data"));
matchingRules.Add((f => !string.IsNullOrWhiteSpace(f) && f.Contains("__Transform__"), "Transformational Data"));
matchingRules.Add((f => !string.IsNullOrWhiteSpace(f), "General"));
matchingRules.Add((f => true, string.Empty));
There are two drawbacks with this solution however:
- Since each condition should be auto-sufficient, one has to repeat the
IsNullOrWhiteSpace
many times - Rules ordering matters (not a real drawback because the same ordering also applies with the imperative approach)