I have a small function that I would like to make cleaner and/or more efficient.
There is a directory that contains files in the format of <name>-<counter>-<datetime>.xml
.
When a new file is added to the directory, I have a function that reads the counter part of the file name of every file. Then it increments this value and uses it as the counter for the new file.
The code works as expected and has no bugs that I've found
One thing I'm not sure of, in terms of best practice, is if I should pass the path
into the function, then extract the files there, or if I should extract the files in the calling code and pass the list of files to my function.
public int GetCounterValue(string path)
{
var files = Directory.GetFiles(path, "*.xml", SearchOption.TopDirectoryOnly);
// Return 1 for initial value if no files are present in the directory.
if (files.Count() == 0)
return 1;
var counterValues = new List<int>();
foreach (var file in files)
{
// The [1] index is a magic number that corresponds to the placement of the counter value in the file names.
if (int.TryParse(file.Split('-')[1], out var counterValue))
counterValues.Add(counterValue);
}
// Take the value of the highest counter and increment by one for the next item in the sequence.
return counterValues.Max() + 1;
}
2 Answers 2
Passing in the directory or the files I don't feel is a big decision. Just need to pick a direction and go. If passing in an IEnumerable gives just a bit more freedom of where the files resides. Now they are in a local directory but in future maybe in an FTP site or someplace else or could even be split into multiple locations. Again nothing I would worry too much now. If moving from directory to http something is going to have to refactor at some point.
I would change Directory.GetFiles to Directory.EnumerateFiles will not have to wait for entire directory to enumerate before working with the files and should save some memory if the directory contains a lot of files.
Also the wild card for the directory should contain the separator. That way you know you can safely split the file and not get an index out of bound error when looking for array position 2 in the code. Look something like this
const string filePartSeparator = "-";
var fileCounterParts = Directory.EnumerateFiles(path, $"*{filePartSeparator}*{filePartSeparator}*.xml", SearchOption.TopDirectoryOnly)
.Select(file => file.Split(filePartSeparator)) // split into file parts
.Select(fileParts => fileParts[1]);
This will give back an IEnumerable of string that is just the "counter" in the file and because of wild card we will know can safely do fileParts[1].
For finding the max instead of building up a list, again keeping data in memory that doesn't need to be can just keep a running value and increase it if we find a bigger value. Something like
var nextId = 1;
foreach (var fileCounterPart in fileCounterParts)
{
if (int.TryParse(fileCounterPart, out var counterValue) && counterValue >= nextId)
{
nextId = ++counterValue;
}
}
return nextId;
Now just a general concern is nothing prevents two calls to this method and having both return the same id as a new file hasn't been created yet. It's not very multi thread or process friendly. Could implement some sort of locking but would need to be at a higher level than this code.
As alternative you can use regex to find the <counter>
part
-([^-]+)(?:-)
-
: It should start with a hyphen([^-]+)
: It should capture as a group[^-]
: It should not contain hyphen+
: It should match for the previous token 1 or many times
(?:-)
: It is a no-capturing group for hyphen
And here is a sample usage
static Regex counterPattern = new Regex("-([^-]+)(?:-)", RegexOptions.Compiled);
static void Main()
{
var filePath = "<name>-<counter>-<datetime>.xml";
var match = counterPattern.Match(filePath);
Console.WriteLine(match.Groups.Cast<Group>().Last());
}
- The regex is compiled for better performance
- The
Match
function call returns aMatch
- Its
Groups
property is defined asGroupCollection
so it should be casted toGroup
to be able to use Linq - The
Groups
contains two groups (-<counter>-
and<counter>
)- Since you want to parse it as an integer you need the second (
Last
)
- Since you want to parse it as an integer you need the second (
- Its