I am working with a 3th party library which has following interface defined for a processing task:
public interface IProcessor {
void Execute( FileContent fileContent );
}
Such an IProcessor
gives me the possibility to change the FileContent
before it gets written to the database. For legacy reasons, and for integration reasons with an unnamed ERP, its unique identifiers are string identifiers which have a limitation in length, and they are padded on the left with padding char '0'
.
Names will therefore always have the form of 00100
or 02301
or 10179
for a 5 digit identifier.
Sometimes, we need to add new content inside the fileContent.Components
, but we are not allowed to change the existing identifiers. Theoretically any number of items could be added to the components, but for the question at hand, I simplified it to just adding 20 items.
The existing identifiers do not have any order to them, nor can I be sure that the highest number inside the string would be lower than the theoretical maximum number for this identifier.
To handle this addition, I thought I could use a IEnumerator<string>
function (not class) as an iterator, that creates the next number starting from the theoretical minimum value to the theoretical maximum value, and throw when it exceeds the maximum value.
public static class GapHelper {
public static IEnumerator<string> GetNextName( int maxSize, IEnumerable<string> items, Func<int, string> transform ) {
var maxLength = (long)Math.Pow( maxSize, 10 );
var hashSet = new HashSet<string>( items );
for (var i = 0; i < maxLength; i++) {
var text = transform(i);
if (hashSet.Contains( text )) {
continue;
}
yield return text;
}
throw new ArgumentOutOfRangeException( "Iteration exceeded maximum numbers of items at " + maxSize );
}
public static IEnumerator<string> GetNextName( int maxSize, IEnumerable<string> items) {
return GetNextName( maxSize, items, i => i.ToString().PadLeft( maxSize, '0' ) );
}
}
and this would be used inside such an IProcessor
in the following way:
public class ProcessingTask : IProcessor {
public void Execute( FileContent fileContent ) {
using (var nameGenerator = GapHelper.GetNextName( 5, fileContent.Components.Select( c => c.Name ) ) ) {
for (var i = 0; i < 20; i++) {
nameGenerator.MoveNext();
fileContent.Components.Add( new Component {
Name = nameGenerator.Current, Content = "Foo"
} );
}
}
}
}
What I cannot change:
- How the processor is called, and with which parameters
- How the file is processed afterwards, and might have been processed before
- My bracket positioning, we have them at the end of each line
- I am aware I didn't add documentation to the methods
A full simplified example can be found in this dotnetfiddle.
I am interested in how this code looks from maintainability perspective, and what other potential options I might have overlooked. I am also curious if having an IEnumerator<string>
implemented as a method rather than a class makes a big difference, either for this implementation or from an iterator perspective.
1 Answer 1
Possible Bug
Since Func<int, string> transform
is not guarded against collisions (different int
values may yield the same string result
), the next line is wrong. It only checks for unique elements in the already available data items
, not in the new results fetched from transform
.
if (hashSet.Contains( text )) {
This could be solved by trying to add each element to the set instead:
if (!hashSet.Add( text )) {
LINQ
Furthermore, I would use an IEnumerable
to allow for some LINQ chaining.
public static IEnumerable<string> GetNextName(
int maxSize, IEnumerable<string> items, Func<int, string> transform) {
So we could just call:
public class ProcessingTask : IProcessor {
public void Execute( FileContent fileContent ) {
GapHelper.GetNextName(5, fileContent.Components
.Select(c => c.Name)).Take(20).Select(x
=> new Component { Name = x, Content = "Foo" })
.ToList().ForEach(fileContent.Components.Add);
}
}
Which yields better readability. Also a fun fact about disposing an enumerator from the initial code:
using (var nameGenerator = // ..
-
1\$\begingroup\$ Ha, good one on the bug, I had actually refactored that for the question, but didn't test it with anything else ^_^ before that the padding was done in the method, but it didn't feel right :) On the topic of chaining, that would be nice, but I am iterating an external file given to me (I know that the question didn't reflect that, so sorry), and I will have 3 different iterators there \$\endgroup\$Icepickle– Icepickle2019年10月01日 08:10:57 +00:00Commented Oct 1, 2019 at 8:10
-
\$\begingroup\$ I only don't get the comment on disposing an enumerator, I honestly don't mind it, it makes sense to have it disposed since I will probably never really iterate the full list, so having it in a using feels like second nature to me, as it implements the
IDisposable
\$\endgroup\$Icepickle– Icepickle2019年10月01日 08:29:19 +00:00Commented Oct 1, 2019 at 8:29 -
\$\begingroup\$ Eric Lippert suggests not to bother with disposing enumerators, since its only purpose is to for some interop. But it can't hurt ether if you keep using it. \$\endgroup\$dfhwze– dfhwze2019年10月01日 09:33:35 +00:00Commented Oct 1, 2019 at 9:33
-
-
\$\begingroup\$ @t3chb0t He has a more recent post where he argued otherwise. \$\endgroup\$dfhwze– dfhwze2019年10月06日 08:58:31 +00:00Commented Oct 6, 2019 at 8:58
n
number of items or can have. In other words do you add fake items to the result as you are doing here or do you stop when there are no more items to add. \$\endgroup\$