I've build simple application that allows me to generate all numbers that have desired length and start with prefix.
For example if I specify 12
as prefix and set desired length to 4 it will generate numbers from 1200
to 1299
.
I'm storing all generated numbers in List<int>
then using shuffe method from Jon's answer I'm saving them in pseudo-random order to files, 200000 records per file.
This is my code:
public partial class Form1 : Form
{
private List<int> _prefixes;
private List<int> _results;
private readonly Object _resultsLock = new Object();
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
_prefixes = new List<int>();
_results = new List<int>();
int i = 0;
foreach (string line in AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i)))
{
_prefixes.Add(i);
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
_prefixes.AsParallel().ForAll(item =>
{
string prefix = item.ToString();
//here I'm determining how many numbers i must generate?
//can't his be done simpler?
var count = Convert.ToInt32("1".PadRight(9 - prefix.Count() + 1, '0'));
for (int j = 0; j < count; j++)
{
var res = prefix + j.ToString().PadLeft(9 - prefix.Length, '0');
lock (_resultsLock)
{
_results.Add(Convert.ToInt32(res));
}
}
});
stopwatch.Stop();
Debug.WriteLine("Time elapsed (s): {0}", stopwatch.Elapsed.TotalSeconds);
}
//Can't this be done simpler and also parallel?
private void SaveClick(object sender, EventArgs e)
{
const string dir = @"C:\TESTS";
int fileCount = 1;
var file = Path.Combine(dir, string.Format("{0}.csv", fileCount));
var sw = new StreamWriter(file, false);
int i = 0;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
var rnd = new Random();
foreach (int res in _results.Shuffle(rnd))
{
sw.WriteLine(res);
i++;
if (i % 200000 != 0) continue;
fileCount++;
sw.Close();
file = Path.Combine(dir, string.Format("{0}.csv", fileCount));
sw = new StreamWriter(file, false);
}
sw.Close();
stopwatch.Stop();
Debug.WriteLine("Time elapsed (s): {0}", stopwatch.Elapsed.TotalSeconds);
}
}
For 200 prefixes all 9 digits numbers are generated on my PC in about 80-90 seconds, when I add some more prefixes I get OutOfMemory
exception.
Saving to files takes about 6-8 minutes, probably because I store 200000 results per file and I have more than 100 milion of generated results.
I'd like to optimize this as much as possible, time is priority, but memory usage is most important.
All suggestions are welcome!
First fix (thanks to RobH) - remove AsParallel
private void button1_Click(object sender, EventArgs e)
{
_prefixes = new List<int>();
_results = new List<int>();
int i = 0;
foreach (string line in AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i)))
{
_prefixes.Add(i);
}
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
foreach (var p in _prefixes)
{
string prefix = p.ToString();
var count = Convert.ToInt32("1".PadRight(9 - prefix.Count() + 1, '0'));
for (int j = 0; j < count; j++)
{
var res = prefix + j.ToString().PadLeft(9 - prefix.Length, '0');
_simpleResults.Add(Convert.ToInt32(res));
}
}
stopwatch.Stop();
Debug.WriteLine("Time elapsed (s): {0}", stopwatch.Elapsed.TotalSeconds);
}
This decreased time by 40 seconds (almost 50%), from 80-90, to about 40-50 seconds. Thanks!
2 Answers 2
//here I'm determining how many numbers i must generate?
//can't his be done simpler?
var count = Convert.ToInt32("1".PadRight(9 - prefix.Count() + 1, '0'));
Yes, I think it can be simpler. You basically want to generate from the minimum value to the maximum value while padding zeros to make sure everything is the same length as enteredLength - prefixLength
.
I think you would be better off doing something like this:
List<string> generatedItems = new List<string>();
int lengthToGenerate = enteredLength - prefix.Length;
for (int i = int.Parse(prefix + new string('0', lengthToGenerate)); i <= int.Parse(prefix + new string('9', lengthToGenerate)); i++)
{
generatedItems.Add(i.ToString());
}
Let's take a closer look at this. You'll notice that I save the generated items as a collection of strings. We can do this because they need to be written to a file anyway so it doesn't matter if it's a string or an integer.
int i = int.Parse(prefix + new string('0', lengthToGenerate)); i <= int.Parse(prefix + new string('9', lengthToGenerate))
This line of code allows us to loop from 0
to 9
or 99
or 999
whatever length we need. In effect this means we can iterate every value inbetween.
Afterwards we create our entire new string by simply calling .ToString()
on the generated number.
The difference with your code is that it now skips every Convert.ToInt32()
call and a whole lot of padding.
Since it's going to be a bunch of unique values anyway, you might as well use a HashSet<T>
instead of a List<T>
.
You're constantly locking the result set. It's probably faster to just omit the parallellism altogether since we're not doing that much work anyway but if you really want to have it, I would suggest looking into splitting the to-generate dataset beforehand, computing the data and storing it in separate collections and at the end combining those collections. Though that might be slightly overdoing things.
Automatically dispose of your StreamWriter
by using a using
block:
foreach(int res in _results.Shuffle(rnd))
{
using(var sw = new StreamWriter(file, false)
{
// Get funky
}
}
-
\$\begingroup\$ Thank You for all those suggestions. as first I've tried using
List<string>
but I gotOutOfMemory
exceptions a lot. I'll try to apply all improvements suggested by You and I'll get back to You. P.S. You suggested usingTake
to get items I must save it single file, How to use it to save all items? How should I use it in loop? \$\endgroup\$Misiu– Misiu2015年10月13日 10:48:19 +00:00Commented Oct 13, 2015 at 10:48 -
\$\begingroup\$ Mhhm yeah, I guess that a string will take more memory than an integer. I might consider writing the data to a file sooner so you can let the garbage collector collect those unnecessary strings but that would require a change in the way your application works. The
Take()
suggestion wasn't a good one -- I removed it. \$\endgroup\$Jeroen Vannevel– Jeroen Vannevel2015年10月13日 10:53:01 +00:00Commented Oct 13, 2015 at 10:53 -
\$\begingroup\$ I don't mind changing my application. It must generate all numbers from given prefixes, shuffle them and save to files, 200 000 records per file. I know how to use StreamWriter in
using
but I must create new file for each group (200 000 records) \$\endgroup\$Misiu– Misiu2015年10月13日 10:56:04 +00:00Commented Oct 13, 2015 at 10:56 -
\$\begingroup\$ You can just put the
file
reassignment above theusing(StreamWriter)
statement and it will use the new file for each new writer. \$\endgroup\$Jeroen Vannevel– Jeroen Vannevel2015年10月13日 10:59:43 +00:00Commented Oct 13, 2015 at 10:59 -
\$\begingroup\$ I've edited my question with fix based on Your suggestion. Thank You for that! Only writing results to files are left. \$\endgroup\$Misiu– Misiu2015年10月13日 11:20:36 +00:00Commented Oct 13, 2015 at 11:20
In addition to what has been said already (and if I understood the problem correctly), I'd say that you are complicating things too much. Let's think about it (on a single prefix for now):
- We have a prefix indicated by a string (that has a numeric format alright, but it is a string)
- We have a max length, let's call it
maxLength
, of the numbers you can consider. - The allowed length of the prefix, let's call it
prefixLength
, should be<= maxLength
- We need all the numbers that start with the given prefix and have the given
maxLength
Given the premises, we could do the following (in pseudo code, similar to C#):
List<int> GetAllNumbers(string prefix, int maxLength)
{
int minNumber = int.Parse(prefix);
int maxNumber = minNumber + 1;
int lengthDifference = maxLength - prefix.Length;
List<int> result = new List<int>();
for(int i = 0; i < lengthDifference; i++)
{
minNumber *= 10;
maxNumber *= 10;
}
for(int i = minNumber; i < maxNumber; i++)
{
result.Add(i);
}
return result;
}
Let's take an example. If we have that prefix = "15"
and maxLength = 5
we'd need the numbers from 15000
to 15999
. We just parse prefix
to int
, multiply it by 10 for 3 times (length of prefix
is 2 and maxLength
is 5) and return the numbers from 15000 to 15999 (maxNumber
is set to 16000 which is the first value that is not allowed).
In addition, if you just need an IEnumerable<int>
the previous method would be something like the following:
IEnumerable<int> GetAllNumbers(string prefix, int maxLength)
{
int minNumber = int.Parse(prefix);
int maxNumber = minNumber + 1;
int lengthDifference = maxLength - prefix.Length;
for(int i = 0; i < lengthDifference; i++)
{
minNumber *= 10;
maxNumber *= 10;
}
for(int i = minNumber; i < maxNumber; i++)
{
yield return i;
}
}
The methods are not tested, they are just an idea of how to approach the problem for a single prefix.
In the case of multiple prefixes we could also execute the method in a different thread for each prefix.
Also, I'd suggest you transform private List<int> _prefixes;
into private IEnumerable<int> _prefixes;
or, better yet, into private IEnumerable<string> _prefixes;
(see point 1 and this page for the reason why). In the last case we could transform:
_prefixes = new List<int>();
// other code here
foreach (string line in AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i)))
{
_prefixes.Add(i);
}
into something like:
_prefixes = AllPrefixes.Lines.Where(l=>!string.IsNullOrWhiteSpace(l) && Int32.TryParse(l, out i));
After that we just need to do something like:
foreach (var prefix in _prefixes)
{
_results.AddRange(GetAllNumbers(prefix, maxLength));
}
and then proceed as you are already doing (randomize the _results
list and write it in a file, or whatever the case may be).
Let me know if something is unclear.
Explore related questions
See similar questions with these tags.
AsParallel
and get rid of the locking. \$\endgroup\$123
,1254
,22456
. So they can be different length. \$\endgroup\$