\$\begingroup\$
\$\endgroup\$
7
We have a
string text =
"SOME OTHER TEXT
WHITE SPACES AND NEW LINES
[HitObjects]
109,192,7241,1,0,0:0:0:0:
256,192,7413,1,0,0:0:0:0:
475,192,75865,1,0,0:0:0:0:
329,192,86524,1,0,0:0:0:0:
182,192,256242,1,0,0:0:0:0:
256,192,306521,1,2,0:0:0:0:
WHITE SPACES AND NEW LINES
"
The third number of every row is the milliseconds. I need to know the seconds between the first time and last time. Right now I'm doing it like this:
text = text.Substring(text.IndexOf("Objects]") + 8).Trim();
string[] lines = text.Split('\n');
string[] firstLine = lines[0].Split(',');
string[] lastLine = lines[lines.Length - 1].Split(',');
int length = Convert.ToInt32(lastLine[2]) - Convert.ToInt32(firstLine[2]);
length = length / 1000;
I need to do this with thousands of 'text's though. Any optimalization or other methodes possible?
more information on the request of @Slothario:
- the string text is the full text of 1 file
- file type: .osu
- average file size: 25KB
- average characters: 30000
- SOME OTHER TEXT - HitObjects ratio: 1 - 100
- average amount of files to check: 500
1 Answer 1
\$\begingroup\$
\$\endgroup\$
If you want to support an insane large file with a small hardware footprint you should use streaming. Something like
public static class Program
{
public static void Main(string[] args)
{
string text =
@"SOME OTHER TEXT
WHITE SPACES AND NEW LINES
[HitObjects]
109,192,7241,1,0,0:0:0:0:
256,192,7413,1,0,0:0:0:0:
475,192,75865,1,0,0:0:0:0:
329,192,86524,1,0,0:0:0:0:
182,192,256242,1,0,0:0:0:0:
256,192,306521,1,2,0:0:0:0:
WHITE SPACES AND NEW LINES
";
Task.Run(async () =>
{
using (var reader = new StringReader(text)) //This should be streamed from a disk or network stream or similar
{
string line;
var inScope = false;
int? start = null;
int last = 0;
while ((line = (await reader.ReadLineAsync())) != null)
{
if (inScope)
{
var values = line.Split(',');
if (values.Length != 6)
break;
last = int.Parse(values[2]);
if (!start.HasValue)
start = last;
} else if (line.StartsWith("[HitObjects]"))
inScope = true;
}
Console.WriteLine(last - start);
}
});
Console.ReadLine();
}
}
lang-cs
Substring
,Trim
andSplit
all create copies that are not really needed). With custom parsing you could skip all this copying, but will make the code much less simple. That said have you tested if this parsing is the bottleneck, usually when reading files from disk a little inefficiency in the processing won't matter too much. \$\endgroup\$