Following snippet reads CSV Line count using BinaryReader.
Currently it checks \r
and \n
for line delimiters.
private static int GetLineCount(string fileName)
{
BinaryReader reader = new BinaryReader(File.OpenRead(fileName));
int lineCount = 0;
char lastChar = reader.ReadChar();
char newChar = new char();
do
{
newChar = reader.ReadChar();
if (lastChar == '\r' && newChar == '\n')
{
lineCount++;
}
lastChar = newChar;
} while (reader.PeekChar() != -1);
return lineCount;
}
I want to use Environment.NewLine
string and make it work on windows\unix
.
I want to refactor above to find word occurance and then match for the word Environment.NewLine
. The issue is that I am not able to refactor following for word (more specifically change lastChar
, newChar
into Array.
do
{
newChar = reader.ReadChar();
if (lastChar == '\r' && newChar == '\n')
{
lineCount++;
}
lastChar = newChar;
} while (reader.PeekChar() != -1);
1 Answer 1
Environment.NewLine
always returns \r\n
so it won't help you in parsing different line endings.
If your task is to count the number of lines then it would be much easier just to do smth. like:
private static int GetLineCount(string fileName)
{
return File.ReadLines(fileName).Count();
}
ReadLines
method automatically parses different line endings.
-
2\$\begingroup\$ 1. Environment.NewLine returns line feed. On unix it is LF (\n), on Mac it is CR(\r), on windows it is CRLF(\r\n). 2. Try running the code on large files (> 2 GB), It will throw OutOfMemoryException. \$\endgroup\$Tilak– Tilak2013年01月12日 12:57:06 +00:00Commented Jan 12, 2013 at 12:57
-
\$\begingroup\$ @Tilak: Why would you line count a 2gb file?... \$\endgroup\$user21077– user210772013年01月12日 19:49:41 +00:00Commented Jan 12, 2013 at 19:49
-
\$\begingroup\$ 2GB is for illustration, Files are large in size (100MB - 1GB), but to answer your question, I need to show message like
Processing x row in Total N rows
. \$\endgroup\$Tilak– Tilak2013年01月12日 19:53:17 +00:00Commented Jan 12, 2013 at 19:53 -
1\$\begingroup\$ @Tilak, I've used
ReadLines
method that returns a stream of lines. You probably confused it withReadAllLines
which indeed loads all lines in memory. The only case when this code would throwOutOfMemoryException
is when your file would have a line longer than 2GBs which is very-very unlikely for text files \$\endgroup\$almaz– almaz2013年01月12日 20:22:19 +00:00Commented Jan 12, 2013 at 20:22 -
\$\begingroup\$ @Tilak about
Environment.NewLine
- are you talking about running this .NET code on Mac/Unix? Or parsing files coming from those platforms? .NET framework explicitly specifies\r\n
as a value forEnvironment.NewLine
, not sure about Mono. \$\endgroup\$almaz– almaz2013年01月12日 20:30:15 +00:00Commented Jan 12, 2013 at 20:30
BinaryReader
andFile.OpenRead()
areIDisposable
resources - wrap them in ausing
statement for deterministic disposal. \$\endgroup\$Environment.NewLine
. At higher level I am finding exact line count for big files ( >1GB) for progress reporting. Currently It is done using Stream Current Read position divided by file length. But that is not acceptable (as inconsistent due to other file types involved), in our overall context. \$\endgroup\$