I have written a simple program to read file that contains record spread between multiple lines. There is requirement to read file from source that is dirty and contains line breaks which are unexpected.
I would like to tune this if there is any suggestions. I am very new to Java programming. This is just to demonstrate to a technical team a algorithm to perform the same. It might be dirty but any feedback in welcome.
import java.io.*;
import java.util.*;
import java.util.function.Consumer;
import java.util.stream.Stream;
final class RawFile{
private InputStreamReader filereader;
private char dlm; // Delimeter char of the file
private int maxdlmcnt; // depending on number of columns assign num of dlm expected
private int cntdlm; // this is a counter used while looping the file
private StringBuffer bufferline; // this is to buffer if there is unexpected line breaks
RawFile(String filename,int numcolmns) throws FileNotFoundException
{
try {
filereader=new InputStreamReader(new FileInputStream(filename), "UTF-8");
dlm=' ';
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
System.out.println("Encoding not supported");
}
}
RawFile(String filename,int numcolmns, char seperator) throws FileNotFoundException
{
try {
filereader=new InputStreamReader(new FileInputStream(filename), "UTF-8");
dlm=seperator;
maxdlmcnt=numcolmns-1;
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
System.out.println("Encoding not supported");
}
}
int countdelimiters(char chararr[])
{
int cnt=0;
for(int i=0; i<= chararr.length-1;i++){
if(chararr[i]==dlm) cnt=cnt+1;
}
return cnt;
}
List<String> readline() throws IOException
{
List<String> recordlist=new ArrayList<String>();
BufferedReader filerecords = null;
Stream<String> recordstream = null;
try{
bufferline=new StringBuffer();
filerecords=new BufferedReader(filereader);
recordstream=filerecords.lines();
Consumer<String> readline = (rawline) -> {
int cnt=0;
if(rawline!=null)
{
cnt=countdelimiters(rawline.toCharArray());
cntdlm=cntdlm+cnt;
if (cnt==maxdlmcnt) { recordlist.add(rawline);cntdlm=0; }
else if(cntdlm>=maxdlmcnt)
{
bufferline.append(rawline);
recordlist.add(bufferline.toString());
cntdlm=0; //reset the line count
}
else bufferline.append(rawline);
}
};
recordstream.forEach(readline);
}
finally
{
recordstream.close();
filerecords.close();
}
return recordlist;
}
}
public class ReadRawFile {
public static void main(String args[]) throws FileNotFoundException, IOException {
List<String> outputrecords=new ArrayList<String>();
String pathstr="C:\\HOMEWARE\\ITEC-Toolbox\\projects\\Training\\SAMPLE_COPY.txt";
RawFile recordlist=new RawFile(pathstr,29,'|');
outputrecords=recordlist.readline();
Consumer<String> strline=(n)->System.out.println(n);
outputrecords.forEach(strline);
}
}
Let me also put some example record,
05|XXX476-319|458|AJKSDGAKSJDGJASDKJGASD|ASDJGASDFHANDVSNVSD|Babu|Villian|Pharmacy - Clinical Trials|London||United Kingdom|KJASDHKASJDGKASJDGAJKSDGJKASGD|A|Arewerwayo|Site Coordinator|Site Coordinator|||||0255555555||Denmark Hill|||London||SR5 3CS|United Kingdom
05|XXX476-419|432|KAJSHDKASDHAKSDHASDG|FKASDHASDKHASDKHASDSD|Mary|Wickel|E Daer Pantel|Gent||Belgium|JKASDGJASDGASJDGHASGDGASDASD|Els|Kristens|Pharma|Pharma|||||:+32(9)93326666|[email protected]|JGASDJGAS|C. He
ymasewe 10|Entrance
T32 - Route 5678
|Gant||B-
9995|Belgium
05|XXX476-319|458|AJKSDGAKSJDGJASDKJGASD|ASDJGASDFHANDVSNVSD|Babu|Villian|Pharmacy - Clinical Trials|London||United Kingdom|KJASDHKASJDGKASJDGAJKSDGJKASGD|A|Arewerwayo|Site Coordinator|Site Coordinator|||||0255555555||Denmark Hill|||London||SR5 3CS|United Kingdom
-
\$\begingroup\$ What kind of records? An example of an input file and the corresponding output might be helpful in order to understand the goal better. \$\endgroup\$Martin R– Martin R2018年09月05日 07:03:43 +00:00Commented Sep 5, 2018 at 7:03
-
\$\begingroup\$ I have pasted sample record for your reference. There are actually 3 records to be read into a destination in above case. This having 29 columns with 28 DLM '|' char. Hope this helps. \$\endgroup\$Sunil K-Standard Chartered– Sunil K-Standard Chartered2018年09月05日 07:12:25 +00:00Commented Sep 5, 2018 at 7:12
-
\$\begingroup\$ Sorry, made some modification for readability and exception handling. \$\endgroup\$Sunil K-Standard Chartered– Sunil K-Standard Chartered2018年09月05日 07:43:14 +00:00Commented Sep 5, 2018 at 7:43
1 Answer 1
Don’t use comments to explain what a variable holds. Instead, use descriptive variable names. For instance, use delimiter
instead of dlm
. The former is much easier to read.
Use camelCase when naming methods like countDelimiters
and variables like numColumns
.
In Java, it’s conventional to put the open curly brace on the same line as the preceding code, not on a line by itself.
In Java, it’s conventional to put else
or finally
on the same line as the }
that precedes it.
In Java, it’s conventional to use whitespace after commas and on either side of operators and equals signs. This makes them easier to read.
In Java, it’s conventional to put a space in between if
or for
or while
and the (
that follows it. This makes them visually distinct from method calls.
In Java, it’s conventional to put the array designation next to the variable type, not the variable name. So prefer final char[] chararr
to final char chararr[]
.
Please don’t squish the if
code onto one line and spread out the else
code. If you have a trivial if
block and no else
, it’s arguably ok to put it all on one line. Otherwise, use vertical space.
Closeable resources need to always be closed when you’re done with them. Keeping a local instance of a Reader
open is a problem, because if the client never calls readLine()
, your reader never gets closed. Also, if you call readLine
more than once, the reader will already be closed. Save the filename
in the instance and make a new reader when you need it.
It’s somewhat confusing to switch from a separator
to a delimiter
. Also, you misspelled separator
.
You can use chained constructors to pass default values so you’re only writing your constructor code once.
All kinds of things can go wrong when reading a file. If you want to check if the file exists when you make a RawFile
instance, that’s fine, but it might not exist when somebody tries to read from it later. Is it really worth checking at construction time?
Errors should be logged to System.err
or a logging framework. Don’t swallow the exception - the stack trace should also be logged. You can accomplish that with e.printStackTrace(System.err);
Make methods and variables as private as you can. countDelimiters
should almost certainly be private
, not default
.
Variables that don’t get reassigned should be made final
to reduce cognitive load and make your intent clear to the reader.
countDelimiters
isn’t wrong, but it would be easier to read if written using Streams.
try-with-resources
is more canonical in modern Java than try-finally
.
Variables should be declared as close to where they’re used as possible, and in the minimum scope possible.
You don’t need to check if rawline
is null. The stream won’t give you null lines.
Streaming the lines of the file directly would be preferable to using readers.
It’s cleaner to inline the Consumer<String>
rather than define it and then call forEach()
.
Given that you’re saving state between each line, and that you’re really operating on "X columns" rather than "X lines", you should consider using a Scanner
rather than parsing by line.
If you applied all of my suggestions above, your code might look something like:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Stream;
final class RawFile {
private final String filename;
private final char delimiter;
private final int expectedDelimitersPerLine;
private int currentDelimiters; // this is a counter used while looping the file
RawFile(final String filename, final int numColumns) {
this(filename, numColumns, ' ');
}
RawFile(final String filename, final int numColumns, final char delimiter) {
this.filename = filename;
this.delimiter = delimiter;
this.expectedDelimitersPerLine = numColumns - 1;
}
private int countDelimiters(final String line) {
return (int) line.chars().filter(c -> c == this.delimiter).count();
}
List<String> readline() throws IOException {
final List<String> recordList = new ArrayList<String>();
final StringBuffer lineBuffer = new StringBuffer();
try (final Stream<String> lines = Files.lines(Paths.get(this.filename))) {
lines.forEach(rawline -> {
final int delimiters = this.countDelimiters(rawline);
this.currentDelimiters = this.currentDelimiters + delimiters;
if (delimiters == this.expectedDelimitersPerLine) {
recordList.add(rawline);
this.currentDelimiters = 0;
} else if (this.currentDelimiters >= this.expectedDelimitersPerLine) {
lineBuffer.append(rawline);
recordList.add(lineBuffer.toString());
this.currentDelimiters = 0;
} else {
lineBuffer.append(rawline);
}
});
}
return recordList;
}
}
-
\$\begingroup\$ Small addition: if you want to specify the charset explicitly (like the original code did),
Files.lines()
defines an overloaded version with a charset parameter. Esp., you can pass a standard charset there to get rid of the pesky UnsupportedEncodingExceptions: e.g.Files.lines(path, StandardCharsets.UTF_8)
\$\endgroup\$mtj– mtj2018年09月05日 15:09:56 +00:00Commented Sep 5, 2018 at 15:09 -
\$\begingroup\$ Thanks for the charset comment. I have added that as part of the RawFile class definition and used that in Files.lines as overloaded attribute for additional option. \$\endgroup\$Sunil K-Standard Chartered– Sunil K-Standard Chartered2018年09月13日 15:59:32 +00:00Commented Sep 13, 2018 at 15:59