I need to parse a file, with a standard formatting:
200 lines of uninteresting data
=====================================================================
Column(s)
=====================================================================
Column Description : HP-PONA
Inventory# : autoID-1
Diameter : 200.0 μm
Length : 50.0 m
The data must be outputted to a POJO outside of the project scope (i.e. I cannot change the POJO definition - ColumnInnerDiameter
and ChromatographyColumnLength
could/should be the same class or inherit for the same base class, but I cannot change this.):
public class ChromatographyColumnDocument { private ChromatographyColumnLength chromatographyColumnLength; private ColumnInnerDiameter columnInnerDiameter; public ChromatographyColumnLength getChromatographyColumnLength() { return chromatographyColumnLength; } public void setChromatographyColumnLength(ChromatographyColumnLength chromatographyColumnLength) { this.chromatographyColumnLength = chromatographyColumnLength; } public ColumnInnerDiameter getColumnInnerDiameter() { return columnInnerDiameter; } public void setColumnInnerDiameter(ColumnInnerDiameter columnInnerDiameter) { this.columnInnerDiameter = columnInnerDiameter; } } public class ColumnInnerDiameter { private Double value; private String unit; public Double getValue() { return value; } public void setValue(Double value) { this.value = value; } public String getUnit() { return unit; } public void setUnit(String unit) { this.unit = unit; } } public class ChromatographyColumnLength{ private Double value; private String unit; public Double getValue() { return value; } public void setValue(Double value) { this.value = value; } public String getUnit() { return unit; } public void setUnit(String unit) { this.unit = unit; } }
Finally, the code under review:
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class ColumnInformationMapper {
private ColumnInformationMapper() {
throw new IllegalStateException("Utility class");
}
private static final String valueAndUnitPattern = "([\\d|\\.]*) (\\S*)";
public static ChromatographyColumnDocument readColumnDocumentFromFile(String folderPath) throws Exception {
String line;
ChromatographyColumnDocument columnDocument = new ChromatographyColumnDocument();
File file = new File(folderPath,"acq.txt");
try (BufferedReader br = new BufferedReader(
new InputStreamReader(
new FileInputStream(file), StandardCharsets.UTF_16))){
/* Looking for pattern
=====================================================================
Column(s)
=====================================================================
Column Description : HP-PONA
*/
while ((line = br.readLine()) != null) {
if (line.contains("======")) {
line = br.readLine();
if (line.contains("Column(s)")) {
br.readLine();// === line
br.readLine();// empty line
break;
}
}
}
br.readLine(); //Column Description - Not in model
br.readLine(); //Inventory # - Not in model
columnDocument.setColumnInnerDiameter(parseColumnInnerDiameter(
StringUtils.substringAfterLast(br.readLine(), ":").trim())); //Column diameter
columnDocument.setChromatographyColumnLength(parseColumnLength(
StringUtils.substringAfterLast(br.readLine(), ":").trim())); //Column length
return columnDocument;
}
}
private static ColumnInnerDiameter parseColumnInnerDiameter(String inputString) {
Pattern p = Pattern.compile(valueAndUnitPattern);
Matcher m = p.matcher(inputString);
if (m.find()) {
ColumnInnerDiameter columnInnerDiameter = new ColumnInnerDiameter();
columnInnerDiameter.setValue(Double.parseDouble(m.group(1)));
columnInnerDiameter.setUnit(m.group(2));
return columnInnerDiameter;
}
throw new NumberFormatException();
}
private static ChromatographyColumnLength parseColumnLength(String inputString) {
Pattern p = Pattern.compile(valueAndUnitPattern);
Matcher m = p.matcher(inputString);
if (m.find()) {
ChromatographyColumnLength columnLength = new ChromatographyColumnLength();
columnLength.setValue(Double.parseDouble(m.group(1)));
columnLength.setUnit(m.group(2));
return columnLength;
}
throw new NumberFormatException();
}
}
My questions:
To skip a line I am not interested in, I use
readLine()
and just don't use the results. SonarQube is very mad at this, and calls it a Major Bug: "When a method is called that returns data read from some data source, that data should be stored rather than thrown away. Any other course of action is surely a bug.". Is there a more correct way of skipping lines?The methods to parse
ColumnInnerDiameter
andChromatographyColumnLength
are identical. I have tried to fiddle with generics to get rid of code repetition, but could not get it to work. Is there any way to improve on this side, without modifying the POJO classes?Any advice on how to improve the code overall is most welcome!
2 Answers 2
Your questions:
- SonarQube is wrong and you are right. That is the correct way to skip a line. You can use
@SuppressWarnings
to get rid of the warnings, or you can assign the skipped line to some local variable, such asString skip = br.nextLine()
. - Since the two Column classes do not share an interface which defines those methods, I do not see a good way to remove the duplication.
- See below:
Avoid * imports. It makes it harder to determine where dependencies are coming from, and harder to determine what dependencies a class actually has.
Consider importing UTF_16
statically.
There's no need to throw an exception from a private constructor. Having the private constructor is enough.
Constants (and all variables) belong before the constructor, not after. Constants should use UPPER_SNAKE_CASE
variable names rather than camelCase
.
Throw the most specific exceptions possible. A client cannot handle exception types differently if all it knows it might get is an Exception
.
Avoid acronyms and abbreviations when naming code. bufferedReader
would be preferable to br
. Code should strive to be readable. "bee arr dot readline" vs. "buffered reader dot readline".
The pattern currently used to determine if something is a number is not correct. It will clearly match invalid numbers.
Prefer passing in a File
instead of String
for the file path. A String
can be anything. Consider changing the file path to a File
object as soon as possible so the code gains type safety.
Rather than constructing nested readers, consider creating separate named instances. This aligns the code more neatly. Since the use is literally on the next line, it's reasonable to use abbreviated names here. The tradeoff is that the stream and the reader are now in scope for the try block, where they weren't before. Given the size of the method, I think the readability tradeoff is worth it.
Consider breaking out the skip portion of readColumnDocumentFromFile
into its own method.
readColumnDocumentFromFile
will throw a NullPointerException if the columns
block is not in the text file. It would be better to catch the NoSuchElementException and rethrow something that gives more context into what went wrong.
The two parse
methods are throwing NumberFormatException
, which doesn't make sense when the real problem is that the column data is invalid.
Pattern
is intended to be compiled once, and then used as a factory for Matcher
instances. The Pattern
instance should be the constant, not the String.
The code would probably read more cleanly if a Scanner
was used. The methods would probably also then be unnecessary, as Scanner can cleanly locate the next double or String value.
If you made all these changes, your code might look more like:
import static java.nio.charset.StandardCharsets.UTF_16;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Scanner;
import java.util.regex.Pattern;
public final class ColumnInformationMapper {
private static final Pattern COLUMN_NAME_PATTERN = Pattern.compile(".*:");
private ColumnInformationMapper() {
}
/**
* @throws FileNotFoundException if there is no `acq.txt` file in the given folderPath.
* @throws IOException if the file cannot be read.
*/
public static ChromatographyColumnDocument readColumnDocumentFromFile(File folderPath)
throws FileNotFoundException, IOException {
ChromatographyColumnDocument columnDocument = new ChromatographyColumnDocument();
File file = new File(folderPath, "acq.txt");
try (FileInputStream fis = new FileInputStream(file);
InputStreamReader isr = new InputStreamReader(fis, UTF_16);
Scanner acqScanner = new Scanner(isr)) {
skipToColumnInnerDiameterLine(acqScanner);
ColumnInnerDiameter columnInnerDiameter = new ColumnInnerDiameter();
acqScanner.next(COLUMN_NAME_PATTERN); // skip column name
columnInnerDiameter.setValue(acqScanner.nextDouble());
columnInnerDiameter.setUnit(acqScanner.next());
columnDocument.setColumnInnerDiameter(columnInnerDiameter);
ChromatographyColumnLength columnLength = new ChromatographyColumnLength();
acqScanner.next(COLUMN_NAME_PATTERN); // skip column name
columnLength.setValue(acqScanner.nextDouble());
columnLength.setUnit(acqScanner.next());
columnDocument.setChromatographyColumnLength(columnLength);
return columnDocument;
}
}
private static void skipToColumnInnerDiameterLine(Scanner acqScanner) {
/* Looking for pattern
=====================================================================
Column(s)
=====================================================================
Column Description : HP-PONA
*/
String line;
while ((line = acqScanner.nextLine()) != null) {
if (line.contains("======")) {
line = acqScanner.nextLine();
if (line.contains("Column(s)")) {
acqScanner.nextLine();// === line
acqScanner.nextLine();// empty line
break;
}
}
}
acqScanner.nextLine(); //Column Description - Not in model
acqScanner.nextLine(); //Inventory # - Not in model
}
}
-
\$\begingroup\$ I disagree, Sonar is correct. If you read a line with the intention to skip it AND you know what the line is supposed to contain, then you need to verify that the skipped line actually contains it. Othwerise errors in the input will just be ignored. Maybe write a helper method for asserting that a string contains a substring and throwing an exception if it does not. \$\endgroup\$TorbenPutkonen– TorbenPutkonen2022年09月29日 04:41:57 +00:00Commented Sep 29, 2022 at 4:41
Avoid code duplication
First, to remove duplicate code present in your methods parseColumnInnerDiameter
and parseColumnLength
, try to find some similarities between this two classes ColumnInnerDiameter
and ChromatographyColumnLength
.
They have both Unit and Value attributes. So you can create a POJO with these two attributes.
ValueInit.class
import lombok.Getter;
import lombok.RequiredArgsConstructor;
import lombok.Setter;
@RequiredArgsConstructor
@Getter
@Setter
public class ValueUnit {
private final Double value;
private final String unit;
}
Now you can make a new class that extracts Value and Unit from a document line.
ValueUnitExtractor.class
public class ValueUnitExtractor {
private static final String VALUE_AND_UNIT_REGEX = "([\\d|\\.]*) (\\S*)";
// Pattern declaration have to be static final because compiling regex may cost a lot
private static final Pattern VALUE_AND_UNIT_PATTERN = Pattern.compile(VALUE_AND_UNIT_REGEX);
/**
* Allow extracting the Value and Unit of a line
*/
public ValueUnit extract(String line) {
Matcher m = VALUE_AND_UNIT_PATTERN.matcher(line);
if (!m.find()) {
throw new NumberFormatException();
}
return new ValueUnit(Double.parseDouble(m.group(1)), m.group(2));
}
}
To apply this extractor on your two similar objects, you can use a strategy pattern. We will use an enum for that point.
LineType.class
public enum LineType {
COLUMN_INNER_DIAMETER() {
@Override
protected ColumnInnerDiameter create(ValueUnit valueUnit) {
ColumnInnerDiameter columnLength = new ColumnInnerDiameter();
columnLength.setValue(valueUnit.getValue());
columnLength.setUnit(valueUnit.getUnit());
return columnLength;
}
},
CHROMATOGRAPHY_COLUMN_LENGTH() {
@Override
protected ChromatographyColumnLength create(ValueUnit valueUnit) {
ChromatographyColumnLength columnLength = new ChromatographyColumnLength();
columnLength.setValue(valueUnit.getValue());
columnLength.setUnit(valueUnit.getUnit());
return columnLength;
}
};
// The common ValueUnit extractor
private final ValueUnitExtractor extractor = new ValueUnitExtractor();
// The only method expose by your enum to parse a document line and create ChromatographyColumnLength or ColumnInnerDiameter object
// I didn't find a way to return generic type here...
public Object parse(String line) {
ValueUnit valueUnit = extractor.extract(line);
return create(valueUnit);
}
// This method allow you to create specific object from the ValueUnit POJO
protected abstract Object create(ValueUnit valueUnit);
After that, you can create a ChromatographyColumnDocument factory to help you build ChromatographyColumnDocument instance and hide the parsing document line : ChromatographyColumnDocumentFactory.class
public class ChromatographyColumnDocumentFactory {
public static ChromatographyColumnDocument create(String firstLine, String secondLing) {
ChromatographyColumnDocument columnDocument = new ChromatographyColumnDocument();
columnDocument.setColumnInnerDiameter(parseColumnInnerDiameter(extractRightPart(firstLine)));
columnDocument.setChromatographyColumnLength(parseColumnLength(extractRightPart(secondLing)));
return columnDocument;
}
private static String extractRightPart(String fullLine) {
// For null-safety, use StringUtils.trim on StringUtils.substringAfterLast() result
return StringUtils.trim(StringUtils.substringAfterLast(fullLine, ":"));
}
// Here you use a specific enum value to build ColumnInnerDiameter
private static ColumnInnerDiameter parseColumnInnerDiameter(String line) {
return (ColumnInnerDiameter) LineType.COLUMN_INNER_DIAMETER.parse(line);
}
// Here you use a specific enum value to build ChromatographyColumnLength
private static ChromatographyColumnLength parseColumnLength(String line) {
return (ChromatographyColumnLength) LineType.CHROMATOGRAPHY_COLUMN_LENGTH.parse(line);
}
}
Sonar alert
If you looking for Sonar happiness, you could verify all lines after pattern matching and throwing custom error. DocumentReader.class
public class DocumentReader {
public static void readAndFindPattern(BufferedReader br) throws IOException {
boolean isFound = false;
String line;
while ((line = br.readLine()) != null) {
if (line.contains("=====")) {
line = br.readLine();
if (line.contains("Column(s)")) {
isFound = true;
break;
}
}
}
if (!isFound) {
throw new InvalidFileContentException("Pattern not found");
}
if (!br.readLine().contains("=====")) {
throw new InvalidFileContentException("Line must contains '====='");
}
if (!br.readLine().isBlank()) {
throw new InvalidFileContentException("Line after '=====' must be blank");
}
if (!br.readLine().contains("Column Description")) {
throw new InvalidFileContentException("Line must contains 'Column Description'");
}
if (!br.readLine().contains("Inventory #")) {
throw new InvalidFileContentException("Line must contains 'Inventory #'");
}
}
}
InvalidFileContentException.class
public class InvalidFileContentException extends RuntimeException {
public InvalidFileContentException(String message) {
super(message);
}
}
Finally, your ColumnInformationMapper uses DocumentReader to find the pattern and ChromatographyColumnDocumentFactory to create ChromatographyColumnDocument
ColumnInformationMapper.class
@UtilityClass
public class ColumnInformationMapper {
public static ChromatographyColumnDocument readColumnDocumentFromFile(String folderPath) throws Exception {
File file = new File(folderPath, "acq.txt");
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file), StandardCharsets.UTF_16))) {
DocumentReader.readAndFindPattern(br);
return ChromatographyColumnDocumentFactory.create(br.readLine(), br.readLine());
}
}
}
-
2\$\begingroup\$ I generally don't think
lombok
is worth pulling in the dependency when Java hasrecord
s. \$\endgroup\$Reinderien– Reinderien2022年09月28日 23:44:52 +00:00Commented Sep 28, 2022 at 23:44 -
1\$\begingroup\$ Hi Guillaume, welcome to the site. In my opinion, this design is over-engineered. It adds 6 new classes to address 3 lines of duplicate code. \$\endgroup\$Eric Stein– Eric Stein2022年09月29日 00:46:04 +00:00Commented Sep 29, 2022 at 0:46