I am currently dealing with an app that has several classes which are used to compare files in various formats (xls, csv, xml, html, pdf...). They are all implementing an interface that is defined like this:
public interface ReportComparator {
// compare files (1 - match, 0 - no match, -1 - both files empty)
int compareTwoFiles(InputStream fileA, InputStream fileB) throws IOException;
// gets a list of format-specific differences between two files
List<String> getDifferences(InputStream fileA, InputStream fileB) throws IOException;
}
The specific comparators are instantiated through a class Report
and are later used by a GUI application, that prints the results of the comparison.
Report:
public class Report {
private final String name, format;
private final ReportComparator comparator;
public Report(String name, String format) {
// produce comparator specific for the format
}
ReportComparator getComparator() {
return this.comparator;
}
}
The problem is, when I am dealing with large files I quickly run out of memory when calling getDifferences()
(increasing memory is currently not an option). Therefore, I thought of using something like a python generator
, but I am having trouble visualizing it in java. Since lambdas
allow deferred execution they would be a candidate for that, but would it be possible keeping this API or at least not having to refactor a lot of code (practically writing it from scratch again)?
1 Answer 1
I am not sure whether this is idiomatic Java, since I'm not really a Java programmer, but a nice way to do it involves rewriting your comparator as a state machine, keeping the state in some object. For example:
public interface Coroutine {
List<String> getPrevDiffs(void);
String getDiff(void);
}
private class SomeSpecificReportComparison implements Coroutine {
private InputStream a, b;
private List<String> prevDiffs = new List<>;
private int state = 0;
SomeSpecificReportComparison(InputStream a, InputStream b) {
this.a = a;
this.b = b;
}
List<String> getPrevDiffs(void) {
return prevDiffs;
}
String getDiff(void) {
// diff generation goes here
// add diff to prevDiffs
return diff;
}
}
The state
variable is for storing what state the getDiff
state machine is in, and any other variables that need to be preserved must also be instance variables. Then, your comparator function would just return an instance of this class.
For more details, see the article from Simon Tatham about coroutines in c, which uses a similar techniche.
-
Thank you, but that doesn't solve my problem - I still need a list to keep the diffs. I want to be able to hold in memory just one diff at a time.DCzo– DCzo2017年09月14日 10:39:10 +00:00Commented Sep 14, 2017 at 10:39
-
Sorry, @DCzo. Forgot about that requirement, you don't need the list for that to work, it is for keeping previous diffs if you need them later.Min4Builder– Min4Builder2017年09月16日 13:43:31 +00:00Commented Sep 16, 2017 at 13:43
-
Thanks for the clarification. Now the problem is, how do I keep track of my position inside a file or rather a
Stream
. Starting every time from the beginning would definitely take too much time...DCzo– DCzo2017年09月17日 12:49:39 +00:00Commented Sep 17, 2017 at 12:49 -
It's technically in the answer, but it may not be clear. The idea is to keep all necessary state in instance variables.Min4Builder– Min4Builder2017年09月18日 16:24:34 +00:00Commented Sep 18, 2017 at 16:24
-
I don't know how i could have missed it :). Thank you.DCzo– DCzo2017年09月23日 13:40:38 +00:00Commented Sep 23, 2017 at 13:40
ReportComparator::getDifferences
to return a stream be okay? You need a return type which isn't required to be strict hereStream
I getStackOverflowError
s, because I add the differences usingStream.concat
in a loop - and it is long. Can you think of a solution to that problem?Supplier
to 'feed' theStream
. The problem now is: how do I keep current position in the file that I am parsing?