1

I'm trying to parse the following file which contains information in the following format:

TABLE_NAME

VARIABLE_LIST_OF_COLUMNS

VARIABLE_NUMBER_OF_ROWS (Seperated by a tab seperator)

An example (using ',' as the seperator for the question; actual seperator is a tab):

STUDENTS

ID

NAME

1,Mike

2,Kimberly

The idea is to build a list of insert sql statements (context for the code snippet).

What I want to know is whether this kind of multiline parsing is at all possible using java 8 streams API? This is what I have at the moment:

public final class StatementGeneratorMain {
 public static void main(final String[] args) throws Exception{
 List<String> fileNames = Arrays
 .asList("STUDENTS.txt");
 fileNames.stream()
 .forEach(fileName -> {
 String tableName;
 List<String> columnNames;
 List<String[]> dataRows;
 try (BufferedReader br = getBufferedReader(fileName)) {
 tableName = br.lines().findFirst().get();
 } catch (Exception e) {
 throw new RuntimeException(e);
 }
 try (BufferedReader br = getBufferedReader(fileName)) {
 //skip the first line because its been processed.
 columnNames = br.lines().skip(1).filter(v -> v.split("\t").length == 1).collect(toList());
 } catch (Exception e) {
 throw new RuntimeException(e);
 }
 try (BufferedReader br = getBufferedReader(fileName)) {
 //skip the first line and the columns length to get the data
 //columns are identified as being splittable on the delimiter
 dataRows = br.lines().skip(1 + columnNames.size()).map(s -> s.split("\t"))
 .collect(toList());
 } catch (Exception e) {
 throw new RuntimeException(e);
 }
 String columns = columnNames.stream().collect(joining(",","(",")"));
 List<String> dataRow = dataRows.stream()
 .map(arr -> Arrays.stream(arr).map(x -> "'" + x + "'").collect(joining(",", "(", ")")))
 .map(row -> String.format("INSERT INTO %s %s VALUES %s;", tableName, columns, row))
 .collect(toList());
 dataRow.forEach(l -> System.out.println(l));
 });
 }
 private static BufferedReader getBufferedReader(String fileName) {
 return new BufferedReader(new InputStreamReader(StatementGeneratorMain.class.getClassLoader().getResourceAsStream(
 fileName)));
 }
}

This piece of code does the job for me, but I don't really like it because I read the same file thrice (once for table name, again to deduce the columns, again to get the rows). I also don't think that it is proper functional style.

What I am looking for is a more elegant way to do this kind of multiline/multirecord parsing using the streams API.

For completeness, the output is:

INSERT INTO STUDENTS (ID, NAME) VALUES ('1','Mike');

INSERT INTO STUDENTS (ID, NAME) VALUES ('2','Kimberly');

I'm not too particular about stuff like numeric column and null values at this point.

asked May 26, 2015 at 21:30
6
  • 2
    If your code works and you are looking for a way to improve it then you probably should post your question on codereview.stackexchange.com instead of Stack Overflow. Commented May 26, 2015 at 21:47
  • 2
    BTW I am not sure why you need getBufferedReader method. If you want to get stream of lines from file simply use Files.lines(Paths.get(fineName)) (you can also add charset if it is needed). Commented May 26, 2015 at 21:49
  • 4
    @Pshemo has good advice. I will add that your life and your code will be simpler if the column names are a CSV on one line (rather than each column name on a separate line) , because you actually need them as a CSV and it solves the problem of figuring out where the column names stop and the rows start. Commented May 26, 2015 at 21:52
  • 3
    By the way little Bobby Tables likes how you quote your database inputs. Commented May 27, 2015 at 1:22
  • @Pshemo Thanks I think I will take the question there. Commented May 27, 2015 at 5:49

2 Answers 2

2

I am not sure if using streams is correct approach here since they ware meant to be used to iterate over data once, or to be more precise, handle data in one way. If you need to handle separate data chunks differently you should probably use good old loops or iterators. One of simplest solutions which comes to mind is using Scanner so your code can look like:

Pattern oneWordLine = Pattern.compile("^\\w+$", Pattern.MULTILINE);
List<String> files = Arrays.asList("input.txt");
for (String file : files) {
 try (Scanner sc = new Scanner(new File(file))) {
 String tableName = sc.nextLine();
 StringJoiner columnNamesJoiner = new StringJoiner(", ", "(", ")");
 // iterate over lines with single words
 while (sc.hasNext(oneWordLine)) {
 columnNamesJoiner.add(sc.nextLine());
 }
 String columns = columnNamesJoiner.toString();
 List<String> dataRow = new ArrayList<>();
 // iterate over rest of lines
 while (sc.hasNextLine()) {
 String values = Arrays.stream(sc.nextLine().split("\t")) 
 .collect(joining("', '", "('", "')"));
 dataRow.add(String.format("INSERT INTO %s %s VALUES %s;", 
 tableName,columns, values));
 }
 dataRow.forEach(System.out::println);
 } catch (Exception e) {
 e.printStackTrace();// no need to rethrow RuntimeEception
 }
}
answered May 26, 2015 at 22:27

Comments

0

You can move this piece "BufferedReader br = getBufferedReader(fileName)" to above, and read it as you required. I dont think, it is needed to read three times.

answered May 26, 2015 at 21:42

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.