7
\$\begingroup\$

I have a parser for CNF formulas in Dimacs format, which is very slow. Any suggestion on how to improve its speed? I did some profiling and I might have to replace Scanner. Is there anything faster?

A possible input for the parser is:

c A sample .cnf file.
p cnf 3 2
1 -3 0
2 3 -1 0

The code:

/**
 * Parses a stream for a CNF instance. 
 *
 * @param source input stream 
 * @return read skeleton
 * @throws ParseException if stream contains an invalid instance 
 */
private static Skeleton parseStream(final InputStream source) 
 throws IOException, ParseException { 
 Scanner scanner = new Scanner(source); 
 // Skip comments
 try { 
 String token = scanner.next(); 
 while (token.equals("c")) { 
 scanner.nextLine();
 token = scanner.next(); 
 }
 if (!token.equals("p")) {
 throw new ParseException( 
 "Excepted 'p', but '" + token + "' was found"); 
 }
 } catch (NoSuchElementException e) {
 throw new ParseException("Header not found"); 
 }
 // Reads header
 int numVariables, numClauses; 
 try {
 String cnf = scanner.next(); 
 if (!cnf.equals("cnf")) { 
 throw new ParseException( 
 "Expected 'cnf', but '" + cnf + "' was found"); 
 }
 numVariables = scanner.nextInt(); 
 numClauses = scanner.nextInt();
 } catch (NoSuchElementException e) {
 throw new ParseException("Incomplete header"); 
 }
 logger.debug("p cnf " + numVariables + " " + numClauses); 
 // Reads clauses
 Skeleton skeleton = new Skeleton(numVariables);
 try {
 while (numClauses > 0) {
 int literal = scanner.nextInt();
 skeleton.cnf.add(literal);
 if (literal == 0) {
 numClauses--;
 }
 }
 } catch (NoSuchElementException e) {
 throw new ParseException(
 "Incomplete problem: " + numClauses + " clauses are missing");
 }
 return skeleton; 
} 
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Feb 15, 2011 at 0:04
\$\endgroup\$
2
  • \$\begingroup\$ I suggest you to use ANTLR to generate parsers. It also provides you with some useful operations. You can find all you need. When you don't want to just learn how to write a parser from scratch then you should not reinvent the wheel. \$\endgroup\$ Commented Apr 8, 2011 at 7:51
  • \$\begingroup\$ Are you sure it's the parsing that is the bottleneck here? Simply reading data from disk is an expensive operation, so, it could be just that. \$\endgroup\$ Commented Apr 8, 2011 at 19:26

1 Answer 1

7
\$\begingroup\$

Use a BufferedInputStream to speed up the disk access. If that's not enough, you can read the file line-by-line and use split() to break it into individual numbers.

answered Feb 15, 2011 at 7:29
\$\endgroup\$
1
  • \$\begingroup\$ Same exact suggestion I was going to give \$\endgroup\$ Commented Feb 16, 2011 at 21:20

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.