10
\$\begingroup\$

I created a method to test whether the specified input is a double, int or String and then pass it to a 3rd party library which has the methods doFoo which accepts either a String, double or int. This is the method that I would like feedback on:

public static void testString(String val) { 
 System.out.print("Original '" + val + "' ");
 String x = val.trim();
 try {
 int i = Integer.parseInt(x);
 System.out.println("It's an integer: " + i);
 doFoo(i);
 } catch (NumberFormatException e) {
 try {
 double d = Double.parseDouble(x);
 System.out.println("It's a double: " + d);
 doFoo(d);
 } catch (NumberFormatException e2) {
 System.out.println("It's a String: " + x);
 doFoo(x);
 }
 }
}

Is this good code? Could it be improved? I don't like the throwing and catching of Exceptions.

Some test code to prove it works:

testString("N/A");
testString("19.");
testString("19.0");
testString("19.4");
testString(" 1 ");
testString(" 1");
testString("1 ");
testString("1");
testString(" ");

Results in:

Original 'N/A' It's a String: N/A
Original '19.' It's a double: 19.0
Original '19.0' It's a double: 19.0
Original '19.4' It's a double: 19.4
Original ' 1 ' It's an integer: 1
Original ' 1' It's an integer: 1
Original '1 ' It's an integer: 1
Original '1' It's an integer: 1
Original ' ' It's a String: 
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 31, 2017 at 17:03
\$\endgroup\$
3
  • 3
    \$\begingroup\$ What is the use-case for this? What is it that doFoo is really doing? \$\endgroup\$ Commented Mar 31, 2017 at 17:21
  • \$\begingroup\$ You do realize that if you redefine foo to have a different signature it the compiler will determine which doFoo to call? This is unlike a language like Python. \$\endgroup\$ Commented Mar 31, 2017 at 18:13
  • 3
    \$\begingroup\$ doFoo is actually part of Apache-Poi. It is setting an Excel cell value and there are 3 methods which are for double, String and int. \$\endgroup\$ Commented Mar 31, 2017 at 21:43

3 Answers 3

6
\$\begingroup\$

I don't like the throwing and catching of Exceptions

This can be made much cleaner with the use of a Scanner. It might not be the most performant way, but it's fast and easy to use.

try (Scanner scanner = new Scanner(x)) {
 if (scanner.hasNextInt()) doFoo(scanner.nextInt());
 else if (scanner.hasNextDouble()) doFoo(scanner.nextDouble());
 else doFoo(x);
}

However, if this is going to be called hundreds of thousands of times, the try catch method might be faster, though you should encapsulate those into their own functions. You'd need to profile to be sure which is faster, but I believe it would be this because Scanner.hasNextFoo uses regular expressions:

public static boolean isInteger(String str) {
 try {
 Integer.parse(str);
 return true;
 } catch (NumberFormatException e) {
 return false;
 }
}

Also, your function is doing multiple things: printing/reporting, and parsing/forwarding to doFoo. This is not a good thing. I'd recommend removing those and handling them where it's more appropriate:

public static void testString(String val) { 
 String x = val.trim();
 try (Scanner scanner = new Scanner(x)) {
 if (scanner.hasNextInt()) doFoo(scanner.nextInt());
 else if (scanner.hasNextDouble()) doFoo(scanner.nextDouble());
 else doFoo(x);
 }
}

That was much shorter. Now if you wanted the same functionality, it would look like so:

public static void testTestString(String val) {
 System.out.print("Original '" + val + "' ");
 testString(val);
}
// ...
public static void doFoo(int i) {
 System.out.println("It's an integer: " + i);
 // ...
}

If you want your code to be extremely extensible, there is another way. Notice how the new function I suggested still does multiple things:

  • It detects the type of the string
  • It parses the value from the string
  • It forwards the value on to another function

We can separate these into their own components.

This is only really worth it if you can foresee adding types to be a common feature, but especially if the "another function" you forward to should be selectable by the user (say, if you packaged these functions as member functions of an object):

// Class is the easiest type we can return
private static Class<?> determineType(String val) {
 try (Scanner scanner = new Scanner(val)) {
 if (scanner.hasNextInt()) return Integer.class;
 if (scanner.hasNextDouble()) return Double.class;
 return String.class;
 }
}
private static final Map<Class<?>, Function<String, ?>> parsers = new IdentityHashMap<>();
private static final Map<Class<?>, Consumer<Object>> functionSwitch = new IdentityHashMap<>();
static {
 parsers.put(Integer.class, Integer::parseInt);
 parsers.put(Double.class, Double::parseDouble);
 parsers.put(String.class, Function.identity());
 // Note that, due to limitations in the type system,
 // i is of type Object, so we need to cast it to the appropriate
 // class before forwarding on to the function.
 functionSwitch.put(Integer.class, i -> doFoo((Integer) i));
 functionSwitch.put(Double.class, d -> doFoo((Double) d));
 functionSwitch.put(String.class, str -> doFoo((String) str));
}
public static void testString(String val) {
 val = val.trim(); // This could even be part of the parser's responsibility
 Class<?> stringType = determineType(val);
 Function<String, ?> parser = parsers.get(stringType);
 functionSwitch.get(stringType).accept(parser.apply(val));
}
answered Mar 31, 2017 at 18:49
\$\endgroup\$
2
  • \$\begingroup\$ Very nice answer. Unfortunately I should have said that performance is key here as my method may be called thousands of times. I was worried about the try / catch being un-performant. However, thank you for the detailed answer - I've found it illuminating \$\endgroup\$ Commented Mar 31, 2017 at 21:45
  • 2
    \$\begingroup\$ @Phil Even for thousands of time, Scanner is quite fast. Assuming it is used for input (which is already slow anyway), Scanner is not what is going to be taking up all your time. But the try {} catch way is about 15x faster \$\endgroup\$ Commented Mar 31, 2017 at 23:42
6
\$\begingroup\$

Background

This question was brought to my attention in The 2nd Monitor chat room because in the past I have claimed that using exception handling to handle parse exceptions is "a bad idea and slow". This is exactly what your code is doing, and it's a bad idea, and slow.... at least, that's what I thought, until I benchmarked your code.

Now, in the past, I wrote a CSV parser and it used a similar system to yours to handle the values in a field, and I discovered that I got a significant speed-up (like 100% faster) when I prevalidated the values to an large extent, before doing a parseInt or parseDouble on them. I found that it is much better to "identify" a value of a certain type to a high degree of confidence, and thus reduce the number of exceptions thrown.

In your code, if the values are 1/3 integers, 1/3 double, and 1/3 string, then on average you are creating 1 exception for each value (none for ints, 1 for doubles, and 2 for strings). Worst case, if all your values are strings, you'll create 2 exceptions per value.

What if you could (almost) guarantee that all your parseInt and parseDouble calls will succeed, and you'll have (almost) no exceptions? Is the work to check the value "worth it"?

My claim is yes, it's worth it.

So, I have tried to prove it, and ... the results are interesting.

I used my MicroBench performance system to run the benchmark, and I built a dummy "load" for the doFoo function. Let's look at my test-rig:

public class ParseVal {
 
 private final LongAdder intsums = new LongAdder();
 private final DoubleAdder doubsums = new DoubleAdder();
 private final LongAdder stringsums = new LongAdder();
 
 private final void doFoo(int val) {
 intsums.add(val);
 }
 
 private final void doFoo(double val) {
 doubsums.add(val);
 }
 
 private final void doFoo(String val) {
 stringsums.add(val.length());
 }
 
 @Override
 public String toString() {
 return String.format("IntSum %d - DoubleSum %.9f - StringLen %d", intsums.longValue(), doubsums.doubleValue(), stringsums.longValue());
 }
 public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
 ParseVal pv = new ParseVal();
 for (String v : data) {
 fn.accept(pv, v);
 }
 return pv.toString();
 }
 
 public static final String[] testData(int count) {
 String[] data = new String[count];
 Random rand = new Random(count);
 for (int i = 0; i < count; i++) {
 String base = String.valueOf(1000000000 - rand.nextInt(2000000000));
 switch(i % 3) {
 case 0:
 break;
 case 1:
 base += "." + rand.nextInt(10000);
 break;
 case 2:
 base += "foobar";
 break;
 }
 data[i] = base;
 }
 return data;
 }
 
 .......
 public void testStringOP(String val) { 
 String x = val.trim();
 try {
 int i = Integer.parseInt(x);
 doFoo(i);
 } catch (NumberFormatException e) {
 try {
 double d = Double.parseDouble(x);
 doFoo(d);
 } catch (NumberFormatException e2) {
 doFoo(x);
 }
 }
 }
 
 public static void main(String[] args) {
 String[] data = testData(1000);
 String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
 System.out.println(expect);
 
 ....
 }
}

The doFoo methods have an accumulator mechanism (adding up ints, doubles, and the string lengths) and making the results available in a toString method.

Also, I have put your function in there as testStringOP.

There is a testData function which builds an array if input strings where there are approximately equal numbers of int, double, and string values.

Finally, the benchmark function:

public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
 ParseVal pv = new ParseVal();
 for (String v : data) {
 fn.accept(pv, v);
 }
 return pv.toString();
}

That function takes an input function and the test data as an argument, and returns the String summary as a result. You would use this function like it's used in the main method....

String expect = testFunction((pv, v) -> pv.testStringOP(v), data);

which runs the testStringOP function on all the input data values, and returns the accumulated string results.

What's nice is that I can now create other functions to test performance, for example testStringMyFn and call:

String myresult = testFunction((pv, v) -> pv.testStringMyFn(v), data);

This is the basic tool I can use for the MicroBench system: https://github.com/rolfl/MicroBench

Scanner option

Let's start by comparing your function to the Scanner type system recommended in another answer... Here's the code I used for the Scanner:

public void testStringScanner(String val) {
 val = val.trim();
 try (Scanner scanner = new Scanner(val)) {
 if (scanner.hasNextInt()) {
 doFoo(scanner.nextInt());
 } else if (scanner.hasNextDouble()) {
 doFoo(scanner.nextDouble());
 } else {
 doFoo(val);
 }
 }
}

and here's how I benchmarked that code:

public static void main(String[] args) {
 String[] data = testData(1000);
 String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
 System.out.println(expect);
 
 UBench bench = new UBench("IntDoubleString Parser")
 .addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
 .addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
 bench.press(10).report("Warmup");
 bench.press(100).report("Final");
}

That runs the benchmark on both your function, and the Scanner function, and does a warmup run (to get JIT optimzations done), and a "Final" run to get real results.... what are the results, you ask?

Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
 Count : 100 Average : 1.6914
 Fastest : 1.5331 Slowest : 3.2561
 95Pctile : 2.0277 99Pctile : 3.2561
 TimeBlock : 1.794 2.037 1.674 1.654 1.674 1.588 1.665 1.588 1.634 1.606
 Histogram : 99 1
Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
 Count : 100 Average : 69.9713
 Fastest : 67.2338 Slowest : 98.4322
 95Pctile : 73.8073 99Pctile : 98.4322
 TimeBlock : 77.028 70.050 69.325 69.860 69.094 68.498 68.547 68.779 69.586 68.945
 Histogram : 100

What does that mean? It means, on average, your code is 40-times faster than the Scanner. Your code runs in 1.7Milliseconds to process 1000 input values, and the scanner runs in 70 milliseconds.

So, a Scanner is a bad idea if performance is required, right? I agree.

Alternative

But, what about a RegEx pre-validation check? Note that the regex will not guarantee a clean parse, but it can go a long way. For example, the regex [+-]?\d+ will match any integer, right, but is -999999999999999999999 a valid integer? No, it's too big. But, it is a valid double. We will still need to have a try/catch block even if we pass the regex prevalidation. That's going to eliminate almost all exceptions, though....

So, what do we do to prevalidate things? Well, the Double.valueOf(String) function documents a regex for matching double values in Strings. It's complicated, and I made a few modifications because we don't have already trimmed our inputs, but here's a couple of patterns for prevalidating double values, and integer values:

private static final String Digits = "(\\p{Digit}+)";
private static final String HexDigits = "(\\p{XDigit}+)";
private static final String Exp = "[eE][+-]?"+Digits;
private static final String fpRegex =
 ( //"[\\x00-\\x20]*"+ // Optional leading "whitespace"
 "[+-]?(" + // Optional sign character
 "NaN|" + // "NaN" string
 "Infinity|" + // "Infinity" string
 "((("+Digits+"(\\.)?("+Digits+"?)("+Exp+")?)|"+
 "(\\.("+Digits+")("+Exp+")?)|"+
 "((" +
 "(0[xX]" + HexDigits + "(\\.)?)|" +
 "(0[xX]" + HexDigits + "?(\\.)" + HexDigits + ")" +
 ")[pP][+-]?" + Digits + "))" +
 "[fFdD]?))"); // +
 //"[\\x00-\\x20]*");// Optional trailing "whitespace"
Pattern isDouble = Pattern.compile(fpRegex);
Pattern isInteger = Pattern.compile("[+-]?[0-9]+");

We can use those functions to build the code:

public void testStringRegex(String val) { 
 String x = val.trim();
 if (isInteger.matcher(x).matches()) {
 try {
 doFoo(Integer.parseInt(x));
 } catch (NumberFormatException nfe) {
 try {
 doFoo(Double.parseDouble(x));
 } catch (NumberFormatException e) {
 doFoo(x);
 }
 }
 } else if (isDouble.matcher(x).matches()) {
 try {
 doFoo(Double.parseDouble(x));
 } catch (NumberFormatException e) {
 doFoo(x);
 }
 } else {
 doFoo(x);
 }
}

Now, that's pretty complicated, right? Well, it does a "quick" integer regex check, and if it's likely an integer, it tries to parse it as an integer, and fails over to a double, and then to a string....

If it's not likely an integer, it checks if it's a double, and so on.....

How can this code be faster, you ask? Well, we're almost certainly having clean parses when we do them, and we'll have almost no exceptions... But, is it actually faster?

Here are the results:

Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
 Count : 100 Average : 1.6689
 Fastest : 1.5580 Slowest : 2.1572
 95Pctile : 1.8012 99Pctile : 2.1572
 TimeBlock : 1.695 1.752 1.709 1.670 1.641 1.648 1.643 1.639 1.662 1.630
 Histogram : 100
Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
 Count : 100 Average : 1.9580
 Fastest : 1.8379 Slowest : 2.5713
 95Pctile : 2.1004 99Pctile : 2.5713
 TimeBlock : 1.978 2.022 1.949 1.966 2.020 1.933 1.890 1.940 1.955 1.928
 Histogram : 100
Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
 Count : 100 Average : 69.8886
 Fastest : 67.1848 Slowest : 77.2769
 95Pctile : 71.9153 99Pctile : 77.2769
 TimeBlock : 70.940 69.735 69.879 69.381 69.579 69.180 69.611 70.412 70.123 70.045
 Histogram : 100

If you look, you'll see the regex version is Slower than the exception version... it runs in 1.95ms but the exception version runs in 1.67ms

Exceptions

But, there's a catch. In these tests, the stack trace for the exceptions is really small... and the "cost" of an exception depends on the depth of the trace, so let's increase the stack depths for the regex and exception code. Well add a recursive function to simulate a deeper stack:

public void testStringDeepOP(String val, int depth) {
 if (depth <= 0) {
 testStringOP(val);
 } else {
 testStringDeepOP(val, depth - 1);
 }
}
public void testStringDeepRegex(String val, int depth) {
 if (depth <= 0) {
 testStringRegex(val);
 } else {
 testStringDeepRegex(val, depth - 1);
 }
}

and we will test the OP and Regex code a different "depths" of nesting, 5, 10, and 20 layers deep. The benchmark code is:

 UBench bench = new UBench("IntDoubleString Parser")
 .addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
 .addTask("OP D5", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 5), data), s -> expect.equals(s))
 .addTask("OP D10", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 10), data), s -> expect.equals(s))
 .addTask("OP D20", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 20), data), s -> expect.equals(s))
 .addTask("Regex", () -> testFunction((pv, v) -> pv.testStringRegex(v), data), s -> expect.equals(s))
 .addTask("Regex D5", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 5), data), s -> expect.equals(s))
 .addTask("Regex D10", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 10), data), s -> expect.equals(s))
 .addTask("Regex D20", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 20), data), s -> expect.equals(s))
 .addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
 bench.press(10).report("Warmup");
 bench.press(100).report("Final");

What are the results?

Final
=====
Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS)
 Count : 100 Average : 1.7005
 Fastest : 1.5260 Slowest : 3.9813
 95Pctile : 1.9346 99Pctile : 3.9813
 TimeBlock : 1.682 1.624 1.612 1.675 1.708 1.658 1.727 1.738 1.672 1.910
 Histogram : 99 1
Task IntDoubleString Parser -> OP D5: (Unit: MILLISECONDS)
 Count : 100 Average : 1.9288
 Fastest : 1.7325 Slowest : 4.9673
 95Pctile : 2.0897 99Pctile : 4.9673
 TimeBlock : 2.124 1.812 1.828 1.873 1.925 1.877 1.855 1.869 1.903 2.221
 Histogram : 98 2
Task IntDoubleString Parser -> OP D10: (Unit: MILLISECONDS)
 Count : 100 Average : 2.2271
 Fastest : 2.0171 Slowest : 4.7395
 95Pctile : 2.4904 99Pctile : 4.7395
 TimeBlock : 2.392 2.125 2.129 2.152 2.246 2.169 2.189 2.203 2.247 2.420
 Histogram : 98 2
Task IntDoubleString Parser -> OP D20: (Unit: MILLISECONDS)
 Count : 100 Average : 2.9278
 Fastest : 2.6838 Slowest : 6.3169
 95Pctile : 3.2415 99Pctile : 6.3169
 TimeBlock : 2.870 2.822 2.860 2.794 2.956 2.861 3.041 3.012 2.853 3.211
 Histogram : 99 1
Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS)
 Count : 100 Average : 2.0739
 Fastest : 1.9338 Slowest : 3.8368
 95Pctile : 2.2744 99Pctile : 3.8368
 TimeBlock : 2.229 2.083 2.034 2.013 2.021 2.004 2.013 2.096 2.059 2.186
 Histogram : 100
Task IntDoubleString Parser -> Regex D5: (Unit: MILLISECONDS)
 Count : 100 Average : 2.0565
 Fastest : 1.9377 Slowest : 3.2857
 95Pctile : 2.2646 99Pctile : 3.2857
 TimeBlock : 2.148 2.075 2.035 2.038 2.035 2.031 2.026 2.000 2.032 2.145
 Histogram : 100
Task IntDoubleString Parser -> Regex D10: (Unit: MILLISECONDS)
 Count : 100 Average : 2.0647
 Fastest : 1.9598 Slowest : 2.6360
 95Pctile : 2.2906 99Pctile : 2.6360
 TimeBlock : 2.073 2.094 2.051 2.048 2.072 2.029 2.057 2.124 2.057 2.042
 Histogram : 100
Task IntDoubleString Parser -> Regex D20: (Unit: MILLISECONDS)
 Count : 100 Average : 2.0891
 Fastest : 1.9930 Slowest : 2.6483
 95Pctile : 2.2587 99Pctile : 2.6483
 TimeBlock : 2.108 2.070 2.078 2.066 2.071 2.091 2.048 2.090 2.137 2.132
 Histogram : 100
Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS)
 Count : 100 Average : 71.7199
 Fastest : 67.9621 Slowest : 152.0714
 95Pctile : 75.2141 99Pctile : 152.0714
 TimeBlock : 71.006 69.896 70.160 69.734 70.824 69.854 71.473 71.888 73.607 78.756
 Histogram : 99 1

Here it is expressed as a table (using the average times):

 0 5 10 20
OP 1.7005 1.9288 2.2271 2.9278
RegEx 2.0739 2.0565 2.0647 2.0891

Conclusion

So, that's the real problem with exceptions, the performance is unpredictable... and, for example, if you run it inside a Tomcat container, with stacks hundreds of levels deep, you may find this completely destroys your performance.

answered Apr 1, 2017 at 14:22
\$\endgroup\$
1
  • \$\begingroup\$ Great detailed answer - its certainly taught me a lot. Thanks very much for spending your time to add this. \$\endgroup\$ Commented Apr 2, 2017 at 20:20
5
\$\begingroup\$

Is this good code?

Yes, except for using System.out.println statements for logging.

Could it be improved? I don't like the throwing and catching of Exceptions.

There isn't much to improve other than logging and some design improvements @Justin suggested. Your code is better performance wise than using Scanner methods. The Scanner methods have some overhead trying to validate the input and ultimately calls respective parse methods. If you were to use above code in a high performance application than no further improvements needed.

answered Mar 31, 2017 at 21:01
\$\endgroup\$
2
  • \$\begingroup\$ Gah! I knew I should not have left those System.outs in! I do this for my own debugging (I am very old school) and I felt that it helped in the example. Consider the System.outs irrelevant. I was concerned that the throwing and catching was expensive and that there would be some other way (Apache Commons?) that would provide me with the solution I required. \$\endgroup\$ Commented Mar 31, 2017 at 21:46
  • \$\begingroup\$ Note that if your requirement is strict on not parsing Long numbers as Double you've to handle that case and not parse it as Double as you've done. \$\endgroup\$ Commented Apr 1, 2017 at 0:19

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.