I created a method to test whether the specified input is a double
, int
or String
and then pass it to a 3rd party library which has the methods doFoo
which accepts either a String
, double
or int
. This is the method that I would like feedback on:
public static void testString(String val) {
System.out.print("Original '" + val + "' ");
String x = val.trim();
try {
int i = Integer.parseInt(x);
System.out.println("It's an integer: " + i);
doFoo(i);
} catch (NumberFormatException e) {
try {
double d = Double.parseDouble(x);
System.out.println("It's a double: " + d);
doFoo(d);
} catch (NumberFormatException e2) {
System.out.println("It's a String: " + x);
doFoo(x);
}
}
}
Is this good code? Could it be improved? I don't like the throwing and catching of Exception
s.
Some test code to prove it works:
testString("N/A");
testString("19.");
testString("19.0");
testString("19.4");
testString(" 1 ");
testString(" 1");
testString("1 ");
testString("1");
testString(" ");
Results in:
Original 'N/A' It's a String: N/A
Original '19.' It's a double: 19.0
Original '19.0' It's a double: 19.0
Original '19.4' It's a double: 19.4
Original ' 1 ' It's an integer: 1
Original ' 1' It's an integer: 1
Original '1 ' It's an integer: 1
Original '1' It's an integer: 1
Original ' ' It's a String:
3 Answers 3
I don't like the throwing and catching of Exceptions
This can be made much cleaner with the use of a Scanner
. It might not be the most performant way, but it's fast and easy to use.
try (Scanner scanner = new Scanner(x)) {
if (scanner.hasNextInt()) doFoo(scanner.nextInt());
else if (scanner.hasNextDouble()) doFoo(scanner.nextDouble());
else doFoo(x);
}
However, if this is going to be called hundreds of thousands of times, the try catch
method might be faster, though you should encapsulate those into their own functions. You'd need to profile to be sure which is faster, but I believe it would be this because Scanner.hasNextFoo
uses regular expressions:
public static boolean isInteger(String str) {
try {
Integer.parse(str);
return true;
} catch (NumberFormatException e) {
return false;
}
}
Also, your function is doing multiple things: printing/reporting, and parsing/forwarding to doFoo
. This is not a good thing. I'd recommend removing those and handling them where it's more appropriate:
public static void testString(String val) {
String x = val.trim();
try (Scanner scanner = new Scanner(x)) {
if (scanner.hasNextInt()) doFoo(scanner.nextInt());
else if (scanner.hasNextDouble()) doFoo(scanner.nextDouble());
else doFoo(x);
}
}
That was much shorter. Now if you wanted the same functionality, it would look like so:
public static void testTestString(String val) {
System.out.print("Original '" + val + "' ");
testString(val);
}
// ...
public static void doFoo(int i) {
System.out.println("It's an integer: " + i);
// ...
}
If you want your code to be extremely extensible, there is another way. Notice how the new function I suggested still does multiple things:
- It detects the type of the string
- It parses the value from the string
- It forwards the value on to another function
We can separate these into their own components.
This is only really worth it if you can foresee adding types to be a common feature, but especially if the "another function" you forward to should be selectable by the user (say, if you packaged these functions as member functions of an object):
// Class is the easiest type we can return
private static Class<?> determineType(String val) {
try (Scanner scanner = new Scanner(val)) {
if (scanner.hasNextInt()) return Integer.class;
if (scanner.hasNextDouble()) return Double.class;
return String.class;
}
}
private static final Map<Class<?>, Function<String, ?>> parsers = new IdentityHashMap<>();
private static final Map<Class<?>, Consumer<Object>> functionSwitch = new IdentityHashMap<>();
static {
parsers.put(Integer.class, Integer::parseInt);
parsers.put(Double.class, Double::parseDouble);
parsers.put(String.class, Function.identity());
// Note that, due to limitations in the type system,
// i is of type Object, so we need to cast it to the appropriate
// class before forwarding on to the function.
functionSwitch.put(Integer.class, i -> doFoo((Integer) i));
functionSwitch.put(Double.class, d -> doFoo((Double) d));
functionSwitch.put(String.class, str -> doFoo((String) str));
}
public static void testString(String val) {
val = val.trim(); // This could even be part of the parser's responsibility
Class<?> stringType = determineType(val);
Function<String, ?> parser = parsers.get(stringType);
functionSwitch.get(stringType).accept(parser.apply(val));
}
-
\$\begingroup\$ Very nice answer. Unfortunately I should have said that performance is key here as my method may be called thousands of times. I was worried about the try / catch being un-performant. However, thank you for the detailed answer - I've found it illuminating \$\endgroup\$Phil– Phil2017年03月31日 21:45:15 +00:00Commented Mar 31, 2017 at 21:45
-
2\$\begingroup\$ @Phil Even for thousands of time,
Scanner
is quite fast. Assuming it is used for input (which is already slow anyway),Scanner
is not what is going to be taking up all your time. But thetry {} catch
way is about 15x faster \$\endgroup\$Justin– Justin2017年03月31日 23:42:06 +00:00Commented Mar 31, 2017 at 23:42
Background
This question was brought to my attention in The 2nd Monitor chat room because in the past I have claimed that using exception handling to handle parse exceptions is "a bad idea and slow". This is exactly what your code is doing, and it's a bad idea, and slow.... at least, that's what I thought, until I benchmarked your code.
Now, in the past, I wrote a CSV parser and it used a similar system to yours to handle the values in a field, and I discovered that I got a significant speed-up (like 100% faster) when I prevalidated the values to an large extent, before doing a parseInt
or parseDouble
on them. I found that it is much better to "identify" a value of a certain type to a high degree of confidence, and thus reduce the number of exceptions thrown.
In your code, if the values are 1/3 integers, 1/3 double, and 1/3 string, then on average you are creating 1 exception for each value (none for ints, 1 for doubles, and 2 for strings). Worst case, if all your values are strings, you'll create 2 exceptions per value.
What if you could (almost) guarantee that all your parseInt
and parseDouble
calls will succeed, and you'll have (almost) no exceptions? Is the work to check the value "worth it"?
My claim is yes, it's worth it.
So, I have tried to prove it, and ... the results are interesting.
I used my MicroBench performance system to run the benchmark, and I built a dummy "load" for the doFoo
function. Let's look at my test-rig:
public class ParseVal {
private final LongAdder intsums = new LongAdder();
private final DoubleAdder doubsums = new DoubleAdder();
private final LongAdder stringsums = new LongAdder();
private final void doFoo(int val) {
intsums.add(val);
}
private final void doFoo(double val) {
doubsums.add(val);
}
private final void doFoo(String val) {
stringsums.add(val.length());
}
@Override
public String toString() {
return String.format("IntSum %d - DoubleSum %.9f - StringLen %d", intsums.longValue(), doubsums.doubleValue(), stringsums.longValue());
}
public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
ParseVal pv = new ParseVal();
for (String v : data) {
fn.accept(pv, v);
}
return pv.toString();
}
public static final String[] testData(int count) {
String[] data = new String[count];
Random rand = new Random(count);
for (int i = 0; i < count; i++) {
String base = String.valueOf(1000000000 - rand.nextInt(2000000000));
switch(i % 3) {
case 0:
break;
case 1:
base += "." + rand.nextInt(10000);
break;
case 2:
base += "foobar";
break;
}
data[i] = base;
}
return data;
}
.......
public void testStringOP(String val) {
String x = val.trim();
try {
int i = Integer.parseInt(x);
doFoo(i);
} catch (NumberFormatException e) {
try {
double d = Double.parseDouble(x);
doFoo(d);
} catch (NumberFormatException e2) {
doFoo(x);
}
}
}
public static void main(String[] args) {
String[] data = testData(1000);
String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
System.out.println(expect);
....
}
}
The doFoo
methods have an accumulator mechanism (adding up ints, doubles, and the string lengths) and making the results available in a toString
method.
Also, I have put your function in there as testStringOP
.
There is a testData
function which builds an array if input strings where there are approximately equal numbers of int, double, and string values.
Finally, the benchmark function:
public static final String testFunction(BiConsumer<ParseVal, String> fn, String[] data) {
ParseVal pv = new ParseVal();
for (String v : data) {
fn.accept(pv, v);
}
return pv.toString();
}
That function takes an input function and the test data as an argument, and returns the String summary as a result. You would use this function like it's used in the main method....
String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
which runs the testStringOP
function on all the input data
values, and returns the accumulated string results.
What's nice is that I can now create other functions to test performance, for example testStringMyFn
and call:
String myresult = testFunction((pv, v) -> pv.testStringMyFn(v), data);
This is the basic tool I can use for the MicroBench system: https://github.com/rolfl/MicroBench
Scanner option
Let's start by comparing your function to the Scanner
type system recommended in another answer... Here's the code I used for the Scanner
:
public void testStringScanner(String val) {
val = val.trim();
try (Scanner scanner = new Scanner(val)) {
if (scanner.hasNextInt()) {
doFoo(scanner.nextInt());
} else if (scanner.hasNextDouble()) {
doFoo(scanner.nextDouble());
} else {
doFoo(val);
}
}
}
and here's how I benchmarked that code:
public static void main(String[] args) {
String[] data = testData(1000);
String expect = testFunction((pv, v) -> pv.testStringOP(v), data);
System.out.println(expect);
UBench bench = new UBench("IntDoubleString Parser")
.addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
.addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
bench.press(10).report("Warmup");
bench.press(100).report("Final");
}
That runs the benchmark on both your function, and the Scanner function, and does a warmup run (to get JIT optimzations done), and a "Final" run to get real results.... what are the results, you ask?
Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS) Count : 100 Average : 1.6914 Fastest : 1.5331 Slowest : 3.2561 95Pctile : 2.0277 99Pctile : 3.2561 TimeBlock : 1.794 2.037 1.674 1.654 1.674 1.588 1.665 1.588 1.634 1.606 Histogram : 99 1 Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS) Count : 100 Average : 69.9713 Fastest : 67.2338 Slowest : 98.4322 95Pctile : 73.8073 99Pctile : 98.4322 TimeBlock : 77.028 70.050 69.325 69.860 69.094 68.498 68.547 68.779 69.586 68.945 Histogram : 100
What does that mean? It means, on average, your code is 40-times faster than the Scanner. Your code runs in 1.7Milliseconds to process 1000 input values, and the scanner runs in 70 milliseconds.
So, a Scanner
is a bad idea if performance is required, right? I agree.
Alternative
But, what about a RegEx pre-validation check? Note that the regex will not guarantee a clean parse, but it can go a long way. For example, the regex [+-]?\d+
will match any integer, right, but is -999999999999999999999
a valid integer? No, it's too big. But, it is a valid double. We will still need to have a try/catch block even if we pass the regex prevalidation. That's going to eliminate almost all exceptions, though....
So, what do we do to prevalidate things? Well, the Double.valueOf(String)
function documents a regex for matching double values in Strings. It's complicated, and I made a few modifications because we don't have already trimmed our inputs, but here's a couple of patterns for prevalidating double values, and integer values:
private static final String Digits = "(\\p{Digit}+)";
private static final String HexDigits = "(\\p{XDigit}+)";
private static final String Exp = "[eE][+-]?"+Digits;
private static final String fpRegex =
( //"[\\x00-\\x20]*"+ // Optional leading "whitespace"
"[+-]?(" + // Optional sign character
"NaN|" + // "NaN" string
"Infinity|" + // "Infinity" string
"((("+Digits+"(\\.)?("+Digits+"?)("+Exp+")?)|"+
"(\\.("+Digits+")("+Exp+")?)|"+
"((" +
"(0[xX]" + HexDigits + "(\\.)?)|" +
"(0[xX]" + HexDigits + "?(\\.)" + HexDigits + ")" +
")[pP][+-]?" + Digits + "))" +
"[fFdD]?))"); // +
//"[\\x00-\\x20]*");// Optional trailing "whitespace"
Pattern isDouble = Pattern.compile(fpRegex);
Pattern isInteger = Pattern.compile("[+-]?[0-9]+");
We can use those functions to build the code:
public void testStringRegex(String val) {
String x = val.trim();
if (isInteger.matcher(x).matches()) {
try {
doFoo(Integer.parseInt(x));
} catch (NumberFormatException nfe) {
try {
doFoo(Double.parseDouble(x));
} catch (NumberFormatException e) {
doFoo(x);
}
}
} else if (isDouble.matcher(x).matches()) {
try {
doFoo(Double.parseDouble(x));
} catch (NumberFormatException e) {
doFoo(x);
}
} else {
doFoo(x);
}
}
Now, that's pretty complicated, right? Well, it does a "quick" integer regex check, and if it's likely an integer, it tries to parse it as an integer, and fails over to a double, and then to a string....
If it's not likely an integer, it checks if it's a double, and so on.....
How can this code be faster, you ask? Well, we're almost certainly having clean parses when we do them, and we'll have almost no exceptions... But, is it actually faster?
Here are the results:
Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS) Count : 100 Average : 1.6689 Fastest : 1.5580 Slowest : 2.1572 95Pctile : 1.8012 99Pctile : 2.1572 TimeBlock : 1.695 1.752 1.709 1.670 1.641 1.648 1.643 1.639 1.662 1.630 Histogram : 100 Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS) Count : 100 Average : 1.9580 Fastest : 1.8379 Slowest : 2.5713 95Pctile : 2.1004 99Pctile : 2.5713 TimeBlock : 1.978 2.022 1.949 1.966 2.020 1.933 1.890 1.940 1.955 1.928 Histogram : 100 Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS) Count : 100 Average : 69.8886 Fastest : 67.1848 Slowest : 77.2769 95Pctile : 71.9153 99Pctile : 77.2769 TimeBlock : 70.940 69.735 69.879 69.381 69.579 69.180 69.611 70.412 70.123 70.045 Histogram : 100
If you look, you'll see the regex version is Slower than the exception version... it runs in 1.95ms but the exception version runs in 1.67ms
Exceptions
But, there's a catch. In these tests, the stack trace for the exceptions is really small... and the "cost" of an exception depends on the depth of the trace, so let's increase the stack depths for the regex and exception code. Well add a recursive function to simulate a deeper stack:
public void testStringDeepOP(String val, int depth) {
if (depth <= 0) {
testStringOP(val);
} else {
testStringDeepOP(val, depth - 1);
}
}
public void testStringDeepRegex(String val, int depth) {
if (depth <= 0) {
testStringRegex(val);
} else {
testStringDeepRegex(val, depth - 1);
}
}
and we will test the OP and Regex code a different "depths" of nesting, 5, 10, and 20 layers deep. The benchmark code is:
UBench bench = new UBench("IntDoubleString Parser")
.addTask("OP", () -> testFunction((pv, v) -> pv.testStringOP(v), data), s -> expect.equals(s))
.addTask("OP D5", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 5), data), s -> expect.equals(s))
.addTask("OP D10", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 10), data), s -> expect.equals(s))
.addTask("OP D20", () -> testFunction((pv, v) -> pv.testStringDeepOP(v, 20), data), s -> expect.equals(s))
.addTask("Regex", () -> testFunction((pv, v) -> pv.testStringRegex(v), data), s -> expect.equals(s))
.addTask("Regex D5", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 5), data), s -> expect.equals(s))
.addTask("Regex D10", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 10), data), s -> expect.equals(s))
.addTask("Regex D20", () -> testFunction((pv, v) -> pv.testStringDeepRegex(v, 20), data), s -> expect.equals(s))
.addTask("Scanner", () -> testFunction((pv, v) -> pv.testStringScanner(v), data), s -> expect.equals(s));
bench.press(10).report("Warmup");
bench.press(100).report("Final");
What are the results?
Final ===== Task IntDoubleString Parser -> OP: (Unit: MILLISECONDS) Count : 100 Average : 1.7005 Fastest : 1.5260 Slowest : 3.9813 95Pctile : 1.9346 99Pctile : 3.9813 TimeBlock : 1.682 1.624 1.612 1.675 1.708 1.658 1.727 1.738 1.672 1.910 Histogram : 99 1 Task IntDoubleString Parser -> OP D5: (Unit: MILLISECONDS) Count : 100 Average : 1.9288 Fastest : 1.7325 Slowest : 4.9673 95Pctile : 2.0897 99Pctile : 4.9673 TimeBlock : 2.124 1.812 1.828 1.873 1.925 1.877 1.855 1.869 1.903 2.221 Histogram : 98 2 Task IntDoubleString Parser -> OP D10: (Unit: MILLISECONDS) Count : 100 Average : 2.2271 Fastest : 2.0171 Slowest : 4.7395 95Pctile : 2.4904 99Pctile : 4.7395 TimeBlock : 2.392 2.125 2.129 2.152 2.246 2.169 2.189 2.203 2.247 2.420 Histogram : 98 2 Task IntDoubleString Parser -> OP D20: (Unit: MILLISECONDS) Count : 100 Average : 2.9278 Fastest : 2.6838 Slowest : 6.3169 95Pctile : 3.2415 99Pctile : 6.3169 TimeBlock : 2.870 2.822 2.860 2.794 2.956 2.861 3.041 3.012 2.853 3.211 Histogram : 99 1 Task IntDoubleString Parser -> Regex: (Unit: MILLISECONDS) Count : 100 Average : 2.0739 Fastest : 1.9338 Slowest : 3.8368 95Pctile : 2.2744 99Pctile : 3.8368 TimeBlock : 2.229 2.083 2.034 2.013 2.021 2.004 2.013 2.096 2.059 2.186 Histogram : 100 Task IntDoubleString Parser -> Regex D5: (Unit: MILLISECONDS) Count : 100 Average : 2.0565 Fastest : 1.9377 Slowest : 3.2857 95Pctile : 2.2646 99Pctile : 3.2857 TimeBlock : 2.148 2.075 2.035 2.038 2.035 2.031 2.026 2.000 2.032 2.145 Histogram : 100 Task IntDoubleString Parser -> Regex D10: (Unit: MILLISECONDS) Count : 100 Average : 2.0647 Fastest : 1.9598 Slowest : 2.6360 95Pctile : 2.2906 99Pctile : 2.6360 TimeBlock : 2.073 2.094 2.051 2.048 2.072 2.029 2.057 2.124 2.057 2.042 Histogram : 100 Task IntDoubleString Parser -> Regex D20: (Unit: MILLISECONDS) Count : 100 Average : 2.0891 Fastest : 1.9930 Slowest : 2.6483 95Pctile : 2.2587 99Pctile : 2.6483 TimeBlock : 2.108 2.070 2.078 2.066 2.071 2.091 2.048 2.090 2.137 2.132 Histogram : 100 Task IntDoubleString Parser -> Scanner: (Unit: MILLISECONDS) Count : 100 Average : 71.7199 Fastest : 67.9621 Slowest : 152.0714 95Pctile : 75.2141 99Pctile : 152.0714 TimeBlock : 71.006 69.896 70.160 69.734 70.824 69.854 71.473 71.888 73.607 78.756 Histogram : 99 1
Here it is expressed as a table (using the average times):
0 5 10 20 OP 1.7005 1.9288 2.2271 2.9278 RegEx 2.0739 2.0565 2.0647 2.0891
Conclusion
So, that's the real problem with exceptions, the performance is unpredictable... and, for example, if you run it inside a Tomcat container, with stacks hundreds of levels deep, you may find this completely destroys your performance.
-
\$\begingroup\$ Great detailed answer - its certainly taught me a lot. Thanks very much for spending your time to add this. \$\endgroup\$Phil– Phil2017年04月02日 20:20:49 +00:00Commented Apr 2, 2017 at 20:20
Is this good code?
Yes, except for using System.out.println
statements for logging.
Could it be improved? I don't like the throwing and catching of Exceptions.
There isn't much to improve other than logging and some design improvements @Justin suggested. Your code is better performance wise than using Scanner methods. The Scanner methods have some overhead trying to validate the input and ultimately calls respective parse methods. If you were to use above code in a high performance application than no further improvements needed.
-
\$\begingroup\$ Gah! I knew I should not have left those System.outs in! I do this for my own debugging (I am very old school) and I felt that it helped in the example. Consider the System.outs irrelevant. I was concerned that the throwing and catching was expensive and that there would be some other way (Apache Commons?) that would provide me with the solution I required. \$\endgroup\$Phil– Phil2017年03月31日 21:46:41 +00:00Commented Mar 31, 2017 at 21:46
-
\$\begingroup\$ Note that if your requirement is strict on not parsing Long numbers as Double you've to handle that case and not parse it as Double as you've done. \$\endgroup\$VinPro– VinPro2017年04月01日 00:19:09 +00:00Commented Apr 1, 2017 at 0:19
doFoo
is really doing? \$\endgroup\$doFoo
to call? This is unlike a language like Python. \$\endgroup\$