I have a complex strings like the following:
$company=>Company(code->MyCompany)
$brand=>Brand(code->Brand_1, company->$company)
and then I prepare objects out of them as follows:
Macro [key=$company, type=Company, map={code=MyCompany}]
Macro [key=$brand, type=Brand, map={ company=$company, code=Brand_1}]
using the following code for parsing. Is there a better way?
public class Macro {
private String key;
private String type;
private Map<String, String> map;
}
public static Macro getMacro(String str) {
final Macro macro = new Macro();
final Map<String, String> map = new HashMap<String, String>();
final Matcher matcher = Pattern.compile("\\(([^)]+)\\)").matcher(str);
while (matcher.find()) {
final List<String> list = Arrays.asList(matcher.group(1).split(",")).stream()
.collect(Collectors.toList());
list.stream().forEach(a ->
{
final String[] ar = a.split("->");
map.put(ar[0], ar[1]);
});
}
str = str.replaceAll("\\(.*\\)", "");
final String[] s = str.split("=>");
macro.setMap(map);
macro.setKey(s[0]);
macro.setType(s[1]);
return macro;
}
3 Answers 3
Pattern.compile("\\(([^)]+)\\)");
The Pattern is always the same so you may just as well make it a private static final
constant. This will avoid having to recompile the pattern each time.
Arrays.asList(/*..*/).stream().collect(Collectors.toList());
Why this dance? Arrays.asList returns a List. You only do a stream on the return result.
final List<String> list = Arrays.asList(matcher.group(1).split(","));
list.stream().forEach(a ->
{
final String[] ar = a.split("->");
map.put(ar[0], ar[1]);
});
However a Pattern an return a Stream split according to the pattern directly:
Pattern COMMA_PATTERN = Pattern.compile("\\s*,\\s*");
COMMA_PATTERN.splitAsStream(matcher.group(3)).forEach(a ->
{
//...
}
You can use a single match of a Pattern to find both the leading values and the string inside the parenthesis:
private static final Pattern FULL_PATTERN = Pattern.compile("^(\\$\\w+)\\s*=>\\s*(\\w+)\\s*\\(([^)]+)\\)$");
private static final Pattern KEY_VALUE_PATTERN = Pattern.compile("(\\w+)\\s*->\\s*([\\$\\w]+)");
private static final Pattern COMMA_PATTERN = Pattern.compile("\\s*,\\s*");
public static Macro getMacro(String str) {
final Macro macro = new Macro();
final Map<String, String> map = new HashMap<String, String>();
final Matcher matcher = FULL_PATTERN.matcher(str);
if(!matcher.matches()) {
throw new IllegalArgumentException("...");
}
COMMA_PATTERN.splitAsStream(matcher.group(3)).forEach(a ->
{
final Matcher kvMatcher = KEY_VALUE_PATTERN.matcher(a);
if(!kvMatcher.matches()){
throw new IllegalArgumentException("...");
}
map.put(kvMatcher.group(1), kvMatcher.group(2));
});
macro.setMap(map);
macro.setKey(matcher.group(1));
macro.setType(matcher.group(2));
return macro;
}
You'll note I added \\s*
to various points in each pattern, This lets you ignore the whitespace that may surround the operators.
The \\w
in a Pattern means a word character. If you only want alphanumeric +undescore then you'llneed to replace each instance of it with [\\p{Alnum}_]
-
\$\begingroup\$ Thank you sir. When i check your Full pattern ^(\\$\\w+)\\s*=>\\s*(\\w+)\\s*\(([^)]+)\)$ to the following regex tool regex101.com, it is not matching anything. Will it be possible for you to have a look \$\endgroup\$Saurabh Kumar– Saurabh Kumar2017年08月03日 10:19:51 +00:00Commented Aug 3, 2017 at 10:19
-
\$\begingroup\$ @SaurabhKumar if I insert
^(\$\w+)\s*=>\s*(\w+)\s*\(([^)]+)\)$
into regex101 it properly matches your test strings: regex101.com/r/E7uvML/1 \$\endgroup\$ratchet freak– ratchet freak2017年08月03日 10:25:19 +00:00Commented Aug 3, 2017 at 10:25 -
\$\begingroup\$ Correct but for (\w+)\s*->\s*(\w+) regex sir it's not matching the following str for example company->$company. Its goes into IllegalArgument. \$\endgroup\$Saurabh Kumar– Saurabh Kumar2017年08月03日 10:37:13 +00:00Commented Aug 3, 2017 at 10:37
-
\$\begingroup\$ @SaurabhKumar Ah I see it was missing the
$
, should have been(\w+)\s*->\s*([\$\w]+)
fixed in the answer \$\endgroup\$ratchet freak– ratchet freak2017年08月03日 10:41:10 +00:00Commented Aug 3, 2017 at 10:41 -
\$\begingroup\$ Yup. Thank you very much for helping me out. I learnt new things about regex from you :) \$\endgroup\$Saurabh Kumar– Saurabh Kumar2017年08月03日 10:44:29 +00:00Commented Aug 3, 2017 at 10:44
Simplifications:
Arrays.asList(...).stream()
can be shorter (and faster) asArrays.stream(...)
In the same vein, the code collects the stream into a list, just to restream the list. Additionally using an
Action
(as inforEach
) to collect results is semantically questionable. The whole while-loop should be written like this:while (matcher.find()) { map = Arrays.stream(matcher.group(1).split(",")) .map(content -> content.split("->") .collect(Collectors.toMap(a -> a[0], a -> a[1])); }
Since the
Pattern
instance you have does not change, I'd extract it into a static variableInstead of making this an external static method, I'd put it into the
Macro
class. This enables calling it as follows:Macro.fromString(...)
.
Last but not least this is a pretty nicely usable place for regex as follows:
static final Pattern macroPattern =
Pattern.compile("(\\$[^=]+)=>([A-Z][^\\(]+)\\((([^-]+)->([^\\)]+)(?:\\s*,\\s*)?\\)");
Now if java had named capture groups this would suddenly become much easier (~hint, hint, nudge nudge). How to use this pattern is something left as exercise to the reader
-
1\$\begingroup\$ But Java does have named-capturing groups!
(?<key>\$.+?)=>
etc. Also, your regex does not compile. You have to use double backslashes in string literals to represent a single backslash in the regular expression. \$\endgroup\$Stingy– Stingy2017年08月03日 22:42:04 +00:00Commented Aug 3, 2017 at 22:42 -
\$\begingroup\$ thanks for catching the backslashes and the apparently needed refresher on named capture groups... \$\endgroup\$Vogel612– Vogel6122017年08月03日 23:14:27 +00:00Commented Aug 3, 2017 at 23:14
You are calling Stream.forEach(Consumer)
with a Consumer
whose action accesses shared state (namely the local and final
variable map
) without taking care of synchronization (a HashMap
is not thread-safe). This is bad. Stream.forEach(Consumer)
might look like a simple for-each loop, but it is more than that, because unlike the latter, it might process multiple stream elements simultaneously in multiple threads, so accessing shared state without synchronization is dangerous. A safer approach would have been to use forEachOrdered(Consumer)
instead of forEach(Consumer)
.
The solution proposed by Vogel612 for this problem is even more elegant, but, technically, it is not equivalent to your procedure, because your procedure doesn't create a new Map
for every iteration of while (matcher.find())
, as opposed to Vogel612's code, which assigns a new Map
to map
for every loop iteration. However, the while (matcher.find())
loop is probably a design flaw anyway, because as far as I understand your question, the input string is not supposed to contain multiple parentheses-enclosed "maps" (not to be confused with key-value-"mappings"). This can be rectified by using a single regular expression for the whole input string, as the other answers have already suggested.
-
\$\begingroup\$ FWIW making "synchronization" an issue when referring to a local variable is a separate level of overkill. You're of course correct in principle, but it doesn't matter that much, given that the stream isn't a
parallelStream
in the first place. Unless there is aparallel
in the pipeline, Java / JIT is highly unlikely if not disallowed from making the Stream parallel \$\endgroup\$Vogel612– Vogel6122017年08月03日 22:44:31 +00:00Commented Aug 3, 2017 at 22:44 -
\$\begingroup\$ @Vogel612 It is irrelevant whether the variable is local or a field. The point is that it can be accessed concurrently and therefore behave in an unexpected manner. However, you're right, the documentation of
Collection.stream()
says explicitly that it returns a sequential stream. Thank you for pointing that out, I wasn't aware of that. Still, I think it is better to point out a practice that has the potential of causing trouble, even if, in this particular case, it is unproblematic. \$\endgroup\$Stingy– Stingy2017年08月03日 23:07:59 +00:00Commented Aug 3, 2017 at 23:07