I have the following string
aaaa#include(soap1.xml)bbbb #include(soap2.xml)cccc #include(soap2.xml)
I want to find all occurrences of #include([anyfilename]) where [anyfilename] varies.
I have the regex (?<=#include\()(.*?)(?=\)*\)) which matches [anyfilename] but then performing a replace using this leaves behind there #include()
Can someone suggest show me how to find/replace the entire #include([anyfilename])?
1 Answer 1
You may use the following regex:
#include\(([^)]*)\)
See the regex demo
I replaced lookarounds (that are zero-width assertions and do not consume text, do not return it in the match value) with consuming equivalents.
The regex breakdown:
#include\(- match a sequence of literal symbols#include(([^)]*)- Group 1 (we'll refer to the value inside the group withmatcher.group(1)) matching zero or more characters other than)\)- match a literal)
The same pattern can be used to retrieve the filenames, and remove whole #include()s from the input.
String str = "aaaa#include(soap1.xml)bbbb#include(soap2.xml)cccc";
String p = "#include\\(([^)]*)\\)";
Pattern ptrn = Pattern.compile(p);
Matcher matcher = ptrn.matcher(str);
List<String> arr = new ArrayList<String>();
while (matcher.find()) {
arr.add(matcher.group(1)); // Get the Group 1 value, file name
}
System.out.println(arr); // => [soap1.xml, soap2.xml]
System.out.println(str.replaceAll(p, "")); // => aaaabbbbcccc
#include(soap2.xml)and#include(soap1.xml)?#include()" well, look-around mechanisms are zero-length (they are not included in match - the one you want to replace) so that behaviour is expected. What else did you expect and why?\)*\)is the same as\)+, which is clearer on intent. As for that, why match multiple close-parenthesis?#include\(.*?\)+(see regex101 for result).