My text will look like this
| birth_date = {{birth date|1925|09|2|df=y}}
| birth_place = [[Bristol]], [[England]], UK
| death_date = {{death date and age|2000|11|16|1925|09|02|df=y}}
| death_place = [[Eastbourne]], [[Sussex]], England, UK
| origin =
| instrument = [[Piano]]
| genre =
| occupation = [[Musician]]
I would like to get everything that is inside of [[ ]]. I tried to use replace all to replace everything that is not inside the [[ ]] and then use split by new line to get a list of text with [[ ]].
input = input.replaceAll("^[\\[\\[(.+)\\]\\]]", "");
Required output:
[[Bristol]]
[[England]]
[[Eastbourne]]
[[Sussex]]
[[Piano]]
[[Musician]]
But this is not giving the desired output. What am I missing here?. There are thousands of documents and is this the fastest way to get it? If no, do tell me the optimum way to get the desired output.
3 Answers 3
You need to match it not replace
Matcher m=Pattern.compile("\\[\\[\\w+\\]\\]").matcher(input);
while(m.find())
{
m.group();//result
}
answered Oct 4, 2013 at 16:22
Anirudha
32.9k8 gold badges71 silver badges90 bronze badges
Sign up to request clarification or add additional context in comments.
Comments
Use Matcher.find. For example:
import java.util.regex.*;
...
String text =
"| birth_date = {{birth date|1925|09|2|df=y}}\n" +
"| birth_place = [[Bristol]], [[England]], UK\n" +
"| death_date = {{death date and age|2000|11|16|1925|09|02|df=y}}\n" +
"| death_place = [[Eastbourne]], [[Sussex]], England, UK\n" +
"| origin = \n" +
"| instrument = [[Piano]]\n" +
"| genre = \n" +
"| occupation = [[Musician]]\n";
Pattern pattern = Pattern.compile("\\[\\[.+?\\]\\]");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
answered Oct 4, 2013 at 16:22
falsetru
371k69 gold badges770 silver badges660 bronze badges
Comments
Just for fun, using replaceAll:
String output = input.replaceAll("(?s)(\\]\\]|^).*?(\\[\\[|$)", "1ドル\n2ドル");
answered Oct 4, 2013 at 16:37
femtoRgon
33.4k7 gold badges67 silver badges90 bronze badges
Comments
lang-java
(.+)is a "greedy" quantifier that will grab as many characters as it can between[[and]], meaning that forbirth_placeyou'll get"Bristol]], [[England"as one of the matches. Adding?after.+, as in falsetru's answer, prevents this.