how to use Pattern matcher in java?

Question 1

lets say the string is <title>xyz</title> I want to extract the xyz out of the string. I used:

Pattern titlePattern = Pattern.compile("&lttitle&gt\\s*(.+?)\\s*&lt/title&gt");
Matcher titleMatcher = titlePattern.matcher(line);
String title=titleMatcher.group(1));

but I am getting an error for titlePattern.matcher(line);

Question 2

You need to tell people what the problem is if you expect us to debug it.

Question 3

Somebody warn this guy about re + html!

Question 4

You say your error occurs earlier (what is the actual error, runs without an error for me), but after solving that you will need to call find() on the matcher once to actually search for the pattern:

if(titleMatcher.find()){
 String title = titleMatcher.group(1);
}

Not that if you really match against a string with non-escaped HTML entities like

<title>xyz</title>

Then your regular expression will have to use these, not the escaped entities:

"<title>\\s*(.+?)\\s*</title>"

Also, you should be careful about how far you try to get with this, as you can't really parse HTML or XML with regular expressions. If you are working with XML, it's much easier to use an XML parser, e.g. JDOM.

Question 5

yeah i cant seem to find it. is this line wrong Pattern titlePattern = Pattern.compile("&lttitle&gt\\s*(.+?)\\s*&lt/title&gt");

Question 6

Are you getting an exception or are you just not getting the correct result?

Question 7

it seems titleMatcher.find() always evaluates to false even though the string is <title>xyz</title>. So my only concern is that this part is wrong &lttitle&gt\\s*(.+?)\\s*&lt/title&gt

Question 8

Yes, see my addition to the answer. I tried it with the regular expression I gave in the end, using the string above of that and it works for me.

Question 9

Not technically an answer but you shouldn't be using regular expressions to parse HTML. You can try and you can get away with it for simple tasks but HTML can get ugly. There are a number of Java libraries that can parse HTML/XML just fine. If you're going to be working a lot with HTML/XML it would be worth your time to learn them.

Question 10

As others have suggested, it's probably not a good idea to parse HTML/XML with regex. You can parse XML Documents with the standard java API, but I don't recommend it. As Fabian Steeg already answered, it's probably better to use JDOM or a similar open source library for parsing XML.

With javax.xml.parsers you can do the following:

String xml = "<title>abc</title>";
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new InputSource(new StringReader(xml)));
NodeList nodeList = doc.getElementsByTagName("title");
String title = nodeList.item(0).getTextContent();

This parses your XML string into a Document object which you can use for further lookups. The API is kinda horrible though.

Another way is to use XPath for the lookup:

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xPath = xpathFactory.newXPath();
String titleByXpath = xPath.evaluate("/title/text()", new InputSource(new StringReader(xml)));
// or use the Document for lookup
String titleFromDomByXpath = xPath.evaluate("/title/text()", doc);

Fabian Steeg 45.9k7 gold badges88 silver badges113 bronze badges · Accepted Answer · 2009-12-31 16:35:31Z

You say your error occurs earlier (what is the actual error, runs without an error for me), but after solving that you will need to call find() on the matcher once to actually search for the pattern:

if(titleMatcher.find()){
 String title = titleMatcher.group(1);
}

Not that if you really match against a string with non-escaped HTML entities like

<title>xyz</title>

Then your regular expression will have to use these, not the escaped entities:

"<title>\\s*(.+?)\\s*</title>"

Also, you should be careful about how far you try to get with this, as you can't really parse HTML or XML with regular expressions. If you are working with XML, it's much easier to use an XML parser, e.g. JDOM.

yeah i cant seem to find it. is this line wrong Pattern titlePattern = Pattern.compile("&lttitle&gt\\s*(.+?)\\s*&lt/title&gt");
Are you getting an exception or are you just not getting the correct result?
it seems titleMatcher.find() always evaluates to false even though the string is <title>xyz</title>. So my only concern is that this part is wrong &lttitle&gt\\s*(.+?)\\s*&lt/title&gt
Yes, see my addition to the answer. I tried it with the regular expression I gave in the end, using the string above of that and it works for me.

CollectivesTM on Stack Overflow

how to use Pattern matcher in java?

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related