0

I want to match some links from a web content. I know I can use file_get_contents(url) to do this in php. How about in javascript? For regular expression, like

<a href="someurl/something" id="someid">contents</a>

How can I use js regular expression to match this (match only once, do not greedy). I try to use this

/^\<a href=\"someurl\/something\" id=\"someid\"\>(+?)\<\/a\>$/

but it doesn't work. Can someone help? Thanks!

mhyfritz
8,5722 gold badges31 silver badges30 bronze badges
asked Jul 18, 2011 at 8:59
1
  • 3
    You don't want to use regular expressions for this. Your error, by the way, is a missing dot: (.+?) instead of (+?) makes the regex at least syntactically valid. (And you don't need all those backslashes except the ones before slashes) Commented Jul 18, 2011 at 9:02

5 Answers 5

4

You should know that parsing HTML with regex is not the optimal way to solve this problem, and if you have access to a live DOM of the page, you should use DOM methods instead. As in, you should use

document.getElementById('someid').innerHTML // this will return 'contents'

instead of a regex.

answered Jul 18, 2011 at 9:02
Sign up to request clarification or add additional context in comments.

Comments

3

I'd highly recommend using a library like jQuery to get the element, and then get the contents via a .text() call. It's much more simple and reliable than trying to parse HTML with regex.

answered Jul 18, 2011 at 9:02

2 Comments

Why jQuery? I keep seeing jQuery being recommended for the simplest things, that don't even have any browser quirkiness, lack of elegance or anything to need it. Personally, I'd just use the native DOM API for this.
@Delan Azabani - if this particular example is all that is being done, yes, the DOM works fine. But typically things like this are not done in isolation, and honestly, jQuery is just nice to work with.
0

DOM and jQuery suggestions are better but if you still want to use regex then try this:

/^<a href=".*?" id=".*?">(.*?)<\/a>$/
Delan Azabani
81.8k30 gold badges174 silver badges215 bronze badges
answered Jul 18, 2011 at 9:06

2 Comments

No need to eacape? like \", \<
No need to escape except the "/" character. Check out the demo here.
0

You might as well create the elements with jQuery

var elements = $(html);
var links = elements.find('a');
links.each(function(i, link){
 //Do the regexp matching in here if you wish to search for specific urls only
});

In bigger documents, using the DOM is way quicker than regexping the whole thing as text.

answered Jul 18, 2011 at 9:16

Comments

0

Try this~

try {
 boolean foundMatch = subjectString.matches("(?im)<a[^>]*href=(\"[^\"]*\"|'[^']*'|[^\\s>]*)[^>]*>.*?</a>");
} catch (PatternSyntaxException ex) {
 // Syntax error in the regular expression
}

Match double quotation marks,single quotes and empty.

<a href="someurl/something" id="someid">contents</a>
<a href='someurl/something' id='someid'>contents</a>
<a href=someurl/something id=someid>contents</a>
answered Jul 18, 2011 at 9:50

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.