1. Home
2. Questions
3. AI Assist Labs
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

HTML comment scraping in PHP

Asked 16 years, 3 months ago

Viewed 3k times

Part of PHP Collective

I've been looking around but have yet to find a solution. I'm trying to scrape an HTML document and get the text between two comments however have been unable to do this successfully so far.

I'm using PHP and have tried the PHP Simple DOM parser recommended here many times but can't seem to get it to do what I want.

Here's (part of) the page that I wish to parse:

<div class="class">
 <!-- blah -->
 text
 <!-- end blah -->
 Text I want
 <!-- blah -->
 text
 <!-- end blah -->
</div>

Thanks

Improve this question

edited Dec 28, 2009 at 16:22

Charles Stewart's user avatar

Charles Stewart

11.8k4 gold badges49 silver badges86 bronze badges

asked Aug 26, 2009 at 5:55

Pep's user avatar

Pep

1451 gold badge2 silver badges4 bronze badges

Could you show us your current code?

Randell
– Randell

2009年08月26日 06:01:39 +00:00
Commented Aug 26, 2009 at 6:01

Add a comment |

2 Answers 2

Sorted by: Reset to default

Assuming that each comment is different (i.e. "blah" is not the same in the first and second sections), you can use some simple strpos to grab everything between them. Regular expressions are not necessary.

$startStr = '<!-- end blah1 -->';
$endStr = '<!-- start blah2 -->';
$startPos = strpos($HTML, $startStr) + strlen($startStr);
$endPos = strpos($HTML, $endStr );
$textYouWant = substr($HTML, $startPos, $endPos-$startPos);

If the two sets of comments are the same, you'll need to modify this to find the second "blah", using strpos's offset parameter

Improve this answer

answered Aug 26, 2009 at 12:00

DisgruntledGoat's user avatar

DisgruntledGoat

73k70 gold badges214 silver badges295 bronze badges

Comments

Maybe you can use regular expressions?

$text = '
<div class="class">
 <!-- blah -->
 text
 <!-- end blah -->
 Text I want
 <!-- blah -->
 text
 <!-- end blah -->
</div>
';
$regex = '/(<!-- end blah -->)(.*?)(<!-- blah -->)/ims';
$match = preg_match_all ($regex, $text, $matches);

Improve this answer

answered Aug 26, 2009 at 6:14

Deniss Kozlovs's user avatar

Deniss Kozlovs

4,8392 gold badges30 silver badges36 bronze badges

2 Comments

DisgruntledGoat

DisgruntledGoat Over a year ago

Obligatory "now you have two problems" comment ;)

2009年08月26日T12:01:01.243Z+00:00

Jon Winstanley

Jon Winstanley Over a year ago

"Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins".

2010年07月10日T10:45:32.693Z+00:00

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

default

CollectivesTM on Stack Overflow

HTML comment scraping in PHP

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related