How to parse JavaScript in webpage?

Asked 10 years, 5 months ago

Viewed 146 times

I am trying to parse one webpage by using Python 2.7 and I want to read entire HTML code. But result is like this ...

<html><head><script type="text/javascript">
location.replace( "http://captcha.search.daum.net/captcha/show?url=http%3A%2F%2Fsearch.daum.net%2Fsearch%3Fw%3Dnews%26nil_search%3Dbtn%26DA%3DNTB%26enc%3Dutf8%26cluster%3Dy%26cluster_page%3D1%26q%3D%25EB%25B3%25B4%25EA%25B3%25A0%25EC%2584%259C" );
</script>
</head></html>

I think this webpage is using JavaScript. How can I parse entire HTML code contained in JavaScript?

My python code is this ...

#-*- coding: utf-8 -*-
import urllib2
from bs4 import BeautifulSoup
url = "http://search.daum.net/search?w=news&nil_search=btn&DA=NTB&enc=utf8&cluster=y&cluster_page=1&q=%EB%B3%B4%EA%B3%A0%EC%84%9C"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
print soup

Improve this question

edited Nov 15, 2015 at 12:54

Tom Zych's user avatar

Tom Zych

13.7k9 gold badges38 silver badges55 bronze badges

asked Aug 8, 2015 at 6:42

Suk-ju Lee's user avatar

Suk-ju Lee

112 bronze badges

1

The page is sending you javascript which redirects to a CAPTCHA. This means is trying to prevent you from reading the website.

Thom Wiggers
– Thom Wiggers

2015年08月08日 06:49:11 +00:00
Commented Aug 8, 2015 at 6:49
1

I've rolled back your question because replacing the body by an image doesn't really help.

Thom Wiggers
– Thom Wiggers

2015年08月08日 06:53:29 +00:00
Commented Aug 8, 2015 at 6:53

Add a comment |

1 Answer 1

Sorted by: Reset to default

It seems some headers are required for this page to be shown properly.

Try adding page headers from your request to your soup command, sending the same parameters as your browser send to get the result u see in the browser

Improve this answer

answered Aug 8, 2015 at 6:57

alizelzele's user avatar

alizelzele

9233 gold badges19 silver badges34 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

default

CollectivesTM on Stack Overflow

How to parse JavaScript in webpage?

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related