0

Hi i at the momment try to parse some HTML news for our new fan page. Caus the company do not offer a RSS Feed.

I got a new JS File with that included

function getNews() {
 y = 0;
 news = new Array(7);
 news_content = new Array(5);
 for (var i = 0; i < news.length; i++)
 {
 var table = document.getElementById('news').contentWindow.getElementsByTagName('table')[y];
 news_content[0] = table.rows[0].cells[0].getElementsByTagName('img')[0].src;
 news_content[1] = table.rows[0].cells[1].getElementsByTagName('span')[0].innerHTML;
 news_content[2] = table.rows[0].cells[2].getElementsByTagName('span')[0].innerHTML;
 news_content[3] = table.rows[1].cells[0].getElementsByTagName('p')[0].innerHTML;
 news_content[4] = table.rows[0].cells[0].getElementsByTagName('a')[0].href;
 //alert(news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4]);
 news[i] = news_content[0] + "\n" + news_content[1] + "\n" + news_content[2] + "\n" + news_content[3] + "\n" + news_content[4] + "\n";
 y = y + 2;
 }
 alert (news[0] + "\n" + news[1] + "\n" + news[2] + "\n" + news[3] + "\n" + news[4])
}

and that html

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Unbenanntes Dokument</title>
<script src="test.js"></script>
</head>
<body>
<a href="page.html" onclick="getNews()">Hier klicken</a>
<iframe id="news" src="http://www.aerosoft-shop.com/list_news.php?cat=fs&lang=de">
</body>
</html>

At last if i pase the source code into the html file it works but is there no way to parse from a external page?

asked Oct 25, 2011 at 10:31

1 Answer 1

1

If you debug your code with a tool like Firebug, a errormessage would be returned like this: Permission denied to access property 'getElementsByTagName'

It's indeed not possible in JavaScript to access a IFrame which points to a different domain, (削除) not even subdomain of your domain (削除ここまで) (according to the comment on this answer it is possible). The question here is, if the site-owner wants you do crawl his site off or at least gave you an okay for it, because its generally not that welcomed to get crawled from other sources (traffic and maybe copyright problems).

answered Oct 25, 2011 at 11:08
Sign up to request clarification or add additional context in comments.

2 Comments

Actually accessing content from a different sub-domain (but same domain) is possible, if you add document.domain = "yourdomain.com"; in both documents.
Thanks for clarification, i edited my answer and pointed to your comment.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.