I'm working on a web based applcation, which loads the HTML content of an URL using the call made to http://www.whateverorigin.org/ This avoids the same origin policy violation
url = 'http://' + document.getElementById("urlText").value
$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent(url) + '&callback=?', function(data){
var doc = new DOMParser().parseFromString(data.contents, 'text/html');
If I would need to extract the meaningful visible text from this html string, is there a way that I can do this like how beautifulsoup would do in python? I'm more a beginner to javascript.
2 Answers 2
Use jQuery in order to find and iterate over the appropriate elements. Then you can decide what to print out - for example: show the text-node of visible items. Here is a jsfiddle with a working script example: http://jsfiddle.net/w147o9f6/1/
<body>
<div id="outputTexts">OUTPUT:</div>
</body>
javascript:
var parser = new DOMParser();
var doc;
var meaningfulTexts = [];
$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent('https://www.facebook.com') + '&callback=?', function(data){
doc = parser.parseFromString(data.contents, "text/html");
var ELMS = $(doc).find("div, p, a, span");
ELMS.each(function(index, element) {
if(element.style.display != "none" && $(element).text() != "") {
$("#outputTexts").append('<br>'+ element.tagName + ' - '+$(element).text());
meaningfulTexts.push( $(element).text() );
}
});
});
3 Comments
span tag). I don't know if it's a problem with my code or with google's site. Is google.com the website you intend to work with?It looks like this is what you need? The code below parses google.nl with the whateverorigin.org website and adds it to a div. If not, please try to explain what more you need!
jQuery:
$(document).ready(function() {
$.getJSON('http://whateverorigin.org/get?url=' + encodeURIComponent('http://www.google.nl') + '&callback=?', function(data){
$('.result').html(data.contents);
});
});
HTML:
<div class="result"></div>
Example: http://jsfiddle.net/qddekhnc/1/