I am trying to get the html code from a website from within my java project. I was able to do this however through some exploring I found out it never loads the client side of the site (not 100% sure what the difference would be), I'm not sure if the website never runs its javascript code which might add the rest of the html code that I am looking for but this is what i think is happening. Any help would be greatly appreciated!
Edit - Here it the code that ended up working for me:
public class GetHtml {
public static WebDriver driver = new FirefoxDriver();
public static String get() throws Exception {
//Connect to the website
driver.get("webiste");
// Sleep for 5 seconds so page can load
long end = System.currentTimeMillis() + 5000;
while (System.currentTimeMillis() < end) {
}
//Get userlist
List<WebElement> users=driver.findElements(By.className("userlist"));
String s = "";
for (WebElement w : users) {
s += (w.getText());
}
return s;
}
}
This opens the "website" in a firefox web browser, waits for it to load, then find this html element with the class name "userlist" and returns a string with the names of all the users currently in the userlist.
-
1Please show the code that you already have.Gregory Basior– Gregory Basior2015年02月22日 05:24:12 +00:00Commented Feb 22, 2015 at 5:24
2 Answers 2
If the content is dynamically generated with javascript or another request, one approach is to use the selenium browser automation framework: https://code.google.com/p/selenium/wiki/GettingStarted
A simple example to get all elements from a page:
WebDriver driver = new FirefoxDriver();
driver.get("http://www.example.com");
List<WebElement> el = driver.findElements(By.cssSelector("*"));
1 Comment
The jsoup library fetches a URL in its HTML scraping and parsing duties.
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();