0

I am trying to get the html code from a website from within my java project. I was able to do this however through some exploring I found out it never loads the client side of the site (not 100% sure what the difference would be), I'm not sure if the website never runs its javascript code which might add the rest of the html code that I am looking for but this is what i think is happening. Any help would be greatly appreciated!

Edit - Here it the code that ended up working for me:

public class GetHtml {
 public static WebDriver driver = new FirefoxDriver();
 public static String get() throws Exception {
 //Connect to the website
 driver.get("webiste");
 // Sleep for 5 seconds so page can load
 long end = System.currentTimeMillis() + 5000;
 while (System.currentTimeMillis() < end) {
 }
 //Get userlist
 List<WebElement> users=driver.findElements(By.className("userlist"));
 String s = "";
 for (WebElement w : users) {
 s += (w.getText());
 }
 return s;
 }
}

This opens the "website" in a firefox web browser, waits for it to load, then find this html element with the class name "userlist" and returns a string with the names of all the users currently in the userlist.

Adrian Cid Almaguer
7,79113 gold badges44 silver badges66 bronze badges
asked Feb 22, 2015 at 5:16
1
  • 1
    Please show the code that you already have. Commented Feb 22, 2015 at 5:24

2 Answers 2

1

If the content is dynamically generated with javascript or another request, one approach is to use the selenium browser automation framework: https://code.google.com/p/selenium/wiki/GettingStarted

A simple example to get all elements from a page:

WebDriver driver = new FirefoxDriver();
driver.get("http://www.example.com");
List<WebElement> el = driver.findElements(By.cssSelector("*"));
answered Feb 22, 2015 at 5:35
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks this works for what I need although having it opening a browser is a bit annoying but it gets the job done, thanks!
0

The jsoup library fetches a URL in its HTML scraping and parsing duties.

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
answered Feb 22, 2015 at 5:28

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.