1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

How to get HTML code from a website in java program?

Asked 10 years, 10 months ago

Viewed 2k times

I am trying to get the html code from a website from within my java project. I was able to do this however through some exploring I found out it never loads the client side of the site (not 100% sure what the difference would be), I'm not sure if the website never runs its javascript code which might add the rest of the html code that I am looking for but this is what i think is happening. Any help would be greatly appreciated!

Edit - Here it the code that ended up working for me:

public class GetHtml {
 public static WebDriver driver = new FirefoxDriver();
 public static String get() throws Exception {
 //Connect to the website
 driver.get("webiste");
 // Sleep for 5 seconds so page can load
 long end = System.currentTimeMillis() + 5000;
 while (System.currentTimeMillis() < end) {
 }
 //Get userlist
 List<WebElement> users=driver.findElements(By.className("userlist"));
 String s = "";
 for (WebElement w : users) {
 s += (w.getText());
 }
 return s;
 }
}

This opens the "website" in a firefox web browser, waits for it to load, then find this html element with the class name "userlist" and returns a string with the names of all the users currently in the userlist.

Improve this question

edited Mar 16, 2015 at 15:05

Adrian Cid Almaguer's user avatar

Adrian Cid Almaguer

7,79113 gold badges44 silver badges66 bronze badges

asked Feb 22, 2015 at 5:16

drewpel's user avatar

drewpel

4751 gold badge3 silver badges17 bronze badges

1

Please show the code that you already have.

Gregory Basior
– Gregory Basior

2015年02月22日 05:24:12 +00:00
Commented Feb 22, 2015 at 5:24

Add a comment |

2 Answers 2

Sorted by: Reset to default

If the content is dynamically generated with javascript or another request, one approach is to use the selenium browser automation framework: https://code.google.com/p/selenium/wiki/GettingStarted

A simple example to get all elements from a page:

WebDriver driver = new FirefoxDriver();
driver.get("http://www.example.com");
List<WebElement> el = driver.findElements(By.cssSelector("*"));

Improve this answer

answered Feb 22, 2015 at 5:35

ashatte's user avatar

ashatte

5,5388 gold badges41 silver badges51 bronze badges

1 Comment

drewpel

drewpel Over a year ago

Thanks this works for what I need although having it opening a browser is a bit annoying but it gets the job done, thanks!

2015年02月22日T17:59:19.103Z+00:00

The jsoup library fetches a URL in its HTML scraping and parsing duties.

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();

Improve this answer

answered Feb 22, 2015 at 5:28

Adrian Cid Almaguer's user avatar

Adrian Cid Almaguer

7,79113 gold badges44 silver badges66 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

default

CollectivesTM on Stack Overflow

How to get HTML code from a website in java program?

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related