I don't know anything about HTML, but I want to extract the % probability of the Northern lights being visible from this webpage http://aurorawatch.ca/. I entered this into the URL field, and selected HTML version 1.
When I inspect the text element I want and copy the Xpath I get the following:
"/html/body/div1/div[5]/table/tbody/tr1/td/div1/table[2]/tbody/tr[2]/td/div/table/tbody/tr/td[2]/span"
and enter this into the "parse string" field of the ThingSpeak API app, but get the following error:
"Error parsing document, try a different parse string."
Most people who get this error seem to be getting data that doesn't load with the original page, so I think I must be missing something.
If there is a better solution for pulling this number to my ESP8266 than creating a thingspeak API, I would love to hear it too!
-
Post your code for a better understand of questionleoc7– leoc72018年12月23日 19:14:58 +00:00Commented Dec 23, 2018 at 19:14
-
Maybe the website is blocking thingspeak as a scraper. You would want to make the ESP8266 do the GET request and then parse the data. I'd suggest using aurorawatch.ca/… as it's a) less data, and b) less likely to change.Majenko– Majenko2018年12月23日 19:18:36 +00:00Commented Dec 23, 2018 at 19:18
-
@Majenko Thanks for the quick response! That's a good suggestion, using the alternate page. I thought that was possible too, but when I leave the parser string blank, the thingspeak api request returns the full site. I know this is beyond the scope of the question, but do you have any recommendation for where to start learning how to make the GET request I want from the ESP8266?Alex Kuebel– Alex Kuebel2018年12月23日 19:24:50 +00:00Commented Dec 23, 2018 at 19:24
-
why do you select HTTP 1.0 (HTML 1?)Juraj– Juraj ♦2018年12月23日 20:06:48 +00:00Commented Dec 23, 2018 at 20:06
-
@Juraj I believe that's what version the website I'm attempting to get data from is build onAlex Kuebel– Alex Kuebel2018年12月23日 20:11:46 +00:00Commented Dec 23, 2018 at 20:11
1 Answer 1
You should use this page, since the other is the front page of the site and more likely to be dynamic:
The XPath is also slightly different. The first tbody
in the reported path doesn't actually exist. You should also add /text()
to the end of the path to just return the textual content of the node:
/html/body/div[1]/div[5]/table[2]/tr[3]/td/table/tbody/tr/td[2]/span/text()
Explore related questions
See similar questions with these tags.