1

So I need to extract a variable from an html webpage If anyone could assist me.

Here is what webpage contains

<script>
 var id = "5010"; 
</script>

I pretty much just need to extract that value from a webpage in py. If anyone could help would be nice, sorry if this is hard to understand i'm dumb.

asked Nov 2, 2018 at 0:00

2 Answers 2

1

You can do this using urllib and regular expression searching.

import urllib.request
import re
url = "https://stackoverflow.com/questions/53111019/python-get-data-value-from-inside-script-html-tag"
response = urllib.request.urlopen(url)
html = response.read().decode('utf-8')
#print(html)
between_script_tags = re.search('<script>(.*)</script>', html)
print(between_script_tags)

URlLib extracts the HTML from the page, and then 're.search()' is finding any text in the HTML between '' and ''

However this will only get you this in plain text. E.g. in your case it will return a string of "var id = "5010";"`

You could go further to split this:

output = between_script_tags.split(" ")

This would make output a list of three things: ['var', 'id', '=', '"5010";']

From here this is quite simple to extract the data you want.

answered Nov 2, 2018 at 0:10
Sign up to request clarification or add additional context in comments.

Comments

1

I find it easy to use the python string split() function to handle this sort of thing.

EDIT: big update to handle new requirements

Something simple like:

html = """
<script>
 var id = \"5010\";
 var id2 = \"8888\";
 var idX = \"XoX\";
</script>"""
varlist = {}
vars = html.split("var ")[1:] # get each var entry
for v in vars:
 name = v.split("=")[0].strip() # first part is the var [name = "]
 value = v.split("\"")[1] # second part is the value [ = "..."]
 varlist[name] = value # store it for printing below
print("Varlist - " + str(varlist))
---------------------
OUTPUT: Varlist - {'id': '5010', 'id2': '8888', 'idX': 'XoX'}

split() returns a list of strings, broken-apart around the part you search for. The second parameter indicates the maximum number of splits. So by splitting on a string, restricting it to one split, then taking the [0] or [1] element, it's possible to pick the input apart to get the data needed.

In the above, the first split is on var. This gives a list, since the string is split wherever there was a var, so the first part of each of these entries is the var name (and we throw away the junk from the beginning).

Then the code loops for each of these splits, fetching the var name by splitting on =, getting the [0] side. Next is the var value, which is always contained in quotes, so splitting on " should give a 3-item list, the [1] element being the value of the var. These are added to a python dictionary just for the purposes of the example.

If your values aren't always in quotes, perhaps it could be split on the ; instead, etc. Any sort of guaranteed pattern can be used.

answered Nov 2, 2018 at 0:13

1 Comment

Forgot to mention but var id isnt the only thing inside the <script> there is a ocuple other variables so how could I only get var id?

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.