0

I don't have mach experience with html so I hope to use the right terminology to explain myself.

I have the following html line ..

 <script type="text/javascript">
 var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name": "C0nw0nk Steam Patcher.exe","first_seen":
 "2355-02-21 00:00:00,183", "calls": [{"category": "system",
 "timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
 {"category": "process", "timestamp": "2015-02-21 18:59:49,584",
 "api": "ExitProcess"}]}];
 </script>

this node is nested within few nodes with the following pattern:

<div class="tab-content">

How can i inject graph_raw_data into python variable - something slimier to dictionary varibale type.

Basically I need to iterate thorough all the nodes and find the desire one ? how can I do it in python ?

I take the html data with this python code:

f = urllib2.urlopen(url)
page_data = f.read()
soup = BeautifulSoup(page_data)
asked Feb 22, 2015 at 12:48
2
  • 1
    HTML is just markup, it does not have any variables :-( Commented Feb 22, 2015 at 12:49
  • graph_raw_data is a JavaScript object, nicely decorated as JSON. Use Python to parse the JSON. How that works, I don't know, but JSON is there to allow us to send data between languages. Commented Feb 22, 2015 at 12:53

1 Answer 1

1

Use regex to extract the string which contains the variable, then use json.loads to convert it into python variable.

import json
import re
html="""<script type="text/javascript">
 var graph_raw_data = [{"parent_id": 844, "process_id": 236, "process_name": "C0nw0nk Steam Patcher.exe","first_seen":
 "2355-02-21 00:00:00,183", "calls": [{"category": "system",
 "timestamp": "2355-02-21 00:00:00,193", "api": "LdrGetDllHandle"},
 {"category": "process", "timestamp": "2015-02-21 18:59:49,584",
 "api": "ExitProcess"}]}];
 </script>"""
graph_raw_data=re.search(r'var graph_raw_data = (.*?);',html.replace('\n','')).group(1)
data=json.loads(graph_raw_data)
print(data)
>>>[{'parent_id': 844, 'calls': [{'timestamp': '2355-02-21 00:00:00,193', 'category': 'system', 'api': 'LdrGetDllHandle'}, {'timestamp': '2015-02-21 18:59:49,584', 'category': 'process', 'api': 'ExitProcess'}], 'process_name': 'C0nw0nk Steam Patcher.exe', 'first_seen': '2355-02-21 00:00:00,183', 'process_id': 236}]
answered Feb 22, 2015 at 13:02
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.