I can currently scrape Javascript data from a post request I sent using requests then Soup. But I only want to scrape the product plu, sku, description and brand. I am struggling to find a way in which I can just print the data I need rather then the whole script. This is the text that is printed after I extract the script using soup. I will be scraping more than one product from multiple post requests, so the chunk idea is not really suitable.
<script type="text/javascript">
var dataObject = {
platform: 'desktop',
pageType: 'basket',
orderID: '',
pageName: 'Basket',
orderTotal: '92.99',
orderCurrency: 'GBP',
currency: 'GBP',
custEmail: '',
custId: '',
items: [
{
plu: '282013',
sku: '653460',
category: 'Footwear',
description: 'Mayfly Lite Pinnacle Women's',
colour: '',
brand: 'Nike',
unitPrice: '90',
quantity: '1',
totalPrice: '90',
sale: 'false'
} ]
};
As you can see it is far too much information.
1 Answer 1
How about this:
- You assign the captured text to a new multiline string variable called "chunk"
- Make a list of keys you are looking for
Loop over each line to check if the line has a term that you want, and then print out that term:
chunk = ''' <script type="text/javascript"> var dataObject = { .........blah blah....... plu: '282013', sku: '653460', category: 'Footwear', description: 'Mayfly Lite Pinnacle Women's', colour: '', brand: 'Nike', ..... blah ....... };''' keys = ['plu', 'sku', 'description', 'brand'] for line in chunk.splitlines(): if line.split(':')[0].strip() in keys: print line.strip()
Result:
plu: '282013',
sku: '653460',
description: 'Mayfly Lite Pinnacle Women's',
brand: 'Nike',
You could obviously clean up the result using similar applications of split, strip, replace, etc.
3 Comments
keys? or the text after the colon for each key?Explore related questions
See similar questions with these tags.