I know very little about Python. But I was trying to achieve something in Extract, Transform and Load (ETL) using a small Python scrip. I get the desired result, but still want to understand this script.
from bs4 import BeautifulSoup
import urllib
import re
import string
import csv
urlHandle = urllib.urlopen("http://finance.yahoo.com/q/cp?s=^DJI")
html = urlHandle.read()
soup = BeautifulSoup(html)
table = soup.find('table', attrs = {
'id': 'yfncsumtab'
})
rows = table.findAll('tr')
a = ''
csvfile = open("F:/data/yahoofinance.csv", 'w')
for tr in rows[5: ]:
for td in tr.find_all('td', attrs = {
'class': 'yfnc_tabledata1'
}):
a += '"' + td.get_text() + '",'
a += '\n'
csvfile.write(a)
a = '
My questions are in this code, soup is an object returned from BeautifulSoup(html) function. Am I right? So in next statement I guess table is also an object, so that means we are searching for a value in the soup object using the find function and that it's returning an object?
Please correct me on my information I have understood myself in the above code...
urlHandleis a class,urllibis what? andurlopenis a static method.htmlis an object,urlhandleis a class,readis a method.soupis an object,BeautifulSoup(html)is a function.
Please give your feedback on my understanding....and correct me where am wrong with your experienced words!
4 Answers 4
urlHandleis an object,urllibis a module andurlopenis a functionhtmlis an object andreadis a methodsoupis an object andBeatifulSoup(html)is the constructor for aBeautifulSoupobject
It can be quite confusing, but in general you can keep in mind that CamelCased names are classes, which makes CamelCase() the constructor. What you import is a module, which can contain classes and/or functions.
2 Comments
To be technical, I think it's important to understand that EVERYTHING in Python is an object. So, classes are objects, functions are objects, everything is an object.
That being said, we make distinctions after that, such as "function", "class", etc.
urllib, in particular, is something we call a module.
Comments
soupis an instance of theBeautifulSoup.urlHandleis again instance ,urllibis a module andurlopenis a function belonging to this modulehtmlis object andreadis a method which is executed.
There is a way you can find out them yourself using the type() function.
Comments
In Python basically everything is an Object!
when you use import, you are including a certain module like urllib.
things like soup = BeautifulSoup(html) means that you create an instance (also an object) of a class BeautifulSoup module, that you initiate/construct passing the html object.
then things like soup.find(... are functions that use the instance of a class to do a certain job. In this case get the first HTML table that has the attribute id with the value 'yfncsumtab'. it returs a Beautifult tag/obj.