Web crawler that charts stock ticker data using matplotlib

Question 1

I've built a web crawler using the BeautifulSoup library that pulls stock ticker data from CSV files on Yahoo finance, and charts the data using matplotlib. I'm wondering if there are any ways to improve the code I've written, because there are some parts that I think could be a lot better.

import urllib.request
from matplotlib import pyplot as plt
from bs4 import BeautifulSoup
import requests
def chartStocks(*tickers):
 # Run loop for each ticker passed in as an argument
 for ticker in tickers:
 # Convert URL into text for parsing
 url = "http://finance.yahoo.com/q/hp?s=" + str(ticker) + "+Historical+Prices"
 sourceCode = requests.get(url)
 plainText = sourceCode.text
 soup = BeautifulSoup(plainText, "html.parser")
 # Find all links on the page
 for link in soup.findAll('a'):
 href = link.get('href')
 link = []
 for c in href[:48]:
 link.append(c)
 link = ''.join(link)
 # Find the URL for the stock ticker CSV file and convert the data to text
 if link == "http://real-chart.finance.yahoo.com/table.csv?s=":
 csv_url = href
 res = urllib.request.urlopen(csv_url)
 csv = res.read()
 csv_str = str(csv)
 # Parse the CSV to create a list of data points
 point = []
 points = []
 curDay = 0
 day = []
 commas = 0 
 lines = csv_str.split("\\n")
 lineOne = True
 for line in lines:
 commas = 0
 if lineOne == True:
 lineOne = False
 else:
 for c in line:
 if c == ",":
 commas += 1
 if commas == 4:
 point.append(c)
 elif commas == 5:
 for x in point:
 if x == ",":
 point.remove(x)
 point = ''.join(point)
 point = float(point)
 points.append(point)
 day.append(curDay)
 curDay += 1
 point = []
 commas = 0
 points = list(reversed(points))
 # Plot the data
 plt.plot(day,points)
 plt.ylabel(ticker)
 plt.show()

Question 2

Composing small functions

chartStocks would be more readable if you split it into a handful of smaller functions, roughly like this:

def chartStocks(*tickers):
 for ticker in tickers:
 page = getTickerPage(ticker)
 csv_url = findCSVUrl(page)
 csv = getCSV(csv_url)
 day, points = parseCSV(csv)
 plot_data(ticker, day, points)
 # Or, if you're allergic to temporary variables:
 day, points = parseCSV(getCSV(findCSVUrl(getTickerPage(ticker))))

This approach lets you clearly see the "pipeline" your data is passing through, and lets you test and reuse smaller pieces individually.

Arguably it would be even cleaner to define def chartStock(ticker) to handle the case of one ticker, so that chartStocks is just

def chartStocks(*tickers):
 for ticker in tickers:
 chartStock(ticker)

The only thing to be careful of here is to make sure to design your functions to handle errors properly -- either check that each return value is not None before calling the next function, or allow them to take None as a parameter and just return nothing in that case.

str.startswith

This:

# Find all links on the page
for link in soup.findAll('a'):
 href = link.get('href')
 link = []
 for c in href[:48]:
 link.append(c)
 link = ''.join(link)
 # Find the URL for the stock ticker CSV file and convert the data to text
 if link == "http://real-chart.finance.yahoo.com/table.csv?s=":
 # ...

could be simplified with str.startswith:

def findCSVUrl(soupPage):
 CSV_URL_PREFIX = 'http://real-chart.finance.yahoo.com/table.csv?s='
 for link in soupPage.findAll('a'):
 href = link.get('href', '')
 if href.startswith(CSV_URL_PREFIX):
 return href

I've also provided a not-found value of '' so that if link has no href, startswith won't get called on None.

Skipping first row

Instead of using a lineOne flag when looping over lines:

lineOne = True
for line in lines:
 if lineOne == True:
 lineOne = False
 else:
 # continue parsing line...

you can just start from after the first row with a slice:

for line in lines[1:]:
 # ... continue parsing line

CSV parsing

Python has a built-in CSV parsing module that could simplify a lot of your parsing. It would do the splitting-by-commas for you and return either a list or dict of fields for each row, depending on what you ask for. You would end up with roughly this:

def parseCSV(csv_text):
 csv_rows = csv.reader(csv_text.split('\n'))
 days = []
 points = []
 for day, row in enumerate(csv_rows):
 close = float(row[4])
 days.append(day)
 points.append(close)
 return days, points

where the enumerate function would give you the same zero-based days list as you currently have.

Actually, since it seems like days will just be the list [0 .. len(points)], you could skip the enumerate and simply define days after you've parsed all the points, and toss in a list comprehension if you'd like for good measure:

def parseCSV(csv_text):
 csv_rows = csv.reader(csv_text.split('\n'))
 points = [float(row[4]) for row in csv_rows]
 days = list(range(len(points)))
 return days, points

BenC BenC 2,77811 silver badges22 bronze badges · Accepted Answer · 2015-12-21 07:33:15Z

Composing small functions

chartStocks would be more readable if you split it into a handful of smaller functions, roughly like this:

def chartStocks(*tickers):
 for ticker in tickers:
 page = getTickerPage(ticker)
 csv_url = findCSVUrl(page)
 csv = getCSV(csv_url)
 day, points = parseCSV(csv)
 plot_data(ticker, day, points)
 # Or, if you're allergic to temporary variables:
 day, points = parseCSV(getCSV(findCSVUrl(getTickerPage(ticker))))

This approach lets you clearly see the "pipeline" your data is passing through, and lets you test and reuse smaller pieces individually.

Arguably it would be even cleaner to define def chartStock(ticker) to handle the case of one ticker, so that chartStocks is just

def chartStocks(*tickers):
 for ticker in tickers:
 chartStock(ticker)

The only thing to be careful of here is to make sure to design your functions to handle errors properly -- either check that each return value is not None before calling the next function, or allow them to take None as a parameter and just return nothing in that case.

str.startswith

This:

# Find all links on the page
for link in soup.findAll('a'):
 href = link.get('href')
 link = []
 for c in href[:48]:
 link.append(c)
 link = ''.join(link)
 # Find the URL for the stock ticker CSV file and convert the data to text
 if link == "http://real-chart.finance.yahoo.com/table.csv?s=":
 # ...

could be simplified with str.startswith:

def findCSVUrl(soupPage):
 CSV_URL_PREFIX = 'http://real-chart.finance.yahoo.com/table.csv?s='
 for link in soupPage.findAll('a'):
 href = link.get('href', '')
 if href.startswith(CSV_URL_PREFIX):
 return href

I've also provided a not-found value of '' so that if link has no href, startswith won't get called on None.

Skipping first row

Instead of using a lineOne flag when looping over lines:

lineOne = True
for line in lines:
 if lineOne == True:
 lineOne = False
 else:
 # continue parsing line...

you can just start from after the first row with a slice:

for line in lines[1:]:
 # ... continue parsing line

CSV parsing

Python has a built-in CSV parsing module that could simplify a lot of your parsing. It would do the splitting-by-commas for you and return either a list or dict of fields for each row, depending on what you ask for. You would end up with roughly this:

def parseCSV(csv_text):
 csv_rows = csv.reader(csv_text.split('\n'))
 days = []
 points = []
 for day, row in enumerate(csv_rows):
 close = float(row[4])
 days.append(day)
 points.append(close)
 return days, points

where the enumerate function would give you the same zero-based days list as you currently have.

Actually, since it seems like days will just be the list [0 .. len(points)], you could skip the enumerate and simply define days after you've parsed all the points, and toss in a list comprehension if you'd like for good measure:

def parseCSV(csv_text):
 csv_rows = csv.reader(csv_text.split('\n'))
 points = [float(row[4]) for row in csv_rows]
 days = list(range(len(points)))
 return days, points

Stack Exchange Network

Web crawler that charts stock ticker data using matplotlib

1 Answer 1

Composing small functions

str.startswith

Skipping first row

CSV parsing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Web crawler that charts stock ticker data using matplotlib

1 Answer 1

Composing small functions

str.startswith

Skipping first row

CSV parsing

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions