Retrieve a number in a webpage and store in a SQLite3 db

Question 1

I'm beggining Python. So I wrote a program which is supposed to get a number of connected people on a forum (like this one : http://www.jeuxvideo.com/forums/0-51-0-1-0-1-0-blabla-18-25-ans.htm) and store the number with the datetime in a database file (SQLite3). Every forum has his own table name.

My code is supposed to do this :

Create object for each forum we want to retrieve with the Forum class.

Store these objects in a list to put use in a loop For.

Get a web page .htm (with requests) where the number of connected people is wrote in a span tag with the class "nb-connect-fofo" who looks like this <span class="nb-connect-fofo">1799 connecté(s)</span>. I'm using BeautifulSoup to get the string and REGEX to get the number It's supposed to be done for every forum

Execute a SQLite3 request to store the datetime, number in database file with the same name as the forum which is retrieve

Here's my code :

#!/usr/bin/python3
from bs4 import BeautifulSoup
from time import sleep
import sqlite3
import datetime
import requests
import re
class Forum:
 def __init__(self, forum, url_forum): #initialization all object with there name, URL
 self.forum = forum
 self.url_forum = url_forum
 pattern = '([0-9]{1,5})'
 self.pattern = re.compile(pattern)
 def add_to_database(self): #Add to the SQLite3 database the number of connected people and the datetime to their own table
 connection = sqlite3.connect("database.db")
 c = connection.cursor()
 now = datetime.datetime.today()
 nb_co = self.recup_co()
 text = "INSERT INTO {0}(datetime, nb_co) VALUES('{1}', '{2}')".format(self.forum, now, nb_co)
 c.execute(text)
 connection.commit()
 connection.close()
 print(now, self.forum, str(nb_co))
 sleep(1)
 def recup_co(self): #Retrieving the page and the number of people connected by using REGEX
 r = requests.get(self.url_forum)
 page_html = str(r.text)
 page = BeautifulSoup(page_html, 'html.parser') 
 resultat = page.select(".nb-connect-fofo")
 nb_co = re.search(self.pattern, str(resultat))
 return nb_co.group(0)
def main(): 
 # All forums which are scanned are here
 dixhuit_vingtcinq = Forum("dixhuit_vingtcinq", "http://www.jeuxvideo.com/forums/0-51-0-1-0-1-0-blabla-18-25-ans.htm")
 moins_quinze = Forum("moins_quinze", "http://www.jeuxvideo.com/forums/0-15-0-1-0-1-0-blabla-moins-de-15-ans.htm")
 quinze_dixhuit = Forum("quinze_dixhuit", "http://www.jeuxvideo.com/forums/0-50-0-1-0-1-0-blabla-15-18-ans.htm")
 overwatch = Forum("overwatch", "http://www.jeuxvideo.com/forums/0-33972-0-1-0-1-0-overwatch.htm")
 #All forum name's are stored here to use them with a list
 forums = [dixhuit_vingtcinq, moins_quinze, quinze_dixhuit, overwatch] 
 while(True):
 for forum in forums:
 try:
 forum.add_to_database()
 except:
 print("An error occured with the forum '{0}' at {1}".format(forum.forum, datetime.datetime.today()))
 sleep(5)
 sleep(60)
main()

I will use it later to make graphics, make little statistics to improve my skill with Python. Maybe I will retrieve more forum and expand my program to scrap the website and get every post on these forums (If I, I will do this in a lot of time later).

So I'm asking you for some improvements/ideas. As a beginner, there are obviously somes errors that can be very annoying. Because, i really want to improve

Also, my code is running on one of my own server. Isn't it better to buy a cheap VPS for 2€ instead ?

Thanks for reading and thanking you in advance.

PS : If there are somes mistakes relatives to my post about the website tell me

Question 2

Code smells

your code is vulnerable to SQL injection attacks because you are using string formatting to put query parameters into a query. You need to proper parameterize your query with the help of the database driver:
```
query = """
 INSERT INTO {table} (datetime, nb_co)
 VALUES(?, ?)
""".format(table=self.forum)
c.execute(query, (now, nb_co))
```
Note that this way you also don't need to worry about Python to database type conversions and quotes inside parameters - it will all be handled by the database driver.

Performance

instead of re-connecting to the database multiple times, think about connecting to a database once, processing all the data and then closing the connection afterwards
same idea about the use of requests - you may initialize a Session() and reuse
use lxml instead of html.parser as an underlying parser used by BeautifulSoup
you can use SoupStrainer class to parse only the desired element, which will allow you to then simply get the text and split by space instead of applying a regular expression:
```
parse_only = SoupStrainer(class_="nb-connect-fofo")
page = BeautifulSoup(page_html, 'lxml', parse_only=parse_only)
return page.get_text().split()[0]
```

alecxe alecxe 17.5k8 gold badges52 silver badges93 bronze badges · Accepted Answer · 2017-06-30 17:16:51Z

Code smells

your code is vulnerable to SQL injection attacks because you are using string formatting to put query parameters into a query. You need to proper parameterize your query with the help of the database driver:
```
query = """
 INSERT INTO {table} (datetime, nb_co)
 VALUES(?, ?)
""".format(table=self.forum)
c.execute(query, (now, nb_co))
```
Note that this way you also don't need to worry about Python to database type conversions and quotes inside parameters - it will all be handled by the database driver.

Performance

instead of re-connecting to the database multiple times, think about connecting to a database once, processing all the data and then closing the connection afterwards
same idea about the use of requests - you may initialize a Session() and reuse
use lxml instead of html.parser as an underlying parser used by BeautifulSoup
you can use SoupStrainer class to parse only the desired element, which will allow you to then simply get the text and split by space instead of applying a regular expression:
```
parse_only = SoupStrainer(class_="nb-connect-fofo")
page = BeautifulSoup(page_html, 'lxml', parse_only=parse_only)
return page.get_text().split()[0]
```

Stack Exchange Network

Retrieve a number in a webpage and store in a SQLite3 db

1 Answer 1

Code smells

Performance

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Retrieve a number in a webpage and store in a SQLite3 db

1 Answer 1

Code smells

Performance

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions