Commit cface0d

authored

Merge pull request avinashkranjan#899 from Ayushjain2205/udemy-scraper

Udemy scraper

2 parents 6914085 + 00b9082 commit cface0dCopy full SHA for cface0d

File tree

4 files changed

+155

-0

lines changed

Udemy Scraper

4 files changed

+155

-0

lines changed

`‎Udemy Scraper/README.md`

Lines changed: 30 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,30 @@`
	`1`	`+# Udemy Scraper`
	`2`	`+There are 2 scripts in this project-`
	`3`	`+1. fetcher.py - This script is used to scrape course data from udemy based on the category entered as input by the user`
	`4`	`+2. display.py - This script is used to display the scraped courses from the database to the terminal`
	`5`	`+`
	`6`	`+## Setup instructions`
	`7`	`+In order to run this script, you need to have Python and pip installed on your system. After you're done installing Python and pip, run the following command from your terminal to install the requirements from the same folder (directory) of the project.`
	`8`	+```
	`9`	`+pip install -r requirements.txt`
	`10`	+```
	`11`	`+`
	`12`	`+After satisfying all the requirements for the project, Open the terminal in the project folder and run`
	`13`	+```
	`14`	`+python fetcher.py`
	`15`	`+python display.py`
	`16`	+```
	`17`	`+or`
	`18`	+```
	`19`	`+python3 fetcher.py`
	`20`	`+python3 display.py`
	`21`	+```
	`22`	`+depending upon the python version. Make sure that you are running the command from the same virtual environment in which the required modules are installed.`
	`23`	`+`
	`24`	`+## Output`
	`25`	`+![Sample output of fetcher script](https://i.postimg.cc/SNCmzfhp/fetcher.png)`
	`26`	`+`
	`27`	`+![Sample output of display script](https://i.postimg.cc/7h7r0wjN/display.png)`
	`28`	`+`
	`29`	`+## Author`
	`30`	`+[Ayush Jain](https://github.com/Ayushjain2205)`

`‎Udemy Scraper/display.py`

Lines changed: 43 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,43 @@`
	`1`	`+import sqlite3`
	`2`	`+from sqlite3 import Error`
	`3`	`+`
	`4`	`+# Function to connect to the SQL Database`
	`5`	`+def sql_connection():`
	`6`	`+ try:`
	`7`	`+ con = sqlite3.connect('./Udemy Scraper/udemyDatabase.db')`
	`8`	`+ return con`
	`9`	`+ except Error:`
	`10`	`+ print(Error)`
	`11`	`+`
	`12`	`+con = sql_connection()`
	`13`	`+`
	`14`	`+# Function to Fetch courses from database`
	`15`	`+def sql_fetch(con):`
	`16`	`+ cursorObj = con.cursor()`
	`17`	`+ try:`
	`18`	`+ cursorObj.execute('SELECT * FROM courses') # SQL search query`
	`19`	`+ except Error:`
	`20`	`+ print("Database empty... Fetch courses using fetcher script")`
	`21`	`+ return`
	`22`	`+`
	`23`	`+ rows = cursorObj.fetchall()`
	`24`	`+`
	`25`	`+ # Print table header`
	`26`	`+ print("{:^30}".format("Title"),"{:^30}".format("Description"),"{:^20}".format("Instructor"),`
	`27`	`+ "{:<15}".format("Current Price"),"{:<18}".format("Original Price"),"{:^10}".format("Rating"),`
	`28`	`+ "{:^10}".format("Hours"),"{:^10}".format("Lectures"))`
	`29`	`+`
	`30`	`+ # Print all rows`
	`31`	`+ for row in rows:`
	`32`	`+ # Format individual data items for printing in a table like manner`
	`33`	`+ title = "{:<30}".format(row[0] if len(row[0])<30 else row[0][:26]+"...")`
	`34`	`+ description = "{:<30}".format(row[1] if len(row[1])<30 else row[1][:26]+"...")`
	`35`	`+ instructor = "{:<20}".format(row[2] if len(row[2])<30 else row[2][:16]+"...")`
	`36`	`+ current_price = "{:^15}".format(row[3])`
	`37`	`+ original_price= "{:^18}".format(row[4])`
	`38`	`+ rating = "{:^10}".format(row[5])`
	`39`	`+ hours= "{:^10}".format(row[6])`
	`40`	`+ lectures = "{:^10}".format(row[7])`
	`41`	`+ print(title,description,instructor,current_price,original_price,rating,hours,lectures)`
	`42`	`+`
	`43`	`+sql_fetch(con)`

`‎Udemy Scraper/fetcher.py`

Lines changed: 79 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,79 @@`
	`1`	`+import requests`
	`2`	`+from bs4 import BeautifulSoup`
	`3`	`+from selenium import webdriver`
	`4`	`+from selenium.webdriver.common.keys import Keys`
	`5`	`+import time`
	`6`	`+import sqlite3`
	`7`	`+from sqlite3 import Error`
	`8`	`+`
	`9`	`+# Function to connect to the SQL Database`
	`10`	`+def sql_connection():`
	`11`	`+ try:`
	`12`	`+ con = sqlite3.connect('./Udemy Scraper/udemyDatabase.db')`
	`13`	`+ return con`
	`14`	`+ except Error:`
	`15`	`+ print(Error)`
	`16`	`+`
	`17`	`+# Function to create table`
	`18`	`+def sql_table(con):`
	`19`	`+ cursorObj = con.cursor()`
	`20`	`+ cursorObj.execute("CREATE TABLE IF NOT EXISTS courses(title text, description text, instructor text,current_price INTEGER, original_price INTEGER, rating REAL, hours REAL, lectures INTEGER)")`
	`21`	`+ con.commit()`
	`22`	`+`
	`23`	`+# Call functions to connect to database and create table`
	`24`	`+con = sql_connection()`
	`25`	`+sql_table(con)`
	`26`	`+`
	`27`	`+# Function to insert into table`
	`28`	`+def sql_insert(con, entities):`
	`29`	`+ cursorObj = con.cursor()`
	`30`	`+ cursorObj.execute('INSERT INTO courses(title, description, instructor, current_price, original_price, rating, hours, lectures) VALUES(?, ?, ?, ?, ?, ?, ?, ?)', entities)`
	`31`	`+ con.commit()`
	`32`	`+`
	`33`	`+`
	`34`	`+# Get chrome driver path`
	`35`	`+driver_path = input("Enter chrome driver path: ")`
	`36`	`+`
	`37`	`+print("\nSome Categories Available on Udemy include:\nDevelopment - Python, Web Development, Javascript, Java \nDesign - Photoshop, Blender, Graphic design\n")`
	`38`	`+`
	`39`	`+# Get input for course category to scrape`
	`40`	`+category = input("Enter course category: ")`
	`41`	`+`
	`42`	`+url = 'https://www.udemy.com/courses/search/?src=ukw&q={}'.format(category)`
	`43`	`+`
	`44`	`+# initiating the webdriver. Parameter includes the path of the webdriver.`
	`45`	`+driver = webdriver.Chrome(driver_path)`
	`46`	`+driver.get(url)`
	`47`	`+`
	`48`	`+# this is just to ensure that the page is loaded`
	`49`	`+time.sleep(5)`
	`50`	`+html = driver.page_source`
	`51`	`+`
	`52`	`+# Now apply bs4 to html variable`
	`53`	`+soup = BeautifulSoup(html, "html.parser")`
	`54`	`+course_divs = soup.find_all("div", {"class": "course-card--container--3w8Zm course-card--large--1BVxY"})`
	`55`	`+`
	`56`	`+# Get all course divs and extract information from individual divs`
	`57`	`+for course_div in course_divs:`
	`58`	`+ title = course_div.find("div",{"class":"udlite-focus-visible-target udlite-heading-md course-card--course-title--2f7tE"}).text.strip()`
	`59`	`+ description = course_div.find("p",{"class":"udlite-text-sm course-card--course-headline--yIrRk"}).text.strip()`
	`60`	`+ instructor = course_div.find("div",{"class":"udlite-text-xs course-card--instructor-list--lIA4f"}).text.strip()`
	`61`	`+`
	`62`	`+ current_price = course_div.find("div",{"class":"price-text--price-part--Tu6MH course-card--discount-price--3TaBk udlite-heading-md"}).text.strip()`
	`63`	`+ current_price = current_price.replace("Current price₹","")`
	`64`	`+`
	`65`	`+ original_price = course_div.find("div",{"class":"price-text--price-part--Tu6MH price-text--original-price--2e-F5 course-card--list-price--2AO6G udlite-text-sm"}).text.strip()`
	`66`	`+ original_price = original_price.replace("Original Price₹","")`
	`67`	`+`
	`68`	`+ rating = course_div.find("span",{"class":"udlite-heading-sm star-rating--rating-number--3lVe8"}).text.strip()`
	`69`	`+`
	`70`	`+ hours = course_div.find_all("span",{"class":"course-card--row--1OMjg"})[0].text.strip().split()[0]`
	`71`	`+`
	`72`	`+ lectures = course_div.find_all("span",{"class":"course-card--row--1OMjg"})[1].text.strip().split()[0]`
	`73`	`+`
	`74`	`+ entities = (title, description, instructor, current_price, original_price, rating, hours, lectures)`
	`75`	`+ sql_insert(con, entities)`
	`76`	`+`
	`77`	`+print("Saved successfully in database!")`
	`78`	`+`
	`79`	`+driver.close() # closing the webdriver`

`‎Udemy Scraper/requirements.txt`

Lines changed: 3 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+requests`
	`2`	`+beautifulsoup4`
	`3`	`+selenium`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit cface0d

File tree

4 files changed

4 files changed

`‎Udemy Scraper/README.md`

`‎Udemy Scraper/display.py`

`‎Udemy Scraper/fetcher.py`

`‎Udemy Scraper/requirements.txt`

0 commit comments