Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit cface0d

Browse files
Merge pull request avinashkranjan#899 from Ayushjain2205/udemy-scraper
Udemy scraper
2 parents 6914085 + 00b9082 commit cface0d

File tree

4 files changed

+155
-0
lines changed

4 files changed

+155
-0
lines changed

‎Udemy Scraper/README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Udemy Scraper
2+
There are 2 scripts in this project-
3+
1. fetcher.py - This script is used to scrape course data from udemy based on the category entered as input by the user
4+
2. display.py - This script is used to display the scraped courses from the database to the terminal
5+
6+
## Setup instructions
7+
In order to run this script, you need to have Python and pip installed on your system. After you're done installing Python and pip, run the following command from your terminal to install the requirements from the same folder (directory) of the project.
8+
```
9+
pip install -r requirements.txt
10+
```
11+
12+
After satisfying all the requirements for the project, Open the terminal in the project folder and run
13+
```
14+
python fetcher.py
15+
python display.py
16+
```
17+
or
18+
```
19+
python3 fetcher.py
20+
python3 display.py
21+
```
22+
depending upon the python version. Make sure that you are running the command from the same virtual environment in which the required modules are installed.
23+
24+
## Output
25+
![Sample output of fetcher script](https://i.postimg.cc/SNCmzfhp/fetcher.png)
26+
27+
![Sample output of display script](https://i.postimg.cc/7h7r0wjN/display.png)
28+
29+
## Author
30+
[Ayush Jain](https://github.com/Ayushjain2205)

‎Udemy Scraper/display.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import sqlite3
2+
from sqlite3 import Error
3+
4+
# Function to connect to the SQL Database
5+
def sql_connection():
6+
try:
7+
con = sqlite3.connect('./Udemy Scraper/udemyDatabase.db')
8+
return con
9+
except Error:
10+
print(Error)
11+
12+
con = sql_connection()
13+
14+
# Function to Fetch courses from database
15+
def sql_fetch(con):
16+
cursorObj = con.cursor()
17+
try:
18+
cursorObj.execute('SELECT * FROM courses') # SQL search query
19+
except Error:
20+
print("Database empty... Fetch courses using fetcher script")
21+
return
22+
23+
rows = cursorObj.fetchall()
24+
25+
# Print table header
26+
print("{:^30}".format("Title"),"{:^30}".format("Description"),"{:^20}".format("Instructor"),
27+
"{:<15}".format("Current Price"),"{:<18}".format("Original Price"),"{:^10}".format("Rating"),
28+
"{:^10}".format("Hours"),"{:^10}".format("Lectures"))
29+
30+
# Print all rows
31+
for row in rows:
32+
# Format individual data items for printing in a table like manner
33+
title = "{:<30}".format(row[0] if len(row[0])<30 else row[0][:26]+"...")
34+
description = "{:<30}".format(row[1] if len(row[1])<30 else row[1][:26]+"...")
35+
instructor = "{:<20}".format(row[2] if len(row[2])<30 else row[2][:16]+"...")
36+
current_price = "{:^15}".format(row[3])
37+
original_price= "{:^18}".format(row[4])
38+
rating = "{:^10}".format(row[5])
39+
hours= "{:^10}".format(row[6])
40+
lectures = "{:^10}".format(row[7])
41+
print(title,description,instructor,current_price,original_price,rating,hours,lectures)
42+
43+
sql_fetch(con)

‎Udemy Scraper/fetcher.py

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
import requests
2+
from bs4 import BeautifulSoup
3+
from selenium import webdriver
4+
from selenium.webdriver.common.keys import Keys
5+
import time
6+
import sqlite3
7+
from sqlite3 import Error
8+
9+
# Function to connect to the SQL Database
10+
def sql_connection():
11+
try:
12+
con = sqlite3.connect('./Udemy Scraper/udemyDatabase.db')
13+
return con
14+
except Error:
15+
print(Error)
16+
17+
# Function to create table
18+
def sql_table(con):
19+
cursorObj = con.cursor()
20+
cursorObj.execute("CREATE TABLE IF NOT EXISTS courses(title text, description text, instructor text,current_price INTEGER, original_price INTEGER, rating REAL, hours REAL, lectures INTEGER)")
21+
con.commit()
22+
23+
# Call functions to connect to database and create table
24+
con = sql_connection()
25+
sql_table(con)
26+
27+
# Function to insert into table
28+
def sql_insert(con, entities):
29+
cursorObj = con.cursor()
30+
cursorObj.execute('INSERT INTO courses(title, description, instructor, current_price, original_price, rating, hours, lectures) VALUES(?, ?, ?, ?, ?, ?, ?, ?)', entities)
31+
con.commit()
32+
33+
34+
# Get chrome driver path
35+
driver_path = input("Enter chrome driver path: ")
36+
37+
print("\nSome Categories Available on Udemy include:\nDevelopment - Python, Web Development, Javascript, Java \nDesign - Photoshop, Blender, Graphic design\n")
38+
39+
# Get input for course category to scrape
40+
category = input("Enter course category: ")
41+
42+
url = 'https://www.udemy.com/courses/search/?src=ukw&q={}'.format(category)
43+
44+
# initiating the webdriver. Parameter includes the path of the webdriver.
45+
driver = webdriver.Chrome(driver_path)
46+
driver.get(url)
47+
48+
# this is just to ensure that the page is loaded
49+
time.sleep(5)
50+
html = driver.page_source
51+
52+
# Now apply bs4 to html variable
53+
soup = BeautifulSoup(html, "html.parser")
54+
course_divs = soup.find_all("div", {"class": "course-card--container--3w8Zm course-card--large--1BVxY"})
55+
56+
# Get all course divs and extract information from individual divs
57+
for course_div in course_divs:
58+
title = course_div.find("div",{"class":"udlite-focus-visible-target udlite-heading-md course-card--course-title--2f7tE"}).text.strip()
59+
description = course_div.find("p",{"class":"udlite-text-sm course-card--course-headline--yIrRk"}).text.strip()
60+
instructor = course_div.find("div",{"class":"udlite-text-xs course-card--instructor-list--lIA4f"}).text.strip()
61+
62+
current_price = course_div.find("div",{"class":"price-text--price-part--Tu6MH course-card--discount-price--3TaBk udlite-heading-md"}).text.strip()
63+
current_price = current_price.replace("Current price₹","")
64+
65+
original_price = course_div.find("div",{"class":"price-text--price-part--Tu6MH price-text--original-price--2e-F5 course-card--list-price--2AO6G udlite-text-sm"}).text.strip()
66+
original_price = original_price.replace("Original Price₹","")
67+
68+
rating = course_div.find("span",{"class":"udlite-heading-sm star-rating--rating-number--3lVe8"}).text.strip()
69+
70+
hours = course_div.find_all("span",{"class":"course-card--row--1OMjg"})[0].text.strip().split()[0]
71+
72+
lectures = course_div.find_all("span",{"class":"course-card--row--1OMjg"})[1].text.strip().split()[0]
73+
74+
entities = (title, description, instructor, current_price, original_price, rating, hours, lectures)
75+
sql_insert(con, entities)
76+
77+
print("Saved successfully in database!")
78+
79+
driver.close() # closing the webdriver

‎Udemy Scraper/requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
requests
2+
beautifulsoup4
3+
selenium

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /