Image downloader for a website

Question 1

This code takes a website and downloads all .jpg images in the webpage. It supports only websites that have the <img> element and src contains a .jpg link.

(Tested here)

import random
import urllib.request
import requests
from bs4 import BeautifulSoup
def Download_Image_from_Web(url):
 source_code = requests.get(url)
 plain_text = source_code.text
 soup = BeautifulSoup(plain_text, "html.parser")
 raw_text = r'links.txt'
 with open(raw_text, 'w') as fw:
 for link in soup.findAll('img'):
 image_links = link.get('src')
 if '.jpg' in image_links:
 for i in image_links.split("\\n"):
 fw.write(i + '\n')
 num_lines = sum(1 for line in open('links.txt'))
 if num_lines == 0:
 print("There is 0 photo in this web page.")
 elif num_lines == 1:
 print("There is", num_lines, "photo in this web page:")
 else:
 print("There are", num_lines, "photos in this web page:")
 k = 0
 while k <= (num_lines-1):
 name = random.randrange(1, 1000)
 fullName = str(name) + ".jpg"
 with open('links.txt', 'r') as f:
 lines = f.readlines()[k]
 urllib.request.urlretrieve(lines, fullName)
 print(lines+fullName+'\n')
 k += 1
Download_Image_from_Web("https://pixabay.com")

Question 2

Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers .

Question 3

Unnecessary file operations

This is horribly inefficient:

k = 0
while k <= (num_lines-1):
 name = random.randrange(1, 1000)
 fullName = str(name) + ".jpg"
 with open('links.txt', 'r') as f:
 lines = f.readlines()[k]
 urllib.request.urlretrieve(lines, fullName)
 print(lines+fullName+'\n')
 k += 1

Re-reading the same file num_lines times, to download the k-th!

Btw, do you really need to write the list of urls to a file? Why not just keep them in a list? Even if you want the urls in a file, you could keep them in a list in memory and never read that file, only write.

Code organization

Instead of having all the code in a single function that does multiple things, it would be better to organize your program into smaller functions, each with a single responsibility.

Python conventions

Python has a well-defined set of coding conventions in PEP8, many of which are violated here. I suggest to read through that document, and follow as much as possible.

Question 4

list doesn't work form me, it separates each link in one list and i don't know how to merge those links in one lists;

Question 5

@SalahEddine perhaps you're looking for the extend function, for example all_links.extend(links)

Question 6

please look at it now I have solved some problems

Question 7

codereview.stackexchange.com/questions/162160/…

Question 8

Aside from the things others mentioned, you can also improve the way you locate the img elements that have src attribute ending with .jpg. Instead of using findAll and if conditions, you can do it in one go with a CSS selector:

for img in soup.select("img[src$=jpg]"):
 print(img["src"])

Question 9

How about the following?

import random
import requests
from bs4 import BeautifulSoup
# got from http://stackoverflow.com/a/16696317
def download_file(url):
 local_filename = url.split('/')[-1]
 print("Downloading {} ---> {}".format(url, local_filename))
 # NOTE the stream=True parameter
 r = requests.get(url, stream=True)
 with open(local_filename, 'wb') as f:
 for chunk in r.iter_content(chunk_size=1024): 
 if chunk: # filter out keep-alive new chunks
 f.write(chunk)
 return local_filename
def Download_Image_from_Web(url):
 source_code = requests.get(url)
 plain_text = source_code.text
 soup = BeautifulSoup(plain_text, "html.parser")
 for link in soup.findAll('img'):
 image_links = link.get('src')
 if not image_links.startswith('http'):
 image_links = url + '/' + image_links
 download_file(image_links)
Download_Image_from_Web("https://pixabay.com")

Question 10

it works pretty good, i feel bad about my code lol

Question 11

I learned quite a lot from this code thank's for shairing it

janos janos 113k15 gold badges154 silver badges396 bronze badges · Answer 1 · 2017-04-29 16:25:55Z

Unnecessary file operations

This is horribly inefficient:

k = 0
while k <= (num_lines-1):
 name = random.randrange(1, 1000)
 fullName = str(name) + ".jpg"
 with open('links.txt', 'r') as f:
 lines = f.readlines()[k]
 urllib.request.urlretrieve(lines, fullName)
 print(lines+fullName+'\n')
 k += 1

Re-reading the same file num_lines times, to download the k-th!

Btw, do you really need to write the list of urls to a file? Why not just keep them in a list? Even if you want the urls in a file, you could keep them in a list in memory and never read that file, only write.

Code organization

Instead of having all the code in a single function that does multiple things, it would be better to organize your program into smaller functions, each with a single responsibility.

Python conventions

Python has a well-defined set of coding conventions in PEP8, many of which are violated here. I suggest to read through that document, and follow as much as possible.

list doesn't work form me, it separates each link in one list and i don't know how to merge those links in one lists;
@SalahEddine perhaps you're looking for the extend function, for example all_links.extend(links)

alecxe alecxe 17.5k8 gold badges52 silver badges93 bronze badges · Answer 2 · 2017-04-30 11:40:56Z

Aside from the things others mentioned, you can also improve the way you locate the img elements that have src attribute ending with .jpg. Instead of using findAll and if conditions, you can do it in one go with a CSS selector:

for img in soup.select("img[src$=jpg]"):
 print(img["src"])

RChat RChat 1313 bronze badges · Answer 3 · 2017-04-29 18:20:28Z

How about the following?

import random
import requests
from bs4 import BeautifulSoup
# got from http://stackoverflow.com/a/16696317
def download_file(url):
 local_filename = url.split('/')[-1]
 print("Downloading {} ---> {}".format(url, local_filename))
 # NOTE the stream=True parameter
 r = requests.get(url, stream=True)
 with open(local_filename, 'wb') as f:
 for chunk in r.iter_content(chunk_size=1024): 
 if chunk: # filter out keep-alive new chunks
 f.write(chunk)
 return local_filename
def Download_Image_from_Web(url):
 source_code = requests.get(url)
 plain_text = source_code.text
 soup = BeautifulSoup(plain_text, "html.parser")
 for link in soup.findAll('img'):
 image_links = link.get('src')
 if not image_links.startswith('http'):
 image_links = url + '/' + image_links
 download_file(image_links)
Download_Image_from_Web("https://pixabay.com")

I learned quite a lot from this code thank's for shairing it

Stack Exchange Network

Image downloader for a website

3 Answers 3

Unnecessary file operations

Code organization

Python conventions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Image downloader for a website

3 Answers 3

Unnecessary file operations

Code organization

Python conventions

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions