3
\$\begingroup\$

This code takes a website and downloads all .jpg images in the webpage. It supports only websites that have the <img> element and src contains a .jpg link.

(Tested here)

import random
import urllib.request
import requests
from bs4 import BeautifulSoup
def Download_Image_from_Web(url):
 source_code = requests.get(url)
 plain_text = source_code.text
 soup = BeautifulSoup(plain_text, "html.parser")
 raw_text = r'links.txt'
 with open(raw_text, 'w') as fw:
 for link in soup.findAll('img'):
 image_links = link.get('src')
 if '.jpg' in image_links:
 for i in image_links.split("\\n"):
 fw.write(i + '\n')
 num_lines = sum(1 for line in open('links.txt'))
 if num_lines == 0:
 print("There is 0 photo in this web page.")
 elif num_lines == 1:
 print("There is", num_lines, "photo in this web page:")
 else:
 print("There are", num_lines, "photos in this web page:")
 k = 0
 while k <= (num_lines-1):
 name = random.randrange(1, 1000)
 fullName = str(name) + ".jpg"
 with open('links.txt', 'r') as f:
 lines = f.readlines()[k]
 urllib.request.urlretrieve(lines, fullName)
 print(lines+fullName+'\n')
 k += 1
Download_Image_from_Web("https://pixabay.com")
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Apr 29, 2017 at 15:46
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Welcome to Code Review! Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . \$\endgroup\$ Commented Apr 29, 2017 at 19:27

3 Answers 3

4
\$\begingroup\$

Unnecessary file operations

This is horribly inefficient:

k = 0
while k <= (num_lines-1):
 name = random.randrange(1, 1000)
 fullName = str(name) + ".jpg"
 with open('links.txt', 'r') as f:
 lines = f.readlines()[k]
 urllib.request.urlretrieve(lines, fullName)
 print(lines+fullName+'\n')
 k += 1

Re-reading the same file num_lines times, to download the k-th!

Btw, do you really need to write the list of urls to a file? Why not just keep them in a list? Even if you want the urls in a file, you could keep them in a list in memory and never read that file, only write.

Code organization

Instead of having all the code in a single function that does multiple things, it would be better to organize your program into smaller functions, each with a single responsibility.

Python conventions

Python has a well-defined set of coding conventions in PEP8, many of which are violated here. I suggest to read through that document, and follow as much as possible.

200_success
146k22 gold badges190 silver badges479 bronze badges
answered Apr 29, 2017 at 16:25
\$\endgroup\$
4
  • \$\begingroup\$ list doesn't work form me, it separates each link in one list and i don't know how to merge those links in one lists; \$\endgroup\$ Commented Apr 29, 2017 at 17:16
  • \$\begingroup\$ @SalahEddine perhaps you're looking for the extend function, for example all_links.extend(links) \$\endgroup\$ Commented Apr 29, 2017 at 17:18
  • \$\begingroup\$ please look at it now I have solved some problems \$\endgroup\$ Commented Apr 29, 2017 at 19:25
  • \$\begingroup\$ codereview.stackexchange.com/questions/162160/… \$\endgroup\$ Commented Apr 30, 2017 at 5:24
2
\$\begingroup\$

Aside from the things others mentioned, you can also improve the way you locate the img elements that have src attribute ending with .jpg. Instead of using findAll and if conditions, you can do it in one go with a CSS selector:

for img in soup.select("img[src$=jpg]"):
 print(img["src"])
answered Apr 30, 2017 at 11:40
\$\endgroup\$
1
\$\begingroup\$

How about the following?

import random
import requests
from bs4 import BeautifulSoup
# got from http://stackoverflow.com/a/16696317
def download_file(url):
 local_filename = url.split('/')[-1]
 print("Downloading {} ---> {}".format(url, local_filename))
 # NOTE the stream=True parameter
 r = requests.get(url, stream=True)
 with open(local_filename, 'wb') as f:
 for chunk in r.iter_content(chunk_size=1024): 
 if chunk: # filter out keep-alive new chunks
 f.write(chunk)
 return local_filename
def Download_Image_from_Web(url):
 source_code = requests.get(url)
 plain_text = source_code.text
 soup = BeautifulSoup(plain_text, "html.parser")
 for link in soup.findAll('img'):
 image_links = link.get('src')
 if not image_links.startswith('http'):
 image_links = url + '/' + image_links
 download_file(image_links)
Download_Image_from_Web("https://pixabay.com")
answered Apr 29, 2017 at 18:20
\$\endgroup\$
2
  • \$\begingroup\$ it works pretty good, i feel bad about my code lol \$\endgroup\$ Commented Apr 30, 2017 at 5:43
  • \$\begingroup\$ I learned quite a lot from this code thank's for shairing it \$\endgroup\$ Commented Apr 30, 2017 at 5:53

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.