Get most recent downloaded file using python-Selenium

Question 1

My Python-Selenium script is downloading the particular file but the file is taking long time to start and get completed downloaded. It starts after clicking in 25/30 seconds and completed in 50/55/60 seconds which is varied. I want to print the name of most recent file after download completion and quit browser afterwords.

I am using glob.glob but it is giving me an error as empty value in max for below code as it executes before the download completion.

mypath = "C:/Users/Desktop/"*.xlsx"
ReportFile = (max(glob.glob(mypath), key=os.path.getmtime))
print(ReportFile)
driver.quit()

OR

list_of_files = glob.glob('C:/Users/Desktop/*.xlsx') 
latest_file = max(list_of_files, key=os.path.getmtime)
print (latest_file)
driver.quit()

Currently I am using below code.

time.sleep(100)
driver.quit()

but it is something which is not acceptable as the time download is fluctuating and possibility of failure in the script. I tried with lambda function also but that is also not working.

Question 2

You can do this only if you have any of the below informations:

Know unique file name of the new download:

Then try to open or do something with the file or search for the specific file, till the file is found or exception is not thrown ( catch exceptions and try till there is no error) 2. Unique pattern of the new download eg: file_name+impestamp.xlxs

same like before , but get the time stamp part from max file from the list and check if it is greater than the timestamp taken just before the download started

No other files will get downloaded in between

Assuming that you know initial files and only one file will be downloaded after u click download.

#!/usr/bin/python
# -*- coding: utf-8 -*-
import time
#get file list before download
list_of_files_before_download = glob.glob('C:/Users/Desktop/*.xlsx')
#Get the loop start time ( current time)
start = time.time()
#this is the time in seconds , we set to zero initially
elapsed = 0
# write your code to click download
#then loop till two mins to check whether there is any new file created
while elapsed < 120:
 list_of_files_after_download = glob.glob(r'C:\Users\Desktop\*.txt')
 #get time and check if 120 second is elapsed
 done = time.time()
 elapsed = done - start
 # get new file list
 list_of_files_after_download = glob.glob('C:/Users/Desktop/*.xlsx')
 newfile = \
 list(set(list_of_files_after_download).difference(list_of_files_before_download))
 #if new file is created then break the loop
 if len(newfile):
 break

Question 3

Newfile is a list contain the latest file

Question 4

Ya its working fine but cant we remove while elapsed < 120: and use some other logic as the time of download is varied and not known the exact time of file downloaded.?

Question 5

@Amaze_Rock 120 is kept as to ensure it won't create infinite loop . You can keep it to any time as you want . If you keep 300 the loop runs for 5 mins . This ensures that if file download fails the code won't get stuck

Question 6

The code won't run for 2 mins it will run maximum to mins and if the list is not empty before that then the loop exit before that also .

Question 7

Ya I got it...Thanks! but one more query. but when I remove the break: and just put driver.quit() then it is working fine and not with the break

PDHide PDHidePDHide 11.1k2 gold badges17 silver badges43 bronze badges · Accepted Answer · 2020-06-29 07:54:58Z

You can do this only if you have any of the below informations:

Know unique file name of the new download:

Then try to open or do something with the file or search for the specific file, till the file is found or exception is not thrown ( catch exceptions and try till there is no error) 2. Unique pattern of the new download eg: file_name+impestamp.xlxs

same like before , but get the time stamp part from max file from the list and check if it is greater than the timestamp taken just before the download started

No other files will get downloaded in between

Assuming that you know initial files and only one file will be downloaded after u click download.

#!/usr/bin/python
# -*- coding: utf-8 -*-
import time
#get file list before download
list_of_files_before_download = glob.glob('C:/Users/Desktop/*.xlsx')
#Get the loop start time ( current time)
start = time.time()
#this is the time in seconds , we set to zero initially
elapsed = 0
# write your code to click download
#then loop till two mins to check whether there is any new file created
while elapsed < 120:
 list_of_files_after_download = glob.glob(r'C:\Users\Desktop\*.txt')
 #get time and check if 120 second is elapsed
 done = time.time()
 elapsed = done - start
 # get new file list
 list_of_files_after_download = glob.glob('C:/Users/Desktop/*.xlsx')
 newfile = \
 list(set(list_of_files_after_download).difference(list_of_files_before_download))
 #if new file is created then break the loop
 if len(newfile):
 break

Ya its working fine but cant we remove while elapsed < 120: and use some other logic as the time of download is varied and not known the exact time of file downloaded.?
@Amaze_Rock 120 is kept as to ensure it won't create infinite loop . You can keep it to any time as you want . If you keep 300 the loop runs for 5 mins . This ensures that if file download fails the code won't get stuck
The code won't run for 2 mins it will run maximum to mins and if the list is not empty before that then the loop exit before that also .
Ya I got it...Thanks! but one more query. but when I remove the break: and just put driver.quit() then it is working fine and not with the break

Stack Exchange Network

Get most recent downloaded file using python-Selenium

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Get most recent downloaded file using python-Selenium

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions