My Python-Selenium script is downloading the particular file but the file is taking long time to start and get completed downloaded. It starts after clicking in 25/30 seconds and completed in 50/55/60 seconds which is varied. I want to print the name of most recent file after download completion and quit browser afterwords.
I am using glob.glob but it is giving me an error as empty value in max for below code as it executes before the download completion.
mypath = "C:/Users/Desktop/"*.xlsx"
ReportFile = (max(glob.glob(mypath), key=os.path.getmtime))
print(ReportFile)
driver.quit()
OR
list_of_files = glob.glob('C:/Users/Desktop/*.xlsx')
latest_file = max(list_of_files, key=os.path.getmtime)
print (latest_file)
driver.quit()
Currently I am using below code.
time.sleep(100)
driver.quit()
but it is something which is not acceptable as the time download is fluctuating and possibility of failure in the script. I tried with lambda function also but that is also not working.
1 Answer 1
You can do this only if you have any of the below informations:
- Know unique file name of the new download:
Then try to open or do something with the file or search for the specific file, till the file is found or exception is not thrown ( catch exceptions and try till there is no error) 2. Unique pattern of the new download eg: file_name+impestamp.xlxs
same like before , but get the time stamp part from max file from the list and check if it is greater than the timestamp taken just before the download started
- No other files will get downloaded in between
Assuming that you know initial files and only one file will be downloaded after u click download.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import time
#get file list before download
list_of_files_before_download = glob.glob('C:/Users/Desktop/*.xlsx')
#Get the loop start time ( current time)
start = time.time()
#this is the time in seconds , we set to zero initially
elapsed = 0
# write your code to click download
#then loop till two mins to check whether there is any new file created
while elapsed < 120:
list_of_files_after_download = glob.glob(r'C:\Users\Desktop\*.txt')
#get time and check if 120 second is elapsed
done = time.time()
elapsed = done - start
# get new file list
list_of_files_after_download = glob.glob('C:/Users/Desktop/*.xlsx')
newfile = \
list(set(list_of_files_after_download).difference(list_of_files_before_download))
#if new file is created then break the loop
if len(newfile):
break
-
Newfile is a list contain the latest filePDHide– PDHide2020年06月29日 08:11:15 +00:00Commented Jun 29, 2020 at 8:11
-
Ya its working fine but cant we remove while elapsed < 120: and use some other logic as the time of download is varied and not known the exact time of file downloaded.?ADS KUL– ADS KUL2020年06月29日 09:15:58 +00:00Commented Jun 29, 2020 at 9:15
-
@Amaze_Rock 120 is kept as to ensure it won't create infinite loop . You can keep it to any time as you want . If you keep 300 the loop runs for 5 mins . This ensures that if file download fails the code won't get stuckPDHide– PDHide2020年06月29日 09:19:41 +00:00Commented Jun 29, 2020 at 9:19
-
The code won't run for 2 mins it will run maximum to mins and if the list is not empty before that then the loop exit before that also .PDHide– PDHide2020年06月29日 09:21:05 +00:00Commented Jun 29, 2020 at 9:21
-
1Ya I got it...Thanks! but one more query. but when I remove the break: and just put driver.quit() then it is working fine and not with the breakADS KUL– ADS KUL2020年06月29日 09:21:31 +00:00Commented Jun 29, 2020 at 9:21
Explore related questions
See similar questions with these tags.