Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 14215e5

Browse files
Merge pull request avinashkranjan#422 from vybhav72954/iss_413
Added Duplicate File Finder
2 parents 91fc7bd + 0d40566 commit 14215e5

File tree

3 files changed

+201
-0
lines changed

3 files changed

+201
-0
lines changed

‎Duplicate-File-Finder/README.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Duplicate File Finder
2+
3+
[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
4+
5+
Many a time, we find duplicate files residing in our Directories, especially documents and downloads, there are various
6+
reasons:
7+
8+
- downloading the same file from various sources.
9+
- auto backup on the cloud,
10+
- it slipped out of our mind that we downloaded it already in the first place, etc.
11+
12+
Manually selecting them is actually a hassle, but why do such a boring task when
13+
automation can do the trick. This sweet and simple script helps you to compare various files in a
14+
directory, find the duplicate, list them out, and then even allows you to delete them.
15+
16+
**Sweet!!!**
17+
18+
## Setup
19+
20+
- Setup a `python 3.x` virtual environment.
21+
- `Activate` the environment
22+
- Install the dependencies using ```pip3 install -r requiremnts.txt```
23+
- You are all set and the [script](file_finder.py) is Ready to run.
24+
- Clearly Follow the Instructions provided in the comments.
25+
26+
### Usage
27+
28+
In Command Line Interface, Run the script using -
29+
30+
`python image_finder.py <path of folder1, path of folder2, .....>`
31+
32+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1. folder1 - *Parent Folder*
33+
34+
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2. folder2, folder3 .... - *Subsequent Folders*
35+
36+
>- This acts as a reference for duplicate files, i.e. this contains the original copy, hence no file is deleted from this folder.
37+
>- Comparisons are done with in the folder, and from Parent to Subsequent Folders.
38+
39+
## Dependencies
40+
41+
1. python3
42+
2. keyboard
43+
44+
## Detailed explanation
45+
46+
The Script works on a simple fundamental. Two files with same [`md5checksum`](https://en.wikipedia.org/wiki/MD5) will
47+
have similar contents. So in the script all we aim to do is determine the checksum, compare and find the duplicates.
48+
49+
## Output
50+
51+
- Running Script on a single folder `Stand_Alone`. In this example I pressed [n] in order to not delete anything.
52+
53+
![Pasting the Magnet Link](https://i.imgur.com/pcABYx4.png)
54+
55+
- Stand_Alone folder Before Deleting the files.
56+
57+
![Pasting the Magnet Link](https://i.imgur.com/PqwxrPQ.png)
58+
59+
- After Deleting the Files, i.e. Pressing [y] at the prompt.
60+
61+
![Pasting the Magnet Link](https://i.imgur.com/34fR6w3.png)
62+
63+
- `Parent`, `Duplicate`, `Duplicate_1` folder before running the script.
64+
65+
![Pasting the Magnet Link](https://i.imgur.com/XcJ0we3.png)
66+
67+
- Running the scripts on the Folder and deleting the duplicate files.
68+
69+
![Pasting the Magnet Link](https://i.imgur.com/ZaEcroF.png)
70+
71+
![Pasting the Magnet Link](https://i.imgur.com/tfo2day.png)
72+
73+
- Final Result, Notice that all the files in `Parent` Folder remain as it is.
74+
75+
>Also notice that similar files but wth different extensions are not deleted, cause technically they aren't same.
76+
77+
![Pasting the Magnet Link](https://i.imgur.com/d8VXy5m.png)
78+
79+
## Author(s)
80+
81+
Made by [Vybhav Chaturvedi](https://www.linkedin.com/in/vybhav-chaturvedi-0ba82614a/)

‎Duplicate-File-Finder/file_finder.py

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# Imports
2+
import hashlib
3+
import os
4+
import sys
5+
import keyboard
6+
7+
8+
def image_finder(parent_folder):
9+
# A dictionary to store Hash of Images corresponding to names
10+
"""
11+
Sample -
12+
{hash:[names]}
13+
"""
14+
duplicate_img = {}
15+
for dirName, subdirs, fileList in os.walk(parent_folder):
16+
# Iterating over various Sub-Folders
17+
print('Scanning %s...' % dirName)
18+
for filename in fileList:
19+
# Get the path to the file
20+
path = os.path.join(dirName, filename)
21+
# Calculate hash
22+
file_hash = hash_file(path)
23+
# Add or append the file path in the dictionary
24+
if file_hash in duplicate_img:
25+
duplicate_img[file_hash].append(path)
26+
else:
27+
duplicate_img[file_hash] = [path]
28+
return duplicate_img
29+
30+
31+
def delete_duplicate(duplicate_img):
32+
# Deleting those values whose keys are not unique
33+
for key in duplicate_img:
34+
file_list = duplicate_img[key]
35+
while len(file_list) > 1:
36+
item = file_list.pop()
37+
os.remove(item)
38+
39+
40+
# Joins two dictionaries
41+
def join_dicts(dict1, dict2):
42+
for key in dict2.keys():
43+
if key in dict1:
44+
dict1[key] = dict1[key] + dict2[key]
45+
else:
46+
dict1[key] = dict2[key]
47+
48+
49+
# For finding Hash of various Files
50+
# If 2 files have the same md5checksum,they most likely have the same content
51+
def hash_file(path, blocksize=65536):
52+
img_file = open(path, 'rb')
53+
hasher = hashlib.md5()
54+
buf = img_file.read(blocksize)
55+
while len(buf) > 0:
56+
hasher.update(buf)
57+
buf = img_file.read(blocksize)
58+
img_file.close()
59+
# Return Hex MD5
60+
return hasher.hexdigest()
61+
62+
63+
def print_results(dict1):
64+
results = list(filter(lambda x: len(x) > 1, dict1.values()))
65+
if len(results) > 0:
66+
print('Found Duplicated Images - ')
67+
print('Details -')
68+
print('<--------------------->')
69+
for result in results:
70+
# Print Path of Files
71+
for subresult in result:
72+
print('\t%s' % subresult)
73+
print('<--------------------->')
74+
75+
else:
76+
print('Unable to identify Similar Images')
77+
78+
79+
if __name__ == '__main__':
80+
if len(sys.argv) > 1:
81+
duplicate = {}
82+
folders = sys.argv[1:]
83+
for i in folders:
84+
# Iterate the folders given
85+
if os.path.exists(i):
86+
# Find the duplicated files and append them to the dictionary
87+
join_dicts(duplicate, image_finder(i))
88+
else:
89+
print('%s is not a valid path, please verify' % i)
90+
sys.exit()
91+
print_results(duplicate)
92+
# Delete Duplicate Images
93+
# Comment if not required
94+
print("Do you want to delete the Duplicate Images (If Any)? Press [y] for Yes.")
95+
while True:
96+
if keyboard.read_key() == "y":
97+
print("Deleting Duplicate Files\n")
98+
delete_duplicate(duplicate)
99+
print("Thank You\n")
100+
break
101+
else:
102+
print("Nothing Deleted!!! Thank You\n")
103+
break
104+
else:
105+
print("Use Command Line Interface")
106+
print("Hint: python file_finder.py <path of folders>")
107+
print("Please Read comments for greater detailing")
108+
'''
109+
Suggestions :------
110+
Usage - python file_finder.py <path of folder1, path of folder2, .....>
111+
folder1 - Parent Folder
112+
folder2, folder3 .... - Subsequent Folders
113+
Comparisons are done with in the folder, and from Parent to Subsequent Folders.
114+
115+
No Files are deleted form Parent Folder but the files which are Duplicate to the files in Subsequent Folders are
116+
deleted. Make sure that the paths are correct
117+
118+
Be careful during Keyboard Input.
119+
'''

‎Duplicate-File-Finder/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
keyboard==0.13.5

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /