I've gone through all the similar questions + installed few command line utilities like duff but with no sucess.
In my Images directory, I have images with same name but with different extensions (jpg & png). I just want to print the filenames of duplicate filename with different extensions. (e.g. foo.jpg & foo.png) Or at least one file name (foo)
So far I tried these methods
find . -exec bash -c 'basename "0ドル" ".${0##*.}"' {} \; | sort | uniq
find . -type f \( -name "*.jpg" -o -name "*.png" \)
Most of these commands returns me the either Nothing OR All the files OR unique filenames but not the Duplicate ones.
3 Answers 3
I would suggest a modification of your second command:
find . -type f -name "*.jpg" | \
while read -r f; do [ -e "${f%.jpg}.png" ] && echo "${f%.jpg}"; done
This finds all .jpg files and checks if the corresponding .png file exists, and displays the full path without the extensions.
Note that if there are a lot fewer .png files it will be more efficient to search for these and check for the corresponding .jpg files.
[Tested with bash on Ubuntu 18.04.1.]
-
find: illegal option -- tNaveed Abbas– Naveed Abbas2018年10月11日 12:11:11 +00:00Commented Oct 11, 2018 at 12:11
-
@ToughGuy - I didn't use any
-toption, but on Linuxfind .is implicit if no directory is given, and maybe it isn't on OSX. I have added the.and quoted the file name in theechoin case you have some some odd file names which could appear as options.AFH– AFH2018年10月11日 12:21:39 +00:00Commented Oct 11, 2018 at 12:21 -
-
Yes,
findon Mac requires.or any path, POSIX requires it. Here on Super User I advise never omit.for the sake of portability.Kamil Maciorowski– Kamil Maciorowski2018年10月11日 16:24:17 +00:00Commented Oct 11, 2018 at 16:24 -
@KamilMaciorowski - Thanks: there are things I have just got used to doing. Without a Mac, I can't check every option of every command in case it's not compatible.AFH– AFH2018年10月11日 16:30:30 +00:00Commented Oct 11, 2018 at 16:30
I accept and appreciate the answer. Meanwhile I got this python script that worked somewhat near to what I was looking for. I tried to find the source but couldn't find it in 100's of tabs I searched.
#!/usr/bin/env python
# Syntax: duplicates.py DIRECTORY
import os, sys
top = sys.argv[1]
d = {}
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
fn = os.path.join(root, name)
basename, extension = os.path.splitext(name)
basename = basename.lower() # ignore case
if basename in d:
print(d[basename])
print(fn)
else:
d[basename] = fn
Save this file as duplicates.py and give it rights and then execute it on the folder.
./duplicates.py Images
-
Matching on name and alternate suffixes is useful but it doesn't guarantee the files are actual duplicates. You would need to include comparison of file sizes and if they match actually compare the file contents.Hogstrom– Hogstrom2018年10月12日 13:00:37 +00:00Commented Oct 12, 2018 at 13:00
-
@Hogstorm Good idea but my requirement was different. There are plenty of utilities apps for duplicate finder but none was promising as I needed the output in a text file.Naveed Abbas– Naveed Abbas2019年01月23日 11:02:25 +00:00Commented Jan 23, 2019 at 11:02
You have to use uniq -c to get the counts, then reverse-sort at the end to list the duplicates first. At the end awk is used to filter for lines that start with 2 or higher.
$ find . -type f -exec sh -c 'basename ${0%.*}' {} \; | sort | uniq -c | sort -r | awk 'int(1ドル)>=2'
2 foo
Here, %.* strips the extension, so foo.x.y becomes foo.x.
Instead of a simple find . -type f, which would find all files, you could also filter for *.jpg or *.png files like in your second command.
-
Certainly it gave me a long list of all the files with the count of duplicates. Can I get only the duplicates? (e.g. foo which is used twice or thrice)Naveed Abbas– Naveed Abbas2018年10月11日 12:00:00 +00:00Commented Oct 11, 2018 at 12:00
You must log in to answer this question.
Explore related questions
See similar questions with these tags.