I have two files which look identical to me (including trailing whitespaces and newlines) but diff still says they differ. Even when I do a diff -y
side by side comparison the lines look exactly the same. The output from diff is the whole 2 files.
Any idea what's causing it?
7 Answers 7
Try:
diff file1 file2 | cat -t
The -t
option will cause cat
to show any special characters clearly - eg. ^M
for CR, ^I
for tab.
From the man page (OS X):
-t Display non-printing characters (see the -v option), and display tab characters as `^I'. -v Display non-printing characters so they are visible. Control characters print as `^X' for control-X; the delete character (octal 0177) prints as `^?'. Non-ASCII characters (with the high bit set) are printed as `M-' (for meta) followed by the character for the low 7 bits.
Odd .. can you try cmp
? You may want to use the '-b
' option too.
cmp man page - Compare two files byte by byte.
This is one of the nice things about Unix/Linux .. so many tools :)
-
2Thanks for that! I got: byte 19, line 1 is 15 ^M 12 ^J what does it mean?InsightClip– InsightClip2012年08月17日 13:22:15 +00:00Commented Aug 17, 2012 at 13:22
-
4looks like carriage return and linefeed according to this tableLevon– Levon2012年08月17日 13:25:44 +00:00Commented Aug 17, 2012 at 13:25
-
3tried -b with the diff and it seems to be working for me. man page says
-b
is forignore changes in the amount of white space
.rahul.deshmukhpatil– rahul.deshmukhpatil2016年08月26日 06:22:26 +00:00Commented Aug 26, 2016 at 6:22 -
cmp
is not always very efficient : in my case, it only provides a useless/tmp/w /tmp/z differ: byte 48, line 1 is 12 ^J 12 ^J
, whilecat -t
clearly states that the difference is that lines of the first file end with^M
.Skippy le Grand Gourou– Skippy le Grand Gourou2023年04月07日 08:43:19 +00:00Commented Apr 7, 2023 at 8:43
Might the differences be caused by DOS vs. UNIX line endings, or something similar?
What if you hexdump
them? This might show differences more obviously, eg:
hexdump -C file1 > file1.hex
hexdump -C file2 > file2.hex
diff file1.hex file2.hex
-
Well, the two hexes are different. every time there's a 0d 0a in a file the other one just has 0aInsightClip– InsightClip2012年08月17日 13:29:38 +00:00Commented Aug 17, 2012 at 13:29
-
5In one, you have DOS line endings (CRLF) and in the other, UNIX line endings (LF). That's why they look different to diff but not when you look at them visually. Look at en.wikipedia.org/wiki/Newline#Conversion_utilitiesmrb– mrb2012年08月17日 13:32:37 +00:00Commented Aug 17, 2012 at 13:32
-
1Got it! Thanks a lot. Levon's suggestion of using cmp shows the difference more clearly though :)InsightClip– InsightClip2012年08月17日 13:39:05 +00:00Commented Aug 17, 2012 at 13:39
My first guess, which turns out to be confirmed, is that the files use different line endings. It could be some other difference in whitespace, such as the presence of trailing whitespace (but you typically wouldn't get that on many lines) or different indentation (tabs vs spaces). Use a command that prints out whitespace and control characters in a visible form, such as
diff <(cat -A file1) <(cat -A file2)
diff <(sed -n l file1) <(sed -n l file2)
You can confirm that the differences only have to do with line endings by normalizing them first. You may have a dos2unix
utility; if not, remove the extra CR (^M, \r, 015円) character explicitly:
diff <(tr -d '\r' <file1) <(tr -d '\r' <file2)
or, if file1
is the one with DOS endings
tr -d '\r' <file1 | diff - file2
Other answers are complete enough, but in providing ways of showing somehow invisible differences explicitly. However, there's another option: ignoring these differences, which are somehow unimportant. In some cases, it's not useful to be informed about these differences.
diff
command has some useful options regarding this:
--strip-trailing-cr
strip trailing carriage return on input
-B, --ignore-blank-lines
ignore changes where lines are all blank
-Z, --ignore-trailing-space
ignore white space at line end
Personally, I found --strip-trailing-cr
useful, especially when using -r
(i.e. --recursive
) option on large projects, or when Git's core.autocrlf
is not false
(i.e. is either true
or input
).
For more information on these options and more, see its man page (or via man diff
).
Note: Using these options affects the performance of getting results, especially in the case of huge files/directories. In one of my own cases, it increased the manipulation time from 0.321s
to 0.422s
.
For anyone on Windows you can do this with fc. It can use binary compare.
fc /B file1 file2
-
Hi Johan, and welcome to the UNIX & Linux Stack Exchange! Our target systems here are UNIX/Linux systems; you might find Windows-centric answers more on-topic at SuperUser or Server Fault. Thank you!2020年11月18日 17:01:11 +00:00Commented Nov 18, 2020 at 17:01
In side-by-side view add --suppress-common-lines
to the options.
All other answers and comments here, are good to know, but not sufficient at all. The original question is explicitly about side-by-side comparison. Even files produced with cp
will be completely listed in side-by-side mode - all problems with line feed, spaces or special characters aside. You will always need --suppress-common-lines
to get the desired result.
This may be not obvious for non english natives, as common may be interpreted as 'normal' and not 'mutual'. Perhaps it would be easier if it was saying 'suppress-equal-lines' or similar. And I was really surprised that there was no short, one letter option for such a 'common' :) task.
sed -n l filename
. If it won't help, add a data example anddiff
output here.file
command will hint you about file content, including things likeASCII text, with CRLF line terminators
vsASCII text
.ls -l
(orstat
) on both files and compare the sizes (and include that information in any question). That’s a minimal, obvious first step toward diagnosing the situation.