diff reports two files differ, although they are the same!

Question 1

I have two files which look identical to me (including trailing whitespaces and newlines) but diff still says they differ. Even when I do a diff -y side by side comparison the lines look exactly the same. The output from diff is the whole 2 files.

Any idea what's causing it?

Question 2

Try to compare unprintable characters. The simplest way to watch them is sed -n l filename. If it won't help, add a data example and diff output here.

Question 3

Ahh yes thank you, the lines in a file are ending with $ and in the other one ending with \r$

Question 4

A quick fix is to use dos2unix on both the files (or the one you suspect to be from a Windows machine).

Question 5

As a complement to existing answers: the file command will hint you about file content, including things like ASCII text, with CRLF line terminators vs ASCII text.

Question 6

I know that I’m late to the party, and this specific question has been answered (i.e., MinaHany’s problem has been solved), but — anybody who has a problem like this should do an ls -l (or stat) on both files and compare the sizes (and include that information in any question). That’s a minimal, obvious first step toward diagnosing the situation.

Question 7

Try:

diff file1 file2 | cat -t

The -t option will cause cat to show any special characters clearly - eg. ^M for CR, ^I for tab.

From the man page (OS X):

 -t Display non-printing characters (see the -v option), and display tab characters as `^I'.
 -v Display non-printing characters so they are visible.
 Control characters print as `^X' for control-X; the delete character
 (octal 0177) prints as `^?'. Non-ASCII characters
 (with the high bit set) are printed as `M-' (for meta) followed by the
 character for the low 7 bits.

Question 8

Odd .. can you try cmp? You may want to use the '-b' option too.

cmp man page - Compare two files byte by byte.

This is one of the nice things about Unix/Linux .. so many tools :)

Question 9

Thanks for that! I got: byte 19, line 1 is 15 ^M 12 ^J what does it mean?

Question 10

looks like carriage return and linefeed according to this table

Question 11

tried -b with the diff and it seems to be working for me. man page says -b is for ignore changes in the amount of white space.

Question 12

cmp is not always very efficient : in my case, it only provides a useless /tmp/w /tmp/z differ: byte 48, line 1 is 12 ^J 12 ^J, while cat -t clearly states that the difference is that lines of the first file end with ^M.

Question 13

Might the differences be caused by DOS vs. UNIX line endings, or something similar?

What if you hexdump them? This might show differences more obviously, eg:

hexdump -C file1 > file1.hex
hexdump -C file2 > file2.hex
diff file1.hex file2.hex

Question 14

Well, the two hexes are different. every time there's a 0d 0a in a file the other one just has 0a

Question 15

In one, you have DOS line endings (CRLF) and in the other, UNIX line endings (LF). That's why they look different to diff but not when you look at them visually. Look at en.wikipedia.org/wiki/Newline#Conversion_utilities

Question 16

Got it! Thanks a lot. Levon's suggestion of using cmp shows the difference more clearly though :)

Question 17

My first guess, which turns out to be confirmed, is that the files use different line endings. It could be some other difference in whitespace, such as the presence of trailing whitespace (but you typically wouldn't get that on many lines) or different indentation (tabs vs spaces). Use a command that prints out whitespace and control characters in a visible form, such as

diff <(cat -A file1) <(cat -A file2)
diff <(sed -n l file1) <(sed -n l file2)

You can confirm that the differences only have to do with line endings by normalizing them first. You may have a dos2unix utility; if not, remove the extra CR (^M, \r, 015円) character explicitly:

diff <(tr -d '\r' <file1) <(tr -d '\r' <file2)

or, if file1 is the one with DOS endings

 tr -d '\r' <file1 | diff - file2

Question 18

Other answers are complete enough, but in providing ways of showing somehow invisible differences explicitly. However, there's another option: ignoring these differences, which are somehow unimportant. In some cases, it's not useful to be informed about these differences.

diff command has some useful options regarding this:

--strip-trailing-cr
 strip trailing carriage return on input
-B, --ignore-blank-lines
 ignore changes where lines are all blank
-Z, --ignore-trailing-space
 ignore white space at line end

Personally, I found --strip-trailing-cr useful, especially when using -r (i.e. --recursive) option on large projects, or when Git's core.autocrlf is not false (i.e. is either true or input).

For more information on these options and more, see its man page (or via man diff).

Note: Using these options affects the performance of getting results, especially in the case of huge files/directories. In one of my own cases, it increased the manipulation time from 0.321s to 0.422s.

Question 19

For anyone on Windows you can do this with fc. It can use binary compare.

fc /B file1 file2

Question 20

Hi Johan, and welcome to the UNIX & Linux Stack Exchange! Our target systems here are UNIX/Linux systems; you might find Windows-centric answers more on-topic at SuperUser or Server Fault. Thank you!

Question 21

In side-by-side view add --suppress-common-lines to the options.

All other answers and comments here, are good to know, but not sufficient at all. The original question is explicitly about side-by-side comparison. Even files produced with cp will be completely listed in side-by-side mode - all problems with line feed, spaces or special characters aside. You will always need --suppress-common-lines to get the desired result.

This may be not obvious for non english natives, as common may be interpreted as 'normal' and not 'mutual'. Perhaps it would be easier if it was saying 'suppress-equal-lines' or similar. And I was really surprised that there was no short, one letter option for such a 'common' :) task.

JosephH JosephH 2,7492 gold badges17 silver badges9 bronze badges · Accepted Answer · 2013-12-04 17:12:49Z

Try:

diff file1 file2 | cat -t

The -t option will cause cat to show any special characters clearly - eg. ^M for CR, ^I for tab.

From the man page (OS X):

 -t Display non-printing characters (see the -v option), and display tab characters as `^I'.
 -v Display non-printing characters so they are visible.
 Control characters print as `^X' for control-X; the delete character
 (octal 0177) prints as `^?'. Non-ASCII characters
 (with the high bit set) are printed as `M-' (for meta) followed by the
 character for the low 7 bits.

Stack Exchange Network

diff reports two files differ, although they are the same!

7 Answers 7

You must log in to answer this question.

Linked

Hot Network Questions

diff reports two files differ, although they are the same!

7 Answers 7

You must log in to answer this question.

Linked

Related

Hot Network Questions