50

I have two files which look identical to me (including trailing whitespaces and newlines) but diff still says they differ. Even when I do a diff -y side by side comparison the lines look exactly the same. The output from diff is the whole 2 files.

Any idea what's causing it?

Raphael Ahrens
9,8715 gold badges39 silver badges53 bronze badges
asked Aug 17, 2012 at 13:16
5
  • 5
    Try to compare unprintable characters. The simplest way to watch them is sed -n l filename. If it won't help, add a data example and diff output here. Commented Aug 17, 2012 at 13:18
  • 1
    Ahh yes thank you, the lines in a file are ending with $ and in the other one ending with \r$ Commented Aug 17, 2012 at 13:33
  • 1
    A quick fix is to use dos2unix on both the files (or the one you suspect to be from a Windows machine). Commented Jun 2, 2015 at 18:43
  • As a complement to existing answers: the file command will hint you about file content, including things like ASCII text, with CRLF line terminators vs ASCII text. Commented Dec 15, 2015 at 16:44
  • I know that I’m late to the party, and this specific question has been answered (i.e., MinaHany’s problem has been solved), but — anybody who has a problem like this should do an ls -l (or stat) on both files and compare the sizes (and include that information in any question). That’s a minimal, obvious first step toward diagnosing the situation. Commented Mar 3, 2020 at 7:58

7 Answers 7

37

Try:

diff file1 file2 | cat -t

The -t option will cause cat to show any special characters clearly - eg. ^M for CR, ^I for tab.

From the man page (OS X):

 -t Display non-printing characters (see the -v option), and display tab characters as `^I'.
 -v Display non-printing characters so they are visible.
 Control characters print as `^X' for control-X; the delete character
 (octal 0177) prints as `^?'. Non-ASCII characters
 (with the high bit set) are printed as `M-' (for meta) followed by the
 character for the low 7 bits.
Raphael Ahrens
9,8715 gold badges39 silver badges53 bronze badges
answered Dec 4, 2013 at 17:12
35

Odd .. can you try cmp? You may want to use the '-b' option too.

cmp man page - Compare two files byte by byte.

This is one of the nice things about Unix/Linux .. so many tools :)

answered Aug 17, 2012 at 13:20
4
  • 2
    Thanks for that! I got: byte 19, line 1 is 15 ^M 12 ^J what does it mean? Commented Aug 17, 2012 at 13:22
  • 4
    looks like carriage return and linefeed according to this table Commented Aug 17, 2012 at 13:25
  • 3
    tried -b with the diff and it seems to be working for me. man page says -b is for ignore changes in the amount of white space. Commented Aug 26, 2016 at 6:22
  • cmp is not always very efficient : in my case, it only provides a useless /tmp/w /tmp/z differ: byte 48, line 1 is 12 ^J 12 ^J, while cat -t clearly states that the difference is that lines of the first file end with ^M. Commented Apr 7, 2023 at 8:43
19

Might the differences be caused by DOS vs. UNIX line endings, or something similar?

What if you hexdump them? This might show differences more obviously, eg:

hexdump -C file1 > file1.hex
hexdump -C file2 > file2.hex
diff file1.hex file2.hex
answered Aug 17, 2012 at 13:22
3
  • Well, the two hexes are different. every time there's a 0d 0a in a file the other one just has 0a Commented Aug 17, 2012 at 13:29
  • 5
    In one, you have DOS line endings (CRLF) and in the other, UNIX line endings (LF). That's why they look different to diff but not when you look at them visually. Look at en.wikipedia.org/wiki/Newline#Conversion_utilities Commented Aug 17, 2012 at 13:32
  • 1
    Got it! Thanks a lot. Levon's suggestion of using cmp shows the difference more clearly though :) Commented Aug 17, 2012 at 13:39
6

My first guess, which turns out to be confirmed, is that the files use different line endings. It could be some other difference in whitespace, such as the presence of trailing whitespace (but you typically wouldn't get that on many lines) or different indentation (tabs vs spaces). Use a command that prints out whitespace and control characters in a visible form, such as

diff <(cat -A file1) <(cat -A file2)
diff <(sed -n l file1) <(sed -n l file2)

You can confirm that the differences only have to do with line endings by normalizing them first. You may have a dos2unix utility; if not, remove the extra CR (^M, \r, 015円) character explicitly:

diff <(tr -d '\r' <file1) <(tr -d '\r' <file2)

or, if file1 is the one with DOS endings

 tr -d '\r' <file1 | diff - file2
answered Aug 18, 2012 at 0:00
4

Other answers are complete enough, but in providing ways of showing somehow invisible differences explicitly. However, there's another option: ignoring these differences, which are somehow unimportant. In some cases, it's not useful to be informed about these differences.

diff command has some useful options regarding this:

--strip-trailing-cr
 strip trailing carriage return on input
-B, --ignore-blank-lines
 ignore changes where lines are all blank
-Z, --ignore-trailing-space
 ignore white space at line end

Personally, I found --strip-trailing-cr useful, especially when using -r (i.e. --recursive) option on large projects, or when Git's core.autocrlf is not false (i.e. is either true or input).

For more information on these options and more, see its man page (or via man diff).

Note: Using these options affects the performance of getting results, especially in the case of huge files/directories. In one of my own cases, it increased the manipulation time from 0.321s to 0.422s.

answered Apr 25, 2020 at 19:06
1

For anyone on Windows you can do this with fc. It can use binary compare.

fc /B file1 file2
answered Nov 18, 2020 at 15:38
1
  • Hi Johan, and welcome to the UNIX & Linux Stack Exchange! Our target systems here are UNIX/Linux systems; you might find Windows-centric answers more on-topic at SuperUser or Server Fault. Thank you! Commented Nov 18, 2020 at 17:01
0

In side-by-side view add --suppress-common-lines to the options.

All other answers and comments here, are good to know, but not sufficient at all. The original question is explicitly about side-by-side comparison. Even files produced with cp will be completely listed in side-by-side mode - all problems with line feed, spaces or special characters aside. You will always need --suppress-common-lines to get the desired result.

This may be not obvious for non english natives, as common may be interpreted as 'normal' and not 'mutual'. Perhaps it would be easier if it was saying 'suppress-equal-lines' or similar. And I was really surprised that there was no short, one letter option for such a 'common' :) task.

answered Jan 3, 2021 at 13:02

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.