How to diff files ignoring comments (lines starting with #)?

Question 1

I've two configuration files, the original from the package manager and a customized one modified by myself. I've added some comments to describe behavior.

How can I run diff on the configuration files, skipping the comments? A commented line is defined by:

optional leading whitespace (tabs and spaces)
hash sign (#)
anything other character

The (simplest) regular expression skipping the first requirement would be #.*. I tried the --ignore-matching-lines=RE (-I RE) option of GNU diff 3.0, but I couldn't get it working with that RE. I also tried .*#.* and .*\#.* without luck. Literally putting the line (Port 631) as RE does not match anything, neither does it help to put the RE between slashes.

As suggested in "diff" tool's flavor of regex seems lacking?, I tried grep -G:

grep -G '#.*' file

This seems to match the comments, but it does not work for diff -I '#.*' file1 file2.

So, how should this option be used? How can I make diff skip certain lines (in my case, comments)? Please do not suggest greping the file and comparing the temporary files.

Question 2

The -I option causes a block to be ignored only if all its lines match the regexp. So you can ignore a comment-only change that way, but not the comment changes that are near a non-comment change.

Question 3

@Gilles: Thanks, now I get it why diff -I does not behave as I expected. I updated my answer with an example which clarified this behavior for me.

Question 4

According to Gilles, the -I option only ignores a line if nothing else inside that set matches except for the match of -I. I didn't fully get it until I tested it.

The Test

Three files are involved in my test:
File test1:

 text

File test2:

 text
 #comment

File test3:

 changed text
 #comment

The commands:

$ # comparing files with comment-only changes
$ diff -u -I '#.*' test{1,2}
$ # comparing files with both comment and regular changes
$ diff -u -I '#.*' test{2,3}
--- test2 2011年07月20日 16:38:59.717701430 +0200
+++ test3 2011年07月20日 16:39:10.187701435 +0200
@@ -1,2 +1,2 @@
-text
+changed text
 #comment

The alternative way

Since there is no answer so far explaining how to use the -I option correctly, I'll provide an alternative which works in bash shells:

diff -u -B <(grep -vE '^\s*(#|$)' test1) <(grep -vE '^\s*(#|$)' test2)

diff -u - unified diff
- -B - ignore blank lines
<(command) - a bash feature called process substitution which opens a file descriptor for the command, this removes the need for a temporary file
grep - command for printing lines (not) matching a pattern
- -v - show non-matching lines
- E - use extended regular expressions
- '^\s*(#|$)' - a regular expression matching comments and empty lines
  - ^ - match the beginning of a line
  - \s* - match whitespace (tabs and spaces) if any
  - (#|$) match a hash mark, or alternatively, the end of a line

Question 5

*CHEF'S KISS*

Question 6

Try:

diff -b -I '^#' -I '^ #' file1 file2

Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.

Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).

We can read in diffutils manual:

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.

In other words, for each non-ignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression, starting with the last one given.

This behavior is also well explained by armel here.

Related: How can I perform a diff that ignores all comments?

Question 7

After searching around the web, I found a method similar to Lekensteyn's.

But I want use the diff output as input to patch, and grep -v changes the formatting, so I can't.

Here's an improvement, maybe :

diff -u -B <(sed 's/^[[:blank:]]*#.*$/ /' file1) <(sed 's/^[[:blank:]]*#.*$/ /' file2)

It's not perfect, but line numbers are kept in the patch file.

However, if a new line is added instead of comment line, then the comment will cause the hunk to fail while patching:

File test1:
 text
 #comment
 other text
File test2:
 text
 new line here
 #comment changed
 other text changed

Testing that data with our command:

$ echo -e "#!/usr/bin/sed -f\ns/^[[:blank:]]*#.*$/ /" > outcom.sed
$ echo "diff -u -B <(./outcom.sed \1ドル) <(./outcom.sed \2ドル)" > mydiff.sh
$ chmod +x mydiff.sh outcom.sed
$ ./mydiff.sh file1 file2 > file.dif
$ cat file.dif
--- /dev/fd/63 2014年08月23日 10:05:08.000000000 +0200
+++ /dev/fd/62 2014年08月23日 10:05:08.000000000 +0200
@@ -1,2 +1,3 @@
 text
+new line
 
-other text
+other text changed

/dev/fd/62 & /dev/fd/63 are file produced by process substitution. The line between "+new line" and "-other text" is the default space character, which we defined in our sed expression to replace comments.

Applying that patch gives us the error :

$ patch -p0 file1 < file.dif 
patching file file1
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file file1.rej

A solution is to not use the unified diff format; so, without -u:

$ echo "diff -B <(./outcom.sed \1ドル) <(./outcom.sed \2ドル)" > mydiff.sh
$ ./mydiff.sh file1 file2 > file.dif
$ cat file.dif
1a2
> new line
3c4
< other text
---
> other text changed
$ patch -p0 file1 < file.dif 
patching file file1
$ cat file1
text
new line
#comment
other text changed

Now the patch file works (no guarantees for anything more complex, though).

Question 8

Your unified diff fails to apply due to the context differences. You can use diff -U0 one two to disable the context. For patching, there are a bunch of tools that may be better suited such as kdiff3.

Question 9

Thank you for -U0 option to disable context. Note : kdiff3 is a graphical tool. I need automatic tool for manage git merge attributes.

Question 10

vimdiff supports three-way merges, might be worth looking at.

Question 11

to be more precise, I need a script tool for automating the git merge process with exludes in an sql script. kdiff3 and vimdiff are interactives tools, not usable in my case.

Question 12

I usually ignore this clutter by either:

Generating non-commented versions using grep -v "^#" | cat -s and diffing those or...
Using vim -d to look at the files. The syntax highlighting takes care of making comment vs. non-comment differences quite obvious. The diff highlighting of in-line difference so you can see what values or parts of values have been changed at a glance makes this my favorite.

Question 13

Here is what I use to remove all commented lines -even those starting with a tab or space- and the blank ones:

egrep -v "^$|^[[:space:]]*#" /path/to/file

or you can do

sed -e '/^#.*/d' -e 's/#.*//g' | cat -s

Question 14

My open-source Linux tool 'dif' compares files while ignoring various differences such as comments.

It has many other options for ignoring whitespace or timestamps, sorting the input files, doing search/replace, ignoring certain lines, etc.

After preprocessing the input files, it runs the Linux tools meld, gvimdiff, tkdiff, or kompare on these intermediate files.

Installation is not required, just download and run the 'dif' executable from https://github.com/koknat/dif

For your use case, try the "comments" option:

dif file1 file2 -comments

If standard "diff" stdout should be used instead of a gui, add the "diff" option

dif file1 file2 -comments -diff

Lekensteyn Lekensteyn 21.7k18 gold badges80 silver badges112 bronze badges · Accepted Answer · 2011-07-20 13:49:42Z

According to Gilles, the -I option only ignores a line if nothing else inside that set matches except for the match of -I. I didn't fully get it until I tested it.

The Test

Three files are involved in my test:
File test1:

 text

File test2:

 text
 #comment

File test3:

 changed text
 #comment

The commands:

$ # comparing files with comment-only changes
$ diff -u -I '#.*' test{1,2}
$ # comparing files with both comment and regular changes
$ diff -u -I '#.*' test{2,3}
--- test2 2011年07月20日 16:38:59.717701430 +0200
+++ test3 2011年07月20日 16:39:10.187701435 +0200
@@ -1,2 +1,2 @@
-text
+changed text
 #comment

The alternative way

Since there is no answer so far explaining how to use the -I option correctly, I'll provide an alternative which works in bash shells:

diff -u -B <(grep -vE '^\s*(#|$)' test1) <(grep -vE '^\s*(#|$)' test2)

diff -u - unified diff
- -B - ignore blank lines
<(command) - a bash feature called process substitution which opens a file descriptor for the command, this removes the need for a temporary file
grep - command for printing lines (not) matching a pattern
- -v - show non-matching lines
- E - use extended regular expressions
- '^\s*(#|$)' - a regular expression matching comments and empty lines
  - ^ - match the beginning of a line
  - \s* - match whitespace (tabs and spaces) if any
  - (#|$) match a hash mark, or alternatively, the end of a line

2

*CHEF'S KISS*

CivFan
– CivFan

2020年05月07日 20:46:16 +00:00
Commented May 7, 2020 at 20:46

Stack Exchange Network

How to diff files ignoring comments (lines starting with #)?

6 Answers 6

The Test

The alternative way

You must log in to answer this question.

Hot Network Questions