I've two configuration files, the original from the package manager and a customized one modified by myself. I've added some comments to describe behavior.
How can I run diff
on the configuration files, skipping the comments? A commented line is defined by:
- optional leading whitespace (tabs and spaces)
- hash sign (
#
) - anything other character
The (simplest) regular expression skipping the first requirement would be #.*
. I tried the --ignore-matching-lines=RE
(-I RE
) option of GNU diff 3.0, but I couldn't get it working with that RE. I also tried .*#.*
and .*\#.*
without luck. Literally putting the line (Port 631
) as RE
does not match anything, neither does it help to put the RE between slashes.
As suggested in "diff" tool's flavor of regex seems lacking?, I tried grep -G
:
grep -G '#.*' file
This seems to match the comments, but it does not work for diff -I '#.*' file1 file2
.
So, how should this option be used? How can I make diff
skip certain lines (in my case, comments)? Please do not suggest grep
ing the file and comparing the temporary files.
6 Answers 6
According to Gilles, the -I
option only ignores a line if nothing else inside that set matches except for the match of -I
. I didn't fully get it until I tested it.
The Test
Three files are involved in my test:
File test1
:
text
File test2
:
text
#comment
File test3
:
changed text
#comment
The commands:
$ # comparing files with comment-only changes
$ diff -u -I '#.*' test{1,2}
$ # comparing files with both comment and regular changes
$ diff -u -I '#.*' test{2,3}
--- test2 2011年07月20日 16:38:59.717701430 +0200
+++ test3 2011年07月20日 16:39:10.187701435 +0200
@@ -1,2 +1,2 @@
-text
+changed text
#comment
The alternative way
Since there is no answer so far explaining how to use the -I
option correctly, I'll provide an alternative which works in bash shells:
diff -u -B <(grep -vE '^\s*(#|$)' test1) <(grep -vE '^\s*(#|$)' test2)
diff -u
- unified diff-B
- ignore blank lines
<(command)
- a bash feature called process substitution which opens a file descriptor for the command, this removes the need for a temporary filegrep
- command for printing lines (not) matching a pattern-v
- show non-matching linesE
- use extended regular expressions'^\s*(#|$)'
- a regular expression matching comments and empty lines^
- match the beginning of a line\s*
- match whitespace (tabs and spaces) if any(#|$)
match a hash mark, or alternatively, the end of a line
-
2
Try:
diff -b -I '^#' -I '^ #' file1 file2
Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.
Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).
We can read in diffutils
manual:
However,
-I
only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.In other words, for each non-ignorable change,
diff
prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one-I
option.diff
tries to match each line against each regular expression, starting with the last one given.
This behavior is also well explained by armel here.
Related: How can I perform a diff that ignores all comments?
After searching around the web, I found a method similar to Lekensteyn's.
But I want use the diff
output as input to patch
, and grep -v
changes the formatting, so I can't.
Here's an improvement, maybe :
diff -u -B <(sed 's/^[[:blank:]]*#.*$/ /' file1) <(sed 's/^[[:blank:]]*#.*$/ /' file2)
It's not perfect, but line numbers are kept in the patch file.
However, if a new line is added instead of comment line, then the comment will cause the hunk to fail while patching:
File test1:
text
#comment
other text
File test2:
text
new line here
#comment changed
other text changed
Testing that data with our command:
$ echo -e "#!/usr/bin/sed -f\ns/^[[:blank:]]*#.*$/ /" > outcom.sed
$ echo "diff -u -B <(./outcom.sed \1ドル) <(./outcom.sed \2ドル)" > mydiff.sh
$ chmod +x mydiff.sh outcom.sed
$ ./mydiff.sh file1 file2 > file.dif
$ cat file.dif
--- /dev/fd/63 2014年08月23日 10:05:08.000000000 +0200
+++ /dev/fd/62 2014年08月23日 10:05:08.000000000 +0200
@@ -1,2 +1,3 @@
text
+new line
-other text
+other text changed
/dev/fd/62 & /dev/fd/63 are file produced by process substitution. The line between "+new line" and "-other text" is the default space character, which we defined in our sed expression to replace comments.
Applying that patch gives us the error :
$ patch -p0 file1 < file.dif
patching file file1
Hunk #1 FAILED at 1.
1 out of 1 hunk FAILED -- saving rejects to file file1.rej
A solution is to not use the unified diff format; so, without -u
:
$ echo "diff -B <(./outcom.sed \1ドル) <(./outcom.sed \2ドル)" > mydiff.sh
$ ./mydiff.sh file1 file2 > file.dif
$ cat file.dif
1a2
> new line
3c4
< other text
---
> other text changed
$ patch -p0 file1 < file.dif
patching file file1
$ cat file1
text
new line
#comment
other text changed
Now the patch file works (no guarantees for anything more complex, though).
-
1Your unified diff fails to apply due to the context differences. You can use
diff -U0 one two
to disable the context. For patching, there are a bunch of tools that may be better suited such as kdiff3.Lekensteyn– Lekensteyn2014年08月23日 11:22:33 +00:00Commented Aug 23, 2014 at 11:22 -
Thank you for
-U0
option to disable context. Note : kdiff3 is a graphical tool. I need automatic tool for manage git merge attributes.syjust– syjust2014年08月29日 12:49:10 +00:00Commented Aug 29, 2014 at 12:49 -
vimdiff
supports three-way merges, might be worth looking at.Lekensteyn– Lekensteyn2014年08月29日 14:21:08 +00:00Commented Aug 29, 2014 at 14:21 -
to be more precise, I need a script tool for automating the git merge process with exludes in an sql script. kdiff3 and vimdiff are interactives tools, not usable in my case.syjust– syjust2014年08月29日 14:37:09 +00:00Commented Aug 29, 2014 at 14:37
I usually ignore this clutter by either:
- Generating non-commented versions using
grep -v "^#" | cat -s
and diffing those or... - Using
vim -d
to look at the files. The syntax highlighting takes care of making comment vs. non-comment differences quite obvious. The diff highlighting of in-line difference so you can see what values or parts of values have been changed at a glance makes this my favorite.
Here is what I use to remove all commented lines -even those starting with a tab or space- and the blank ones:
egrep -v "^$|^[[:space:]]*#" /path/to/file
or you can do
sed -e '/^#.*/d' -e 's/#.*//g' | cat -s
My open-source Linux tool 'dif' compares files while ignoring various differences such as comments.
It has many other options for ignoring whitespace or timestamps, sorting the input files, doing search/replace, ignoring certain lines, etc.
After preprocessing the input files, it runs the Linux tools meld, gvimdiff, tkdiff, or kompare on these intermediate files.
Installation is not required, just download and run the 'dif' executable from https://github.com/koknat/dif
For your use case, try the "comments" option:
dif file1 file2 -comments
If standard "diff" stdout should be used instead of a gui, add the "diff" option
dif file1 file2 -comments -diff
-I
option causes a block to be ignored only if all its lines match the regexp. So you can ignore a comment-only change that way, but not the comment changes that are near a non-comment change.diff -I
does not behave as I expected. I updated my answer with an example which clarified this behavior for me.