I have read about diff and patch but I can't figure out how to apply what I need. I guess its pretty simple, so to show my problem take these two files:
a.xml
<resources>
<color name="same_in_b">#AAABBB</color>
<color name="not_in_b">#AAAAAA</color>
<color name="in_b_but_different_val">#AAAAAA</color>
<color name="not_in_b_too">#AAAAAA</color>
</resources>
b.xml
<resources>
<color name="same_in_b">#AAABBB</color>
<color name="in_b_but_different_val">#BBBBBB</color>
<color name="not_in_a">#AAAAAA</color>
</resources>
I want to have an output, which looks like this (order doesn't matter):
<resources>
<color name="same_in_b">#AAABBB</color>
<color name="not_in_b">#AAAAAA</color>
<color name="in_b_but_different_val">#BBBBBB</color>
<color name="not_in_b_too">#AAAAAA</color>
<color name="not_in_a">#AAAAAA</color>
</resources>
The merge should contain all lines along this simple rules:
- any line which is only in one of the files
- if a line has the same name tag but a different value, take the value from the second
I want to apply this task inside a bash script, so it must not nessesarily need to get done with diff and patch, if another programm is a better fit
8 Answers 8
You don't need patch
for this; it's for extracting changes and sending them on without the unchanged part of the file.
The tool for merging two versions of a file is merge
, but as @vonbrand
wrote, you need the "base" file from which your two versions diverged. To do a merge without it, use diff
like this:
diff -DVERSION1 file1.xml file2.xml > merged.xml
It will enclose each set of changes in C-style #ifdef
/#ifndef
"preprocessor" commands, like this:
#ifdef VERSION1
<stuff added to file1.xml>
#endif
...
#ifndef VERSION1
<stuff added to file2.xml>
#endif
If a line or region differs between the two files, you'll get a "conflict", which looks like this:
#ifndef VERSION1
<version 1>
#else /* VERSION1 */
<version 2>
#endif /* VERSION1 */
So save the output in a file, and open it in an editor. Search for any places where #else
comes up, and resolve them manually. Then save the file and run it through grep -v
to get rid of the remaining #if(n)def
and #endif
lines:
grep -v '^#if' merged.xml | grep -v '^#endif' > clean.xml
In the future, save the original version of the file. merge
can give you much better results with the help of the extra information. (But be careful: merge
edits one of the files in-place, unless you use -p
. Read the manual).
-
I added something for if I had a conflict
sed -e "s/^#else.*$/\/\/ conflict/g"
lockwobr– lockwobr2016年08月05日 04:31:31 +00:00Commented Aug 5, 2016 at 4:31 -
1I don't think that's a good idea. As I wrote in my answer, you should be removing the
#else
lines manually, in the editor during conflict resolution.alexis– alexis2017年05月19日 14:53:10 +00:00Commented May 19, 2017 at 14:53
sdiff
(1) - side-by-side merge of file differences
Use the --output
option, this will interactively merge any two files. You use simple commands to select a change or edit a change.
You should make sure that the EDITOR
environment variable is set. The default editor for commands like "eb" is usually ed
, a line editor.
EDITOR=nano sdiff -o merged.txt file1.txt file2.txt
-
1I find using
vim
as the EDITOR as better. But this is the best solution, it comes with thediff
command too!CMCDragonkai– CMCDragonkai2018年06月13日 01:16:14 +00:00Commented Jun 13, 2018 at 1:16
merge(1)
is probably nearer to what you want, but that requires a common ancestor to your two files.
A (dirty!) way of doing it is:
- Get rid of the first and last lines, use
grep(1)
to exclude them - Smash the results together
sort -u
leaves a sorted list, eliminates duplicates- Replace first/last line
Humm... something along the lines:
echo '<resources>'; grep -v resources file1 file2 | sort -u; echo '</resources>'
might do.
-
does work in this particular example, but NOT in general: If the
name
in_b_but_different_val
has a value of#00AABB
sort will put that on top and erases the second value instead of the first oneRafael T– Rafael T2013年02月02日 02:16:25 +00:00Commented Feb 2, 2013 at 2:16 -
for the optimal solution in this case you'd have to parse the XML, with a real XML parser not the hacks above, and produce a new merged XML output from that. diff / patch / sort etc. are just all hacks tailored to "particular examples", for a general solution they're simply the wrong toolsfrostschutz– frostschutz2013年02月02日 12:06:41 +00:00Commented Feb 2, 2013 at 12:06
-
@alzheimer, whip up something simple to show us...vonbrand– vonbrand2013年02月02日 12:22:31 +00:00Commented Feb 2, 2013 at 12:22
-
Apparently
diff3
works the same way. Requiring a common ancestor file. Why is there no simple CLI tool that just merges 2 files together based on whatdiff
shows.CMCDragonkai– CMCDragonkai2018年06月13日 01:08:04 +00:00Commented Jun 13, 2018 at 1:08
Here a simple solution that works merging up to 10 files:
#!/bin/bash
strip(){
i=0
for f; do
sed -r '
/<\/?resources>/ d
s/>/>'$((i++))'/
' "$f"
done
}
strip "$@" | sort -u -k1,1 -t'>' | sed '
1 s|^|<resources>\n|
s/>[0-9]/>/
$ a </resources>
'
please note the arg that comes first has the precedence so you have to call:
script b.xml a.xml
to get common values kept from b.xml
rather than a.xml
.
script b.xml a.xml
outs:
<resources>
<color name="in_b_but_different_val">#BBBBBB</color>
<color name="not_in_a">#AAAAAA</color>
<color name="not_in_b">#AAAAAA</color>
<color name="not_in_b_too">#AAAAAA</color>
<color name="same_in_b">#AAABBB</color>
</resources>
Another horrible hack - could be simplified, but :P
#!/bin/bash
i=0
while read line
do
if [ "${line:0:13}" == '<color name="' ]
then
a_keys[$i]="${line:13}"
a_keys[$i]="${a_keys[$i]%%\"*}"
a_values[$i]="$line"
i=$((i+1))
fi
done < a.xml
i=0
while read line
do
if [ "${line:0:13}" == '<color name="' ]
then
b_keys[$i]="${line:13}"
b_keys[$i]="${b_keys[$i]%%\"*}"
b_values[$i]="$line"
i=$((i+1))
fi
done < b.xml
echo "<resources>"
i=0
for akey in "${a_keys[@]}"
do
print=1
for bkey in "${b_keys[@]}"
do
if [ "$akey" == "$bkey" ]
then
print=0
break
fi
done
if [ $print == 1 ]
then
echo " ${a_values[$i]}"
fi
i=$(($i+1))
done
for value in "${b_values[@]}"
do
echo " $value"
done
echo "</resources>"
OK, second try, now in Perl (not production quality, no checking!):
#!/usr/bin/perl
open(A, "a.xml");
while(<A>) {
next if(m;^\<resource\>$;);
next if(m;^\<\/resource\>$;);
($name, $value) = m;^\s*\<color\s+name\s*\=\s*\"([^"]+)\"\>([^<]+)\<\/color\>$;;
$nv{$name} = $value if $name;
}
close(A);
open(B, "b.xml");
while(<B>) {
next if(m;^\<resource\>$;);
next if(m;^\<\/resource\>$;);
($name, $value) = m;^\s*\<color\s+name\s*\=\*\"([^"]+)\"\>([^<]+)\<\/color\>$;;
$nv{$name} = $value if $name;
}
close(B);
print "<resource>\n";
foreach (keys(%nv)) {
print " <color name=\"$_\">$nv{$_}</color>\n";
}
print "</resource>\n";
Another one, using cut and grep... (takes a.xml b.xml as arguments)
#!/bin/bash
zap='"('"`grep '<color' "2ドル" | cut -d '"' -f 2 | tr '\n' '|'`"'")'
echo "<resources>"
grep '<color' "1ドル" | grep -E -v "$zap"
grep '<color' "2ドル"
echo "</resources>"
-
echo
is the default action, soxargs echo
is superfluous. Why don't you simplytr '\n' '|'
anyway?tripleee– tripleee2013年02月02日 12:54:40 +00:00Commented Feb 2, 2013 at 12:54 -
Good point - it's just a quick hack. I'll edit it.frostschutz– frostschutz2013年02月02日 13:03:17 +00:00Commented Feb 2, 2013 at 13:03
you can also use join
:
JOIN(1) User Commands JOIN(1)
NAME
join - join lines of two files on a common field
SYNOPSIS
join [OPTION]... FILE1 FILE2
DESCRIPTION
For each pair of input lines with identical join fields, write a line to
standard output. The default join field is the first, delimited by blanks.
i found it here: https://stackoverflow.com/questions/10364455/merge-two-files-by-key-if-exists-in-the-first-file-bash-script
-
4Could you explain how
join
would be used in this particular case?Stephen Kitt– Stephen Kitt2020年04月17日 08:40:16 +00:00Commented Apr 17, 2020 at 8:40 -
2So, "using
join
" may be a correct answer, but it's useless unless one knew how to applyjoin
to this particular issue. Thejoin
utility crucially does not read XML, for example.2020年04月17日 10:54:38 +00:00Commented Apr 17, 2020 at 10:54 -
The problem / requirements in the question you linked to are significantly different from those in this question.G-Man Says 'Reinstate Monica'– G-Man Says 'Reinstate Monica'2020年04月17日 22:19:04 +00:00Commented Apr 17, 2020 at 22:19
You must log in to answer this question.
Explore related questions
See similar questions with these tags.
diff
can tell you which lines are in one file but not the other, but only on the granularity of entire lines.patch
is only suitable for making the same changes to a similar file (perhaps a different version of the same file, or an entirely different file where however the line numbers and surrounding lines for each change are identical to your original file). So no, they are not particularly suitable for this task. You might want to have a look atwdiff
but the solution probably requires a custom script. Since your data looks like XML, you might want to look for some XSL tool.