Problem:
- Need to compare two files,
- removing the duplicate from the first file
- then appending the lines of file1 to file2
Illustration by example
Suppose, The two files are test1 and test2.
$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
And test1 is
$ cat test1
www.xyz.com/abc-1
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
Comparing test1 to test2 and removing duplicates from test 1
Result Required:
$ cat test1
www.xyz.com/abc-1
and then adding this test1 data in to test2
$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1
Solutions Tried:
join -v1 -v2 <(sort test1) <(sort test2)
which resulted into this (that was wrong output)
$ join -v1 -v2 <(sort test1) <(sort test2)
www.xyz.com/abc-1
www.xyz.com/abc-6
Another solution i tried was :
fgrep -vf test1 test2
which resulted nothing.
-
Does this answer your question? Deleting lines from one file which are in another fileCertainPensioner– CertainPensioner2022年10月17日 23:42:45 +00:00Commented Oct 17, 2022 at 23:42
4 Answers 4
Remove lines from test1 because they are in test2:
$ grep -vxFf test2 test1
www.xyz.com/abc-1
To overwrite test1:
grep -vxFf test2 test1 >test1.tmp && mv test1.tmp test1
To append the new test1 to the end of test2:
cat test1 >>test2
The grep options
grep normally prints matching lines. -v tells grep to do the reverse: it prints only lines that do not match
-x tells grep to do whole-line matches.
-F tells grep that we are using fixed strings, not regular expressions.
-f test2 tells grep to read those fixed strings, one per line, from file test2.
1 Comment
$ grep -vxFf test2 test1 this is resulting nothing. No output.With awk:
% awk 'NR == FNR{ a[0ドル] = 1;next } !a[0ドル]' test2 test1
www.xyz.com/abc-1
Breakdown:
NR == FNR { # Run for test2 only
a[0ドル] = 1 # Store whole line as key in associative array
next # Skip next block
}
!a[0ドル] # Print line from test1 that are not in a
Comments
Solution to 1 and 2 problem.
diff test1 test2 |grep "<"|sed 's/< \+//g' > test1.tmp|mv test1.tmp test1
here is the output
$ cat test1
www.xyz.com/abc-1
solution to 3 problem.
cat test1 >> test2
here is the output
$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1
2 Comments
$ cat test1 output is < www.xyz.com/abc-1 why this < ?sed 's/< \+//g' is handling it already. Please make sure to maintain the mentioned sequence of files in diff command.If the lines in each file are unique as shown in your sample input then, since you are already sorting the input files in your attempted solutions so sorted output must be OK, this is all you need:
$ sort -u test1 test2
www.xyz.com/abc-1
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
If you need something else then edit your question to clarify your requirements and provide sample input/output that would cause this to break.