comparing two files by lines and removing duplicates from first file

Question 1

Problem:

Need to compare two files,
removing the duplicate from the first file
then appending the lines of file1 to file2

Illustration by example

Suppose, The two files are test1 and test2.

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6

And test1 is

$ cat test1
www.xyz.com/abc-1
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5

Comparing test1 to test2 and removing duplicates from test 1

Result Required:

$ cat test1
www.xyz.com/abc-1

and then adding this test1 data in to test2

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1

Solutions Tried:

join -v1 -v2 <(sort test1) <(sort test2)

which resulted into this (that was wrong output)

$ join -v1 -v2 <(sort test1) <(sort test2)
www.xyz.com/abc-1
www.xyz.com/abc-6

Another solution i tried was :

fgrep -vf test1 test2

which resulted nothing.

Question 2

Does this answer your question? Deleting lines from one file which are in another file

Question 3

Remove lines from test1 because they are in test2:

$ grep -vxFf test2 test1
www.xyz.com/abc-1

To overwrite test1:

grep -vxFf test2 test1 >test1.tmp && mv test1.tmp test1

To append the new test1 to the end of test2:

cat test1 >>test2

The grep options

grep normally prints matching lines. -v tells grep to do the reverse: it prints only lines that do not match

-x tells grep to do whole-line matches.

-F tells grep that we are using fixed strings, not regular expressions.

-f test2 tells grep to read those fixed strings, one per line, from file test2.

Question 4

$ grep -vxFf test2 test1 this is resulting nothing. No output.

Question 5

With awk:

% awk 'NR == FNR{ a[0ドル] = 1;next } !a[0ドル]' test2 test1
www.xyz.com/abc-1

Breakdown:

NR == FNR { # Run for test2 only
 a[0ドル] = 1 # Store whole line as key in associative array
 next # Skip next block
}
!a[0ドル] # Print line from test1 that are not in a

Question 6

Solution to 1 and 2 problem.

diff test1 test2 |grep "<"|sed 's/< \+//g' > test1.tmp|mv test1.tmp test1

here is the output

$ cat test1
www.xyz.com/abc-1

solution to 3 problem.

cat test1 >> test2

here is the output

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1

Question 7

$ cat test1 output is < www.xyz.com/abc-1 why this < ?

Question 8

I have test this in bash, which SHELL you are using? sed 's/< \+//g' is handling it already. Please make sure to maintain the mentioned sequence of files in diff command.

Question 9

If the lines in each file are unique as shown in your sample input then, since you are already sorting the input files in your attempted solutions so sorted output must be OK, this is all you need:

$ sort -u test1 test2
www.xyz.com/abc-1
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6

If you need something else then edit your question to clarify your requirements and provide sample input/output that would cause this to break.

Question 10

I guess you didnt read the question properly. I want to remove the duplicates from the test1 file and then appending that to test2 file.

Question 11

I read it perfectly but many times people ask for A when they actually want B and your question sounds like you are describing what you think are the steps required to solve a problem, not the problem itself. Why do you care where the lines from each file end up as long as the result is the unique set of lines from both files?

John1024 115k15 gold badges152 silver badges183 bronze badges · Accepted Answer · 2016-05-28 19:59:05Z

Remove lines from test1 because they are in test2:

$ grep -vxFf test2 test1
www.xyz.com/abc-1

To overwrite test1:

grep -vxFf test2 test1 >test1.tmp && mv test1.tmp test1

To append the new test1 to the end of test2:

cat test1 >>test2

The grep options

grep normally prints matching lines. -v tells grep to do the reverse: it prints only lines that do not match

-x tells grep to do whole-line matches.

-F tells grep that we are using fixed strings, not regular expressions.

-f test2 tells grep to read those fixed strings, one per line, from file test2.

$ grep -vxFf test2 test1 this is resulting nothing. No output.

CollectivesTM on Stack Overflow

comparing two files by lines and removing duplicates from first file

4 Answers 4

The grep options

1 Comment

Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

The grep options

1 Comment

Comments

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related