Compare two files and output the differences

Question 1

I am aware of diff and using loops but I just cant seem to really get what I need with diff. I'm basically looking to compare two files (file2.txt and file2.txt) and just get the output of what is missing between them.

Objective 1: Find what is missing in file2.txt from file1.txt

Objective 2: Find what is missing in either file. Some lines may exist in file2.txt that arent in file1.txt. I'd like to know about them as well.

diff only tells me that the two files arent the same, going line by line comparing the differences. What I need is a program that goes through the file, and doesn't discriminate by lines. If a line containing '/bin/mount' is found on line 2 of file1.txt and is found on line 59 of file2.txt, then I don't need to know about it. I only want to know what isn't there as a whole. Can this be done?

Question 2

If you don't care about the line order, sort the files first. To see what lines are missing in what file, use comm instead of diff:

comm <(sort file1) <(sort file2)

Question 3

What a simple and easy command. Never knew about it. But how can I grep out only the unique entries from whatever column I want?

Question 4

@unixpipe: Have you read man comm?

Question 5

I just did now, apologies. I am able to suppress either columns. These are great answers. I wish I can answer you guys both because they are both right!

Question 6

Objective 1: Find what is missing in file2.txt from file1.txt

With grep:

grep -xvFf file2.txt file1.txt

With comm:

comm -13 <(sort file1.txt) <(sort file2.txt)

With sort and uniq:

sort file2.txt file2.txt file1.txt | uniq -u

Objective 2: Find what is missing in either file. Some lines may exist in file2.txt that arent in file1.txt. I'd like to know about them as well.

With grep:

grep -xvFf file1.txt file2.txt; grep -xvFf file2.txt file1.txt

With comm:

comm -3 <(sort file1.txt) <(sort file2.txt) | tr -d '\t'

With sort and uniq:

sort file1.txt file2.txt | uniq -u

Question 7

You guys are the best

Question 8

Here is a simple code to match the similarity percentage between two file

import numpy as np
def levenshtein(seq1, seq2):
 size_x = len(seq1) + 1
 size_y = len(seq2) + 1
 matrix = np.zeros ((size_x, size_y))
 for x in range(size_x):
 matrix [x, 0] = x
 for y in range(size_y):
 matrix [0, y] = y
 for x in range(1, size_x):
 for y in range(1, size_y):
 if seq1[x-1] == seq2[y-1]:
 matrix [x,y] = min(
 matrix[x-1, y] + 1,
 matrix[x-1, y-1],
 matrix[x, y-1] + 1
 )
 else:
 matrix [x,y] = min(
 matrix[x-1,y] + 1,
 matrix[x-1,y-1] + 1,
 matrix[x,y-1] + 1
 )
 #print (matrix)
 return (matrix[size_x - 1, size_y - 1])
with open('original.txt', 'r') as file:
 data = file.read().replace('\n', '')
 str1=data.replace(' ', '')
with open('target.txt', 'r') as file:
 data = file.read().replace('\n', '')
 str2=data.replace(' ', '')
if(len(str1)>len(str2)):
 length=len(str1)
else:
 length=len(str2)
print(100-round((levenshtein(str1,str2)/length)*100,2),'% Similarity')

Create two files "original.txt" and "target.txt" in same directory with content.

choroba 20.4k4 gold badges53 silver badges54 bronze badges · Answer 1 · 2014-08-31 19:30:19Z

8

If you don't care about the line order, sort the files first. To see what lines are missing in what file, use comm instead of diff:

comm <(sort file1) <(sort file2)

Share

Improve this answer

answered Aug 31, 2014 at 19:30

choroba's user avatar

choroba

20.4k4 gold badges53 silver badges54 bronze badges

3

What a simple and easy command. Never knew about it. But how can I grep out only the unique entries from whatever column I want?

unixpipe
– unixpipe

2014年08月31日 19:38:45 +00:00
Commented Aug 31, 2014 at 19:38
@unixpipe: Have you read man comm?

choroba
– choroba

2014年08月31日 19:39:31 +00:00
Commented Aug 31, 2014 at 19:39
I just did now, apologies. I am able to suppress either columns. These are great answers. I wish I can answer you guys both because they are both right!

unixpipe
– unixpipe

2014年08月31日 19:44:48 +00:00
Commented Aug 31, 2014 at 19:44

Add a comment |

cuonglm 1513 bronze badges · Answer 2 · 2014-08-31 19:41:14Z

Objective 1: Find what is missing in file2.txt from file1.txt

With grep:

grep -xvFf file2.txt file1.txt

With comm:

comm -13 <(sort file1.txt) <(sort file2.txt)

With sort and uniq:

sort file2.txt file2.txt file1.txt | uniq -u

Objective 2: Find what is missing in either file. Some lines may exist in file2.txt that arent in file1.txt. I'd like to know about them as well.

With grep:

grep -xvFf file1.txt file2.txt; grep -xvFf file2.txt file1.txt

With comm:

comm -3 <(sort file1.txt) <(sort file2.txt) | tr -d '\t'

With sort and uniq:

sort file1.txt file2.txt | uniq -u

1

You guys are the best

unixpipe
– unixpipe

2014年08月31日 19:43:39 +00:00
Commented Aug 31, 2014 at 19:43

Navneet Singh 111 bronze badge · Answer 3 · 2020-08-16 17:35:23Z

Here is a simple code to match the similarity percentage between two file

import numpy as np
def levenshtein(seq1, seq2):
 size_x = len(seq1) + 1
 size_y = len(seq2) + 1
 matrix = np.zeros ((size_x, size_y))
 for x in range(size_x):
 matrix [x, 0] = x
 for y in range(size_y):
 matrix [0, y] = y
 for x in range(1, size_x):
 for y in range(1, size_y):
 if seq1[x-1] == seq2[y-1]:
 matrix [x,y] = min(
 matrix[x-1, y] + 1,
 matrix[x-1, y-1],
 matrix[x, y-1] + 1
 )
 else:
 matrix [x,y] = min(
 matrix[x-1,y] + 1,
 matrix[x-1,y-1] + 1,
 matrix[x,y-1] + 1
 )
 #print (matrix)
 return (matrix[size_x - 1, size_y - 1])
with open('original.txt', 'r') as file:
 data = file.read().replace('\n', '')
 str1=data.replace(' ', '')
with open('target.txt', 'r') as file:
 data = file.read().replace('\n', '')
 str2=data.replace(' ', '')
if(len(str1)>len(str2)):
 length=len(str1)
else:
 length=len(str2)
print(100-round((levenshtein(str1,str2)/length)*100,2),'% Similarity')

Create two files "original.txt" and "target.txt" in same directory with content.

Stack Exchange Network

Compare two files and output the differences

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Compare two files and output the differences

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions