I am aware of diff and using loops but I just cant seem to really get what I need with diff. I'm basically looking to compare two files (file2.txt and file2.txt) and just get the output of what is missing between them.
Objective 1: Find what is missing in file2.txt from file1.txt
Objective 2: Find what is missing in either file. Some lines may exist in file2.txt that arent in file1.txt. I'd like to know about them as well.
diff only tells me that the two files arent the same, going line by line comparing the differences. What I need is a program that goes through the file, and doesn't discriminate by lines. If a line containing '/bin/mount' is found on line 2 of file1.txt and is found on line 59 of file2.txt, then I don't need to know about it. I only want to know what isn't there as a whole. Can this be done?
3 Answers 3
If you don't care about the line order, sort the files first. To see what lines are missing in what file, use comm instead of diff:
comm <(sort file1) <(sort file2)
-
What a simple and easy command. Never knew about it. But how can I grep out only the unique entries from whatever column I want?unixpipe– unixpipe2014年08月31日 19:38:45 +00:00Commented Aug 31, 2014 at 19:38
-
@unixpipe: Have you read
man comm?choroba– choroba2014年08月31日 19:39:31 +00:00Commented Aug 31, 2014 at 19:39 -
I just did now, apologies. I am able to suppress either columns. These are great answers. I wish I can answer you guys both because they are both right!unixpipe– unixpipe2014年08月31日 19:44:48 +00:00Commented Aug 31, 2014 at 19:44
Objective 1: Find what is missing in file2.txt from file1.txt
With grep:
grep -xvFf file2.txt file1.txt
With comm:
comm -13 <(sort file1.txt) <(sort file2.txt)
With sort and uniq:
sort file2.txt file2.txt file1.txt | uniq -u
Objective 2: Find what is missing in either file. Some lines may exist in file2.txt that arent in file1.txt. I'd like to know about them as well.
With grep:
grep -xvFf file1.txt file2.txt; grep -xvFf file2.txt file1.txt
With comm:
comm -3 <(sort file1.txt) <(sort file2.txt) | tr -d '\t'
With sort and uniq:
sort file1.txt file2.txt | uniq -u
-
1You guys are the bestunixpipe– unixpipe2014年08月31日 19:43:39 +00:00Commented Aug 31, 2014 at 19:43
Here is a simple code to match the similarity percentage between two file
import numpy as np
def levenshtein(seq1, seq2):
size_x = len(seq1) + 1
size_y = len(seq2) + 1
matrix = np.zeros ((size_x, size_y))
for x in range(size_x):
matrix [x, 0] = x
for y in range(size_y):
matrix [0, y] = y
for x in range(1, size_x):
for y in range(1, size_y):
if seq1[x-1] == seq2[y-1]:
matrix [x,y] = min(
matrix[x-1, y] + 1,
matrix[x-1, y-1],
matrix[x, y-1] + 1
)
else:
matrix [x,y] = min(
matrix[x-1,y] + 1,
matrix[x-1,y-1] + 1,
matrix[x,y-1] + 1
)
#print (matrix)
return (matrix[size_x - 1, size_y - 1])
with open('original.txt', 'r') as file:
data = file.read().replace('\n', '')
str1=data.replace(' ', '')
with open('target.txt', 'r') as file:
data = file.read().replace('\n', '')
str2=data.replace(' ', '')
if(len(str1)>len(str2)):
length=len(str1)
else:
length=len(str2)
print(100-round((levenshtein(str1,str2)/length)*100,2),'% Similarity')
Create two files "original.txt" and "target.txt" in same directory with content.