0

I am trying to compare two files and output a file which consists of common names for both.

File1

1990.A.BHT.s_fil 4.70 
1991.H.BHT.s_fil 2.34 
1992.O.BHT.s_fil 3.67 
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29 
1995.K.BHT.s_fil -4.01

File2

1990.A.BHT_ScS.dat 1537 -2.21
1993.C.BHT_ScS.dat 1494 1.13
1994.I.BHT_ScS.dat 1545 0.15
1995.K.BHT_ScS.dat 1624 1.15 

I want to compare the first parts of the names ** (ex:1990.A.BHT ) ** on both files and output a file which has common names with the values on 2nd column in file1 to file3

ex: file3 (output)

1990.A.BHT.s_fil 4.70 
1993.C.BHT.s_fil -1.50
1994.I.BHT.s_fil -3.29
1995.K.BHT.s_fil -4.01

I used following codes which uses grep command

while read line 
do
grep $line file1 >> file3
done < file2

and

grep -wf file1 file2 > file3 

I sort the files before using this script. But I get an empty file3. Can someone help me with this please?

Barmar
789k57 gold badges555 silver badges669 bronze badges
asked Jun 10, 2021 at 22:45
10
  • None of the lines in the two files match. file1 has 2 columns, file2 has 3 columns. And the first column in file1 ends with .s_fil while the corresponding names in file2 end with .dat. Commented Jun 10, 2021 at 23:14
  • Do you just want to compare the names without those suffixes? None of the code you've written removes the suffixes and just compares that field. Commented Jun 10, 2021 at 23:15
  • All the tools you've tried to use require exact matches of the lines, it doesn't seem like you've made any attempt to check just the part you care about. Commented Jun 10, 2021 at 23:16
  • @Barmar Yes I want to compare names without the suffixes and output file1 values to file3 Commented Jun 10, 2021 at 23:17
  • So what have you tried? The code you posted is clearly not an attempt to do that. Commented Jun 10, 2021 at 23:19

2 Answers 2

1

You need to remove everything starting from _SCS.dat from the lines in file2. Then you can use that as a pattern to match lines in file1.

grep -F -f <(sed 's/_SCS\.dat.*//' file2) file1 > file3

The -F option matches fixed strings rather than treating them as regular expressions.

answered Jun 10, 2021 at 23:27
Sign up to request clarification or add additional context in comments.

1 Comment

@Monika Accept the answer if it solved your problem.
-1

In your example data, the lines appear to be in sorted order. If you can guarantee that they always are, comm -1 -2 file1 file2 would do the job. If they can be unsorted, do a

comm -1 -2 <(sort file1) <(sort file2)
answered Jun 11, 2021 at 8:01

1 Comment

How can this possibly work when the lines don't actually match, only the beginning of the first fields match? This is the same mistake the OP made.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.