Print matched pattern in a file along with matched lines

Question 1

In a complex script I am using grep to get matched lines using a pattern file

For example: Here is the file containing text

$ cat file.txt
abc$(SEQ)asdasd
wwww$(SEQ)asqqqqqq
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz
klmn$(SEQ)11111111
op$(SEQ)44444444
qrs$(SEQ)777
tuv$(SEQ)mmmmmmmmm
qrs$(SEQ)777444
asdsd777hdhfgjdfasd
wxyzfhdfghdfh

and here is the pattern file

$ cat pattren.txt
444
777
asd

I am using the following grep command to get the matched lines

The

On the command line I can see what pattern is matched but not on the logs when it is logged. So I need a way to print the the matched line and the pattern that got matched. The output should look something like this. Pattern printed after TAB (or any recognizable format)

abc$(SEQ)asdasd <TAB> asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz <TAB> asd
op$(SEQ)44444444 <TAB> 444
qrs$(SEQ)777 <TAB> 444
qrs$(SEQ)777444 <TAB> 777444
asdsd777hdhfgjdfasd <TAB> asd777

I can use grep with -o but I am not able to combine both (i.e. with and without -o) together.

It is not necessary to use grep, I am happy to use any other commands that can accomplish this.

Question 2

you only want to list a pattern once (on the end of the line) regardless of how many times it matches in the line?

Question 3

@markp-fuso, yes i prefer that - all matched patterns once at end. Preferably after TAB

Question 4

Please, do not show text with images. They are not searchable, not copy-paste-able and much heavier than needed. Moreover they affect accessibility negatively. Please copy-paste the text in your question and format it properly, instead.

Question 5

This is a case where a screenshot might actually be preferred, because it shows how the matches are colorized to stand out.

Question 6

Never just use the word "pattern" when talking about patter matching as it's ambiguous - do you want to do string or regexp pattern matching? See how-do-i-find-the-text-that-matches-a-pattern to learn more about the issue.

Question 7

One awk idea:

awk '
BEGIN { sep1 = "\t"; sep2 = "," } # predefine our separators; modify as desired
FNR==NR { ptns[0ドル]; next } # 1st file: save each line as a new index in our ptns[] array
 { sfx = "" # 2nd file: reset our suffix
 for (ptn in ptns) # loop through the indices (aka patterns) of the ptns[] array
 if (index(0,ドルptn)) # if the pattern exists in the current line (ie, index() returns a value > 0) then ...
 sfx = sfx (sfx == "" ? "" : sep2) ptn # append the pattern to our suffix
 if (sfx != "") # if the suffix is not blank then we found at least one match so ...
 print 0ドル sep1 sfx # print current line and append the suffix
 }
' pattern.txt file.txt

Alternatively, place the body of the awk script in a file and then access via awk -f ...:

$ cat my_grep.awk
BEGIN { sep1 = "\t"; sep2 = "," }
FNR==NR { ptns[0ドル]; next }
 { sfx = ""
 for (ptn in ptns)
 if (index(0,ドルptn))
 sfx = sfx (sfx == "" ? "" : sep2) ptn
 if (sfx != "")
 print 0ドル sep1 sfx
 }
$ awk -f my_grep.awk pattern.txt file.txt

NOTES:

assumes lines in patterns.txt do not have any leading/trailing white space which would cause the index() call to fail
(ptn in ptns) does not guarantee the order in which the patterns are processed which means there's no guarantee of the ordering of said patterns when printed at the end of the line; while additional code could be added to address an ordering requirement, OP would need to provide more details to include how to handle duplicate and/or overlapping patterns (eg, a and as would match at the same index() position so which pattern would be considered the actual match?)
since index() will only find the 1st occurrence of a pattern, and we make no attempt to match beyond that first match, this approach only tells us that there is at least one match; additional coding would be needed to determine the number of matches but would also require additional details from OP on how to process duplicate and/or overlapping patterns (eg, how many times do 4 and 44 match against 44444444?)

Both approaches generate:

abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 444,777
asdsd777hdhfgjdfasd asd,777

Question 8

To do this in Perl, use this:

perl -lne '
BEGIN {
 chomp( @pats = `cat pattern.txt` );
 $pat = join "|", @pats;
}
if ( @matches = m{($pat)}g ) {
 %seen = ();
 @uniq = grep !$seen{$_}++, @matches;
 $uniq = join ",", @uniq;
 print "$_\t$uniq";
}' file.txt

Output:

abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 777,444
asdsd777hdhfgjdfasd asd,777

The Perl "one-liner" uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.

The regex uses this modifier:
g : Match the pattern repeatedly.

chomp( @pats = `cat pattern.txt` );

The above line reads the contents of the pattern.txt file into array @pats and strips the newlines (chomp).

$pat = join "|", @pats; : Joins the patterns into a single string, delimited by | (= OR operator).
@matches = m{($pat)}g : matches the patterns repeatedly (m{...}g) against the current line read from file.txt. All the matches are stored in array @matches (which may contain repetitions of the same pattern, if it occurs more than once).
if ( @matches = ... ) : @matches evaluates to TRUE if there is a least one match.
@uniq = grep !$seen{$_}++, @matches; : Makes matches unique and stores these in @uniq array.
$uniq = join ",", @uniq; : Joins the unique matches on a comma, and stores the result in a single string $uniq.

# Perl solution to print all the matched sequences
# No deduplication but can be added easily if necessary
$ grep --color=always -f pattern.txt file.txt | while read -r line; do
 printf "%s\t=== Patterns: " "$line"
 echo "$line" | perl -nE 'while (/\x1b\[[0-9;]+m(.*?)\x1b\[m/g) {
 print "1ドル "; }; print "\n";'
done
abc$(SEQ)asdasd === Patterns: asd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz === Patterns: asd asd asd
op$(SEQ)44444444 === Patterns: 444 444
qrs$(SEQ)777 === Patterns: 777
qrs$(SEQ)777444 === Patterns: 777 444
asdsd777hdhfgjdfasd === Patterns: asd 777 asd
# Rough sed solution that doesn't print all multi-patterns correctly,
# but probably enough for human consumption
$ grep --color=always -f pattern.txt file.txt | while read line; do \
 printf "$line\t=== Patterns: " | sed -E 's/\x1b\[[0-9;]*[A-Za-z]//g'
 echo "$line" | sed -E 's/^.*\x1b\[[0-9;]+m(.+)\x1b\[m/1円/g'
done
abc$(SEQ)asdasd === Patterns: asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz === Patterns: asd$(SEQ)zzzzzz
op$(SEQ)44444444 === Patterns: 44444
qrs$(SEQ)777 === Patterns: 777
qrs$(SEQ)777444 === Patterns: 444
asdsd777hdhfgjdfasd === Patterns: asd

If you want to preserve color for use later you can also do that directly

$ grep --color=always -f pattern.txt in.txt > out.txt # Forced color output
$ cat output.txt # Print out colorized result

Question 10

This is impressive.

Question 11

@RamananT thanks, I've just added a simple way to print out matches with perl

Question 12

That is printing the text that matched the pattern, not the pattern which is why in some of the scripts there are duplicate patterns in the output, and if the patterns are intended to be used for regexp matching then the output would be wrong. Also, some of the output includes strings that don't appear in the patterns file, e.g. that last script shows Patterns: asd$(SEQ)zzzzzz in the 2nd line of output and Patterns: 44444 later.

Question 13

@EdMorton I know, because sed doesn't have a way to loop through the matches and doesn't support non-aggressive match, so the result is just rough enough for human reading. The perl output is much better. Deduplicating the patterns should be easy but I didn't spend time on that

markp-fuso 38.9k5 gold badges25 silver badges50 bronze badges · Accepted Answer · 2025-04-17 04:17:02Z

One awk idea:

awk '
BEGIN { sep1 = "\t"; sep2 = "," } # predefine our separators; modify as desired
FNR==NR { ptns[0ドル]; next } # 1st file: save each line as a new index in our ptns[] array
 { sfx = "" # 2nd file: reset our suffix
 for (ptn in ptns) # loop through the indices (aka patterns) of the ptns[] array
 if (index(0,ドルptn)) # if the pattern exists in the current line (ie, index() returns a value > 0) then ...
 sfx = sfx (sfx == "" ? "" : sep2) ptn # append the pattern to our suffix
 if (sfx != "") # if the suffix is not blank then we found at least one match so ...
 print 0ドル sep1 sfx # print current line and append the suffix
 }
' pattern.txt file.txt

Alternatively, place the body of the awk script in a file and then access via awk -f ...:

$ cat my_grep.awk
BEGIN { sep1 = "\t"; sep2 = "," }
FNR==NR { ptns[0ドル]; next }
 { sfx = ""
 for (ptn in ptns)
 if (index(0,ドルptn))
 sfx = sfx (sfx == "" ? "" : sep2) ptn
 if (sfx != "")
 print 0ドル sep1 sfx
 }
$ awk -f my_grep.awk pattern.txt file.txt

NOTES:

assumes lines in patterns.txt do not have any leading/trailing white space which would cause the index() call to fail
(ptn in ptns) does not guarantee the order in which the patterns are processed which means there's no guarantee of the ordering of said patterns when printed at the end of the line; while additional code could be added to address an ordering requirement, OP would need to provide more details to include how to handle duplicate and/or overlapping patterns (eg, a and as would match at the same index() position so which pattern would be considered the actual match?)
since index() will only find the 1st occurrence of a pattern, and we make no attempt to match beyond that first match, this approach only tells us that there is at least one match; additional coding would be needed to determine the number of matches but would also require additional details from OP on how to process duplicate and/or overlapping patterns (eg, how many times do 4 and 44 match against 44444444?)

Both approaches generate:

abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 444,777
asdsd777hdhfgjdfasd asd,777

CollectivesTM on Stack Overflow

Print matched pattern in a file along with matched lines

3 Answers 3

Comments

See also:

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

Comments

See also:

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related