4

In a complex script I am using grep to get matched lines using a pattern file

For example: Here is the file containing text

$ cat file.txt
abc$(SEQ)asdasd
wwww$(SEQ)asqqqqqq
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz
klmn$(SEQ)11111111
op$(SEQ)44444444
qrs$(SEQ)777
tuv$(SEQ)mmmmmmmmm
qrs$(SEQ)777444
asdsd777hdhfgjdfasd
wxyzfhdfghdfh

and here is the pattern file

$ cat pattren.txt
444
777
asd

I am using the following grep command to get the matched lines

The

On the command line I can see what pattern is matched but not on the logs when it is logged. So I need a way to print the the matched line and the pattern that got matched. The output should look something like this. Pattern printed after TAB (or any recognizable format)

abc$(SEQ)asdasd <TAB> asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz <TAB> asd
op$(SEQ)44444444 <TAB> 444
qrs$(SEQ)777 <TAB> 444
qrs$(SEQ)777444 <TAB> 777444
asdsd777hdhfgjdfasd <TAB> asd777 

I can use grep with -o but I am not able to combine both (i.e. with and without -o) together.

It is not necessary to use grep, I am happy to use any other commands that can accomplish this.

Timur Shtatland
12.8k3 gold badges41 silver badges68 bronze badges
asked Apr 17, 2025 at 3:29
6
  • 1
    you only want to list a pattern once (on the end of the line) regardless of how many times it matches in the line? Commented Apr 17, 2025 at 3:57
  • 1
    @markp-fuso, yes i prefer that - all matched patterns once at end. Preferably after TAB Commented Apr 17, 2025 at 4:02
  • Please, do not show text with images. They are not searchable, not copy-paste-able and much heavier than needed. Moreover they affect accessibility negatively. Please copy-paste the text in your question and format it properly, instead. Commented Apr 17, 2025 at 4:31
  • 1
    This is a case where a screenshot might actually be preferred, because it shows how the matches are colorized to stand out. Commented Apr 17, 2025 at 5:02
  • 1
    Never just use the word "pattern" when talking about patter matching as it's ambiguous - do you want to do string or regexp pattern matching? See how-do-i-find-the-text-that-matches-a-pattern to learn more about the issue. Commented Apr 17, 2025 at 10:11

3 Answers 3

6

One awk idea:

awk '
BEGIN { sep1 = "\t"; sep2 = "," } # predefine our separators; modify as desired
FNR==NR { ptns[0ドル]; next } # 1st file: save each line as a new index in our ptns[] array
 { sfx = "" # 2nd file: reset our suffix
 for (ptn in ptns) # loop through the indices (aka patterns) of the ptns[] array
 if (index(0,ドルptn)) # if the pattern exists in the current line (ie, index() returns a value > 0) then ...
 sfx = sfx (sfx == "" ? "" : sep2) ptn # append the pattern to our suffix
 if (sfx != "") # if the suffix is not blank then we found at least one match so ...
 print 0ドル sep1 sfx # print current line and append the suffix
 }
' pattern.txt file.txt

Alternatively, place the body of the awk script in a file and then access via awk -f ...:

$ cat my_grep.awk
BEGIN { sep1 = "\t"; sep2 = "," }
FNR==NR { ptns[0ドル]; next }
 { sfx = ""
 for (ptn in ptns)
 if (index(0,ドルptn))
 sfx = sfx (sfx == "" ? "" : sep2) ptn
 if (sfx != "")
 print 0ドル sep1 sfx
 }
$ awk -f my_grep.awk pattern.txt file.txt

NOTES:

  • assumes lines in patterns.txt do not have any leading/trailing white space which would cause the index() call to fail
  • (ptn in ptns) does not guarantee the order in which the patterns are processed which means there's no guarantee of the ordering of said patterns when printed at the end of the line; while additional code could be added to address an ordering requirement, OP would need to provide more details to include how to handle duplicate and/or overlapping patterns (eg, a and as would match at the same index() position so which pattern would be considered the actual match?)
  • since index() will only find the 1st occurrence of a pattern, and we make no attempt to match beyond that first match, this approach only tells us that there is at least one match; additional coding would be needed to determine the number of matches but would also require additional details from OP on how to process duplicate and/or overlapping patterns (eg, how many times do 4 and 44 match against 44444444?)

Both approaches generate:

abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 444,777
asdsd777hdhfgjdfasd asd,777
answered Apr 17, 2025 at 4:17
Sign up to request clarification or add additional context in comments.

Comments

2

To do this in Perl, use this:

perl -lne '
BEGIN {
 chomp( @pats = `cat pattern.txt` );
 $pat = join "|", @pats;
}
if ( @matches = m{($pat)}g ) {
 %seen = ();
 @uniq = grep !$seen{$_}++, @matches;
 $uniq = join ",", @uniq;
 print "$_\t$uniq";
}' file.txt

Output:

abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 777,444
asdsd777hdhfgjdfasd asd,777

The Perl "one-liner" uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.

The regex uses this modifier:
g : Match the pattern repeatedly.

chomp( @pats = `cat pattern.txt` );

The above line reads the contents of the pattern.txt file into array @pats and strips the newlines (chomp).

$pat = join "|", @pats; : Joins the patterns into a single string, delimited by | (= OR operator).
@matches = m{($pat)}g : matches the patterns repeatedly (m{...}g) against the current line read from file.txt. All the matches are stored in array @matches (which may contain repetitions of the same pattern, if it occurs more than once).
if ( @matches = ... ) : @matches evaluates to TRUE if there is a least one match.
@uniq = grep !$seen{$_}++, @matches; : Makes matches unique and stores these in @uniq array.
$uniq = join ",", @uniq; : Joins the unique matches on a comma, and stores the result in a single string $uniq.

See also:

answered Apr 17, 2025 at 18:23

Comments

2

Since any recognizable format is allowed, the simplest solution is to replace the ANSI sequences emitted by grep directly with any separator you want, like a tab:

$ grep --color=always -f pattern.txt file.txt | \
 sed -E 's/(\x1b\[[0-9;]*[A-Za-z])+/\t/g'
abc$(SEQ) asd asd
efg hij$(SEQ) asd asd asd $(SEQ)zzzzzz
op$(SEQ) 444 444 44
qrs$(SEQ) 777
qrs$(SEQ) 777 444
 asd sd 777 hdhfgjdf asd

It's also possible to capture each match between the start and end ANSI sequence, for example here I wrap each match within []

$ grep --color=always -f pattern.txt file.txt | \
 sed -E -e 's#\x1b\[m#]\t#g' -e 's#\x1b\[[0-9;]+m#\t[#g'
abc$(SEQ) [asd] [asd]
efg hij$(SEQ) [asd] [asd] [asd] $(SEQ)zzzzzz
op$(SEQ) [444] [444] 44
qrs$(SEQ) [777]
qrs$(SEQ) [777] [444]
 [asd] sd [777] hdhfgjdf [asd]

In the former command \x1b\[[0-9;]*[A-Za-z] is the regex for a general ANSI sequence. In the latter \x1b\[m is the pattern to match the clear format ANSI sequence, and it'll be replaced with ]\t which ends the matched pattern; then any other formatting sequence will be replaced with \t[

To print patterns at the end you can use something like this

# Perl solution to print all the matched sequences
# No deduplication but can be added easily if necessary
$ grep --color=always -f pattern.txt file.txt | while read -r line; do
 printf "%s\t=== Patterns: " "$line"
 echo "$line" | perl -nE 'while (/\x1b\[[0-9;]+m(.*?)\x1b\[m/g) {
 print "1ドル "; }; print "\n";'
done
abc$(SEQ)asdasd === Patterns: asd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz === Patterns: asd asd asd
op$(SEQ)44444444 === Patterns: 444 444
qrs$(SEQ)777 === Patterns: 777
qrs$(SEQ)777444 === Patterns: 777 444
asdsd777hdhfgjdfasd === Patterns: asd 777 asd
# Rough sed solution that doesn't print all multi-patterns correctly,
# but probably enough for human consumption
$ grep --color=always -f pattern.txt file.txt | while read line; do \
 printf "$line\t=== Patterns: " | sed -E 's/\x1b\[[0-9;]*[A-Za-z]//g'
 echo "$line" | sed -E 's/^.*\x1b\[[0-9;]+m(.+)\x1b\[m/1円/g'
done
abc$(SEQ)asdasd === Patterns: asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz === Patterns: asd$(SEQ)zzzzzz
op$(SEQ)44444444 === Patterns: 44444
qrs$(SEQ)777 === Patterns: 777
qrs$(SEQ)777444 === Patterns: 444
asdsd777hdhfgjdfasd === Patterns: asd

If you want to preserve color for use later you can also do that directly

$ grep --color=always -f pattern.txt in.txt > out.txt # Forced color output
$ cat output.txt # Print out colorized result
answered Apr 17, 2025 at 6:55

4 Comments

This is impressive.
@RamananT thanks, I've just added a simple way to print out matches with perl
That is printing the text that matched the pattern, not the pattern which is why in some of the scripts there are duplicate patterns in the output, and if the patterns are intended to be used for regexp matching then the output would be wrong. Also, some of the output includes strings that don't appear in the patterns file, e.g. that last script shows Patterns: asd$(SEQ)zzzzzz in the 2nd line of output and Patterns: 44444 later.
@EdMorton I know, because sed doesn't have a way to loop through the matches and doesn't support non-aggressive match, so the result is just rough enough for human reading. The perl output is much better. Deduplicating the patterns should be easy but I didn't spend time on that

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.