In a complex script I am using grep to get matched lines using a pattern file
For example: Here is the file containing text
$ cat file.txt
abc$(SEQ)asdasd
wwww$(SEQ)asqqqqqq
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz
klmn$(SEQ)11111111
op$(SEQ)44444444
qrs$(SEQ)777
tuv$(SEQ)mmmmmmmmm
qrs$(SEQ)777444
asdsd777hdhfgjdfasd
wxyzfhdfghdfh
and here is the pattern file
$ cat pattren.txt
444
777
asd
I am using the following grep command to get the matched lines
On the command line I can see what pattern is matched but not on the logs when it is logged. So I need a way to print the the matched line and the pattern that got matched. The output should look something like this. Pattern printed after TAB (or any recognizable format)
abc$(SEQ)asdasd <TAB> asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz <TAB> asd
op$(SEQ)44444444 <TAB> 444
qrs$(SEQ)777 <TAB> 444
qrs$(SEQ)777444 <TAB> 777444
asdsd777hdhfgjdfasd <TAB> asd777
I can use grep with -o but I am not able to combine both (i.e. with and without -o) together.
It is not necessary to use grep, I am happy to use any other commands that can accomplish this.
-
1you only want to list a pattern once (on the end of the line) regardless of how many times it matches in the line?markp-fuso– markp-fuso2025年04月17日 03:57:56 +00:00Commented Apr 17, 2025 at 3:57
-
1@markp-fuso, yes i prefer that - all matched patterns once at end. Preferably after TABRamanan T– Ramanan T2025年04月17日 04:02:04 +00:00Commented Apr 17, 2025 at 4:02
-
Please, do not show text with images. They are not searchable, not copy-paste-able and much heavier than needed. Moreover they affect accessibility negatively. Please copy-paste the text in your question and format it properly, instead.Renaud Pacalet– Renaud Pacalet2025年04月17日 04:31:34 +00:00Commented Apr 17, 2025 at 4:31
-
1This is a case where a screenshot might actually be preferred, because it shows how the matches are colorized to stand out.Shawn– Shawn2025年04月17日 05:02:53 +00:00Commented Apr 17, 2025 at 5:02
-
1Never just use the word "pattern" when talking about patter matching as it's ambiguous - do you want to do string or regexp pattern matching? See how-do-i-find-the-text-that-matches-a-pattern to learn more about the issue.Ed Morton– Ed Morton2025年04月17日 10:11:59 +00:00Commented Apr 17, 2025 at 10:11
3 Answers 3
One awk idea:
awk '
BEGIN { sep1 = "\t"; sep2 = "," } # predefine our separators; modify as desired
FNR==NR { ptns[0ドル]; next } # 1st file: save each line as a new index in our ptns[] array
{ sfx = "" # 2nd file: reset our suffix
for (ptn in ptns) # loop through the indices (aka patterns) of the ptns[] array
if (index(0,ドルptn)) # if the pattern exists in the current line (ie, index() returns a value > 0) then ...
sfx = sfx (sfx == "" ? "" : sep2) ptn # append the pattern to our suffix
if (sfx != "") # if the suffix is not blank then we found at least one match so ...
print 0ドル sep1 sfx # print current line and append the suffix
}
' pattern.txt file.txt
Alternatively, place the body of the awk script in a file and then access via awk -f ...:
$ cat my_grep.awk
BEGIN { sep1 = "\t"; sep2 = "," }
FNR==NR { ptns[0ドル]; next }
{ sfx = ""
for (ptn in ptns)
if (index(0,ドルptn))
sfx = sfx (sfx == "" ? "" : sep2) ptn
if (sfx != "")
print 0ドル sep1 sfx
}
$ awk -f my_grep.awk pattern.txt file.txt
NOTES:
- assumes lines in
patterns.txtdo not have any leading/trailing white space which would cause theindex()call to fail (ptn in ptns)does not guarantee the order in which the patterns are processed which means there's no guarantee of the ordering of said patterns when printed at the end of the line; while additional code could be added to address an ordering requirement, OP would need to provide more details to include how to handle duplicate and/or overlapping patterns (eg,aandaswould match at the sameindex()position so which pattern would be considered the actual match?)- since
index()will only find the 1st occurrence of a pattern, and we make no attempt to match beyond that first match, this approach only tells us that there is at least one match; additional coding would be needed to determine the number of matches but would also require additional details from OP on how to process duplicate and/or overlapping patterns (eg, how many times do4and44match against44444444?)
Both approaches generate:
abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 444,777
asdsd777hdhfgjdfasd asd,777
Comments
To do this in Perl, use this:
perl -lne '
BEGIN {
chomp( @pats = `cat pattern.txt` );
$pat = join "|", @pats;
}
if ( @matches = m{($pat)}g ) {
%seen = ();
@uniq = grep !$seen{$_}++, @matches;
$uniq = join ",", @uniq;
print "$_\t$uniq";
}' file.txt
Output:
abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 777,444
asdsd777hdhfgjdfasd asd,777
The Perl "one-liner" uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
The regex uses this modifier:
g : Match the pattern repeatedly.
chomp( @pats = `cat pattern.txt` );
The above line reads the contents of the pattern.txt file into array @pats and strips the newlines (chomp).
$pat = join "|", @pats; : Joins the patterns into a single string, delimited by | (= OR operator).
@matches = m{($pat)}g : matches the patterns repeatedly (m{...}g) against the current line read from file.txt. All the matches are stored in array @matches (which may contain repetitions of the same pattern, if it occurs more than once).
if ( @matches = ... ) : @matches evaluates to TRUE if there is a least one match.
@uniq = grep !$seen{$_}++, @matches; : Makes matches unique and stores these in @uniq array.
$uniq = join ",", @uniq; : Joins the unique matches on a comma, and stores the result in a single string $uniq.
See also:
Comments
Since any recognizable format is allowed, the simplest solution is to replace the ANSI sequences emitted by grep directly with any separator you want, like a tab:
$ grep --color=always -f pattern.txt file.txt | \
sed -E 's/(\x1b\[[0-9;]*[A-Za-z])+/\t/g'
abc$(SEQ) asd asd
efg hij$(SEQ) asd asd asd $(SEQ)zzzzzz
op$(SEQ) 444 444 44
qrs$(SEQ) 777
qrs$(SEQ) 777 444
asd sd 777 hdhfgjdf asd
It's also possible to capture each match between the start and end ANSI sequence, for example here I wrap each match within []
$ grep --color=always -f pattern.txt file.txt | \
sed -E -e 's#\x1b\[m#]\t#g' -e 's#\x1b\[[0-9;]+m#\t[#g'
abc$(SEQ) [asd] [asd]
efg hij$(SEQ) [asd] [asd] [asd] $(SEQ)zzzzzz
op$(SEQ) [444] [444] 44
qrs$(SEQ) [777]
qrs$(SEQ) [777] [444]
[asd] sd [777] hdhfgjdf [asd]
In the former command \x1b\[[0-9;]*[A-Za-z] is the regex for a general ANSI sequence. In the latter \x1b\[m is the pattern to match the clear format ANSI sequence, and it'll be replaced with ]\t which ends the matched pattern; then any other formatting sequence will be replaced with \t[
To print patterns at the end you can use something like this
# Perl solution to print all the matched sequences # No deduplication but can be added easily if necessary $ grep --color=always -f pattern.txt file.txt | while read -r line; do printf "%s\t=== Patterns: " "$line" echo "$line" | perl -nE 'while (/\x1b\[[0-9;]+m(.*?)\x1b\[m/g) { print "1ドル "; }; print "\n";' doneabc$(SEQ)asdasd === Patterns: asd asd efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz === Patterns: asd asd asd op$(SEQ)44444444 === Patterns: 444 444 qrs$(SEQ)777 === Patterns: 777 qrs$(SEQ)777444 === Patterns: 777 444 asdsd777hdhfgjdfasd === Patterns: asd 777 asd# Rough sed solution that doesn't print all multi-patterns correctly, # but probably enough for human consumption $ grep --color=always -f pattern.txt file.txt | while read line; do \ printf "$line\t=== Patterns: " | sed -E 's/\x1b\[[0-9;]*[A-Za-z]//g' echo "$line" | sed -E 's/^.*\x1b\[[0-9;]+m(.+)\x1b\[m/1円/g' doneabc$(SEQ)asdasd === Patterns: asd efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz === Patterns: asd$(SEQ)zzzzzz op$(SEQ)44444444 === Patterns: 44444 qrs$(SEQ)777 === Patterns: 777 qrs$(SEQ)777444 === Patterns: 444 asdsd777hdhfgjdfasd === Patterns: asd
If you want to preserve color for use later you can also do that directly
$ grep --color=always -f pattern.txt in.txt > out.txt # Forced color output
$ cat output.txt # Print out colorized result
4 Comments
Patterns: asd$(SEQ)zzzzzz in the 2nd line of output and Patterns: 44444 later.