I have a huge file contains two types of patterns say pattern1 and pattern2, pattern1 may appear many times before pattern2 appears. I want to grep the last occurrence of each pattern1 before each pattern2.
Input file:
some text
pattern1=1
some lines
pattern1=2
some lines
pattern1=3
some lines
pattern2
some lines
pattern1=4
some lines
pattern1=5
some lines
pattern1=6
some lines
pattern1=7
some lines
pattern2
Desired output:
pattern1=3
pattern1=7
I tried with grep
when I know the numbers of lines between pattern2 and the previous pattern1:
grep -B400 "pattern2" | grep "pattern1"
but I need a unique command that can be run over any file regardless of the number of lines between the two patterns.
6 Answers 6
$ awk '/pattern1/{x=0ドル} /pattern2/{print x}' input
pattern1=3
pattern1=7
Saves the pattern1
matches (the whole line) to the variable x
and prints that when pattern2
happens. Will print a blank line if there's a pattern2
before any pattern1
, which would take more logic to detect if that's not desirable. Will drop any trailing pattern1
that are not followed by a pattern2
before the end of the input.
@thrig's answer is good, but I made some modifications to handle some extra test cases. The following script:
- Will not print an empty line if
pattern2
appears before the first appearance ofpattern1
. - Will not print duplicate lines if
pattern2
appears multiple times afterpattern1
.
With the modified input file:
pattern2
some text
pattern1=1
some lines
pattern1=2
some lines
pattern1=3
some lines
pattern2
pattern2
some lines
pattern1=4
some lines
pattern1=5
pattern2
some lines
pattern1=6
some lines
pattern1=7
some lines
pattern2
The following script seems to do what you describe in the text:
$ awk '/pattern1/{x=0ドル} length(x) && /pattern2/{print x;x=""}' file
pattern1=3
pattern1=5
pattern1=7
Three grep
calls:
Extract only lines that match
^pattern1=
or^pattern2$
from the original input filegrep -e '^pattern1=' -e '^pattern2$' file
Get the lines that match
^pattern2$
, and the lines immediately before these (using the non-standard-B
option):grep -B1 '^pattern2$'
Get all lines matching
^pattern1=
from these:grep '^pattern1='
All together:
grep -e '^pattern1=' -e '^pattern2$' file |
grep -B1 '^pattern2$' |
grep '^pattern1='
This handles the same edge cases as user000001's answer, namely that it does not output duplicate lines if there are many pattern2
lins with no pattern1
line between them, and it would not produce empty lines for pattern2
lines at the start of the file.
Using sed
:
sed -e '/^pattern1=/ { h; d; }' \
-e '/^pattern2$/ x' \
-e '/^pattern1=/ !d' file
If the current line is a
pattern1
line, it saves it into the hold space and discards it.If the current line a
pattern2
line, it swaps in the hold space.If the current line now is not a
pattern1
line, it is discarded.(implicitly) print the current line. The current line, by the preceding commands, must be a
pattern1
line swapped in from the hold space due to finding apattern2
line. The hold space would, therefore, by necessity, hold apattern2
line, ensuring that thepattern1
line would not be outputted multiple times.
❯ printf 'g/pattern2/?pattern1?p\n' | ed -s in.txt
pattern1=3
pattern1=7
Explanation:
g/pattern2/
→ For every match of the pattern2, do
?pattern1?
→ Search backwards for pattern1
p
→ print
Limitations:
- Search backwards wraps back to end of file and selects the last match of pattern1
- Prints the same line multiple times if there are multiple
pattern2
E.g.
pattern1=1
pattern2
pattern2
This will print pattern1 twice.
-
Could you use
/pattern1/,$
(or something like it) as the address range for theg
command, possibly, to fix the wrapping issue? I'm on a phone and can't test...2022年08月23日 14:52:35 +00:00Commented Aug 23, 2022 at 14:52
egrep "^pattern1|^pattern2" <file> | grep -B 1 "^pattern2" | grep "^pattern1"
The first egrep will get only the lines that contain either pattern (stripping all other unknown lines from the output). The second grep will get the pattern2 and whatever line is before it. This will be used to remove lines where there is no pattern1 before the pattern2. The third grep will just return the remaining pattern1 lines.
awk '{a[++i]=0ドル}/pattern2/{for(x=NR-2;x<=NR;x++)print a[x]}' file1.txt| awk '/pattern1/'
output
pattern1=3
pattern1=7
As per Input pattern1 is separated by one line so used above awk command to extract the required output
-
3You should probably mention the assumptions you're making, such as there not being more than two lines between the pattern. It's unclear why you have that second awk call, as you could do the same thing with an
if
statement in front of theprint
.2022年08月23日 08:39:19 +00:00Commented Aug 23, 2022 at 8:39 -
1Think about the values your variable
i
will have. Now think about the values the builtin variableNR
will have and figure out why introducei
.Ed Morton– Ed Morton2022年08月23日 14:29:01 +00:00Commented Aug 23, 2022 at 14:29
pattern 1
that appears before any instance ofpattern 2
regardless of what other strings are between the two of them, yes?pattern
everywhere in your question with whatever you really want to match - full or partial + word or line + string or regexp + anchored or not. Right now the answers you have assume you want unanchored partial line regexp matches which seems unlikely to be the most robust solution for you but that's based on what your code does since you haven't yet stated/shown what you actually need.