So I have a bunch of Apache logs with using the standard log format. I want to get all the log lines that did not come from a web crawler.
So lets say I have a file robot_patterns with entries like
Googlebot
msnbot-media
YandexBot
bingbot
If I run the command grep -f robot_patterns *.log
I will get all the entries by bots matching the above patterns. My actual list has ~30 entries of bots and agents that I wish to ignore.
But I want to find all the entries that are NOT from bots. So I try grep -v -f robot_patterns *.log
and no results are returned by grep. This is not what I expect or desire, and I am not finding an obvious way to get what I want. When using the -v
option combined with multiple patterns in a file, grep will only return a matching line if it matches EVERY pattern.
3 Answers 3
You can try:
grep -vE 'Googlebot|msnbot-media|YandexBot|bingbot' yourlogfile
-
2Welcome to Unix & Linux. The OP has a list of approximately 30 strings that he wants to ignore, and the four that he presented as examples have an average length of ten characters each, so your command is likely to be over 300 characters long. This is likely to be hard to maintain (and even to read). Can you modify your answer to be driven by the OP’s list of strings? ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... P.S. Did you notice that the answer has been found? — The OP has learned how to get his original approach to work.G-Man Says 'Reinstate Monica'– G-Man Says 'Reinstate Monica'2015年10月30日 11:33:28 +00:00Commented Oct 30, 2015 at 11:33
-
2Why negatively evaluate my response? : /Orsius– Orsius2015年10月31日 10:17:54 +00:00Commented Oct 31, 2015 at 10:17
-
3Great answer. Has regex OR and the -vE option was helpful.Kirt Carson– Kirt Carson2017年03月31日 18:46:19 +00:00Commented Mar 31, 2017 at 18:46
-
4This is the answer to the question most people are probably trying to resolve.dsm– dsm2018年06月15日 07:57:00 +00:00Commented Jun 15, 2018 at 7:57
If there is an empty line in the patterns file it will match every line, causing no lines to be returned with -v
. This is because the lines are interpreted as regular expressions, and an empty regular expression will always match.
This isn't a problem with -F
however, because grep
ignores empty lines with -F
.
-F
causes grep
to interpret the lines as simple strings to search for and may speed up grep
if regular expressions aren't needed.
-
1GNU
fgrep
ignoring that trailing empty string was a bug that was fixed in 2.19 (commit 2d3832e1ff772dc1a374bfad5dcc1338350cc48b , so you shouldn't rely on it.Stéphane Chazelas– Stéphane Chazelas2015年10月29日 11:17:31 +00:00Commented Oct 29, 2015 at 11:17
The way I do it is to chain the greps:
cat text | grep -v Googlebot | grep -v msnbot-media | grep -v YandexBot | grep -v bingbot
-
Until now I did the same, but I believe, that
grep -v -e Googlebot -e YandexBot ...
is better. There is just one grep and 1 stream... You do not need firstcat
at all, usergrep -v Googlebot text
(as text is a file in this case IMHO)Betlista– Betlista2023年11月16日 10:49:23 +00:00Commented Nov 16, 2023 at 10:49
GNU grep 2.6.3
.