Package: grep;
Reported by: Aaron Zhou Qian <aaronzhouqian <at> ucla.edu>
Date: 2016年10月14日 15:36:02 UTC
Severity: normal
To reply to this bug, email your comments to 24689 AT debbugs.gnu.org.
the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-grep <at> gnu.org:bug#24689; Package grep.
(2016年10月14日 15:36:02 GMT) Full text and rfc822 format available.Aaron Zhou Qian <aaronzhouqian <at> ucla.edu>:bug-grep <at> gnu.org.
(2016年10月14日 15:36:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Aaron Zhou Qian <aaronzhouqian <at> ucla.edu> To: bug-grep <at> gnu.org Subject: Multithreaded grep Available Date: 2016年10月14日 08:14:42 -0700
[Message part 1 (text/plain, inline)]
Hi, I've multithreaded the grep at the file granularity when used with -r or -R on directories. The four files that I changed i.e. grep.c dosbuff.c dfasearch.c search.h under src/ is in this repository: https://github.com/AaronZhouQian/grep Please take a look at the README there. The default number of threads is the number of cores online in the system. To specify a custom number of threads use -p or --parallel followed by the number of threads. Currently multithreading does not support context i.e. -A -B -C I'd like to invite everyone to test this patch. Thank you, Aaron
[Message part 2 (text/html, inline)]
bug-grep <at> gnu.org:bug#24689; Package grep.
(2016年10月15日 05:05:02 GMT) Full text and rfc822 format available.Message #8 received at 24689 <at> debbugs.gnu.org (full text, mbox):
From: Jim Meyering <jim <at> meyering.net> To: Aaron Zhou Qian <aaronzhouqian <at> ucla.edu>, Zev Weiss <zev <at> bewilderbeest.net> Cc: 24689 <at> debbugs.gnu.org Subject: Re: bug#24689: Multithreaded grep Available Date: 2016年10月14日 22:03:52 -0700
On Fri, Oct 14, 2016 at 8:14 AM, Aaron Zhou Qian <aaronzhouqian <at> ucla.edu> wrote: > Hi, > > I've multithreaded the grep at the file granularity when used with -r or -R > on directories. The four files that I changed i.e. grep.c dosbuff.c > dfasearch.c search.h under src/ is in this repository: > > https://github.com/AaronZhouQian/grep > > Please take a look at the README there. > > > The default number of threads is the number of cores online in the system. > > To specify a custom number of threads use -p or --parallel followed by the > number of threads. > > Currently multithreading does not support context i.e. -A -B -C > > > I'd like to invite everyone to test this patch. Thank you for working on that. Before even looking at your patch, I have to ask whether you have filed copyright assignment papers with the FSF. If not, I can get you started. Secondly, did you know that Zev Weiss has submitted some patches to make grep multithreaded? He also helped to make dfa.c (now in gnulib) multithread safe recently. It's been a while, and I don't know the status of his patch set. At a minimum, please tell us how your approach compares to his. https://github.com/zevweiss/grep
bug-grep <at> gnu.org:bug#24689; Package grep.
(2017年1月10日 00:47:02 GMT) Full text and rfc822 format available.Message #11 received at 24689 <at> debbugs.gnu.org (full text, mbox):
From: Paul Eggert <eggert <at> cs.ucla.edu> To: Aaron Zhou Qian <aaronzhouqian <at> ucla.edu> Cc: 24689 <at> debbugs.gnu.org Subject: Fwd: bug#22239: New Project Date: Mon, 9 Jan 2017 16:46:30 -0800
Oops, sorry, I gave you the wrong bug number. I am forwarding this to Bug#24689, which is what I should have told you in the first place. -------- Forwarded Message -------- Subject: bug#22239: New Project Resent-Date: 2017年1月08日 06:05:02 +0000 Resent-From: Aaron Zhou Qian <aaronzhouqian <at> ucla.edu> Resent-CC: bug-grep <at> gnu.org Date: Sat, 7 Jan 2017 22:04:00 -0800 From: Aaron Zhou Qian <aaronzhouqian <at> ucla.edu> To: Paul Eggert <eggert <at> cs.ucla.edu>, 22239 <at> debbugs.gnu.org On Sat, Jan 7, 2017 at 9:29 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote: > > Could you remind us about the latest status of your proposal compared to > Zev Weiss's? Does <http://bugs.gnu.org/24689> contain the latest thing > you have? Zev Weiss's latest version is at <https://github.com/zevweiss/g > rep>. Comparing the two was the thing Jim Meyering asked for at < > http://bugs.gnu.org/22239#8>, and you can follow up by sending email to > 22239 <at> debbugs.gnu.org. Yes that github link is the latest version. I haven't made any changes to that since last year September. Basically the main thread traverses the file tree and assign the file to be searched to each thread. There is also a dynamic buffer so that the output is identical to the original grep program. I tested the program on a server. On a directory containing 4 files, grep -r on that directory is 4 times faster. On a directory containing 8 files, grep -r is 6 times faster. On a directory containing 12 files, grep -r is 8.5 times faster. I think using multithreading is essentially different from not using multithreading, and we also don't use multithreading all the time for grep. When we're not using multithreading, i.e. when we pass in other options for grep, more functions would call those functions whose function signatures we changed. This is hard to keep track of, because the program is fairly complicated. If we had overloading in C++ I would overload those functions. But since we don't, I made it very clear in the code which functions are the counterparts of the original versions. I did this to contain any potential problems so that if there are any problems with multithreading it would not affect the sequential program, whereas if we interleave the two scenarios we might lose track of what's going on. At least this is what I initially thought. I saw that there were some recent commits by Zev together with Jim, for example: in commit 9365ed6536d4fabf42ec17fef1bbe5d78884f950 * src/grep.c (compile_fp_t): Now returns an opaque pointer (the compiled pattern). (execute_fp_t): Now passed the pointer returned by a compile_fp_t. All call sites updated accordingly. (compiled_pattern): New static variable. * src/dfasearch.c (GEAcompile): Return a void pointer (dummy NULL). (EGexecute): Receive a void pointer argument (unused). * src/kwsearch.c (Fcompile): Return a void pointer (dummy NULL). (Fexecute): Receive a void pointer argument (unused). * src/pcresearch.c (Pcompile): Return a void pointer (dummy NULL). (Pexecute): Receive a void pointer argument (unused). * src/search.h: Update compile/execute function prototypes. So we have different approaches. They are trying to add extra pointer arguments for the multithreading case. The pointer argument would be NULL in the case multithreading is not in effect. Whereas my approach is to replicate the functions so the counterparts of the original functions are used in the multithreading scenario. This was done in an attempt to reduce the complexity of each of the functions and make the program less monolithic. I leave you guys to decide.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.