Re: BFS vs. mainline scheduler benchmarks and measurements
From: Jens Axboe
Date: Wed Sep 09 2009 - 07:54:36 EST
On Wed, Sep 09 2009, Jens Axboe wrote:
>
On Wed, Sep 09 2009, Mike Galbraith wrote:
>
> On Wed, 2009年09月09日 at 08:13 +0200, Ingo Molnar wrote:
>
> > * Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
>
> >
>
> > > On Tue, Sep 08 2009, Peter Zijlstra wrote:
>
> > > > On Tue, 2009年09月08日 at 11:13 +0200, Jens Axboe wrote:
>
> > > > > And here's a newer version.
>
> > > >
>
> > > > I tinkered a bit with your proglet and finally found the
>
> > > > problem.
>
> > > >
>
> > > > You used a single pipe per child, this means the loop in
>
> > > > run_child() would consume what it just wrote out until it got
>
> > > > force preempted by the parent which would also get woken.
>
> > > >
>
> > > > This results in the child spinning a while (its full quota) and
>
> > > > only reporting the last timestamp to the parent.
>
> > >
>
> > > Oh doh, that's not well thought out. Well it was a quick hack :-)
>
> > > Thanks for the fixup, now it's at least usable to some degree.
>
> >
>
> > What kind of latencies does it report on your box?
>
> >
>
> > Our vanilla scheduler default latency targets are:
>
> >
>
> > single-core: 20 msecs
>
> > dual-core: 40 msecs
>
> > quad-core: 60 msecs
>
> > opto-core: 80 msecs
>
> >
>
> > You can enable CONFIG_SCHED_DEBUG=y and set it directly as well via
>
> > /proc/sys/kernel/sched_latency_ns:
>
> >
>
> > echo 10000000 > /proc/sys/kernel/sched_latency_ns
>
>
>
> He would also need to lower min_granularity, otherwise, it'd be larger
>
> than the whole latency target.
>
>
>
> I'm testing right now, and one thing that is definitely a problem is the
>
> amount of sleeper fairness we're giving. A full latency is just too
>
> much short term fairness in my testing. While sleepers are catching up,
>
> hogs languish. That's the biggest issue going on.
>
>
>
> I've also been doing some timings of make -j4 (looking at idle time),
>
> and find that child_runs_first is mildly detrimental to fork/exec load,
>
> as are buddies.
>
>
>
> I'm running with the below at the moment. (the kthread/workqueue thing
>
> is just because I don't see any reason for it to exist, so consider it
>
> to be a waste of perfectly good math;)
>
>
Using latt, it seems better than -rc9. The below are entries logged
>
while running make -j128 on a 64 thread box. I did two runs on each, and
>
latt is using 8 clients.
>
>
-rc9
>
Max 23772 usec
>
Avg 1129 usec
>
Stdev 4328 usec
>
Stdev mean 117 usec
>
>
Max 32709 usec
>
Avg 1467 usec
>
Stdev 5095 usec
>
Stdev mean 136 usec
>
>
-rc9 + patch
>
>
Max 11561 usec
>
Avg 1532 usec
>
Stdev 1994 usec
>
Stdev mean 48 usec
>
>
Max 9590 usec
>
Avg 1550 usec
>
Stdev 2051 usec
>
Stdev mean 50 usec
>
>
max latency is way down, and much smaller variation as well.
Things are much better with this patch on the notebook! I cannot compare
with BFS as that still doesn't run anywhere I want it to run, but it's
way better than -rc9-git stock. latt numbers on the notebook have 1/3
the max latency, average is lower, and stddev is much smaller too.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/