Re: [RFC] IO scheduler based IO controller V9
From: Jerome Marchand
Date: Fri Sep 11 2009 - 09:18:38 EST
Vivek Goyal wrote:
>
On Thu, Sep 10, 2009 at 04:52:27PM -0400, Vivek Goyal wrote:
>
> On Thu, Sep 10, 2009 at 05:18:25PM +0200, Jerome Marchand wrote:
>
>> Vivek Goyal wrote:
>
>>> Hi All,
>
>>>
>
>>> Here is the V9 of the IO controller patches generated on top of 2.6.31-rc7.
>
>>
>
>> Hi Vivek,
>
>>
>
>> I've run some postgresql benchmarks for io-controller. Tests have been
>
>> made with 2.6.31-rc6 kernel, without io-controller patches (when
>
>> relevant) and with io-controller v8 and v9 patches.
>
>> I set up two instances of the TPC-H database, each running in their
>
>> own io-cgroup. I ran two clients to these databases and tested on each
>
>> that simple request:
>
>> $ select count(*) from LINEITEM;
>
>> where LINEITEM is the biggest table of TPC-H (6001215 entries,
>
>> 720MB). That request generates a steady stream of IOs.
>
>>
>
>> Time is measure by psql (\timing switched on). Each test is run twice
>
>> or more if there is any significant difference between the first two
>
>> runs. Before each run, the cache is flush:
>
>> $ echo 3 > /proc/sys/vm/drop_caches
>
>>
>
>>
>
>> Results with 2 groups of same io policy (BE) and same io weight (1000):
>
>>
>
>> w/o io-scheduler io-scheduler v8 io-scheduler v9
>
>> first second first second first second
>
>> DB DB DB DB DB DB
>
>>
>
>> CFQ 48.4s 48.4s 48.2s 48.2s 48.1s 48.5s
>
>> Noop 138.0s 138.0s 48.3s 48.4s 48.5s 48.8s
>
>> AS 46.3s 47.0s 48.5s 48.7s 48.3s 48.5s
>
>> Deadl. 137.1s 137.1s 48.2s 48.3s 48.3s 48.5s
>
>>
>
>> As you can see, there is no significant difference for CFQ
>
>> scheduler.
>
> Thanks Jerome.
>
>
>
>> There is big improvement for noop and deadline schedulers
>
>> (why is that happening?).
>
> I think because now related IO is in a single queue and it gets to run
>
> for 100ms or so (like CFQ). So previously, IO from both the instances
>
> will go into a single queue which should lead to more seeks as requests
>
> from two groups will kind of get interleaved.
>
>
>
> With io controller, both groups have separate queues so requests from
>
> both the data based instances will not get interleaved (This almost
>
> becomes like CFQ where ther are separate queues for each io context
>
> and for sequential reader, one io context gets to run nicely for certain
>
> ms based on its priority).
>
>
>
>> The performance with anticipatory scheduler
>
>> is a bit lower (~4%).
>
>>
>
>
Hi Jerome,
>
>
Can you also run the AS test with io controller patches and both the
>
database in root group (basically don't put them in to separate group). I
>
suspect that this regression might come from that fact that we now have
>
to switch between queues and in AS we wait for request to finish from
>
previous queue before next queue is scheduled in and probably that is
>
slowing down things a bit.., just a wild guess..
>
Hi Vivek,
I guess that's not the reason. I got 46.6s for both DB in root group with
io-controller v9 patches. I also rerun the test with DB in different groups
and found about the same result as above (48.3s and 48.6s).
Jerome
>
Thanks
>
Vivek
>
>
> I will run some tests with AS and see if I can reproduce this lower
>
> performance and attribute it to a particular piece of code.
>
>
>
>> Results with 2 groups of same io policy (BE), different io weights and
>
>> CFQ scheduler:
>
>> io-scheduler v8 io-scheduler v9
>
>> weights = 1000, 500 35.6s 46.7s 35.6s 46.7s
>
>> weigths = 1000, 250 29.2s 45.8s 29.2s 45.6s
>
>>
>
>> The result in term of fairness is close to what we can expect from the
>
>> ideal theoric case: with io weights of 1000 and 500 (1000 and 250),
>
>> the first request get 2/3 (4/5) of io time as long as it runs and thus
>
>> finish in about 3/4 (5/8) of total time.
>
>>
>
> Jerome, after 36.6 seconds, disk will be fully given to second group.
>
> Hence these times might not reflect the accurate measure of who got how
>
> much of disk time.
>
>
>
> Can you just capture the output of "io.disk_time" file in both the cgroups
>
> at the time of completion of task in higher weight group. Alternatively,
>
> you can just run this a script in a loop which prints the output of
>
> "cat io.disk_time | grep major:minor" every 2 seconds. That way we can
>
> see how disk times are being distributed between groups.
>
>
>
>> Results with 2 groups of different io policies, same io weight and
>
>> CFQ scheduler:
>
>> io-scheduler v8 io-scheduler v9
>
>> policy = RT, BE 22.5s 45.3s 22.4s 45.0s
>
>> policy = BE, IDLE 22.6s 44.8s 22.4s 45.0s
>
>>
>
>> Here again, the result in term of fairness is very close from what we
>
>> expect.
>
> Same as above in this case too.
>
>
>
> These seem to be good test for fairness measurement in case of streaming
>
> readers. I think one more interesting test case will be do how are the
>
> random read latencies in case of multiple streaming readers present.
>
>
>
> So if we can launch 4-5 dd processes in one group and then issue some
>
> random small queueries on postgresql in second group, I am keen to see
>
> how quickly the query can be completed with and without io controller.
>
> Would be interesting to see at results for all 4 io schedulers.
>
>
>
> Thanks
>
> Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
http://vger.kernel.org/majordomo-info.html
Please read the FAQ at
http://www.tux.org/lkml/