Re: [PATCH v2 2/3] mm: Force update of mem cgroup soft limit tree on usage excess
From: Tim Chen
Date: Thu Feb 25 2021 - 17:50:57 EST
On 2/24/21 3:53 AM, Michal Hocko wrote:
>
On Mon 22-02-21 11:48:37, Tim Chen wrote:
>
>
>
>
>
> On 2/22/21 11:09 AM, Michal Hocko wrote:
>
>
>
>>>
>
>>> I actually have tried adjusting the threshold but found that it doesn't work well for
>
>>> the case with unenven memory access frequency between cgroups. The soft
>
>>> limit for the low memory event cgroup could creep up quite a lot, exceeding
>
>>> the soft limit by hundreds of MB, even
>
>>> if I drop the SOFTLIMIT_EVENTS_TARGET from 1024 to something like 8.
>
>>
>
>> What was the underlying reason? Higher order allocations?
>
>>
>
>
>
> Not high order allocation.
>
>
>
> The reason was because the run away memcg asks for memory much less often, compared
>
> to the other memcgs in the system. So it escapes the sampling update and
>
> was not put onto the tree and exceeds the soft limit
>
> pretty badly. Even if it was put onto the tree and gets page reclaimed below the
>
> limit, it could escape the sampling the next time it exceeds the soft limit.
>
>
I am sorry but I really do not follow. Maybe I am missing something
>
obvious but the the rate of events (charge/uncharge) shouldn't be really
>
important. There is no way to exceed the limit without charging memory
>
(either a new or via task migration in v1 and immigrate_on_move). If you
>
have SOFTLIMIT_EVENTS_TARGET 8 then you should be 128 * 8 events to
>
re-evaluate. Huge pages can make the runaway much bigger but how it
>
would be possible to runaway outside of that bound.
Michal,
Let's take an extreme case where memcg 1 always generate the
first event and memcg 2 generates the rest of 128*8-1 events
and the pattern repeat. The update tree happens on the 128*8th event
so memcg 1 did not trigger update tree. In this case we will
keep missing memcg 1's event and not put memcg 1 on the tree.
Something like this pattern of memory events
cg1 cg2 cg2 cg2 ....cg2 cg1 cg2 cg2 cg2....cg2 cg1 cg2 .....
^ ^
update tree update tree
Of course in real life the update events are random in nature.
However, due to the low occurrence of memcg 1 event, we can miss
updating it for a long time due to its lower probability of occurrence.
>
>
Btw. do we really need SOFTLIMIT_EVENTS_TARGET at all? Why cannot we
>
just stick with a single threshold? mem_cgroup_update_tree can be made
>
a effectivelly a noop when there is no soft limit in place so overhead
>
shouldn't matter for the vast majority of workloads.
>
I think there are two limits because the original code wants
memc_cgroup_threshold to be updated more frequently than the
soft_limit_tree. The soft limit tree update is more costly.
Tim