WARNING - OLD ARCHIVES

This is an archived copy of the Xen.org mailing list, which we have preserved to ensure that existing links to archives are not broken. The live archive, which contains the latest emails, can be found at http://lists.xen.org/
Xen

xen-devel

[Top] [All Lists]

Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL

To: "Ian Jackson" <Ian.Jackson@xxxxxxxxxxxxx>, "Juergen Gross" <juergen.gross@xxxxxxxxxxxxxx>
Subject: Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL
From: "Jan Beulich" <JBeulich@xxxxxxxxxx>
Date: 2011年3月14日 10:33:39 +0000
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
Delivery-date: 2011年3月14日 03:34:02 -0700
Envelope-to: www-data@xxxxxxxxxxxxxxxxxxx
In-reply-to: <19834.24888.630582.491364@xxxxxxxxxxxxxxxxxxxxxxxx>
List-help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-id: Xen developer discussion <xen-devel.lists.xensource.com>
List-post: <mailto:xen-devel@lists.xensource.com>
List-subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
List-unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>, <mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
References: <osstest-6374-mainreport@xxxxxxx> <19834.24888.630582.491364@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx
>>> On 11.03.11 at 18:51, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx> wrote:
> xen.org writes ("[Xen-devel] [xen-unstable test] 6374: regressions - FAIL"):
>> flight 6374 xen-unstable real [real]
>> Tests which did not succeed and are blocking:
>> test-amd64-i386-pv 5 xen-boot fail REGR. vs. 6369
>
> Xen crash in scheduler (non-credit2).
>
> Mar 11 13:46:53.646796 (XEN) Watchdog timer detects that CPU1 is stuck!
> Mar 11 13:46:57.922794 (XEN) ----[ Xen-4.1.0-rc7-pre x86_64 debug=y Not 
> tainted ]----
> Mar 11 13:46:57.931763 (XEN) CPU: 1
> Mar 11 13:46:57.931784 (XEN) RIP: e008:[<ffff82c480100140>] 
> __bitmap_empty+0x0/0x7f
> Mar 11 13:46:57.931817 (XEN) RFLAGS: 0000000000000047 CONTEXT: hypervisor
> Mar 11 13:46:57.946773 (XEN) rax: ffff82c4802d1ac0 rbx: ffff8301a7fafc78 
> rcx: 0000000000000002
> Mar 11 13:46:57.946813 (XEN) rdx: ffff82c4802d0cc0 rsi: 0000000000000080 
> rdi: ffff8301a7fafc78
> Mar 11 13:46:57.954780 (XEN) rbp: ffff8301a7fafcb8 rsp: ffff8301a7fafc00 
> r8: 0000000000000002
> Mar 11 13:46:57.966770 (XEN) r9: 0000ffff0000ffff r10: 00ff00ff00ff00ff 
> r11: 0f0f0f0f0f0f0f0f
> Mar 11 13:46:57.966805 (XEN) r12: ffff8301a7fafc68 r13: 0000000000000001 
> r14: 0000000000000001
> Mar 11 13:46:57.975780 (XEN) r15: ffff82c4802d1ac0 cr0: 000000008005003b 
> cr4: 00000000000006f0
> Mar 11 13:46:57.987771 (XEN) cr3: 00000000d7c9c000 cr2: 00000000c45e5770
> Mar 11 13:46:57.987800 (XEN) ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 
> 0000 cs: e008
> Mar 11 13:46:57.998773 (XEN) Xen stack trace from rsp=ffff8301a7fafc00:
>...
> Mar 11 13:46:58.154777 (XEN) Xen call trace:
> Mar 11 13:46:58.154798 (XEN) [<ffff82c480100140>] __bitmap_empty+0x0/0x7f
> Mar 11 13:46:58.163767 (XEN) [<ffff82c480119582>] csched_cpu_pick+0xe/0x10
> Mar 11 13:46:58.163802 (XEN) [<ffff82c480122c8d>] vcpu_migrate+0xfb/0x230
> Mar 11 13:46:58.178768 (XEN) [<ffff82c480122e24>] context_saved+0x62/0x7b
> Mar 11 13:46:58.178799 (XEN) [<ffff82c480157f17>] 
> context_switch+0xd98/0xdca
> Mar 11 13:46:58.183766 (XEN) [<ffff82c4801226b4>] schedule+0x5fc/0x624
> Mar 11 13:46:58.183795 (XEN) [<ffff82c480123837>] __do_softirq+0x88/0x99
> Mar 11 13:46:58.198784 (XEN) [<ffff82c4801238b2>] do_softirq+0x6a/0x7a
I suppose that's a result of 22957:c5c4688d5654 - as I understand it
exiting the loop is only possible if two consecutive invocations of
pick_cpu return the same result. This, however, is precisely what the
pCPU's idle_bias is supposed to prevent on hyper-threaded/multi-core
systems (so that it's not always the same entity that gets selected).
But even beyond that particular aspect, relying on any form of
"stability" of the returned value isn't correct.
Plus running pick_cpu repeatedly without actually using its result
is wrong wrt to idle_bias updating too - that's why
cached_vcpu_acct() calls _csched_cpu_pick() with the commit
argument set to false (which will result in a subsequent call -
through pick_cpu - with the argument set to true to be likely
to return the same value, but there's no correctness dependency
on that). So 22948:2d35823a86e7 already wasn't really correct
in putting a loop around pick_cpu.
It's also not clear to me what the surrounding
if ( old_lock == per_cpu(schedule_data, old_cpu).schedule_lock )
is supposed to filter, as the lock pointer gets set only when a
CPU gets brought up.
As I don't really understand what is being tried to achieve here,
I also can't really suggest a possible fix other than reverting both
offending changesets.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel
<Prev in Thread] Current Thread [Next in Thread>
Previous by Date: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with GPL PVdriver , Paul Durrant
Next by Date: RE: [Xen-devel] RE: Rather slow time of Ping in Windows with GPL PVdriver , Paul Durrant
Previous by Thread: Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL , Tim Deegan
Next by Thread: Re: [Xen-devel] [xen-unstable test] 6374: regressions - FAIL , Juergen Gross
Indexes: [Date] [Thread] [Top] [All Lists]

Copyright ©, Citrix Systems Inc. All rights reserved. Legal and Privacy
Citrix This site is hosted by Citrix

AltStyle によって変換されたページ (->オリジナル) /