Ok, now this is starting to get interesting. I previously had
xen-netback statically compiled into the kernel.
It's hard to debug static drivers, so I changed it to compile as a
module. And lo and behold, the kernel oops disappeared.
It is not a stable solution though. Sometimes I still get the same
oops, it seems to be a race condition.
I'm running 2.6.32.14 from Jeremy's xen/stable-2.6.32.x
On 01.06.2010 11:53, Helmut Wieser wrote:
[
http://lists.xensource.com/archives/html/xen-devel/2010-05/msg01462.html
But as it's incomplete it didn't help me with my configuration.
I even tried to compile 2.6.32.14 and still have the same issue.
This is the relevant part of my drivers/xen/netback/netbus.c:
static int netback_uevent(struct xenbus_device *xdev, struct
kobj_uevent_env *env)
{
struct backend_info *be;
struct xen_netif *netif;
char *val;
DPRINTK("netback_uevent");
be = dev_get_drvdata(&xdev->dev);
if (!be)
return 0;
netif = be->netif;
val = xenbus_read(XBT_NIL, xdev->nodename, "script", NULL);
if (IS_ERR(val)) {
int err = PTR_ERR(val);
xenbus_dev_fatal(xdev, err, "reading script");
return err;
}
else {
if (add_uevent_var(env, "script=%s", val)) {
kfree(val);
return -ENOMEM;
}
kfree(val);
}
if (add_uevent_var(env, "vif=%s", netif->dev->name))
return -ENOMEM;
return 0;
}
This is the dmesg when I start a hvm domU for the first time:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000110
IP: [<ffffffff8123610a>] netback_uevent+0x8e/0xbf
PGD 1e2bc067 PUD 1dd03067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/vif-1-0/uevent
CPU 7
Modules linked in: bridge stp llc ipv6 xen_netfront firewire_sbp2
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm
snd_timer snd tpm_tis soundcore tpm serio_raw snd_page_alloc pcspkr
tpm_bios wmi firewire_ohci usb_storage firewire_core crc_itu_t tg3
floppy [last unloaded: scsi_wait_scan]
Pid: 2141, comm: udevd Not tainted 2.6.32.14 #6 HP Z600 Workstation
RIP: e030:[<ffffffff8123610a>] [<ffffffff8123610a>]
netback_uevent+0x8e/0xbf
RSP: e02b:ffff88001d21fda8 EFLAGS: 00010246
RAX: 00200000000000c1 RBX: ffff88001cccde00 RCX: 0000000000800046
RDX: ffff88001d7e3b00 RSI: ffffea00006739a8 RDI: 00200000000002c0
RBP: ffff88001d21fdc8 R08: 0000000000000000 R09: ffffffff815c7cf0
R10: ffff88001e292904 R11: ffff88001e292154 R12: ffff88001e292000
R13: 0000000000000000 R14: ffff88001d7e3b80 R15: ffff88001e7be000
FS: 00007f7591154790(0000) GS:ffff880002ca2000(0000)
knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000110 CR3: 000000001d240000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process udevd (pid: 2141, threadinfo ffff88001d21e000, task
ffff88001e6a16e0)
Stack:
ffff88001cccde40 ffff88001e292000 ffff88001cccde00 ffffffff815fe0e8
<0> ffff88001d21fdf8 ffffffff8122badc ffff88001cccde40
ffff88001e292000
<0> ffff88001fc49f60 ffff88001cccde50 ffff88001d21fe28
ffffffff81266772
Call Trace:
[<ffffffff8122badc>] xenbus_uevent_backend+0x90/0xab
[<ffffffff81266772>] dev_uevent+0x102/0x146
[<ffffffff81267459>] show_uevent+0x81/0xd8
[<ffffffff81266434>] dev_attr_show+0x22/0x49
[<ffffffff810a0e41>] ? __get_free_pages+0x9/0x46
[<ffffffff8112561c>] sysfs_read_file+0xac/0x12e
[<ffffffff810d459f>] vfs_read+0xa6/0x103
[<ffffffff810d46b2>] sys_read+0x45/0x69
[<ffffffff81012a82>] system_call_fastpath+0x16/0x1b
Code: c6 79 48 53 81 31 c0 4c 89 e7 e8 ea 03 f8 ff 85 c0 74 10 4c 89 f7
41 bc f4 ff ff ff e8 99 3e e9 ff eb 2d 4c 89 f7 e8 8f 3e e9 ff
<49> 8b 95 10 01 00 00 4c 89 e7 31 c0 48 c7 c6 83 48 53 81 41 bc
RIP [<ffffffff8123610a>] netback_uevent+0x8e/0xbf
RSP <ffff88001d21fda8>
CR2: 0000000000000110
---[ end trace 4f88c9bf70342ee1 ]---
I don't get it, because the patch is supposed to prevent null pointers.
Either xdev itself is corrupt, or returning corrupt data.
I'm stumped.
On 01.06.2010 09:03, Helmut Wieser wrote:
[
<maxk@xxxxxxxxxxxx>
[ 168.668587] device tap1.0 entered promiscuous mode
[ 168.668694] eth0: port 3(tap1.0) entering forwarding state
[ 168.688096] alloc irq_desc for 825 on node 0
[ 168.688170] alloc kstat_irqs on node 0
But the machine comes up for the first time and everything seems to be
working fine.
I use udevd 151.
On 31.05.2010 16:27, Niels Dettenbach wrote:
Am Montag 31 Mai 2010, 16:13:14 schrieb Helmut Wieser:
No, this doesn't help.
I'm currently trying to ditch the debian kernel and compiling one of
jeremy's kernels with a config close to the one from debian.
...you may try this:
1.) make shure udev is <=151 (i use 141 currently)
2.) set in your xen kernel (if not):
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_WIRELESS_EXT_SYSFS=y *
CONFIG_GPIO_SYSFS=y *
CONFIG_VIDEO_PVRUSB2_SYSFS=y
CONFIG_RTC_INTF_SYSFS=y
CONFIG_XEN_SYSFS=y
CONFIG_SYSFS=y
(* only if applies to your hardware)
(not shure if it's optimal but seems to work for me with 3.4x and 4.x)
=> reboot
3.) make a
mount -t sysfs sys /sys
=> if you still have any sysfs mounted you might try to unmount it before this
step
I have a line
sys /sys sysfs auto 0 0
in my fstab which seems to help...
May be this is widely waste but it seems to help me - so pls don't hit me...
;)
Another thing is that you might have fractions of your (to new) udev config
from before downgrading.
I'm working with gentoo which compiles things as i want so i'm not fully in
the view what your distributor and package management might does well and what
not with your (udev) configs...
may be this helps,
Niels.
-
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users