I'm deploying a 2-node Pacemaker/DRBD backed Xen cluster to run a
mixture of Linux PVM and Windows HVM VMs. I have this up and running on
a pair of development machines, with both automatic and manual failover
working perfectly. The live migrations work every time for the PVM and
HVM based VMs.
I've replicated the setup onto a pair of high-end live machines, but the
live migrations only succeed around 10% of the time for the HVM VMs. PVM
live migrations complete every time. The configurations on the
development and live machines are identical in every way, except for the
physical hardware.
The migrating host errors with the following when the migration fails:
[2010年04月07日 14:42:45 6211] DEBUG (XendCheckpoint:103) [xc_save]:
/usr/lib64/xen/bin/xc_save 30 18 0 0 5
[2010年04月07日 14:42:45 6211] INFO (XendCheckpoint:403) xc_save: could not
read suspend event channel
[2010年04月07日 14:42:45 6211] WARNING (XendDomainInfo:1617) Domain has
crashed: name=migrating-web id=18.
[2010年04月07日 14:42:45 6211] DEBUG (XendDomainInfo:2389)
XendDomainInfo.destroy: domid=18
[2010年04月07日 14:42:45 6211] DEBUG (XendDomainInfo:2406)
XendDomainInfo.destroyDomain(18)
[2010年04月07日 14:42:48 6211] DEBUG (XendDomainInfo:1939) Destroying device
model
[2010年04月07日 14:42:48 6211] INFO (XendCheckpoint:403) Saving memory
pages: iter 1 10%ERROR Internal error: Error peeking shadow bitmap
[2010年04月07日 14:42:48 6211] INFO (XendCheckpoint:403) Warning - couldn't
disable shadow modeSave exit rc=1
[2010年04月07日 14:42:48 6211] ERROR (XendCheckpoint:157) Save failed on
domain web (18) - resuming.
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/xen/xend/XendCheckpoint.py",
line 125, in save
forkHelper(cmd, fd, saveInputHandler, False)
File "/usr/lib/python2.5/site-packages/xen/xend/XendCheckpoint.py",
line 391, in forkHelper
raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib64/xen/bin/xc_save 30 18 0 0 5 failed
With the below also being logged in /var/log/xen/qemu-dm-web.log:
xenstore_process_logdirty_event: key=000000006b8b4567 size=335816
Log-dirty: mapped segment at 0x7fb56c136000
Triggered log-dirty buffer switch
The host that is being migrated to errors with the following:
[2010年04月07日 14:42:45 6227] INFO (XendCheckpoint:403) Reloading memory
pages: 0%
[2010年04月07日 14:42:48 6227] INFO (XendCheckpoint:403) ERROR Internal
error: Error when reading batch size
[2010年04月07日 14:42:48 6227] INFO (XendCheckpoint:403) Restore exit with rc=1
[2010年04月07日 14:42:48 6227] DEBUG (XendDomainInfo:2389)
XendDomainInfo.destroy: domid=26
[2010年04月07日 14:42:48 6227] DEBUG (XendDomainInfo:2406)
XendDomainInfo.destroyDomain(26)
[2010年04月07日 14:42:48 6227] ERROR (XendDomainInfo:2418)
XendDomainInfo.destroy: xc.domain_destroy failed.
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/xen/xend/XendDomainInfo.py",
line 2413, in destroyDomain
xc.domain_destroy(self.domid)
Error: (3, 'No such process')
Some basic config details:
Xen version: 3.3.0
Kernel: 2.6.24-27-xen
dom0 OS: Ubuntu 8.04 64-bit
domU OS: Windows 2008 64-bit
VM config for the above example:
name = "web"
kernel = "/usr/lib/xen/boot/hvmloader"
builder='hvm'
memory = 10240
shadow_memory = 8
vif = [ 'bridge=eth1' ]
acpi = 1
apic = 1
disk = [ 'phy:/dev/drbd0,hda,w', 'phy:/dev/drbd1,hdb,w' ]
device_model = '/usr/lib64/xen/bin/qemu-dm'
boot="dc"
sdl=0
vnc=1
vncconsole=1
vncpasswd='XXXXXXXXXXXX'
serial='pty'
usbdevice='tablet'
vcpus=8
on_poweroff = 'destroy'
on_reboot = 'restart'
on_crash = 'destroy'
The DRBD resources are handled by Jefferson Ogata's qemu-dm.drbd wrapper
(http://www.antibozo.net/xen/qemu-dm.drbd) and a slightly modified
version of DRBD's block-drbd script.
The dom0 machines are allocated 1GB of memory each and are identical, in
both software and hardware configurations. Each machine has a total of
24GB of memory.
Thanks
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users