[one-users] Nebula 4.0.1 Xen 4.1.4 Debian Wheezy - MIGRATION problem..

Jacek Jarosiewicz nebula at supermedia.pl
Wed Jun 26 05:02:50 PDT 2013


I've created shared storage on both hosts, but live migration still 
gives error:

[2013-06-26 13:21:19 2721] DEBUG (XendCheckpoint:305) [xc_restore]: 
/usr/lib/xen-4.1/bin/xc_restore 18 4 1 2 0 0 0 0
[2013-06-26 13:24:29 2721] INFO (XendCheckpoint:423) xc: error: Failed 
to pin batch of 1024 page tables (22 = Invalid argument): Internal error
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:3071) 
XendDomainInfo.destroy: domid=4
[2013-06-26 13:24:30 2721] ERROR (XendDomainInfo:3085) 
XendDomainInfo.destroy: domain destruction failed.
Traceback (most recent call last):
   File "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendDomainInfo.py", 
line 3078, in destroy
     xc.domain_pause(self.domid)
Error: (3, 'No such process')
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2406) No device model
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2408) Releasing devices
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2414) Removing vif/0
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:1276) 
XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2414) Removing vkbd/0
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:1276) 
XendDomainInfo.destroyDevice: deviceClass = vkbd, device = vkbd/0
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2414) Removing console/0
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:1276) 
XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2414) Removing vbd/51712
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:1276) 
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51712
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2414) Removing vbd/51728
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:1276) 
XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51728
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:2414) Removing vfb/0
[2013-06-26 13:24:30 2721] DEBUG (XendDomainInfo:1276) 
XendDomainInfo.destroyDevice: deviceClass = vfb, device = vfb/0
[2013-06-26 13:24:30 2721] INFO (XendDomain:1126) Domain one-33 
(c80f42e8-c47e-8f96-a26c-0f98b966167b) deleted.
[2013-06-26 13:24:30 2721] ERROR (XendCheckpoint:357) 
/usr/lib/xen-4.1/bin/xc_restore 18 4 1 2 0 0 0 0 failed
Traceback (most recent call last):
   File "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py", 
line 309, in restore
     forkHelper(cmd, fd, handler.handler, True)
   File "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py", 
line 411, in forkHelper
     raise XendError("%s failed" % string.join(cmd))
XendError: /usr/lib/xen-4.1/bin/xc_restore 18 4 1 2 0 0 0 0 failed
[2013-06-26 13:24:30 2721] ERROR (XendDomain:1194) Restore failed
Traceback (most recent call last):
   File "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendDomain.py", 
line 1178, in domain_restore_fd
     dominfo = XendCheckpoint.restore(self, fd, paused=paused, 
relocating=relocating)
   File "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py", 
line 358, in restore
     raise exn
XendError: /usr/lib/xen-4.1/bin/xc_restore 18 4 1 2 0 0 0 0 failed


the shared storage is moosefs, mounted on both hosts, permissions are 
OK, but live migration still gives error. cold migration doesn't work 
either.. I've posted my logs on xen-users list, but no response :(

Help!
J

On 06/24/2013 12:52 PM, Javier Fontan wrote:
> I can not find a reason why cold migration is not working.
>
> Yes, live migration only works with shared storage.
>
> On Thu, Jun 13, 2013 at 11:03 AM, Jacek Jarosiewicz
> <nebula at supermedia.pl> wrote:
>> both hosts are exactly the same software-wise (same versions of OS, same
>> distributions, same versions of opennebula, same versions of xen).
>>
>> processors are different though, one host has Intel Xeon E5430, and the
>> other has Intel Core i5 760.
>>
>> so live migration can be done only with shared storage?
>>
>> J
>>
>>
>> On 06/13/2013 10:14 AM, Javier Fontan wrote:
>>>
>>> In live migration nobody copies the image, it needs to reside in a
>>> shared filesystem mounted in both hosts.
>>>
>>> The cold migration problem is a bit more tricky as it suspends the VM,
>>> copies everything and starts it again in the new host. Can you check
>>> that both hosts have the exact same version of xen? Check also that
>>> the processors are the same.
>>>
>>>
>>> On Thu, Jun 13, 2013 at 9:17 AM, Jacek Jarosiewicz <nebula at supermedia.pl>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> No, it's not a persistent disk. It's just a regular OS image.
>>>> Yes - it doesn't get copied to the other nebula host. But it seems like
>>>> it
>>>> doesn't even try to copy the image. The live migration error shows almost
>>>> immediately. And the vm keeps running on the original host.
>>>>
>>>> I'm not entirely sure if it's nebula's job to copy the image, or is it
>>>> Xen's
>>>> job..?
>>>>
>>>> And the other - cold migration - it doesn't work either.. :(
>>>> It copies the image and the checkpoint file to the other host, but then
>>>> when
>>>> it tries to boot the VM I get the error below..
>>>>
>>>> Cheers,
>>>> J
>>>>
>>>>
>>>> On 12.06.2013 18:29, Javier Fontan wrote:
>>>>>
>>>>>
>>>>> It looks that it can not find a image file:
>>>>>
>>>>> VmError: Device 51712 (vbd) could not be connected.
>>>>> /var/lib/one//datastores/0/28/disk.0 does not exist.
>>>>>
>>>>> Is that image a persistent disk? In that case, is it located in a
>>>>> shared datastore that is not mounted in that host?
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Jun 12, 2013 at 3:13 PM, Jacek Jarosiewicz
>>>>> <nebula at supermedia.pl>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a problem with migrating VMs between hosts. Both cold and live
>>>>>> migration.
>>>>>>
>>>>>> Cold migration log is:
>>>>>> Wed Jun 12 12:32:24 2013 [LCM][I]: New VM state is RUNNING
>>>>>> Wed Jun 12 12:32:41 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 12:39:56 2013 [LCM][I]: New VM state is SAVE_MIGRATE
>>>>>> Wed Jun 12 12:40:40 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 12:40:40 2013 [VMM][I]: Successfully execute virtualization
>>>>>> driver operation: save.
>>>>>> Wed Jun 12 12:40:40 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 12:40:40 2013 [VMM][I]: Successfully execute network driver
>>>>>> operation: clean.
>>>>>> Wed Jun 12 12:40:40 2013 [LCM][I]: New VM state is PROLOG_MIGRATE
>>>>>> Wed Jun 12 12:40:40 2013 [TM][I]: ExitCode: 0
>>>>>> Wed Jun 12 12:41:18 2013 [LCM][E]: monitor_done_action, VM in a wrong
>>>>>> state
>>>>>> Wed Jun 12 12:46:29 2013 [LCM][E]: monitor_done_action, VM in a wrong
>>>>>> state
>>>>>> Wed Jun 12 12:51:40 2013 [LCM][E]: monitor_done_action, VM in a wrong
>>>>>> state
>>>>>> Wed Jun 12 12:56:09 2013 [TM][I]: mv: Moving
>>>>>> nebula1:/var/lib/one/datastores/0/29 to
>>>>>> nebula0:/var/lib/one/datastores/0/29
>>>>>> Wed Jun 12 12:56:09 2013 [TM][I]: ExitCode: 0
>>>>>> Wed Jun 12 12:56:09 2013 [LCM][I]: New VM state is BOOT
>>>>>> Wed Jun 12 12:56:09 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 12:56:09 2013 [VMM][I]: Successfully execute network driver
>>>>>> operation: pre.
>>>>>> Wed Jun 12 12:56:32 2013 [VMM][I]: Command execution fail:
>>>>>> /var/tmp/one/vmm/xen4/restore /var/lib/one//datastores/0/29/checkpoint
>>>>>> nebula0 29 nebula0
>>>>>> Wed Jun 12 12:56:32 2013 [VMM][E]: restore: Command "sudo /usr/sbin/xm
>>>>>> restore /var/lib/one//datastores/0/29/checkpoint" failed: Error:
>>>>>> /usr/lib/xen-4.1/bin/xc_restore 23 12 1 2 0 0 0 0 failed
>>>>>> Wed Jun 12 12:56:32 2013 [VMM][E]: Could not restore from
>>>>>> /var/lib/one//datastores/0/29/checkpoint
>>>>>> Wed Jun 12 12:56:32 2013 [VMM][I]: ExitCode: 1
>>>>>> Wed Jun 12 12:56:32 2013 [VMM][I]: Failed to execute virtualization
>>>>>> driver
>>>>>> operation: restore.
>>>>>> Wed Jun 12 12:56:32 2013 [VMM][E]: Error restoring VM: Could not
>>>>>> restore
>>>>>> from /var/lib/one//datastores/0/29/checkpoint
>>>>>> Wed Jun 12 12:56:33 2013 [DiM][I]: New VM state is FAILED
>>>>>>
>>>>>> and in xend.log i see:
>>>>>>
>>>>>> [2013-06-12 12:56:32 24698] ERROR (XendCheckpoint:357)
>>>>>> /usr/lib/xen-4.1/bin/xc_restore 23 12 1 2 0 0 0 0 failed
>>>>>> Traceback (most recent call last):
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py",
>>>>>> line
>>>>>> 309, in restore
>>>>>>        forkHelper(cmd, fd, handler.handler, True)
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py",
>>>>>> line
>>>>>> 411, in forkHelper
>>>>>>        raise XendError("%s failed" % string.join(cmd))
>>>>>> XendError: /usr/lib/xen-4.1/bin/xc_restore 23 12 1 2 0 0 0 0 failed
>>>>>> [2013-06-12 12:56:32 24698] ERROR (XendDomain:1194) Restore failed
>>>>>> Traceback (most recent call last):
>>>>>>      File "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendDomain.py",
>>>>>> line
>>>>>> 1178, in domain_restore_fd
>>>>>>        dominfo = XendCheckpoint.restore(self, fd, paused=paused,
>>>>>> relocating=relocating)
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py",
>>>>>> line
>>>>>> 358, in restore
>>>>>>        raise exn
>>>>>> XendError: /usr/lib/xen-4.1/bin/xc_restore 23 12 1 2 0 0 0 0 failed
>>>>>>
>>>>>>
>>>>>> ..and with live migration i see:
>>>>>>
>>>>>> Wed Jun 12 12:27:16 2013 [LCM][I]: New VM state is RUNNING
>>>>>> Wed Jun 12 12:27:32 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 13:34:26 2013 [LCM][I]: New VM state is MIGRATE
>>>>>> Wed Jun 12 13:34:26 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 13:34:26 2013 [VMM][I]: Successfully execute transfer
>>>>>> manager
>>>>>> driver operation: tm_premigrate.
>>>>>> Wed Jun 12 13:34:26 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 13:34:26 2013 [VMM][I]: Successfully execute network driver
>>>>>> operation: pre.
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: Successfully execute virtualization
>>>>>> driver operation: migrate.
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: Successfully execute network driver
>>>>>> operation: clean.
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: Successfully execute network driver
>>>>>> operation: post.
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: ExitCode: 0
>>>>>> Wed Jun 12 13:37:34 2013 [VMM][I]: Successfully execute transfer
>>>>>> manager
>>>>>> driver operation: tm_postmigrate.
>>>>>> Wed Jun 12 13:37:35 2013 [LCM][I]: New VM state is RUNNING
>>>>>>
>>>>>> and in xend.log:
>>>>>>
>>>>>> [2013-06-12 13:37:39 9651] ERROR (XendCheckpoint:357) Device 51712
>>>>>> (vbd)
>>>>>> could not be connected. /var/lib/one//datastores/0/28/disk.0 does not
>>>>>> exist.
>>>>>> Traceback (most recent call last):
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py",
>>>>>> line
>>>>>> 346, in restore
>>>>>>        dominfo.waitForDevices() # Wait for backends to set up
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendDomainInfo.py",
>>>>>> line
>>>>>> 1237, in waitForDevices
>>>>>>        self.getDeviceController(devclass).waitForDevices()
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/server/DevController.py",
>>>>>> line
>>>>>> 140, in waitForDevices
>>>>>>        return map(self.waitForDevice, self.deviceIDs())
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/server/DevController.py",
>>>>>> line
>>>>>> 165, in waitForDevice
>>>>>>        "%s" % (devid, self.deviceClass, err))
>>>>>> VmError: Device 51712 (vbd) could not be connected.
>>>>>> /var/lib/one//datastores/0/28/disk.0 does not exist.
>>>>>> [2013-06-12 13:37:39 9651] ERROR (XendDomain:1194) Restore failed
>>>>>> Traceback (most recent call last):
>>>>>>      File "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendDomain.py",
>>>>>> line
>>>>>> 1178, in domain_restore_fd
>>>>>>        dominfo = XendCheckpoint.restore(self, fd, paused=paused,
>>>>>> relocating=relocating)
>>>>>>      File
>>>>>> "/usr/lib/xen-4.1/bin/../lib/python/xen/xend/XendCheckpoint.py",
>>>>>> line
>>>>>> 358, in restore
>>>>>>        raise exn
>>>>>> VmError: Device 51712 (vbd) could not be connected.
>>>>>> /var/lib/one//datastores/0/28/disk.0 does not exist.
>>>>>>
>>>>>> any help would be appreciated..
>>>>>>
>>>>>> Cheers,
>>>>>> J
>>>>>>
>>>>>> --
>>>>>> Jacek Jarosiewicz
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opennebula.org
>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Jacek Jarosiewicz
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Jacek Jarosiewicz
>
>
>


-- 
Jacek Jarosiewicz



More information about the Users mailing list