Hello Stefan<br><br>We are also having the same issue. But when I use opennebula to suspend and resume, I am able to access the console and logon to VM.<br><br>Here are our setup details<br><br>Host information using facter<br>

---------------------------------------------------<br>kernel =&gt; Linux<br>kernelrelease =&gt; 2.6.31-16-server<br>lsbdistcodename =&gt; karmic<br>lsbdistdescription =&gt; Ubuntu 9.10<br>--------------------------------------------------<br>

Libvirt version<br><br>  libvirt-bin                       0.7.0-1ubuntu13.1                          <br>  libvirt0                          0.7.0-1ubuntu13.1 <br>

<br>Opennebula 1.2<br><br>VM_DIR =&quot;/nfs/path/to/storage&quot;<br>Transfer Manager=NFS<br><br>This is actually a bug in Libivirt for which Redhat has released a fix long back but Ubuntu has this Fix only in Lucid.  Lucid  is scheduled for release on April 29, 2010.<br>

<br><a href="https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/448674" target="_blank">https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/448674</a><br>

<br>Since it takes a while to test Lucid completely before using it in production we are going with the work around.<br><br>Here are the logs from opennebula. <br><br>user@managementnode:~$ onevm list<br>  ID     NAME STAT CPU     MEM        HOSTNAME        TIME<br>

 686 migratev runn   0  262144    10.10.20.159 01 22:25:50<br>user@managementnode:~$ onevm suspend 686<br><br>Fri Apr  2 11:56:53 2010 [LCM][I]: New VM state is SAVE_SUSPEND<br>Fri Apr  2 11:58:35 2010 [VMM][I]: Connecting to uri: qemu:///system<br>

Fri Apr  2 11:58:35 2010 [VMM][I]: ExitCode: 0<br>Fri Apr  2 11:58:35 2010 [DiM][I]: New VM state is SUSPENDED<br><br><br>user@host:/nfs/path/to/storage/686$ ls -al images/<br>total 84556<br>drwxrwxrwx  2 oneadmin nogroup        5 2010-04-02 16:53 .<br>

drwxr-xr-x+ 3 oneadmin nogroup        3 2010-03-31 18:27 ..<br>-rw-------+ 1 root     root    92243033 2010-04-02 16:54 checkpoint<br>-rw-r--r--+ 1 oneadmin nogroup      549 2010-03-31 18:27 deployment.0<br>lrwxrwxrwx  1 oneadmin nogroup       34 2010-03-31 18:27 disk.0 -&gt; /nfs/path/to/storage/images/migratevm0<br>

<br><br>user@managementnode:~$ onevm list<br> 686 migratev susp   0  262144    10.10.20.159 01 22:29:53<br><br>unable to connect to host. connection refused 111<br><br>user@managementnode:~$ onevm resume 686<br><br>Fri Apr  2 12:02:00 2010 [DiM][I]: New VM state is ACTIVE.<br>

Fri Apr  2 12:02:00 2010 [LCM][I]: Restoring VM<br>Fri Apr  2 12:02:00 2010 [LCM][I]: New state is BOOT<br>Fri Apr  2 12:02:01 2010 [VMM][I]: Connecting to uri: qemu:///system<br>Fri Apr  2 12:02:01 2010 [VMM][I]: ExitCode: 0<br>

Fri Apr  2 12:02:01 2010 [LCM][I]: New VM state is RUNNING<br><br><br>Ranga<br><br><div class="gmail_quote">On Wed, Mar 24, 2010 at 7:55 AM, Harder, Stefan <span dir="ltr">&lt;<a href="mailto:Stefan.Harder@fokus.fraunhofer.de" target="_blank">Stefan.Harder@fokus.fraunhofer.de</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hi Javier,<br>

<br>

thanks for your answer.<br>

<br>

The state in virsh on the node we livemigrate the VM to is &quot;running&quot;. And on the old node the VM disappears. There are no logs which show some unusual behavior inside the VM.<br>

<br>

If we do suspend via OpenNebula the VM goes into susp state but the log shows an error:<br>

<br>

*****BEGIN*****<br>

Wed Mar 24 14:46:30 2010 [DiM][D]: Suspending VM 111<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 Command execution fail: &#39;touch /srv/cloud/one/var/111/images/checkpoint;virsh --connect qemu:///system save one-111 /srv/cloud/one/var/111/images/checkpoint&#39;<br>


<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 STDERR follows.<br>

<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 /usr/lib/ruby/1.8/open3.rb:67: warning: Insecure world writable dir /srv/cloud/one in PATH, mode 040777<br>

<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 Connecting to uri: qemu:///system<br>

<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 error: Failed to save domain one-111 to /srv/cloud/one/var/111/images/checkpoint<br>

<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 error: operation failed: failed to create &#39;/srv/cloud/one/var/111/images/checkpoint&#39;<br>

<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 ExitCode: 1<br>

<br>

Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: SAVE FAILURE 111 -<br>

*****END*****<br>

<br>

<br>

<br>

<br>

If we then try to resume the VM the state changes to fail and the log shows:<br>

<br>

<br>

<br>

<br>

<br>

*****BEGIN*****<br>

Wed Mar 24 14:49:43 2010 [DiM][D]: Resuming VM 111<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 Command execution fail: virsh --connect qemu:///system restore /srv/cloud/one/var/111/images/checkpoint<br>

<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 STDERR follows.<br>

<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 /usr/lib/ruby/1.8/open3.rb:67: warning: Insecure world writable dir /srv/cloud/one in PATH, mode 040777<br>

<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 Connecting to uri: qemu:///system<br>

<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 error: Failed to restore domain from /srv/cloud/one/var/111/images/checkpoint<br>

<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 error: operation failed: cannot read domain image<br>

<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 ExitCode: 1<br>

<br>

Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: RESTORE FAILURE 111 -<br>

<br>

Wed Mar 24 14:49:44 2010 [TM][D]: Message received: LOG - 111 tm_delete.sh: Deleting /srv/cloud/one/var/111/images<br>

<br>

Wed Mar 24 14:49:44 2010 [TM][D]: Message received: LOG - 111 tm_delete.sh: Executed &quot;rm -rf /srv/cloud/one/var/111/images&quot;.<br>

*****END*****<br>

<br>

<br>

<br>

If we do it directly via virsh the VM resumes and it runs like before. This is not a VNC issue since if we ping the machine the whole time it answers not until suspending and resuming it via virsh on the physical node.<br>


<br>

We faced some other problems compiling a newer version of the libvirt from sources (since we thought the ubuntu packaged version may be too old). Which system configuration and package versions do you use? We thought about a clean new installation on Ubuntu 9.04 since we use 9.10 now.<br>


<br>

Best,<br>

<br>

Stefan<br>

<br>

<br>

&gt; -----Ursprüngliche Nachricht-----<br>

&gt; Von: Javier Fontan [mailto:<a href="mailto:jfontan@gmail.com" target="_blank">jfontan@gmail.com</a>]<br>

&gt; Gesendet: Mittwoch, 24. März 2010 12:33<br>

&gt; An: Harder, Stefan<br>

&gt; Cc: <a href="mailto:users@lists.opennebula.org" target="_blank">users@lists.opennebula.org</a><br>

&gt; Betreff: Re: [one-users] VMs freezing after livemigrating<br>

<div><div></div><div>&gt;<br>

&gt; Hello,<br>

&gt;<br>

&gt; I never had that problem myself. Can you check that the state in vish<br>

&gt; is running? I suppose you check that the VM is frozen connecting using<br>

&gt; VNC. Can you also check in your unfrozen machine logs for any strange<br>

&gt; message dealing with cpu or something that can be stopping it from<br>

&gt; awaking again?<br>

&gt;<br>

&gt; Bye<br>

&gt;<br>

&gt;<br>

&gt; On Thu, Mar 18, 2010 at 11:47 AM, Harder, Stefan<br>

&gt; &lt;<a href="mailto:Stefan.Harder@fokus.fraunhofer.de" target="_blank">Stefan.Harder@fokus.fraunhofer.de</a>&gt; wrote:<br>

&gt; &gt; Hi,<br>

&gt; &gt;<br>

&gt; &gt; after solving some issues livemigrating works in my testenvironment<br>

&gt; (3<br>

&gt; &gt; servers, one of them is the cloud controller and node at the same<br>

&gt; time<br>

&gt; &gt; and the other two are only nodes). The problem I have now is that the<br>

&gt; &gt; VMs freeze after livemigrating. The only way to get them back alive<br>

&gt; is<br>

&gt; &gt; to do a &quot;virsh suspend &lt;name&gt;&quot; and &quot;virsh resume &lt;name&gt;&quot; on the<br>

&gt; physical<br>

&gt; &gt; node where the VM was migrated to. Is this issue or even a solution<br>

&gt; &gt; known to you?<br>

&gt; &gt;<br>

&gt; &gt; Best regards,<br>

&gt; &gt;<br>

&gt; &gt; Stefan<br>

&gt; &gt; _______________________________________________<br>

&gt; &gt; Users mailing list<br>

&gt; &gt; <a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>

&gt; &gt; <a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>

&gt; &gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; --<br>

&gt; Javier Fontan, Grid &amp; Virtualization Technology Engineer/Researcher<br>

&gt; DSA Research Group: <a href="http://dsa-research.org" target="_blank">http://dsa-research.org</a><br>

&gt; Globus GridWay Metascheduler: <a href="http://www.GridWay.org" target="_blank">http://www.GridWay.org</a><br>

&gt; OpenNebula Virtual Infrastructure Engine: <a href="http://www.OpenNebula.org" target="_blank">http://www.OpenNebula.org</a><br>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>

<a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>

</div></div></blockquote></div><br>