<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.E-MailFormatvorlage17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;}
@page Section1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.Section1
{page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=DE link=blue vlink=purple>
<div class=Section1>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Hi Ranga,<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>we set up our environment again with Ubuntu Server 9.04 64bit
since we’ve read that these freezing issues only come up with Ubuntu 9.10 (<a
href="https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/448674">https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/448674</a>).
Then we installed OpenNebula from source (one-1.4.0.tar.gz) and before that all
needed packages and now it works. We have two machines one is the cloud
controller and node at the same time, the other one is only a node.<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Here some details:<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Kernel: 2.6.28-11-server <o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>libvirt-bin: 0.6.1-0ubuntu5.1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>libvirt-ruby1.8: 0.0.7-1<o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Maybe this will work for you too. <o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Regards,<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Stefan<o:p></o:p></span></p>
<p class=MsoNormal><span lang=EN-US style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style='border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt'>
<div>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>Von:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Rangababu
Chakravarthula [mailto:rbabu@hexagrid.com] <br>
<b>Gesendet:</b> Freitag, 2. April 2010 19:16<br>
<b>An:</b> Harder, Stefan<br>
<b>Cc:</b> Javier Fontan; users@lists.opennebula.org<br>
<b>Betreff:</b> Re: [one-users] VMs freezing after livemigrating<o:p></o:p></span></p>
</div>
</div>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal style='margin-bottom:12.0pt'>Hello Stefan<br>
<br>
We are also having the same issue. But when I use opennebula to suspend and
resume, I am able to access the console and logon to VM.<br>
<br>
Here are our setup details<br>
<br>
Host information using facter<br>
---------------------------------------------------<br>
kernel => Linux<br>
kernelrelease => 2.6.31-16-server<br>
lsbdistcodename => karmic<br>
lsbdistdescription => Ubuntu 9.10<br>
--------------------------------------------------<br>
Libvirt version<br>
<br>
libvirt-bin
0.7.0-1ubuntu13.1
<br>
libvirt0
0.7.0-1ubuntu13.1 <br>
<br>
Opennebula 1.2<br>
<br>
VM_DIR ="/nfs/path/to/storage"<br>
Transfer Manager=NFS<br>
<br>
This is actually a bug in Libivirt for which Redhat has released a fix long
back but Ubuntu has this Fix only in Lucid. Lucid is scheduled for
release on April 29, 2010.<br>
<br>
<a href="https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/448674"
target="_blank">https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/448674</a><br>
<br>
Since it takes a while to test Lucid completely before using it in production
we are going with the work around.<br>
<br>
Here are the logs from opennebula. <br>
<br>
user@managementnode:~$ onevm list<br>
ID NAME STAT CPU
MEM
HOSTNAME TIME<br>
686 migratev runn 0 262144
10.10.20.159 01 22:25:50<br>
user@managementnode:~$ onevm suspend 686<br>
<br>
Fri Apr 2 11:56:53 2010 [LCM][I]: New VM state is SAVE_SUSPEND<br>
Fri Apr 2 11:58:35 2010 [VMM][I]: Connecting to uri: qemu:///system<br>
Fri Apr 2 11:58:35 2010 [VMM][I]: ExitCode: 0<br>
Fri Apr 2 11:58:35 2010 [DiM][I]: New VM state is SUSPENDED<br>
<br>
<br>
user@host:/nfs/path/to/storage/686$ ls -al images/<br>
total 84556<br>
drwxrwxrwx 2 oneadmin nogroup 5
2010-04-02 16:53 .<br>
drwxr-xr-x+ 3 oneadmin nogroup 3
2010-03-31 18:27 ..<br>
-rw-------+ 1 root root 92243033
2010-04-02 16:54 checkpoint<br>
-rw-r--r--+ 1 oneadmin nogroup 549 2010-03-31
18:27 deployment.0<br>
lrwxrwxrwx 1 oneadmin nogroup 34
2010-03-31 18:27 disk.0 -> /nfs/path/to/storage/images/migratevm0<br>
<br>
<br>
user@managementnode:~$ onevm list<br>
686 migratev susp 0 262144
10.10.20.159 01 22:29:53<br>
<br>
unable to connect to host. connection refused 111<br>
<br>
user@managementnode:~$ onevm resume 686<br>
<br>
Fri Apr 2 12:02:00 2010 [DiM][I]: New VM state is ACTIVE.<br>
Fri Apr 2 12:02:00 2010 [LCM][I]: Restoring VM<br>
Fri Apr 2 12:02:00 2010 [LCM][I]: New state is BOOT<br>
Fri Apr 2 12:02:01 2010 [VMM][I]: Connecting to uri: qemu:///system<br>
Fri Apr 2 12:02:01 2010 [VMM][I]: ExitCode: 0<br>
Fri Apr 2 12:02:01 2010 [LCM][I]: New VM state is RUNNING<br>
<br>
<br>
Ranga<o:p></o:p></p>
<div>
<p class=MsoNormal>On Wed, Mar 24, 2010 at 7:55 AM, Harder, Stefan <<a
href="mailto:Stefan.Harder@fokus.fraunhofer.de" target="_blank">Stefan.Harder@fokus.fraunhofer.de</a>>
wrote:<o:p></o:p></p>
<p class=MsoNormal>Hi Javier,<br>
<br>
thanks for your answer.<br>
<br>
The state in virsh on the node we livemigrate the VM to is "running".
And on the old node the VM disappears. There are no logs which show some
unusual behavior inside the VM.<br>
<br>
If we do suspend via OpenNebula the VM goes into susp state but the log shows
an error:<br>
<br>
*****BEGIN*****<br>
Wed Mar 24 14:46:30 2010 [DiM][D]: Suspending VM 111<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 Command
execution fail: 'touch /srv/cloud/one/var/111/images/checkpoint;virsh --connect
qemu:///system save one-111 /srv/cloud/one/var/111/images/checkpoint'<br>
<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 STDERR follows.<br>
<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111
/usr/lib/ruby/1.8/open3.rb:67: warning: Insecure world writable dir
/srv/cloud/one in PATH, mode 040777<br>
<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 Connecting to
uri: qemu:///system<br>
<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 error: Failed to
save domain one-111 to /srv/cloud/one/var/111/images/checkpoint<br>
<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 error: operation
failed: failed to create '/srv/cloud/one/var/111/images/checkpoint'<br>
<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: LOG - 111 ExitCode: 1<br>
<br>
Wed Mar 24 14:46:30 2010 [VMM][D]: Message received: SAVE FAILURE 111 -<br>
*****END*****<br>
<br>
<br>
<br>
<br>
If we then try to resume the VM the state changes to fail and the log shows:<br>
<br>
<br>
<br>
<br>
<br>
*****BEGIN*****<br>
Wed Mar 24 14:49:43 2010 [DiM][D]: Resuming VM 111<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 Command execution
fail: virsh --connect qemu:///system restore
/srv/cloud/one/var/111/images/checkpoint<br>
<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 STDERR follows.<br>
<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111
/usr/lib/ruby/1.8/open3.rb:67: warning: Insecure world writable dir
/srv/cloud/one in PATH, mode 040777<br>
<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 Connecting to
uri: qemu:///system<br>
<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 error: Failed to
restore domain from /srv/cloud/one/var/111/images/checkpoint<br>
<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 error: operation
failed: cannot read domain image<br>
<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: LOG - 111 ExitCode: 1<br>
<br>
Wed Mar 24 14:49:44 2010 [VMM][D]: Message received: RESTORE FAILURE 111 -<br>
<br>
Wed Mar 24 14:49:44 2010 [TM][D]: Message received: LOG - 111 tm_delete.sh:
Deleting /srv/cloud/one/var/111/images<br>
<br>
Wed Mar 24 14:49:44 2010 [TM][D]: Message received: LOG - 111 tm_delete.sh:
Executed "rm -rf /srv/cloud/one/var/111/images".<br>
*****END*****<br>
<br>
<br>
<br>
If we do it directly via virsh the VM resumes and it runs like before. This is
not a VNC issue since if we ping the machine the whole time it answers not
until suspending and resuming it via virsh on the physical node.<br>
<br>
We faced some other problems compiling a newer version of the libvirt from
sources (since we thought the ubuntu packaged version may be too old). Which
system configuration and package versions do you use? We thought about a clean
new installation on Ubuntu 9.04 since we use 9.10 now.<br>
<br>
Best,<br>
<br>
Stefan<br>
<br>
<br>
> -----Ursprüngliche Nachricht-----<br>
> Von: Javier Fontan [mailto:<a href="mailto:jfontan@gmail.com"
target="_blank">jfontan@gmail.com</a>]<br>
> Gesendet: Mittwoch, 24. März 2010 12:33<br>
> An: Harder, Stefan<br>
> Cc: <a href="mailto:users@lists.opennebula.org" target="_blank">users@lists.opennebula.org</a><br>
> Betreff: Re: [one-users] VMs freezing after livemigrating<o:p></o:p></p>
<div>
<div>
<p class=MsoNormal>><br>
> Hello,<br>
><br>
> I never had that problem myself. Can you check that the state in vish<br>
> is running? I suppose you check that the VM is frozen connecting using<br>
> VNC. Can you also check in your unfrozen machine logs for any strange<br>
> message dealing with cpu or something that can be stopping it from<br>
> awaking again?<br>
><br>
> Bye<br>
><br>
><br>
> On Thu, Mar 18, 2010 at 11:47 AM, Harder, Stefan<br>
> <<a href="mailto:Stefan.Harder@fokus.fraunhofer.de" target="_blank">Stefan.Harder@fokus.fraunhofer.de</a>>
wrote:<br>
> > Hi,<br>
> ><br>
> > after solving some issues livemigrating works in my testenvironment<br>
> (3<br>
> > servers, one of them is the cloud controller and node at the same<br>
> time<br>
> > and the other two are only nodes). The problem I have now is that the<br>
> > VMs freeze after livemigrating. The only way to get them back alive<br>
> is<br>
> > to do a "virsh suspend <name>" and "virsh resume
<name>" on the<br>
> physical<br>
> > node where the VM was migrated to. Is this issue or even a solution<br>
> > known to you?<br>
> ><br>
> > Best regards,<br>
> ><br>
> > Stefan<br>
> > _______________________________________________<br>
> > Users mailing list<br>
> > <a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
> > <a
href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org"
target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>
> ><br>
><br>
><br>
><br>
> --<br>
> Javier Fontan, Grid & Virtualization Technology Engineer/Researcher<br>
> DSA Research Group: <a href="http://dsa-research.org" target="_blank">http://dsa-research.org</a><br>
> Globus GridWay Metascheduler: <a href="http://www.GridWay.org"
target="_blank">http://www.GridWay.org</a><br>
> OpenNebula Virtual Infrastructure Engine: <a
href="http://www.OpenNebula.org" target="_blank">http://www.OpenNebula.org</a><br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
<a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org"
target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><o:p></o:p></p>
</div>
</div>
</div>
<p class=MsoNormal><o:p> </o:p></p>
</div>
</div>
</body>
</html>