[one-users] The virtual machine failure migration！

Wed Jul 4 23:32:55 PDT 2012

Hi,

Sounds like libvirt/qemu have a problem restarting the VM. Are the 
config files for those daemons under /etc/libvirt set up identically on 
all your hosts? Did you make sure that the user that starts the kvm 
process is able to read en write to the files? (In my install I added 
the user "qemu" to the "oneadmin" group to make that work.)

You should have a log file for this particular VM in 
/var/log/libvirt/qemu/one-[vm-id].log. What does it say?

You say it happens occasionally, so not always? Does it always fail when 
you migrate tot this particular host, or also occasionally? If 
occasionally, you have to look for things that change from time to time. 
Do you have config files maintained by chef/puppet/cfengine? Do you have 
user accounts maintained by ldap/nis? If this host always fails, it must 
be bad config of this host; compare it to the other hosts that do work.

Where does the "unable to read from monitor" come from. Is it 
opennebula? That is kind of normal: it tries to read the status for the 
VM but since it is not up, it fails. But actually, that should not give 
you a "connection reset"...

What you can do to test your setup when it fails:

Go to the directory with the files and before you change anything, so a 
"virsh create deployment.X" where X is the highest number you can find 
in that dir. )In your example, it would be 2). (You will need to be root 
for this or you will not be able to use the "system" libvirt space.)

If that "just works" then you have a real strange problem. If it gives 
you errors, try to solve them. :)
(If you do not know "virsh", you should read up a bit on it.)

Hope this is a bit helpful.

Jhon

On 07/04/2012 12:59 PM, David wrote:
>
> Hi， All
>      I used OpenNebula3.2.1 version.
>      When I execute VM migrate operation，Occasionally the VM appears 
> a migration failure,
>      following log：
> Thu Jun 28 15:16:24 2012 [LCM][I]: New VM state is RUNNING
> Thu Jun 28 15:17:09 2012 [LCM][I]: New VM state is SAVE_MIGRATE
> Thu Jun 28 15:17:42 2012 [VMM][I]: save: Executed "virsh --connect 
> qemu:///system [^] save one-383 /one_images/383/images/checkpoint".
> Thu Jun 28 15:17:42 2012 [VMM][I]: ExitCode: 0
> Thu Jun 28 15:17:42 2012 [VMM][I]: Successfully execute virtualization 
> driver operation: save.
> Thu Jun 28 15:17:43 2012 [VMM][I]: ExitCode: 0
> Thu Jun 28 15:17:43 2012 [VMM][I]: Successfully execute network driver 
> operation: clean.
> Thu Jun 28 15:17:43 2012 [LCM][I]: New VM state is PROLOG_MIGRATE
> Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Moving /one_images/383/images
> Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Executed "ssh 
> compute-56-5.local mkdir -p /one_images/383".
> Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Executed "scp -r 
> compute-56-4.local:/one_images/383/images 
> compute-56-5.local:/one_images/383/images".
> Thu Jun 28 15:56:03 2012 [TM][I]: tm_mv.sh: Executed "ssh 
> compute-56-4.local rm -rf /one_images/383/images".
> Thu Jun 28 15:56:03 2012 [TM][I]: ExitCode: 0
> Thu Jun 28 15:56:03 2012 [LCM][I]: New VM state is BOOT
> Thu Jun 28 15:56:05 2012 [VMM][I]: ExitCode: 0
> Thu Jun 28 15:56:05 2012 [VMM][I]: Successfully execute network driver 
> operation: pre.
> Thu Jun 28 15:56:06 2012 [VMM][I]: Command execution fail: 
> /var/tmp/one/vmm/kvm/restore /one_images/383/images/checkpoint 
> compute-56-5.local 383 compute-56-5.local
> Thu Jun 28 15:56:06 2012 [VMM][E]: restore: Command "virsh --connect 
> qemu:///system [^] restore /one_images/383/images/checkpoint" failed.
> Thu Jun 28 15:56:06 2012 [VMM][E]: restore: error: Failed to restore 
> domain from /one_images/383/images/checkpoint
> Thu Jun 28 15:56:06 2012 [VMM][I]: error: internal error process 
> exited while connecting to monitor: qemu-kvm: -drive 
> file=/one_images/383/images/disk.0,if=none,id=drive-virtio-disk0,format=raw: 
> could not open disk image /one_images/383/images/disk.0: Permission denied
> Thu Jun 28 15:56:06 2012 [VMM][E]: Could not restore from 
> /one_images/383/images/checkpoint
> Thu Jun 28 15:56:06 2012 [VMM][I]: ExitCode: 1
> Thu Jun 28 15:56:06 2012 [VMM][I]: Failed to execute virtualization 
> driver operation: restore.
> Thu Jun 28 15:56:06 2012 [VMM][E]: Error restoring VM: Could not 
> restore from /one_images/383/images/checkpoint
> Thu Jun 28 15:56:06 2012 [DiM][I]: New VM state is FAILED
>
> execute command : chmod +x *
> but ，receive the following error message：
> error: Unable to read from monitor: Connection reset by peer
>
> This is what causes problems ?
> Thanks! Hope after
>
> Regards！
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

-- 
Jhon Masschelein
Senior Systeemprogrammeur
SARA - HPCV

Science Park 140
1098 XG Amsterdam
T +31 (0)20 592 8099
F +31 (0)20 668 3167
M +31 (0)6 4748 9328
E jhon.masschelein at sara.nl
http://www.sara.nl

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120705/4ba0cb0f/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 16933 bytes
Desc: not available
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20120705/4ba0cb0f/attachment-0003.png>