[one-users] onevm migrate/suspend and checkpoint files

Steven Timm timm at fnal.gov
Thu Jul 24 07:00:49 PDT 2014


When OpenNebula creates a checkpoint file either as part
of a onevm migrate or onevm suspend, what libvirt function
is it calling to do the checkpoint?

We are seeing some issues on our new Ivy Bridge hardware
that sometimes in the process of a (non-live) migration,
the clock can get confused in such a way that when the
virtual machine starts from the checkpoint file
it will be hung and the kvm process uses 100% of cpu for
a day or more, and then usually resolves itself.  In some
cases we see the clock jump very far into the future (2598),
which in itself can confuse a linux vm enough to hang it.

Any clues on what OpenNebula /libvirt are doing under the covers?
Is there any reason to suspect that on Ivy Bridge hardware,
in which there are some 60 different cpu frequencies available
for cpu scaling, the rapidly fluctuating clock speeds might
get us into trouble--i.e. suspending the machine on one clock
frequency and bringig it back on a different clock frequency?

Does anyone have experience in migrating between hardware
generations... Ivy Bridge -> Westmere and vice versa?

Finally, has anyone run a successful combination of kernel 3.10
or greater and RHEL6/Centos 6/Sci. Linux 6?
(In particular do the stock versions of libvirt and qemu-kvm
play nice with the 3.10 kernel)?
The 2.6.32 kernel that comes with RHEL6/Centos6/Sci Linux 6 is just not
up to dealing with virtualization on Ivy Bridge machines and it
has some trouble on Sandy Bridge too.

Thanks

Steve Timm



------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm at fnal.gov  http://home.fnal.gov/~timm/
Fermilab Scientific Computing Division, Scientific Computing Services Quad.
Grid and Cloud Services Dept., Associate Dept. Head for Cloud Computing


More information about the Users mailing list