[one-dev] Live migration / recovery - suggestions

Gareth Bult gareth at linux.co.uk
Tue Jan 21 07:12:01 PST 2014


Hi Jamie, 

That sounds like it will work for me, but I'm afraid I don't like it as a solution. 

As long as you make the context something other than the add-on, there is always the potential 
for conflict. In this instance there is the potential for two storage add-on providers to want a 
.fail script, in which case during installation one will overwrite the other. 

Is there a technical issue with the add-on as context, or is it design related ?? 

.. this is the same issue I found when looking at the libvirt api, the scripts use the hypervisor as 
a context, which means you can effectively only use one "add-on" (or one customisation that 
uses the hooks) because as soon as you use a second, there is a potential conflict. 

Regards, 
Gareth. 


-- 
	
Gareth Bult 
“The odds of hitting your target go up dramatically when you aim at it.” 
See the status of my current project at http://vdc-store.com 


----- Original Message -----

From: "Jaime Melis" <jmelis at opennebula.org> 
To: "Gareth Bult" <gareth at linux.co.uk> 
Cc: "David Macleod" <dmacleod at csir.co.za>, dev at lists.opennebula.org 
Sent: Tuesday, 21 January, 2014 2:33:52 PM 
Subject: Re: [one-dev] Live migration / recovery - suggestions 

Hi, 

as far as we can see there are two issues here: 

(1) how can addons modify and manage the original OpenNebula source code 
(2) how to implement a custom action after a migrate failure, specifically for this addon 

We have been trying to find a good solution for (1), but since we haven't found any proposal that fits in nicely, we've decided to focus only on solving (2) for the moment. We don't want to mix the addon's code into the main repository, so we came up with this proposal (which is basically something in between the 'b' option that Gareth suggested in his last email and David's last paragraph of his last email): 

We are going to introduce a new generic action that will be called when an action fails. In the particular case of the 'migrate' action, if 'migrate' fails OpenNebula will check if there's a 'migrate.fail' script for that particular vmm (this file is not supplied by OpenNebula). If it finds this file, it will execute it. OpenNebula will never package an "<action>.fail" file, therefore addon users will be able to upgrade safely without having their content overwritten by the packaging systems. And addon creators simply need to drop that file in the appropriate remotes/vmm/<hypervisor> directory and they're good to go. 

Gareth, David, do you like this proposal? Do you think it's a fair compromise that solves the issue at hand while not being specific to this addon? 

Thanks for your input, guys. 

Cheers, 
Jaime 


On Tue, Jan 21, 2014 at 11:02 AM, Gareth Bult < gareth at linux.co.uk > wrote: 




>A permanent fix would be for OpenNebula to add a new state MIGRATE_FAILED 

Absolutely! 

If you look at the very first posting on this thread, I explain my issue and ask; 

>How to do I do this, I guess effectively I need something like "postmigrate_fail" .. ??? 

And Ruben replies; 

>As the migrate script can be easily updated we do not provide any hook for that. I'd go to kvm/migrate, and do a simple if [ $? ... after the virsh command to kill the cache on the target host. 

So we've sort of come full circle. 

I can see four possible solutions; 

a. Every time someone installs such a plugin, they modify ON source code (current) 
b. ON include the driver name to 'migrate' so 'migrate' can be used to call a plugin style cleanup script (sample supplied) 
c. ON provide for a script that's activated when a migration fails (which is the same as (b) really) ['real' solution] 
d. I don't use ON to provide this functionality at all but instead use the scripts / hooks in 'libvirt' 
(so it's transparent to ON) 

I will be doing some work with the libvirt API to get proper snapshot integration as currently ON does not 
provide a snapshot interface beyond that provided by libvirt, so I will investigate (d) at the same time as this 
may be a simpler solution, albeit it means distributing files for libvirt as well as ON. 



Regards, 
Gareth. 


-- 
	
Gareth Bult 
“The odds of hitting your target go up dramatically when you aim at it.” 
See the status of my current project at http://vdc-store.com 



From: "David Macleod" < dmacleod at csir.co.za > 
To: "Gareth Bult" < gareth at linux.co.uk > 
Cc: "Jaime Melis" < jmelis at opennebula.org >, dev at lists.opennebula.org 
Sent: Tuesday, 21 January, 2014 6:26:07 AM 

Subject: Re: [one-dev] Live migration / recovery - suggestions 

I don't know why I'm offering more solutions to you, but anyway. 

>From what I can see the state trace for a successful migrate is RUNNING->MIGRATE->RUNNING, and that it is the same for a failed migrate. You could write a hook that triggers on state RUNNING and checks for previous state MIGRATE. The issue is you would have to maintain some state info in your add-on by writing a "start migration" flag in your pre-migrate script that gets cleaned up by your post-migrate script, when migrate succeeds, or the proposed hook when the migrate fails. Alternatively you could forgo maintaining your own state and use the hook to parse the VM's log for migration failed messages. 

A permanent fix would be for OpenNebula to add a new state MIGRATE_FAILED that a developer could use to distinguish between a successful and failed migrate. The failed migrate state trace would then be RUNNING->MIGRATE->MIGRATE_FAILED->RUNNING 

Regards, 
David 


On Mon, Jan 20, 2014 at 3:23 PM, Gareth Bult < gareth at linux.co.uk > wrote: 

<blockquote>


    1. OK, so you are quite defensive and rude about this for some reason 
You are suggesting modifying and maintaining (currently) 5 migrate scripts, then redistributing (and keeping up-to-date) 
90 other scripts, just to cope with a current deficiency with the current mechanism available for add-ons with regards to 
hooking into the current infrastructure. 

That was me being restrained. 

>I know it's not as elegant as the four line solution but point 2. 

To put it mildly. 

I'm *not* trying to insist my solution is included, but there needs to be "a" solution and generating lots more 
duplicate code to install and redistribute simply isn't "it" and I'm more than a little shocked anyone would suggest it! 


-- 
	
Gareth Bult 
“The odds of hitting your target go up dramatically when you aim at it.” 
See the status of my current project at http://vdc-store.com 



From: "David Macleod" < dmacleod at csir.co.za > 
To: "Gareth Bult" < gareth at linux.co.uk > 
Cc: "Jaime Melis" < jmelis at opennebula.org >, dev at lists.opennebula.org 
Sent: Monday, 20 January, 2014 12:50:14 PM 

Subject: Re: [one-dev] Live migration / recovery - suggestions 



    1. OK, so you are quite defensive and rude about this for some reason. 
    2. The OpenNebula project has just said they don't want to change their code base to suit you, so your four line solution is off the table. 
    3. You don't need to write the vmm drivers from scratch. Just copy the three provided with ONE call them kvm-vdc, xen-vdc and vmware-vdc or something, then make your customization to them. I know it's not as elegant as the four line solution but point 2. 


On Mon, Jan 20, 2014 at 2:23 PM, Gareth Bult < gareth at linux.co.uk > wrote: 

<blockquote>

Once I switch to a different vmm (say "vdc" rather than "kvm") , how will the system know to fall back on 
the hypervisor in use for all the other scripts that I don't want to modify? 

-- 
	
Gareth Bult 
“The odds of hitting your target go up dramatically when you aim at it.” 
See the status of my current project at http://vdc-store.com 



From: "David Macleod" < dmacleod at csir.co.za > 
To: "Gareth Bult" < gareth at linux.co.uk > 
Cc: "Jaime Melis" < jmelis at opennebula.org >, dev at lists.opennebula.org 
Sent: Monday, 20 January, 2014 11:57:31 AM 

Subject: Re: [one-dev] Live migration / recovery - suggestions 

No, you should write a basic framework around the migrate script that supports the three hypervisors that OpenNebula uses. 

>>> Gareth Bult < gareth at linux.co.uk > 2014/01/20 02:15 PM >>> 

Erm, yeah, Ok, should I include filters for the coffee machine too ?! 

-- 
	
Gareth Bult 
“The odds of hitting your target go up dramatically when you aim at it.” 
See the status of my current project at http://vdc-store.com 



From: "David Macleod" < dmacleod at csir.co.za > 
To: "Gareth Bult" < gareth at linux.co.uk > 
Cc: "Jaime Melis" < jmelis at opennebula.org >, dev at lists.opennebula.org 
Sent: Monday, 20 January, 2014 11:50:26 AM 
Subject: Re: [one-dev] Live migration / recovery - suggestions 

Then distribute a vmm driver per hypervisor... 

>>> Gareth Bult < gareth at linux.co.uk > 2014/01/20 02:01 PM >>> 
.. Erm, because the vmm driver is per-hypervisor .. and this product is technically 
hypervisor agnostic .. ?? 


-- 
	
Gareth Bult 
“The odds of hitting your target go up dramatically when you aim at it.” 
See the status of my current project at http://vdc-store.com 



From: "David Macleod" < dmacleod at csir.co.za > 
To: "Gareth Bult" < gareth at linux.co.uk >, "Jaime Melis" < jmelis at opennebula.org > 
Cc: dev at lists.opennebula.org 
Sent: Monday, 20 January, 2014 11:32:33 AM 
Subject: Re: [one-dev] Live migration / recovery - suggestions 

Hi Gareth, Jaime 
Why can't the recovery be handled by the migrate script? You can distribute your own custom vmm driver so that it doesn't get overwritten by the updates. That's how I wrote my add-on, I also have the need to recover from a migrate fail. 
Regards, 
David 

>>> Jaime Melis < jmelis at opennebula.org > 2014/01/20 12:43 PM >>> 
Hi Gareth, 

we've been studying your proposal, and even though we agree with what you say we 
aren't 100% convinced with this solution. The issues with the proposal are the 
following: 

- As long as it's possible, we'd like to keep separate the main opennebula code 
and the addons. In this case it means that we are not very comfortable with 
point 3: adding addon-specific code to the main repository. 

- The proposed solution only solves the "migrate" issue, but other addons will 
have potentially issues with other scripts, and not necessarily with the 
"CleanUp" part of the "ssh_exec_and_log". We would like to find a more general 
solution. 

We are still thinking about this, we definitely want to solve this issue, so if 
you (or anyone else) has any ideas, please let us know. 

cheers, 
Jaime 


On Tue, Jan 14, 2014 at 2:23 PM, Gareth Bult < gareth at linux.co.uk > wrote: 

<blockquote>

Hey Guys, I've done a little work on the migration script - this is what I've done here .. 
- be nice if something similar could be implemented @ source .. ? 

1. ssh_exec_and_log (generic change - this could be useful elsewhere..) modify as follows; 


function ssh_exec_and_log 
{ 
message=$2 
cleanup=$3 # ++ 

EXEC_LOG_ERR=`$1 2>&1 1>/dev/null` 
EXEC_LOG_RC=$? 

if [ $EXEC_LOG_RC -ne 0 ]; then 
log_error "Command \"$1\" failed: $EXEC_LOG_ERR" 
if [ ! -z $cleanup ]; then # ++ 
$cleanup # ++ 
fi # ++ 


if [ -n "$2" ]; then 
error_message "$2" 
else 
error_message "Error executing $1: $EXEC_LOG_ERR" 
fi 
return $EXEC_LOG_RC 
fi 
} 
i.e. allow a third parameter which is a function to call if the exec fails. 

2. migrate (for my vdc code), add "CleanUp" as a last parameter on the exec_and_log on the last line 


ssh_exec_and_log "virsh --connect $LIBVIRT_URI migrate --live $deploy_id $QEMU_PROTOCOL://$dest_host/system" \ 
"Could not migrate $deploy_id to $dest_host" CleanUp 
3. Then add the following function to migrate; 


function CleanUp 
{ 
VDC=$(dirname $0)/../../../vdc-nebula 
if [ -d "${VDC}" ]; then 
${VDC}/remotes/tm/vdc/postmigrate_fail ${deploy_id} ${dest_host} 
fi 
} 

Cleanup could be extended for other storage options ... ?? 
I guess ideally you would pass the driver through and CleanUp would become completely generic and postmigrate_fail 
would become just another standard routine?? 







</blockquote>


-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html . 


This message has been scanned for viruses and dangerous content by MailScanner , 
and is believed to be clean. 


Please consider the environment before printing this email. 

</blockquote>




</blockquote>




-- 
Jaime Melis 
Project Engineer 
OpenNebula - Flexible Enterprise Cloud Made Simple 
www.OpenNebula.org | jmelis at opennebula.org 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/dev-opennebula.org/attachments/20140121/0d2e929a/attachment-0002.htm>


More information about the Dev mailing list