[one-users] How to use Ceph/RBD for System Datastore

Campbell, Bill bcampbell at axcess-financial.com
Mon Jul 15 06:04:26 PDT 2013


That's a strange one (the delete issue). I've not run across that, I'm hoping it's not related to 13.04. I've done all my testing/deployment on the latest LTS release, Ubuntu 12.04, and everything seems to be working okay. 

I did submit a feature request to the development team for further optimization of the Ceph driver, as newer versions of libvirt handle cephx authentication differently (which a method of utilizing appropriately does not exist in OpenNebula at the moment). I'm wondering if this may be part of the issue. 

As an aside, we are still on 3.8.3 using our internally developed Ceph driver (of which we've submitted the current version of to OpenNebula team for consideration of inclusion) that takes advantage of Ceph's RBD format 2 images (COW clones, etc.) which no longer uses qemu-img for creation of images but rather 'rbd import' as it can specify the format of the image. The Development request with my submitted/updated driver is below, however be advised that this would probably be unsupported by the OpenNebula team so if you ran into any issues then you know the drill. ;-) It *should* work on ONE 4.0, however not 13.04 (due to the libvirt issues mentioned). 

I still think if possible for you, attempt to use Ubuntu 12.04 for your virtualization systems until the updated Ceph driver is available (there's lots of additional information that must be created/included in the deployment file as referenced in the dev request). 

Below is the dev request I mentioned, but be advised if you use anything attached to this post then it would be unsupported. 

http://dev.opennebula.org/issues/1796#change-4486 

----- Original Message -----

From: "Jon" <three18ti at gmail.com> 
Cc: "Users OpenNebula" <users at lists.opennebula.org> 
Sent: Sunday, July 14, 2013 5:28:55 PM 
Subject: Re: [one-users] How to use Ceph/RBD for System Datastore 

Hi Bill, 

I'm using Ubuntu 13.04 and libvirtd (libvirt) 1.0.2. 

As Jens recommended I attempted to run the qemu-img command with "raw" instead of "rbd" 

When I manually run the command: 

>> qemu-img convert -O raw /var/tmp/506f2a15417925478414f1c36f8228f7 rbd:one/one-26 

I'm able to mark the image as active, but when I try to instantiate an image I get the error: 

>> missing DISK mandatory attributes (SOURCE, TM_MAD, CLONE, DATASTORE_ID) for VM 32, DISK 0 

Also, I've noticed, when I remove the image from OpenNebula, it does not remove the image from the ceph datastore. 

Any ideas? 

Thanks, 
Jon A 

On Thu, Jul 11, 2013 at 5:49 AM, Campbell, Bill < bcampbell at axcess-financial.com > wrote: 



Which distribution are you using for the storage node that OpenNebula accesses? (this is the node where qemu-img would be run I do believe). This looks like a potential problem with that qemu-img binary. 


From: "Jon" < three18ti at gmail.com > 
Cc: "Users OpenNebula" < users at lists.opennebula.org > 
Sent: Wednesday, July 10, 2013 8:03:40 PM 

Subject: Re: [one-users] How to use Ceph/RBD for System Datastore 

Hey Bill, 

So let me ask you this. How do you create rbd images? I'm attempting to create a blank datastore, but qemu-img convert keeps segfaulting. 
Any suggestions are appreciated. 

root at red6:~# qemu-img convert -O rbd /var/tmp/61e14679af7dd1e0e1e09e230c89f82a rbd:one/one-17 
Segmentation fault (core dumped) 

The oned log indicates the same: 

Wed Jul 10 17:52:30 2013 [ImM][I]: Creating disk at of 5120Mb (type: ext4) 
Wed Jul 10 17:52:31 2013 [ImM][I]: Command execution fail: /var/lib/one/remotes/datastore/ceph/mkfs 
Wed Jul 10 17:52:31 2013 [ImM][E]: mkfs: Command " set -e 
Wed Jul 10 17:52:31 2013 [ImM][I]: 
Wed Jul 10 17:52:31 2013 [ImM][I]: # create and format 
Wed Jul 10 17:52:31 2013 [ImM][I]: dd if=/dev/zero of=/var/tmp/61e14679af7dd1e0e1e09e230c89f82a bs=1 count=1 seek=5120M 
Wed Jul 10 17:52:31 2013 [ImM][I]: mkfs -t ext4 -F /var/tmp/61e14679af7dd1e0e1e09e230c89f82a 
Wed Jul 10 17:52:31 2013 [ImM][I]: 
Wed Jul 10 17:52:31 2013 [ImM][I]: # create rbd 
Wed Jul 10 17:52:31 2013 [ImM][I]: qemu-img convert -O rbd /var/tmp/61e14679af7dd1e0e1e09e230c89f82a rbd:one/one-17 
Wed Jul 10 17:52:31 2013 [ImM][I]: 
Wed Jul 10 17:52:31 2013 [ImM][I]: # remove original 
Wed Jul 10 17:52:31 2013 [ImM][I]: rm -f /var/tmp/61e14679af7dd1e0e1e09e230c89f82a" failed: 1+0 records in 
Wed Jul 10 17:52:31 2013 [ImM][I]: 1+0 records out 
Wed Jul 10 17:52:31 2013 [ImM][I]: 1 byte (1 B) copied, 0.000232576 s, 4.3 kB/s 
Wed Jul 10 17:52:31 2013 [ImM][I]: mke2fs 1.42.5 (29-Jul-2012) 
Wed Jul 10 17:52:31 2013 [ImM][I]: Segmentation fault (core dumped) 
Wed Jul 10 17:52:31 2013 [ImM][E]: Error registering one/one-17 in localhost 
Wed Jul 10 17:52:31 2013 [ImM][I]: ExitCode: 139 
Wed Jul 10 17:52:31 2013 [ImM][E]: Error creating datablock: Error registering one/one-17 in localhost 

My image definition looks like this: 

IMAGE 17 INFORMATION 
ID : 17 
NAME : ubuntu-server-13.04-x86_64 
USER : oneadmin 
GROUP : oneadmin 
DATASTORE : rbd1 
TYPE : DATABLOCK 
REGISTER TIME : 07/10 17:52:30 
PERSISTENT : Yes 
SOURCE : 
FSTYPE : ext4 
SIZE : 5G 
STATE : err 
RUNNING_VMS : 0 

PERMISSIONS 
OWNER : um- 
GROUP : --- 
OTHER : --- 

IMAGE TEMPLATE 
DESCRIPTION="ubuntu-server-13.04-x86_64" 
DEV_PREFIX="hd" 
ERROR="Wed Jul 10 17:52:31 2013 : Error creating datablock: Error registering one/one-17 in localhost" 

Thanks, 
Jon A 


On Wed, Jul 10, 2013 at 5:14 PM, Jon < three18ti at gmail.com > wrote: 

<blockquote>
Hey Bill, 

Thanks for this. This works perfectly! 

Thanks, 
Jon A 


On Wed, Jul 10, 2013 at 6:44 AM, Campbell, Bill < bcampbell at axcess-financial.com > wrote: 

<blockquote>



Not entirely. You shouldn’t need to manually create/mount an RBD for the system datastore. Since the system datastore holds the running VM deployment files (and not necessarily an RBD image, just a reference to it in the deployment file) then this directory does not necessarily need to be shared. 



Here’s what we do: 



· OpenNebula system configured with no special exports/shares. 

· The System datastore is modified to use the SSH transfer manager 

· We modify the SSH transfer manager pre/post migrate scripts (by default located in /var/lib/one/remotes/tm/ssh/) to copy files from the source host to the destination host prior to migration/delete files on source after successful migration. 



Don’t worry about mapping/unmapping RBD volumes. When creating/importing images into the Ceph datastore the RBDs should be created at this point. So long as the Hypervisor nodes can see/interact with the Ceph cluster, when you deploy the VM it will use the RBD in the cluster for storage (no files copied/mapped locally, all handled by QEMU). 





Here is the pre-migrate script we use (very simple): 



#!/bin/bash 



SRC=$1 

DST=$2 

REMDIR=$3 

VMID=$4 

DSID=$5 

TEMPLATE=$6 



ssh $DST mkdir -p /var/lib/one/datastores/0/$VMID 



ssh $SRC scp /var/lib/one/datastores/0/$VMID/* $DST:/var/lib/one/datastores/0/$VMID/ 



exit 0 



And the post-migrate script: 



#!/bin/bash 



SRC=$1 

DST=$2 

REMDIR=$3 

VMID=$4 

DSID=$5 

TEMPLATE=$6 



ssh $SRC rm -rf /var/lib/one/datastores/0/$VMID 



exit 0 





Hope this helps! 



From: Jon [mailto: three18ti at gmail.com ] 
Sent: Wednesday, July 10, 2013 12:01 AM 
To: Campbell, Bill 
Cc: Users OpenNebula 
Subject: Re: [one-users] How to use Ceph/RBD for System Datastore 




Hey Bill, 





Thanks for getting back to me. 





If I'm understanding you correctly, you're basically using the ssh transfer manager to perform live migrations? 


Do you then create/mount one rbd per host? 





E.g., 





host1: 


mount /dev/rbd/rbd/host1-one-system /var/lib/one/datastores/0 





host2: 


mount /dev/rbd/rbd/host2-one-system /var/lib/one/datastores/0 





then use the modified ssh drivers to perform the migrations? 





I would definitely be interested in learning how you accomplished that. 





My other thought was to use CephFS for shared storage. This would eliminate the need for a NFS/GlusterFS/CLVM, which is an extra layer of complexity I would like to avoid. As I understand it though, CephFS isn't "ready for prime-time" which gives me pause... 





Thanks again, 


Jon A 


On Tue, Jul 9, 2013 at 7:55 PM, Campbell, Bill < bcampbell at axcess-financial.com > wrote: 
<blockquote>



Jon, 


I think I understand what you are trying to do, but I think it doesn't quite work that way. Let me try to explain (and please let me know if I don't explain it well enough ;-)) 





I don't think that you can use Ceph directly as a system datastore. The way the Ceph datastore driver works for migrations is leveraging whatever transfer method you have for the system datastore to perform the migration. For example, if you use the 'shared' system datastore, then it will use that transfer manager's pre and post migration drivers. For 'ssh', the ssh drivers, and so on. The way the Ceph datastore is implemented is as Ceph Block Devices, so unfortunately there is not a way to use it as a simple shared volume. 





There are 2 potential solutions for getting live migrations working for your Ceph datastore VMs: 


    * Create a shared NFS volume (or other 'sharable' filesystem, like GFS2, OCFS2, etc., however these are much more complicated to configure and usually not worth the hassle) and have the shared volume mounted to the same location on each hypervisor node. In a previous test deployment, we just exported out the /var/lib/one/vms directory to the hypervisors. At this point, all of the hypervisors should be able to see the deployment files in the same location and you should be able to perform a migration. 
    * Use SSH as the transfer manager for your system datastore, and modify the pre and post-migrate scripts to copy the deployment files from the current VM host to the target VM host. This is the method we use currently in our deployment, as it is one less configuration step that we have to worry about maintaining on each node, and makes expanding our cluster much quicker and easier. I can share with you the pre and post-migrate scripts we use if you like. 



Let me know if the above makes sense, and of course if you need any additional help please don't hesitate to bug me. I'm very familiar with the Ceph drivers ;-) 






From: "Jon" < three18ti at gmail.com > 
To: "Users OpenNebula" < users at lists.opennebula.org > 
Sent: Tuesday, July 9, 2013 8:05:51 PM 
Subject: [one-users] How to use Ceph/RBD for System Datastore 








Hello All, 





I am using Ceph as my storage back end and would like to know how to configure the system datastore, such that I can live migrate vms. 





Following the directions, I thought I could create a datastore, format it, and mount it at /var/lib/one/datastores/0 , however, I discovered, that isn't quite how things work. 





>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-May/001913.html 


You can read more about that at the above link, but long story short, to mount a shared filesystem it has to be a "clustered" filesystem (I think CephFS is the "clustered filesystem", in this case). 





I attempted to modify my system datastore config, however, I was unable to change the DS_MAD parameter, and vm creation errors out telling me there's no /var/lib/one/remotes/tm/ceph/mkswap driver (there isn't) 





>> oneadmin at red6:~$ onedatastore show 0 


>> DATASTORE 0 INFORMATION 


>> ID : 0 


>> NAME : system 


>> USER : oneadmin 


>> GROUP : oneadmin 


>> CLUSTER : - 


>> TYPE : SYSTEM 


>> DS_MAD : - 


>> TM_MAD : ceph 


>> BASE PATH : /var/lib/one/datastores/0 


>> DISK_TYPE : FILE 


>> 


>> PERMISSIONS 


>> OWNER : um- 


>> GROUP : u-- 


>> OTHER : --- 


>> 


>> DATASTORE TEMPLATE 


>> DISK_TYPE="rbd" 


>> DS_MAD="-" 


>> TM_MAD="ceph" 


>> TYPE="SYSTEM_DS" 


>> 


>> IMAGES 





Maybe I'm just confused. Can anyone provide some guidance on setting ceph up as the system datastore? 





Thanks, 


Jon A 





_______________________________________________ 
Users mailing list 
Users at lists.opennebula.org 
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org 








NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. 










NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. 


</blockquote>



</blockquote>



_______________________________________________ 
Users mailing list 
Users at lists.opennebula.org 
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org 


NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. 


</blockquote>



_______________________________________________ 
Users mailing list 
Users at lists.opennebula.org 
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org 


NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130715/664f500a/attachment-0002.htm>


More information about the Users mailing list