[one-users] How to use Ceph/RBD for System Datastore

Sun Jul 14 14:28:55 PDT 2013

Hi Bill,

I'm using Ubuntu 13.04 and libvirtd (libvirt) 1.0.2.

As Jens recommended I attempted to run the qemu-img command with "raw"
instead of "rbd"

When I manually run the command:

>> qemu-img convert -O raw /var/tmp/506f2a15417925478414f1c36f8228f7
rbd:one/one-26

I'm able to mark the image as active, but when I try to instantiate an
image I get the error:

>> missing DISK mandatory attributes (SOURCE, TM_MAD, CLONE, DATASTORE_ID)
for VM 32, DISK 0

Also, I've noticed, when I remove the image from OpenNebula, it does not
remove the image from the ceph datastore.

Any ideas?

Thanks,
Jon A

On Thu, Jul 11, 2013 at 5:49 AM, Campbell, Bill <
bcampbell at axcess-financial.com> wrote:

> Which distribution are you using for the storage node that OpenNebula
> accesses?  (this is the node where qemu-img would be run I do believe).
>  This looks like a potential problem with that qemu-img binary.
>
> ------------------------------
> *From: *"Jon" <three18ti at gmail.com>
> *Cc: *"Users OpenNebula" <users at lists.opennebula.org>
> *Sent: *Wednesday, July 10, 2013 8:03:40 PM
>
> *Subject: *Re: [one-users] How to use Ceph/RBD for System Datastore
>
> Hey Bill,
>
> So let me ask you this.  How do you create rbd images?  I'm attempting to
> create a blank datastore, but qemu-img convert keeps segfaulting.
> Any suggestions are appreciated.
>
> root at red6:~# qemu-img convert -O rbd
> /var/tmp/61e14679af7dd1e0e1e09e230c89f82a rbd:one/one-17
> Segmentation fault (core dumped)
>
> The oned log indicates the same:
>
> Wed Jul 10 17:52:30 2013 [ImM][I]: Creating disk at  of 5120Mb (type: ext4)
> Wed Jul 10 17:52:31 2013 [ImM][I]: Command execution fail:
> /var/lib/one/remotes/datastore/ceph/mkfs
> Wed Jul 10 17:52:31 2013 [ImM][E]: mkfs: Command "    set -e
> Wed Jul 10 17:52:31 2013 [ImM][I]:
> Wed Jul 10 17:52:31 2013 [ImM][I]: # create and format
> Wed Jul 10 17:52:31 2013 [ImM][I]: dd if=/dev/zero
> of=/var/tmp/61e14679af7dd1e0e1e09e230c89f82a bs=1 count=1 seek=5120M
> Wed Jul 10 17:52:31 2013 [ImM][I]: mkfs -t ext4 -F
> /var/tmp/61e14679af7dd1e0e1e09e230c89f82a
> Wed Jul 10 17:52:31 2013 [ImM][I]:
> Wed Jul 10 17:52:31 2013 [ImM][I]: # create rbd
> Wed Jul 10 17:52:31 2013 [ImM][I]: qemu-img convert -O rbd
> /var/tmp/61e14679af7dd1e0e1e09e230c89f82a rbd:one/one-17
> Wed Jul 10 17:52:31 2013 [ImM][I]:
> Wed Jul 10 17:52:31 2013 [ImM][I]: # remove original
> Wed Jul 10 17:52:31 2013 [ImM][I]: rm -f
> /var/tmp/61e14679af7dd1e0e1e09e230c89f82a" failed: 1+0 records in
> Wed Jul 10 17:52:31 2013 [ImM][I]: 1+0 records out
> Wed Jul 10 17:52:31 2013 [ImM][I]: 1 byte (1 B) copied, 0.000232576 s, 4.3
> kB/s
> Wed Jul 10 17:52:31 2013 [ImM][I]: mke2fs 1.42.5 (29-Jul-2012)
> Wed Jul 10 17:52:31 2013 [ImM][I]: Segmentation fault (core dumped)
> Wed Jul 10 17:52:31 2013 [ImM][E]: Error registering one/one-17 in
> localhost
> Wed Jul 10 17:52:31 2013 [ImM][I]: ExitCode: 139
> Wed Jul 10 17:52:31 2013 [ImM][E]: Error creating datablock: Error
> registering one/one-17 in localhost
>
> My image definition looks like this:
>
> IMAGE 17 INFORMATION
>
> ID             : 17
> NAME           : ubuntu-server-13.04-x86_64
> USER           : oneadmin
> GROUP          : oneadmin
> DATASTORE      : rbd1
> TYPE           : DATABLOCK
> REGISTER TIME  : 07/10 17:52:30
> PERSISTENT     : Yes
> SOURCE         :
> FSTYPE         : ext4
> SIZE           : 5G
> STATE          : err
> RUNNING_VMS    : 0
>
> PERMISSIONS
>
> OWNER          : um-
> GROUP          : ---
> OTHER          : ---
>
> IMAGE TEMPLATE
>
> DESCRIPTION="ubuntu-server-13.04-x86_64"
> DEV_PREFIX="hd"
> ERROR="Wed Jul 10 17:52:31 2013 : Error creating datablock: Error
> registering one/one-17 in localhost"
>
> Thanks,
> Jon A
>
>
> On Wed, Jul 10, 2013 at 5:14 PM, Jon <three18ti at gmail.com> wrote:
>
>> Hey Bill,
>>
>> Thanks for this.  This works perfectly!
>>
>> Thanks,
>> Jon A
>>
>>
>> On Wed, Jul 10, 2013 at 6:44 AM, Campbell, Bill <
>> bcampbell at axcess-financial.com> wrote:
>>
>>> Not entirely.  You shouldn’t need to manually create/mount an RBD for
>>> the system datastore.  Since the system datastore holds the running VM
>>> deployment files (and not necessarily an RBD image, just a reference to it
>>> in the deployment file) then this directory does not necessarily need to be
>>> shared.
>>>
>>>
>>>
>>> Here’s what we do:
>>>
>>>
>>>
>>> ·         OpenNebula system configured with no special exports/shares.
>>>
>>> ·         The System datastore is modified to use the SSH transfer
>>> manager
>>>
>>> ·         We modify the SSH transfer manager pre/post migrate scripts
>>> (by default located in /var/lib/one/remotes/tm/ssh/) to copy files from the
>>> source host to the destination host prior to migration/delete files on
>>> source after successful migration.
>>>
>>>
>>>
>>> Don’t worry about mapping/unmapping RBD volumes.  When
>>> creating/importing images into the Ceph datastore the RBDs should be
>>> created at this point.  So long as the Hypervisor nodes can see/interact
>>> with the Ceph cluster, when you deploy the VM it will use the RBD in the
>>> cluster for storage (no files copied/mapped locally, all handled by QEMU).
>>>
>>>
>>>
>>>
>>>
>>> Here is the pre-migrate script we use (very simple):
>>>
>>>
>>>
>>> *#!/bin/bash*
>>>
>>> * *
>>>
>>> *SRC=$1*
>>>
>>> *DST=$2*
>>>
>>> *REMDIR=$3*
>>>
>>> *VMID=$4*
>>>
>>> *DSID=$5*
>>>
>>> *TEMPLATE=$6*
>>>
>>> * *
>>>
>>> *ssh $DST mkdir -p /var/lib/one/datastores/0/$VMID*
>>>
>>> * *
>>>
>>> *ssh $SRC scp /var/lib/one/datastores/0/$VMID/*
>>> $DST:/var/lib/one/datastores/0/$VMID/*
>>>
>>> * *
>>>
>>> *exit 0*
>>>
>>>
>>>
>>> And the post-migrate script:
>>>
>>>
>>>
>>> *#!/bin/bash*
>>>
>>>
>>>
>>> *SRC=$1*
>>>
>>> *DST=$2*
>>>
>>> *REMDIR=$3*
>>>
>>> *VMID=$4*
>>>
>>> *DSID=$5*
>>>
>>> *TEMPLATE=$6*
>>>
>>> * *
>>>
>>> *ssh $SRC rm -rf /var/lib/one/datastores/0/$VMID*
>>>
>>> * *
>>>
>>> *exit 0*
>>>
>>>
>>>
>>>
>>>
>>> Hope this helps!
>>>
>>>
>>>
>>> *From:* Jon [mailto:three18ti at gmail.com]
>>> *Sent:* Wednesday, July 10, 2013 12:01 AM
>>> *To:* Campbell, Bill
>>> *Cc:* Users OpenNebula
>>> *Subject:* Re: [one-users] How to use Ceph/RBD for System Datastore
>>>
>>>
>>>
>>> Hey Bill,
>>>
>>>
>>>
>>> Thanks for getting back to me.
>>>
>>>
>>>
>>> If I'm understanding you correctly, you're basically using the ssh
>>> transfer manager to perform live migrations?
>>>
>>> Do you then create/mount one rbd per host?
>>>
>>>
>>>
>>> E.g.,
>>>
>>>
>>>
>>> host1:
>>>
>>> mount /dev/rbd/rbd/host1-one-system /var/lib/one/datastores/0
>>>
>>>
>>>
>>> host2:
>>>
>>> mount /dev/rbd/rbd/host2-one-system /var/lib/one/datastores/0
>>>
>>>
>>>
>>> then use the modified ssh drivers to perform the migrations?
>>>
>>>
>>>
>>> I would definitely be interested in learning how you accomplished that.
>>>
>>>
>>>
>>> My other thought was to use CephFS for shared storage.  This would
>>> eliminate the need for a NFS/GlusterFS/CLVM, which is an extra layer of
>>> complexity I would like to avoid.  As I understand it though, CephFS isn't
>>> "ready for prime-time" which gives me pause...
>>>
>>>
>>>
>>> Thanks again,
>>>
>>> Jon A
>>>
>>> On Tue, Jul 9, 2013 at 7:55 PM, Campbell, Bill <
>>> bcampbell at axcess-financial.com> wrote:
>>>
>>> Jon,
>>>
>>> I think I understand what you are trying to do, but I think it doesn't
>>> quite work that way.  Let me try to explain (and please let me know if I
>>> don't explain it well enough ;-))
>>>
>>>
>>>
>>> I don't think that you can use Ceph directly as a system datastore.  The
>>> way the Ceph datastore driver works for migrations is leveraging whatever
>>> transfer method you have for the system datastore to perform the migration.
>>>  For example, if you use the 'shared' system datastore, then it will use
>>> that transfer manager's pre and post migration drivers.  For 'ssh', the ssh
>>> drivers, and so on.  The way the Ceph datastore is implemented is as Ceph
>>> Block Devices, so unfortunately there is not a way to use it as a simple
>>> shared volume.
>>>
>>>
>>>
>>> There are 2 potential solutions for getting live migrations working for
>>> your Ceph datastore VMs:
>>>
>>>    - Create a shared NFS volume (or other 'sharable' filesystem, like
>>>    GFS2, OCFS2, etc., however these are much more complicated to configure and
>>>    usually not worth the hassle) and have the shared volume mounted to the
>>>    same location on each hypervisor node.  In a previous test deployment, we
>>>    just exported out the /var/lib/one/vms directory to the hypervisors.  At
>>>    this point, all of the hypervisors should be able to see the deployment
>>>    files in the same location and you should be able to perform a migration.
>>>    - Use SSH as the transfer manager for your system datastore, and
>>>    modify the pre and post-migrate scripts to copy the deployment files from
>>>    the current VM host to the target VM host.  This is the method we use
>>>    currently in our deployment, as it is one less configuration step that we
>>>    have to worry about maintaining on each node, and makes expanding our
>>>    cluster much quicker and easier.  I can share with you the pre and
>>>    post-migrate scripts we use if you like.
>>>
>>> Let me know if the above makes sense, and of course if you need any
>>> additional help please don't hesitate to bug me.  I'm very familiar with
>>> the Ceph drivers  ;-)
>>>
>>>
>>> ------------------------------
>>>
>>> *From: *"Jon" <three18ti at gmail.com>
>>> *To: *"Users OpenNebula" <users at lists.opennebula.org>
>>> *Sent: *Tuesday, July 9, 2013 8:05:51 PM
>>> *Subject: *[one-users]  How to use Ceph/RBD for System Datastore
>>>
>>>
>>>
>>>
>>>
>>> Hello All,
>>>
>>>
>>>
>>> I am using Ceph as my storage back end and would like to know how to
>>> configure the system datastore, such that I can live migrate vms.
>>>
>>>
>>>
>>> Following the directions, I thought I could create a datastore, format
>>> it, and mount it at /var/lib/one/datastores/0 , however, I discovered, that
>>> isn't quite how things work.
>>>
>>>
>>>
>>> >>
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-May/001913.html
>>>
>>> You can read more about that at the above link, but long story short, to
>>> mount a shared filesystem it has to be a "clustered" filesystem (I think
>>> CephFS is the "clustered filesystem", in this case).
>>>
>>>
>>>
>>> I attempted to modify my system datastore config, however, I was unable
>>> to change the DS_MAD parameter, and vm creation errors out telling me
>>> there's no /var/lib/one/remotes/tm/ceph/mkswap driver (there isn't)
>>>
>>>
>>>
>>> >> oneadmin at red6:~$ onedatastore show 0
>>>
>>> >> DATASTORE 0 INFORMATION
>>>
>>>
>>> >> ID             : 0
>>>
>>> >> NAME           : system
>>>
>>> >> USER           : oneadmin
>>>
>>> >> GROUP          : oneadmin
>>>
>>> >> CLUSTER        : -
>>>
>>> >> TYPE           : SYSTEM
>>>
>>> >> DS_MAD         : -
>>>
>>> >> TM_MAD         : ceph
>>>
>>> >> BASE PATH      : /var/lib/one/datastores/0
>>>
>>> >> DISK_TYPE      : FILE
>>>
>>> >>
>>>
>>> >> PERMISSIONS
>>>
>>>
>>> >> OWNER          : um-
>>>
>>> >> GROUP          : u--
>>>
>>> >> OTHER          : ---
>>>
>>> >>
>>>
>>> >> DATASTORE TEMPLATE
>>>
>>>
>>> >> DISK_TYPE="rbd"
>>>
>>> >> DS_MAD="-"
>>>
>>> >> TM_MAD="ceph"
>>>
>>> >> TYPE="SYSTEM_DS"
>>>
>>> >>
>>>
>>> >> IMAGES
>>>
>>>
>>>
>>> Maybe I'm just confused.  Can anyone provide some guidance on setting
>>> ceph up as the system datastore?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Jon A
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>>
>>>
>>>
>>>
>>> *NOTICE: Protect the information in this message in accordance with the
>>> company's security policies. If you received this message in error,
>>> immediately notify the sender and destroy all copies.*
>>>
>>>
>>>
>>>
>>>
>>> *NOTICE: Protect the information in this message in accordance with the
>>> company's security policies. If you received this message in error,
>>> immediately notify the sender and destroy all copies.*
>>>
>>>
>>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
> *NOTICE: Protect the information in this message in accordance with the
> company's security policies. If you received this message in error,
> immediately notify the sender and destroy all copies.*
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130714/c44261ae/attachment-0002.htm>