[one-users] datastore confusion

Tue Sep 11 01:32:17 PDT 2012

Hi Matthew

First of all, let me thank you your thorough and valuable feedback to
improve Opennebula :)

> The documentation is sure plentiful but it really needs a fact and
> consistency checker....

Totally agree, thanks for point it out. I've filled an issue to fix this.

http://dev.opennebula.org/issues/1453

> If I may summarize my understandings.
> It looks like ONE started out with every host (front-end included) having
> their own local storage in the guise of 'system, id=0'...

Yes.

>
> When moving to a shared 'system' mountpoint the underlying storage gets
> hammered because every host and every guest that's alive is using it. The
> problem could be mitigated somewhat if the source image could be referenced
> indirectly via symlinks (does that work on VMware VMFS via RDM?). Or by
> using clusters and selectively overriding what datastore was marked as
> 'system'.  The upside was obviously the ability to do warm/hot migration
> between hosts.

Exactly.

About the symlinks it depeneds on the underlying storage. A link (LN)
operation may involve different things. For posix fs datastores is a
plain ln command. For iSCSI it makes accessible the device to the host
by login in a iSCSI session... On VMWare (3.6) we are using standard
FS operations (filesystem link) for this. OpenNebula 3.8 will have an
improved support for VMFS volumes.

>
> Under the old way presumably the 'system' used "TM_MAD=ssh" and the
> front-end could (must?) be used as the repository of all non-running disk
> images. Yet all image operations are supposed to be carried out at the host,
> so

Yes. The image repositories must be accessible from the front-end to
register images and as you said to copy them to the host system
datastore (id=0, by default).

>
> Q0: why was the front-end involved in storing anything under
> '.../datastores/0'? If it was storing "at rest" disk images because there
> was no other provider, then it should have been under datastores with id!=0.

There are should be no need for the front-end to access the system
datastore. There are some misleading sentences in the documentation.
We'll update them

>
> Q1: Under the shared model, the front-end definitely doesn't need access to
> 'system' ever? The drawing and text disagree on
> "http://opennebula.org/documentation:rel3.6:system_ds"

Totally. Picture right, text wrong. Fixed.

>
> BUG: Can we please fix 'onedatastore show <#>' such that "BASE PATH" to use
> the literal string '$DATASTORE_LOCATION'
....

Added to the issue. (http://dev.opennebula.org/issues/1453)

>
> Q2: Why are disks images being "copied" (to mean symlinks I guess) when the
> datastore type is 'shared' or 'vmware' unless the disk type is 'clone'? Just
> hit the source image directly wherever it is.

This is needed to abstract all the transfer mechanisms and make
disparate storage systems to fit under the same management model.
Every driver deals with a SRC and a DST in form of disk.i.
This generalization simplifies the architecture and its flexibility.

>
> Q3: Can we dispense with this whole 'system' being mandatory let alone being
> at a fixed location? There is no reason why the datastore that contains the
> "at rest" image can't be used when the VM is running and also include the
> volatile and clone images. Of course that doesn't apply if the source is
> only reachable via SSH, can't withstand the IOPs, or is otherwise
> unsuitable. I also find the term 'system' misleading when it should be named
> something more like 'runtime'. May I suggest a datastore attribute
> "ALLOW_RUN=" or "RUNTIME_SAFE=yes|no" with the unspecified behavior being
> that of 'no' and thus do the copying?

I guess that the answer is the same as for Q2. IMHO for maintenance
purposes it is much more convenient the current layout. You can easily
navigate to the disks a running VM is using you can easily identify
them and work with them. If we use the "at rest" location you will
need to work with hashed names, it'll be much harder to find out which
file is mapped on a given device. And it'll make difficult to
integrate different drivers.

>
> Q4: What happens if there are multiple "SYSTEM=yes" datastores in the
> context of a cluster (including the special cluster 'none')? Why shouldn't
> the runtime datastore(s) also be a HOST attribute in addition to a cluster
> one ala "SYSTEM_DS = <id> [id ...]"? If not specified the scheduler would
> revert to the more general scope and pick one that has sufficient space. It
> is perfectly reasonable to have different 'system' datastore sets across
> hosts even in the same cluster; some may have extra disks, broken disks,
> whatever. Deployment shouldn't break and I shouldn't have to side-line a
> host because it isn't strictly identical to it's peers.

Yes, you are right. In fact having multiple system datastores will
present the same benefits as having multiple image datastores.
To provide this, we opted for  scaling the system datastore to the
cluster size, and let each cluster have it's own system ds. Our
initial thinking was that a system ds (if not ssh) will be shared to
allow livemigration and the extra disks, broken disks and the like
will be handled by the shared/distributed FS. This tradeoff gives you
enough flexibility to architect your resources and plan cluster
networking and the like, while letting the storage perform the
low-level work (like replication, hotplugging of additional disk...)

>
> Q5: Are there plans to have a 'system' datastore of type iSCSI or LVM? It
> would only make sense if the source was of like type. Though actually a
> sparse file on a filesystem would work as a block device too so this is more
> about supporting BLOCK devices for 'system' use.

Note that when used with the FS based system datastores, the
operations are BLOCK-wise for iSCSI and LVM. The only difference would
be the creation of context and volatile disks as BLOCKs instead of
files.
IMHO, the creation of context devices as BLOCKS is pointless. However
volatile storage as BLOCKs totally makes sense. So yes we have plans
for this and wouldn't be to difficult to cook a custom mkimage script
for this (not in 3.8 though)

>
> Q6: when is it safe to override variables like VM_DIR or DS_DIR? Is there an
> accepted methodology?

Those are automatically generated. Using the DATASTORE_LOCATION as base.

Cheers

Ruben

>
> --
> Cloud Services Architect, Senior System Administrator
> InfoRelay Online Systems (www.inforelay.com)
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

-- 
Ruben S. Montero, PhD
Project co-Lead and Chief Architect
OpenNebula - The Open Source Solution for Data Center Virtualization
www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula