[one-users] Fwd: iSCSI multipath

Wed Jan 30 13:10:47 PST 2013

Hi,

well, I think in general in a production environment you should use
something you completely trust. If installing a clvm is not an issue in
your setup then why not do it for extra security?

With OpenNebula we have been using this LVM setup for a couple of months
only, so we can say it is still in testing. Therefore, you should wait a
bit before putting critical stuff on it. We will do the same, first moving
less important stuff on these LV-s then proceed with the more important
services. The reason why I'm not particularly worried is that we have been
using a similar setup under a simple xen cluster for 5 years. Although it
was backed by fibrechannel and not iscsi, and there was no opennebula
involved, we did not have any issues with LVM itself. But, of course,
instead of a one frontend we managed everything with a bunch of simple bash
scripts that were executed by simple php web interface.

Anyway, we I will blog about our experiences with the shared_lvm driver so
you will be informed!

Cheers
Mihály
MTA SZTAKI ITAK

On 30 January 2013 18:47, Miloš Kozák <milos.kozak at lejmr.com> wrote:

>  Hi, it sounds interesting, I think I am going to give it a try. I still
> struggle whether to use or not CLVM. For how long you have been running it
> like that? Have you ever had any serious issues related to LVM?
>
> Thank you, Milos
>
>
> Dne 30.1.2013 13:59, Marlok Tamás napsal(a):
>
> Hi,
>
> We are running it without CLVM.
> If you examine the ONE/lvm driver (the tm/clone script for example), you
> can see, that the lvcreate command runs on the destination host. In the
> shared LVM driver, all the LVM commands are running on the frontend, hence
> there is no possibility of parralel changes (assuming that you are using
> only 1 frontend), because local locking is in effect on the frontend.
>
> The other thing with the ONE/lvm driver is that it makes a snapshot in the
> clone script, while our driver makes a new clone LV. I tried to use the
> original LVM driver, and every time, I deployed a new VM, I got this error
> message:
>
> lv-one-50 must be active exclusively to create snapshot
>
> If you (or everyone else) knows, how to avoid this error, please let me
> know.
> Besides that snapshots are much slower in write operations (as far as I
> know).
>
> Hope this helps!
> --
> Cheers,
> tmarlok
>
>
> On Wed, Jan 30, 2013 at 1:37 PM, Miloš Kozák <milos.kozak at lejmr.com>wrote:
>
>>  Hi, thank you. I checked source codes and I found it is very similar to
>> LVM TM/Datastore drivers which is facilitated in ONE already only you added
>> lvchange -ay DEV. Do you run CLVM along that or not?
>>
>> I worry about parallel changes of LVM metadata which might destroy them.
>> From sequential behaviour it is probably not an issues can you prove it to
>> me? Or  is it highly dangerous to run lvm_shared without CLVM?
>>
>> Thanks, Milos
>>
>>
>> Dne 30.1.2013 10:09, Marlok Tamás napsal(a):
>>
>> Hi,
>>
>> We have a custom datastore, and transfer manager driver, which runs the
>> lvchange command when it is needed.
>> In order to work, you have to enable it in oned.conf.
>>
>> for example:
>>
>> DATASTORE_MAD = [
>>     executable = "one_datastore",
>>     arguments  = "-t 10 -d fs,vmware,iscsi,lvm,shared_lvm"]
>>
>> TM_MAD = [
>>     executable = "one_tm",
>>     arguments  = "-t 10 -d
>> dummy,lvm,shared,qcow2,ssh,vmware,iscsi,shared_lvm" ]
>>
>> After that, you can create a datastore, with the shared_lvm tm and
>> datastore driver.
>>
>> The only limitation is that you can't live migrate VM-s. We have a
>> working solution for that as well, but it is still untested.I can send you
>> that too, if you want to help us testing it.
>>
>> Anyway, here are the drivers, feel free to use or modify it.
>> https://dl.dropbox.com/u/140123/shared_lvm.tar.gz
>>
>> --
>> Cheers,
>> Marlok Tamas
>> MTA Sztaki
>>
>>
>>
>> On Thu, Jan 24, 2013 at 11:32 PM, Mihály Héder <
>> mihaly.heder at sztaki.mta.hu> wrote:
>>
>>> Hi,
>>>
>>> Well, if you can run the lvs or lvscan on at least one server
>>> successfully, then the metadata is probably fine.
>>> We had similar issues before we learned how to exclude unnecessary
>>> block devices in the lvm config.
>>>
>>> The thing is that lvscan and lvs will try to check _every_ potential
>>> block device by default for LVM partitions. If you are lucky, this is
>>> only annoying, because it will throw 'can't read /dev/sdX' or similar
>>> messages. However, if you are using dm-multipath, you will have one
>>> device for each path, like /dev/sdr _plus_ the aggregated device with
>>> the name you have configured in multipath.conf (/dev/mapper/yourname)
>>> what you actually need. LVM did not quite understand this situation
>>> and got stuck on the individual path devices, so we have configured to
>>> look for lvm only on the right place. In man page of lvm.conf look for
>>> the devices / scan and filter options. Also there are quite good
>>> examples in the comments there.
>>>
>>> Also, there could be a much simpler explanation to the issue:
>>> something with the iSCSI connection or multipath that are one layer
>>> below.
>>>
>>> I hope this helps.
>>>
>>> Cheers
>>> Mihály
>>>
>>> On 24 January 2013 23:18, Miloš Kozák <milos.kozak at lejmr.com> wrote:
>>> > Hi, thank you. I tried to update TM ln script, which works but it is
>>> not
>>> > clean solution. So I will try to write hook code and then we can
>>> discuss it.
>>> >
>>> > I deployed a few VM and now on the other server lvs command freezes. I
>>> have
>>> > not set up clvm, do you think it could be caused by lvm metadata
>>> corruption?
>>> > The thing is I can not longer start a VM on the other server.
>>> >
>>> > Miloš
>>> >
>>> > Dne 24.1.2013 23:10, Mihály Héder napsal(a):
>>>  >
>>> >> Hi!
>>> >>
>>> >> We solve this problem via hooks that are activating the LV-s for us
>>> >> when we start/migrate a VM. Unfortunately I will be out of office
>>> >> until early next week but then I will consult with my colleague who
>>> >> did the actual coding of this part and we will share the code.
>>> >>
>>> >> Cheers
>>> >> Mihály
>>> >>
>>> >> On 24 January 2013 20:15, Miloš Kozák<milos.kozak at lejmr.com>  wrote:
>>> >>>
>>> >>> Hi, I have just set it up having two hosts with shared blockdevice.
>>> On
>>> >>> top
>>> >>> of that LVM, as discussed earlier. Triggering lvs I can see all
>>> logical
>>> >>> volumes. When I create a new LV  on the other server, I can see the
>>> LV
>>> >>> being
>>> >>> inactive, so I have to run lvchange -ay VG/LV enable it then this LV
>>> can
>>> >>> be
>>> >>> used for VM..
>>> >>>
>>> >>> Is there any trick howto auto enable newly created LV on every host?
>>> >>>
>>> >>> Thanks Milos
>>> >>>
>>> >>> Dne 22.1.2013 18:22, Mihály Héder napsal(a):
>>> >>>
>>> >>>> Hi!
>>> >>>>
>>> >>>> You need to look at locking_type in the lvm.conf manual [1]. The
>>> >>>> default - locking in a local directory - is ok for the frontend, and
>>> >>>> type 4 is read-only. However, you should not forget that this only
>>> >>>> prevents damaging thing by the lvm commands. If you start to write
>>> >>>> zeros to your disk with the dd command for example, that will kill
>>> >>>> your partition regardless the lvm setting. So this is against user
>>> or
>>> >>>> middleware errors mainly, not against malicious attacks.
>>> >>>>
>>> >>>> Cheers
>>> >>>> Mihály Héder
>>> >>>> MTA SZTAKI
>>> >>>>
>>> >>>> [1] http://linux.die.net/man/5/lvm.conf
>>> >>>>
>>> >>>> On 21 January 2013 18:58, Miloš Kozák<milos.kozak at lejmr.com>
>>> wrote:
>>> >>>>>
>>> >>>>> Oh snap, that sounds great I didn't know about that.. it makes all
>>> >>>>> easier.
>>> >>>>> In this scenario only frontend can work with LVM, so no issues of
>>> >>>>> concurrent
>>> >>>>> change. Only one last think to make it really safe against that. Is
>>> >>>>> there
>>> >>>>> any way to suppress LVM changes from hosts, make it read only? And
>>> let
>>> >>>>> it
>>> >>>>> RW
>>> >>>>> at frontend?
>>> >>>>>
>>> >>>>> Thanks
>>> >>>>>
>>> >>>>>
>>> >>>>> Dne 21.1.2013 18:50, Mihály Héder napsal(a):
>>> >>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> no, you don't have to do any of that. Also, nebula doesn't have to
>>> >>>>>> care about LVM metadata at all and therefore there is no
>>> corresponding
>>> >>>>>> function in it. At /etc/lvm there is no metadata, only
>>> configuration
>>> >>>>>> files.
>>> >>>>>>
>>> >>>>>> Lvm metadata simply sits somewhere at the beginning of your
>>> >>>>>> iscsi-shared disk, like a partition table. So it is on the storage
>>> >>>>>> that is accessed by all your hosts, and no distribution is
>>> necessary.
>>> >>>>>> Nebula frontend simply issues lvcreate, lvchange, etc, on this
>>> shared
>>> >>>>>> disk and those commands will manipulate the metadata.
>>> >>>>>>
>>> >>>>>> It is really LVM's internal business, many layers below
>>> opennebula.
>>> >>>>>> All you have to make sure that you don't run these commands
>>> >>>>>> concurrently  from multiple hosts on the same iscsi-attached disk,
>>> >>>>>> because then they could interfere with each other. This setting is
>>> >>>>>> what you have to indicate in /etc/lvm on the server hosts.
>>> >>>>>>
>>> >>>>>> Cheers
>>> >>>>>> Mihály
>>> >>>>>>
>>> >>>>>> On 21 January 2013 18:37, Miloš Kozák<milos.kozak at lejmr.com>
>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Thank you. does it mean, that I can distribute metadata files
>>> located
>>> >>>>>>> in
>>> >>>>>>> /etc/lvm on frontend onto other hosts and these hosts will see my
>>> >>>>>>> logical
>>> >>>>>>> volumes? Is there any code in nebula which would provide it? Or I
>>> >>>>>>> need
>>> >>>>>>> to
>>> >>>>>>> update DS scripts to update/distribute LVM metadata among
>>> servers?
>>> >>>>>>>
>>> >>>>>>> Thanks, Milos
>>> >>>>>>>
>>> >>>>>>> Dne 21.1.2013 18:29, Mihály Héder napsal(a):
>>> >>>>>>>
>>> >>>>>>>> Hi,
>>> >>>>>>>>
>>> >>>>>>>> lvm metadata[1] is simply stored on the disk. In the setup we
>>> are
>>> >>>>>>>> discussing this happens to be a  shared virtual disk on the
>>> storage,
>>> >>>>>>>> so any other hosts that are attaching the same virtual disk
>>> should
>>> >>>>>>>> see
>>> >>>>>>>> the changes as they happen, provided that they re-read the disk.
>>> >>>>>>>> This
>>> >>>>>>>> re-reading step is what you can trigger with lvscan, but
>>> nowadays
>>> >>>>>>>> that
>>> >>>>>>>> seems to be unnecessary. For us it works with Centos 6.3 so I
>>> guess
>>> >>>>>>>> Sc
>>> >>>>>>>> Linux should be fine as well.
>>> >>>>>>>>
>>> >>>>>>>> Cheers
>>> >>>>>>>> Mihály
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> [1]
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> http://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/lvm_metadata.html
>>> >>>>>>>>
>>> >>>>>>>> On 21 January 2013 12:53, Miloš Kozák<milos.kozak at lejmr.com>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> Hi,
>>>
>>> >>>>>>>>> thank you for great answer. As I wrote my objective is to
>>> avoid as
>>> >>>>>>>>> much
>>> >>>>>>>>> of
>>> >>>>>>>>> clustering sw (pacemaker,..) as possible, so clvm is one of
>>> these
>>> >>>>>>>>> things
>>> >>>>>>>>> I
>>> >>>>>>>>> feel bad about them in my configuration.. Therefore I would
>>> rather
>>> >>>>>>>>> let
>>> >>>>>>>>> nebula manage LVM metadata in the first place as I you wrote.
>>> Only
>>> >>>>>>>>> one
>>> >>>>>>>>> last
>>> >>>>>>>>> thing I dont understand is a way nebula distributes LVM
>>> metadata?
>>> >>>>>>>>>
>>> >>>>>>>>> Is kernel in Scientific Linux 6.3 new enought to LVM issue you
>>> >>>>>>>>> mentioned?
>>> >>>>>>>>>
>>> >>>>>>>>> Thanks Milos
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Dne 21.1.2013 12:34, Mihály Héder napsal(a):
>>> >>>>>>>>>
>>> >>>>>>>>>> Hi!
>>> >>>>>>>>>>
>>> >>>>>>>>>> Last time we could test an Equalogic it did not have option
>>> for
>>> >>>>>>>>>> create/configure Virtual Disks inside in it by an API, so I
>>> think
>>> >>>>>>>>>> the
>>> >>>>>>>>>> iSCSI driver is not an alternative, as it would require a
>>> >>>>>>>>>> configuration step per virtual machine on the storage.
>>> >>>>>>>>>>
>>> >>>>>>>>>> However, you can use your storage just fine in a shared LVM
>>> >>>>>>>>>> scenario.
>>> >>>>>>>>>> You need to consider two different things:
>>> >>>>>>>>>> -the LVM metadata, and the actual VM data on the partitions.
>>> It is
>>> >>>>>>>>>> true, that the concurrent modification of the metadata should
>>> be
>>> >>>>>>>>>> avoided as in theory it can damage the whole virtual group.
>>> You
>>> >>>>>>>>>> could
>>> >>>>>>>>>> use clvm which avoids that by clustered locking, and then
>>> every
>>> >>>>>>>>>> participating machine can safely create/modify/delete LV-s.
>>> >>>>>>>>>> However,
>>> >>>>>>>>>> in a nebula setup this is not necessary in every case: you can
>>> >>>>>>>>>> make
>>> >>>>>>>>>> the LVM metadata read only on your host servers, and let only
>>> the
>>> >>>>>>>>>> frontend modify it. Then it can use local locking that does
>>> not
>>> >>>>>>>>>> require clvm.
>>> >>>>>>>>>> -of course the host servers can write the data inside the
>>> >>>>>>>>>> partitions
>>> >>>>>>>>>> regardless that the metadata is read-only for them. It should
>>> work
>>> >>>>>>>>>> just fine as long as you don't start two VMs for one
>>> partition.
>>> >>>>>>>>>>
>>> >>>>>>>>>> We are running this setup with a dual controller Dell MD3600
>>> >>>>>>>>>> storage
>>> >>>>>>>>>> without issues so far. Before that, we used to do the same
>>> with
>>> >>>>>>>>>> XEN
>>> >>>>>>>>>> machines for years on an older EMC (that was before nebula).
>>> Now
>>> >>>>>>>>>> with
>>> >>>>>>>>>> nebula we have been using a home-grown module for doing that,
>>> >>>>>>>>>> which
>>> >>>>>>>>>> I
>>> >>>>>>>>>> can send you any time - we plan to submit that as a feature
>>> >>>>>>>>>> enhancement anyway. Also, there seems to be a similar shared
>>> LVM
>>> >>>>>>>>>> module in the nebula upstream which we could not get to work
>>> yet,
>>> >>>>>>>>>> but
>>> >>>>>>>>>> did not try much.
>>> >>>>>>>>>>
>>> >>>>>>>>>> The plus side of this setup is that you can make live
>>> migration
>>> >>>>>>>>>> work
>>> >>>>>>>>>> nicely. There are two points to consider however: once you
>>> set the
>>> >>>>>>>>>> LVM
>>> >>>>>>>>>> metadata read-only you wont be able to modify the local LVMs
>>> in
>>> >>>>>>>>>> your
>>> >>>>>>>>>> servers, if there are any. Also, in older kernels, when you
>>> >>>>>>>>>> modified
>>> >>>>>>>>>> the LVM on one machine the others did not get notified about
>>> the
>>> >>>>>>>>>> changes, so you had to issue an lvs command. However in new
>>> >>>>>>>>>> kernels
>>> >>>>>>>>>> this issue seems to be solved, the LVs get instantly updated.
>>> I
>>> >>>>>>>>>> don't
>>> >>>>>>>>>> know when and what exactly changed though.
>>> >>>>>>>>>>
>>> >>>>>>>>>> Cheers
>>> >>>>>>>>>> Mihály Héder
>>> >>>>>>>>>> MTA SZTAKI ITAK
>>> >>>>>>>>>>
>>> >>>>>>>>>> On 18 January 2013 08:57, Miloš Kozák<milos.kozak at lejmr.com>
>>>
>>> >>>>>>>>>> wrote:
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Hi, I am setting up a small installation of opennebula with
>>> >>>>>>>>>>> sharedstorage
>>> >>>>>>>>>>> using iSCSI. THe storage is Equilogic EMC with two
>>> controllers.
>>> >>>>>>>>>>> Nowadays
>>> >>>>>>>>>>> we
>>> >>>>>>>>>>> have only two host servers so we use backed direct connection
>>> >>>>>>>>>>> between
>>> >>>>>>>>>>> storage and each server, see attachment. For this purpose we
>>> set
>>> >>>>>>>>>>> up
>>> >>>>>>>>>>> dm-multipath. Cause in the future we want to add other
>>> servers
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>> some
>>> >>>>>>>>>>> other technology will be necessary in the network segment.
>>> >>>>>>>>>>> Thesedays
>>> >>>>>>>>>>> we
>>> >>>>>>>>>>> try
>>> >>>>>>>>>>> to make it as same as possible with future topology from
>>> >>>>>>>>>>> protocols
>>> >>>>>>>>>>> point
>>> >>>>>>>>>>> of
>>> >>>>>>>>>>> view.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> My question is related to the way how to define datastore,
>>> which
>>> >>>>>>>>>>> driver
>>> >>>>>>>>>>> and
>>> >>>>>>>>>>> TM is the best and which?
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> My primal objective is to avoid GFS2 or any other cluster
>>> >>>>>>>>>>> filesystem
>>> >>>>>>>>>>> I
>>> >>>>>>>>>>> would
>>> >>>>>>>>>>> prefer to keep datastore as block devices. Only option I see
>>> is
>>> >>>>>>>>>>> to
>>> >>>>>>>>>>> use
>>> >>>>>>>>>>> LVM
>>> >>>>>>>>>>> but I worry about concurent writes isn't it a problem? I was
>>> >>>>>>>>>>> googling
>>> >>>>>>>>>>> a
>>> >>>>>>>>>>> bit
>>> >>>>>>>>>>> and I found I would need to set up clvm - is it really
>>> necessary?
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Or is better to use iSCSI driver, drop the dm-multipath and
>>> hope?
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Thanks, Milos
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> _______________________________________________
>>> >>>>>>>>>>> Users mailing list
>>> >>>>>>>>>>> Users at lists.opennebula.org
>>> >>>>>>>>>>>
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>> >>>>>>>>>>>
>>> >>>>>>> _______________________________________________
>>> >>>>>>> Users mailing list
>>> >>>>>>> Users at lists.opennebula.org
>>> >>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>> >>>>>
>>> >>>>>
>>> >
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>>
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130130/ecc9e193/attachment-0002.htm>