[one-users] iSCSI multipath

Miloš Kozák milos.kozak at lejmr.com
Wed Jan 30 04:37:15 PST 2013


Hi, thank you. I checked source codes and I found it is very similar to 
LVM TM/Datastore drivers which is facilitated in ONE already only you 
added lvchange -ay DEV. Do you run CLVM along that or not?

I worry about parallel changes of LVM metadata which might destroy them. 
>From sequential behaviour it is probably not an issues can you prove it 
to me? Or  is it highly dangerous to run lvm_shared without CLVM?

Thanks, Milos


Dne 30.1.2013 10:09, Marlok Tamás napsal(a):
> Hi,
>
> We have a custom datastore, and transfer manager driver, which runs 
> the lvchange command when it is needed.
> In order to work, you have to enable it in oned.conf.
>
> for example:
>
> DATASTORE_MAD = [
>     executable = "one_datastore",
>     arguments  = "-t 10 -d fs,vmware,iscsi,lvm,shared_lvm"]
>
> TM_MAD = [
>     executable = "one_tm",
>     arguments  = "-t 10 -d 
> dummy,lvm,shared,qcow2,ssh,vmware,iscsi,shared_lvm" ]
>
> After that, you can create a datastore, with the shared_lvm tm and 
> datastore driver.
>
> The only limitation is that you can't live migrate VM-s. We have a 
> working solution for that as well, but it is still untested.I can send 
> you that too, if you want to help us testing it.
>
> Anyway, here are the drivers, feel free to use or modify it.
> https://dl.dropbox.com/u/140123/shared_lvm.tar.gz
>
> --
> Cheers,
> Marlok Tamas
> MTA Sztaki
>
>
>
> On Thu, Jan 24, 2013 at 11:32 PM, Mihály Héder 
> <mihaly.heder at sztaki.mta.hu <mailto:mihaly.heder at sztaki.mta.hu>> wrote:
>
>     Hi,
>
>     Well, if you can run the lvs or lvscan on at least one server
>     successfully, then the metadata is probably fine.
>     We had similar issues before we learned how to exclude unnecessary
>     block devices in the lvm config.
>
>     The thing is that lvscan and lvs will try to check _every_ potential
>     block device by default for LVM partitions. If you are lucky, this is
>     only annoying, because it will throw 'can't read /dev/sdX' or similar
>     messages. However, if you are using dm-multipath, you will have one
>     device for each path, like /dev/sdr _plus_ the aggregated device with
>     the name you have configured in multipath.conf (/dev/mapper/yourname)
>     what you actually need. LVM did not quite understand this situation
>     and got stuck on the individual path devices, so we have configured to
>     look for lvm only on the right place. In man page of lvm.conf look for
>     the devices / scan and filter options. Also there are quite good
>     examples in the comments there.
>
>     Also, there could be a much simpler explanation to the issue:
>     something with the iSCSI connection or multipath that are one layer
>     below.
>
>     I hope this helps.
>
>     Cheers
>     Mihály
>
>     On 24 January 2013 23:18, Miloš Kozák <milos.kozak at lejmr.com
>     <mailto:milos.kozak at lejmr.com>> wrote:
>     > Hi, thank you. I tried to update TM ln script, which works but
>     it is not
>     > clean solution. So I will try to write hook code and then we can
>     discuss it.
>     >
>     > I deployed a few VM and now on the other server lvs command
>     freezes. I have
>     > not set up clvm, do you think it could be caused by lvm metadata
>     corruption?
>     > The thing is I can not longer start a VM on the other server.
>     >
>     > Miloš
>     >
>     > Dne 24.1.2013 23:10, Mihály Héder napsal(a):
>     >
>     >> Hi!
>     >>
>     >> We solve this problem via hooks that are activating the LV-s for us
>     >> when we start/migrate a VM. Unfortunately I will be out of office
>     >> until early next week but then I will consult with my colleague who
>     >> did the actual coding of this part and we will share the code.
>     >>
>     >> Cheers
>     >> Mihály
>     >>
>     >> On 24 January 2013 20:15, Miloš Kozák<milos.kozak at lejmr.com
>     <mailto:milos.kozak at lejmr.com>>  wrote:
>     >>>
>     >>> Hi, I have just set it up having two hosts with shared
>     blockdevice. On
>     >>> top
>     >>> of that LVM, as discussed earlier. Triggering lvs I can see
>     all logical
>     >>> volumes. When I create a new LV  on the other server, I can
>     see the LV
>     >>> being
>     >>> inactive, so I have to run lvchange -ay VG/LV enable it then
>     this LV can
>     >>> be
>     >>> used for VM..
>     >>>
>     >>> Is there any trick howto auto enable newly created LV on every
>     host?
>     >>>
>     >>> Thanks Milos
>     >>>
>     >>> Dne 22.1.2013 18:22, Mihály Héder napsal(a):
>     >>>
>     >>>> Hi!
>     >>>>
>     >>>> You need to look at locking_type in the lvm.conf manual [1]. The
>     >>>> default - locking in a local directory - is ok for the
>     frontend, and
>     >>>> type 4 is read-only. However, you should not forget that this
>     only
>     >>>> prevents damaging thing by the lvm commands. If you start to
>     write
>     >>>> zeros to your disk with the dd command for example, that will
>     kill
>     >>>> your partition regardless the lvm setting. So this is against
>     user or
>     >>>> middleware errors mainly, not against malicious attacks.
>     >>>>
>     >>>> Cheers
>     >>>> Mihály Héder
>     >>>> MTA SZTAKI
>     >>>>
>     >>>> [1] http://linux.die.net/man/5/lvm.conf
>     >>>>
>     >>>> On 21 January 2013 18:58, Miloš Kozák<milos.kozak at lejmr.com
>     <mailto:milos.kozak at lejmr.com>>   wrote:
>     >>>>>
>     >>>>> Oh snap, that sounds great I didn't know about that.. it
>     makes all
>     >>>>> easier.
>     >>>>> In this scenario only frontend can work with LVM, so no
>     issues of
>     >>>>> concurrent
>     >>>>> change. Only one last think to make it really safe against
>     that. Is
>     >>>>> there
>     >>>>> any way to suppress LVM changes from hosts, make it read
>     only? And let
>     >>>>> it
>     >>>>> RW
>     >>>>> at frontend?
>     >>>>>
>     >>>>> Thanks
>     >>>>>
>     >>>>>
>     >>>>> Dne 21.1.2013 18:50, Mihály Héder napsal(a):
>     >>>>>
>     >>>>>> Hi,
>     >>>>>>
>     >>>>>> no, you don't have to do any of that. Also, nebula doesn't
>     have to
>     >>>>>> care about LVM metadata at all and therefore there is no
>     corresponding
>     >>>>>> function in it. At /etc/lvm there is no metadata, only
>     configuration
>     >>>>>> files.
>     >>>>>>
>     >>>>>> Lvm metadata simply sits somewhere at the beginning of your
>     >>>>>> iscsi-shared disk, like a partition table. So it is on the
>     storage
>     >>>>>> that is accessed by all your hosts, and no distribution is
>     necessary.
>     >>>>>> Nebula frontend simply issues lvcreate, lvchange, etc, on
>     this shared
>     >>>>>> disk and those commands will manipulate the metadata.
>     >>>>>>
>     >>>>>> It is really LVM's internal business, many layers below
>     opennebula.
>     >>>>>> All you have to make sure that you don't run these commands
>     >>>>>> concurrently  from multiple hosts on the same
>     iscsi-attached disk,
>     >>>>>> because then they could interfere with each other. This
>     setting is
>     >>>>>> what you have to indicate in /etc/lvm on the server hosts.
>     >>>>>>
>     >>>>>> Cheers
>     >>>>>> Mihály
>     >>>>>>
>     >>>>>> On 21 January 2013 18:37, Miloš Kozák<milos.kozak at lejmr.com
>     <mailto:milos.kozak at lejmr.com>>   wrote:
>     >>>>>>>
>     >>>>>>> Thank you. does it mean, that I can distribute metadata
>     files located
>     >>>>>>> in
>     >>>>>>> /etc/lvm on frontend onto other hosts and these hosts will
>     see my
>     >>>>>>> logical
>     >>>>>>> volumes? Is there any code in nebula which would provide
>     it? Or I
>     >>>>>>> need
>     >>>>>>> to
>     >>>>>>> update DS scripts to update/distribute LVM metadata among
>     servers?
>     >>>>>>>
>     >>>>>>> Thanks, Milos
>     >>>>>>>
>     >>>>>>> Dne 21.1.2013 18:29, Mihály Héder napsal(a):
>     >>>>>>>
>     >>>>>>>> Hi,
>     >>>>>>>>
>     >>>>>>>> lvm metadata[1] is simply stored on the disk. In the
>     setup we are
>     >>>>>>>> discussing this happens to be a  shared virtual disk on
>     the storage,
>     >>>>>>>> so any other hosts that are attaching the same virtual
>     disk should
>     >>>>>>>> see
>     >>>>>>>> the changes as they happen, provided that they re-read
>     the disk.
>     >>>>>>>> This
>     >>>>>>>> re-reading step is what you can trigger with lvscan, but
>     nowadays
>     >>>>>>>> that
>     >>>>>>>> seems to be unnecessary. For us it works with Centos 6.3
>     so I guess
>     >>>>>>>> Sc
>     >>>>>>>> Linux should be fine as well.
>     >>>>>>>>
>     >>>>>>>> Cheers
>     >>>>>>>> Mihály
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>> [1]
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     >>>>>>>>
>     http://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/lvm_metadata.html
>     >>>>>>>>
>     >>>>>>>> On 21 January 2013 12:53, Miloš
>     Kozák<milos.kozak at lejmr.com <mailto:milos.kozak at lejmr.com>>
>     >>>>>>>> wrote:
>     >>>>>>>>>
>     >>>>>>>>> Hi,
>     >>>>>>>>> thank you for great answer. As I wrote my objective is
>     to avoid as
>     >>>>>>>>> much
>     >>>>>>>>> of
>     >>>>>>>>> clustering sw (pacemaker,..) as possible, so clvm is one
>     of these
>     >>>>>>>>> things
>     >>>>>>>>> I
>     >>>>>>>>> feel bad about them in my configuration.. Therefore I
>     would rather
>     >>>>>>>>> let
>     >>>>>>>>> nebula manage LVM metadata in the first place as I you
>     wrote. Only
>     >>>>>>>>> one
>     >>>>>>>>> last
>     >>>>>>>>> thing I dont understand is a way nebula distributes LVM
>     metadata?
>     >>>>>>>>>
>     >>>>>>>>> Is kernel in Scientific Linux 6.3 new enought to LVM
>     issue you
>     >>>>>>>>> mentioned?
>     >>>>>>>>>
>     >>>>>>>>> Thanks Milos
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>>
>     >>>>>>>>> Dne 21.1.2013 12:34, Mihály Héder napsal(a):
>     >>>>>>>>>
>     >>>>>>>>>> Hi!
>     >>>>>>>>>>
>     >>>>>>>>>> Last time we could test an Equalogic it did not have
>     option for
>     >>>>>>>>>> create/configure Virtual Disks inside in it by an API,
>     so I think
>     >>>>>>>>>> the
>     >>>>>>>>>> iSCSI driver is not an alternative, as it would require a
>     >>>>>>>>>> configuration step per virtual machine on the storage.
>     >>>>>>>>>>
>     >>>>>>>>>> However, you can use your storage just fine in a shared LVM
>     >>>>>>>>>> scenario.
>     >>>>>>>>>> You need to consider two different things:
>     >>>>>>>>>> -the LVM metadata, and the actual VM data on the
>     partitions. It is
>     >>>>>>>>>> true, that the concurrent modification of the metadata
>     should be
>     >>>>>>>>>> avoided as in theory it can damage the whole virtual
>     group. You
>     >>>>>>>>>> could
>     >>>>>>>>>> use clvm which avoids that by clustered locking, and
>     then every
>     >>>>>>>>>> participating machine can safely create/modify/delete LV-s.
>     >>>>>>>>>> However,
>     >>>>>>>>>> in a nebula setup this is not necessary in every case:
>     you can
>     >>>>>>>>>> make
>     >>>>>>>>>> the LVM metadata read only on your host servers, and
>     let only the
>     >>>>>>>>>> frontend modify it. Then it can use local locking that
>     does not
>     >>>>>>>>>> require clvm.
>     >>>>>>>>>> -of course the host servers can write the data inside the
>     >>>>>>>>>> partitions
>     >>>>>>>>>> regardless that the metadata is read-only for them. It
>     should work
>     >>>>>>>>>> just fine as long as you don't start two VMs for one
>     partition.
>     >>>>>>>>>>
>     >>>>>>>>>> We are running this setup with a dual controller Dell
>     MD3600
>     >>>>>>>>>> storage
>     >>>>>>>>>> without issues so far. Before that, we used to do the
>     same with
>     >>>>>>>>>> XEN
>     >>>>>>>>>> machines for years on an older EMC (that was before
>     nebula). Now
>     >>>>>>>>>> with
>     >>>>>>>>>> nebula we have been using a home-grown module for doing
>     that,
>     >>>>>>>>>> which
>     >>>>>>>>>> I
>     >>>>>>>>>> can send you any time - we plan to submit that as a feature
>     >>>>>>>>>> enhancement anyway. Also, there seems to be a similar
>     shared LVM
>     >>>>>>>>>> module in the nebula upstream which we could not get to
>     work yet,
>     >>>>>>>>>> but
>     >>>>>>>>>> did not try much.
>     >>>>>>>>>>
>     >>>>>>>>>> The plus side of this setup is that you can make live
>     migration
>     >>>>>>>>>> work
>     >>>>>>>>>> nicely. There are two points to consider however: once
>     you set the
>     >>>>>>>>>> LVM
>     >>>>>>>>>> metadata read-only you wont be able to modify the local
>     LVMs in
>     >>>>>>>>>> your
>     >>>>>>>>>> servers, if there are any. Also, in older kernels, when you
>     >>>>>>>>>> modified
>     >>>>>>>>>> the LVM on one machine the others did not get notified
>     about the
>     >>>>>>>>>> changes, so you had to issue an lvs command. However in new
>     >>>>>>>>>> kernels
>     >>>>>>>>>> this issue seems to be solved, the LVs get instantly
>     updated. I
>     >>>>>>>>>> don't
>     >>>>>>>>>> know when and what exactly changed though.
>     >>>>>>>>>>
>     >>>>>>>>>> Cheers
>     >>>>>>>>>> Mihály Héder
>     >>>>>>>>>> MTA SZTAKI ITAK
>     >>>>>>>>>>
>     >>>>>>>>>> On 18 January 2013 08:57, Miloš
>     Kozák<milos.kozak at lejmr.com <mailto:milos.kozak at lejmr.com>>
>     >>>>>>>>>> wrote:
>     >>>>>>>>>>>
>     >>>>>>>>>>> Hi, I am setting up a small installation of opennebula
>     with
>     >>>>>>>>>>> sharedstorage
>     >>>>>>>>>>> using iSCSI. THe storage is Equilogic EMC with two
>     controllers.
>     >>>>>>>>>>> Nowadays
>     >>>>>>>>>>> we
>     >>>>>>>>>>> have only two host servers so we use backed direct
>     connection
>     >>>>>>>>>>> between
>     >>>>>>>>>>> storage and each server, see attachment. For this
>     purpose we set
>     >>>>>>>>>>> up
>     >>>>>>>>>>> dm-multipath. Cause in the future we want to add other
>     servers
>     >>>>>>>>>>> and
>     >>>>>>>>>>> some
>     >>>>>>>>>>> other technology will be necessary in the network segment.
>     >>>>>>>>>>> Thesedays
>     >>>>>>>>>>> we
>     >>>>>>>>>>> try
>     >>>>>>>>>>> to make it as same as possible with future topology from
>     >>>>>>>>>>> protocols
>     >>>>>>>>>>> point
>     >>>>>>>>>>> of
>     >>>>>>>>>>> view.
>     >>>>>>>>>>>
>     >>>>>>>>>>> My question is related to the way how to define
>     datastore, which
>     >>>>>>>>>>> driver
>     >>>>>>>>>>> and
>     >>>>>>>>>>> TM is the best and which?
>     >>>>>>>>>>>
>     >>>>>>>>>>> My primal objective is to avoid GFS2 or any other cluster
>     >>>>>>>>>>> filesystem
>     >>>>>>>>>>> I
>     >>>>>>>>>>> would
>     >>>>>>>>>>> prefer to keep datastore as block devices. Only option
>     I see is
>     >>>>>>>>>>> to
>     >>>>>>>>>>> use
>     >>>>>>>>>>> LVM
>     >>>>>>>>>>> but I worry about concurent writes isn't it a problem?
>     I was
>     >>>>>>>>>>> googling
>     >>>>>>>>>>> a
>     >>>>>>>>>>> bit
>     >>>>>>>>>>> and I found I would need to set up clvm - is it really
>     necessary?
>     >>>>>>>>>>>
>     >>>>>>>>>>> Or is better to use iSCSI driver, drop the
>     dm-multipath and hope?
>     >>>>>>>>>>>
>     >>>>>>>>>>> Thanks, Milos
>     >>>>>>>>>>>
>     >>>>>>>>>>> _______________________________________________
>     >>>>>>>>>>> Users mailing list
>     >>>>>>>>>>> Users at lists.opennebula.org
>     <mailto:Users at lists.opennebula.org>
>     >>>>>>>>>>>
>     http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>     >>>>>>>>>>>
>     >>>>>>> _______________________________________________
>     >>>>>>> Users mailing list
>     >>>>>>> Users at lists.opennebula.org <mailto:Users at lists.opennebula.org>
>     >>>>>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>     >>>>>
>     >>>>>
>     >
>     _______________________________________________
>     Users mailing list
>     Users at lists.opennebula.org <mailto:Users at lists.opennebula.org>
>     http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20130130/30bcbb0a/attachment-0002.htm>


More information about the Users mailing list