[one-users] Possible race condition in iSCSI datastore.

Thu Dec 6 12:24:52 PST 2012

Dear Mark, Ruben,

>From reading the code from those patches, I think they seem indeed to
greatly improve the iSCSI driver, and solve my problem.

Tonight, I was also confronted with the issue that in some cases it
sometimes takes more than two seconds between "iscsiadm_login
"$NEW_IQN" "$TARGET_HOST"" and "/dev/disk/by-path/*$NEW_IQN-lun-1" in
the DISCOVERY_CMD inside iscsi/clone, so the "sleep 2" inserted there
is not enough. This is also fixed by the proposed patches.
Nice work!

All the best,

Alain

On Thu, Dec 6, 2012 at 9:09 PM, Mark Gergely <gergely.mark at sztaki.mta.hu> wrote:
> Dear Ruben, Alain,
>
> our improved iSCSI driver set that we proposed before should solve this issue. As mentioned in the ticket, it is possible to simultaneously start hundreds of non persistent virtual machines.
> The TM concurrency level is 15.
> You can check the details at: http://dev.opennebula.org/issues/1592
>
> All the best,
> Mark Gergely
> MTA-SZTAKI LPDS
>
> On 2012.12.06., at 20:01, "Ruben S. Montero" <rsmontero at opennebula.org> wrote:
>
>> Hi Alain,
>>
>> You are totally right, this may be a problem when instantiated
>> multiple VMs at the same time.  I've filled an issue to look for the
>> best way to generate the TID [1].
>>
>> We'd be interested in updating the tgtadm_next_tid function in
>> scripts_common.sh. Also if the tgt server is getting overloaded by
>> this simultaneous deployments, there are several ways to limit the
>> concurrency of the TM (e.g. the -t option in oned.conf)
>>
>> THANKS for the feedback!
>>
>> Ruben
>>
>> [1]  http://dev.opennebula.org/issues/1682
>>
>> [1] http://dev.opennebula.org/issues/1682
>>
>> On Thu, Dec 6, 2012 at 1:52 PM, Alain Pannetrat
>> <apannetrat at cloudsecurityalliance.org> wrote:
>>> Hi all,
>>>
>>> I'm new to OpenNebula and this mailing list, so forgive me if I
>>> stumble over a topic that may have already been discussed.
>>>
>>> I'm currently discovering opennebula 3.8.1 with a simple 3 node
>>> system: a control node, a compute node and a datastore node
>>> (iscsi+lvm).
>>>
>>> I have been testing the bulk instantiation of virtual machines in
>>> sunstone, where I initiate the bulk creation of 8 virtual machines in
>>> parallel. I have noticed that between 2 and 4 machines just fail to
>>> instantiate correctly with the typical following error message:
>>>
>>> 08 2012 [TM][I]: Command execution fail:
>>> /var/lib/one/remotes/tm/iscsi/clone
>>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26
>>> compute.admin.lan:/var/lib/one//datastores/0/111/disk.0 111 101
>>> Thu Dec  6 14:40:08 2012 [TM][E]: clone: Command "    set -e
>>> Thu Dec  6 14:40:08 2012 [TM][I]: set -x
>>> Thu Dec  6 14:40:08 2012 [TM][I]:
>>> Thu Dec  6 14:40:08 2012 [TM][I]: # get size
>>> Thu Dec  6 14:40:08 2012 [TM][I]: SIZE=$(sudo lvs --noheadings -o
>>> lv_size "/dev/vg-one/lv-one-26")
>>> Thu Dec  6 14:40:08 2012 [TM][I]:
>>> Thu Dec  6 14:40:08 2012 [TM][I]: # create lv
>>> Thu Dec  6 14:40:08 2012 [TM][I]: sudo lvcreate -L${SIZE} vg-one -n
>>> lv-one-26-111
>>> Thu Dec  6 14:40:08 2012 [TM][I]:
>>> Thu Dec  6 14:40:08 2012 [TM][I]: # clone lv with dd
>>> Thu Dec  6 14:40:08 2012 [TM][I]: sudo dd if=/dev/vg-one/lv-one-26
>>> of=/dev/vg-one/lv-one-26-111 bs=64k
>>> Thu Dec  6 14:40:08 2012 [TM][I]:
>>> Thu Dec  6 14:40:08 2012 [TM][I]: # new iscsi target
>>> Thu Dec  6 14:40:08 2012 [TM][I]: TID=$(sudo tgtadm --lld iscsi --op
>>> show --mode target |             grep "Target" | tail -n 1 |
>>>  awk '{split($2,tmp,":"); print tmp[1]+1;}')
>>> Thu Dec  6 14:40:08 2012 [TM][I]:
>>> Thu Dec  6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new
>>> --mode target --tid $TID  --targetname
>>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111
>>> Thu Dec  6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op bind
>>> --mode target --tid $TID -I ALL
>>> Thu Dec  6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new
>>> --mode logicalunit --tid $TID  --lun 1 --backing-store
>>> /dev/vg-one/lv-one-26-111
>>> Thu Dec  6 14:40:08 2012 [TM][I]: sudo tgt-admin --dump |sudo tee
>>> /etc/tgt/targets.conf > /dev/null 2>&1" failed: + sudo lvs
>>> --noheadings -o lv_size /dev/vg-one/lv-one-26
>>> Thu Dec  6 14:40:08 2012 [TM][I]: 131072+0 records in
>>> Thu Dec  6 14:40:08 2012 [TM][I]: 131072+0 records out
>>> Thu Dec  6 14:40:08 2012 [TM][I]: 8589934592 bytes (8.6 GB) copied,
>>> 898.903 s, 9.6 MB/s
>>> Thu Dec  6 14:40:08 2012 [TM][I]: tgtadm: this target already exists
>>> Thu Dec  6 14:40:08 2012 [TM][E]: Error cloning
>>> compute.admin.lan:/dev/vg-one/lv-one-26-111
>>> Thu Dec  6 14:40:08 2012 [TM][I]: ExitCode: 22
>>> Thu Dec  6 14:40:08 2012 [TM][E]: Error executing image transfer
>>> script: Error cloning compute.admin.lan:/dev/vg-one/lv-one-26-111
>>> Thu Dec  6 14:40:09 2012 [DiM][I]: New VM state is FAILED
>>>
>>> After adding traces in the code, I found that there seems to be a race
>>> condition in /var/lib/one/remotes/tm/iscsi/clone here the following
>>> commands get executed:
>>>
>>> TID=\$($SUDO $(tgtadm_next_tid))
>>> $SUDO $(tgtadm_target_new "\$TID" "$NEW_IQN")
>>>
>>> These commands are typically expanded to something like this:
>>>
>>> TID=$(sudo tgtadm --lld iscsi --op show --mode target | grep "Target"
>>> | tail -n 1 | awk '{split($2,tmp,":");
>>> sudo tgtadm --lld iscsi --op new --mode target --tid $TID
>>> --targetname iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111
>>>
>>> What seems to happens is two (or more) calls to the first command
>>> tgtadm_next_tid happen simultaneously before the second command gets a
>>> chance to get executed, and then TID as the same value for two (or
>>> more) VMs.
>>>
>>> The workaround I found is to replace the line:
>>> TID=\$($SUDO $(tgtadm_next_tid))
>>> with
>>> TID=$VMID
>>> in /var/lib/one/remotes/tm/iscsi/clone
>>>
>>> Since $VMID is globally unique no race conditions can happen here.
>>> I've tested this and the failures don't happen anymore in my setting.
>>> Of course I'm not sure this is the ideal fix, since perhaps VMID can
>>> take values that are out of range for tgtadm. So futher testing would
>>> be needed.
>>>
>>> I'd be happy to get your thoughts/feedback on this issue.
>>>
>>> Best,
>>>
>>> Alain
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opennebula.org
>>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>>
>>
>>
>> --
>> Ruben S. Montero, PhD
>> Project co-Lead and Chief Architect
>> OpenNebula - The Open Source Solution for Data Center Virtualization
>> www.OpenNebula.org | rsmontero at opennebula.org | @OpenNebula
>> _______________________________________________
>> Users mailing list
>> Users at lists.opennebula.org
>> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org