Hi Mark,<div><br></div><div>yes, I was thinking your patch should fix this issue, in fact it was one of the reasons I thought it was a large improvement over our current ones. I didn't have the time to merge it into master yet, but I'll work on that pretty soon :)</div>
<div><br></div><div>cheers,<br>Jaime</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Dec 6, 2012 at 8:09 PM, Mark Gergely <span dir="ltr"><<a href="mailto:gergely.mark@sztaki.mta.hu" target="_blank">gergely.mark@sztaki.mta.hu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Ruben, Alain,<br>
<br>
our improved iSCSI driver set that we proposed before should solve this issue. As mentioned in the ticket, it is possible to simultaneously start hundreds of non persistent virtual machines.<br>
The TM concurrency level is 15.<br>
You can check the details at: <a href="http://dev.opennebula.org/issues/1592" target="_blank">http://dev.opennebula.org/issues/1592</a><br>
<br>
All the best,<br>
Mark Gergely<br>
MTA-SZTAKI LPDS<br>
<div><div><br>
On 2012.12.06., at 20:01, "Ruben S. Montero" <<a href="mailto:rsmontero@opennebula.org" target="_blank">rsmontero@opennebula.org</a>> wrote:<br>
<br>
> Hi Alain,<br>
><br>
> You are totally right, this may be a problem when instantiated<br>
> multiple VMs at the same time. I've filled an issue to look for the<br>
> best way to generate the TID [1].<br>
><br>
> We'd be interested in updating the tgtadm_next_tid function in<br>
> scripts_common.sh. Also if the tgt server is getting overloaded by<br>
> this simultaneous deployments, there are several ways to limit the<br>
> concurrency of the TM (e.g. the -t option in oned.conf)<br>
><br>
> THANKS for the feedback!<br>
><br>
> Ruben<br>
><br>
> [1] <a href="http://dev.opennebula.org/issues/1682" target="_blank">http://dev.opennebula.org/issues/1682</a><br>
><br>
> [1] <a href="http://dev.opennebula.org/issues/1682" target="_blank">http://dev.opennebula.org/issues/1682</a><br>
><br>
> On Thu, Dec 6, 2012 at 1:52 PM, Alain Pannetrat<br>
> <<a href="mailto:apannetrat@cloudsecurityalliance.org" target="_blank">apannetrat@cloudsecurityalliance.org</a>> wrote:<br>
>> Hi all,<br>
>><br>
>> I'm new to OpenNebula and this mailing list, so forgive me if I<br>
>> stumble over a topic that may have already been discussed.<br>
>><br>
>> I'm currently discovering opennebula 3.8.1 with a simple 3 node<br>
>> system: a control node, a compute node and a datastore node<br>
>> (iscsi+lvm).<br>
>><br>
>> I have been testing the bulk instantiation of virtual machines in<br>
>> sunstone, where I initiate the bulk creation of 8 virtual machines in<br>
>> parallel. I have noticed that between 2 and 4 machines just fail to<br>
>> instantiate correctly with the typical following error message:<br>
>><br>
>> 08 2012 [TM][I]: Command execution fail:<br>
>> /var/lib/one/remotes/tm/iscsi/clone<br>
>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26<br>
>> compute.admin.lan:/var/lib/one//datastores/0/111/disk.0 111 101<br>
>> Thu Dec 6 14:40:08 2012 [TM][E]: clone: Command " set -e<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: set -x<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]:<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: # get size<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: SIZE=$(sudo lvs --noheadings -o<br>
>> lv_size "/dev/vg-one/lv-one-26")<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]:<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: # create lv<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo lvcreate -L${SIZE} vg-one -n<br>
>> lv-one-26-111<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]:<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: # clone lv with dd<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo dd if=/dev/vg-one/lv-one-26<br>
>> of=/dev/vg-one/lv-one-26-111 bs=64k<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]:<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: # new iscsi target<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: TID=$(sudo tgtadm --lld iscsi --op<br>
>> show --mode target | grep "Target" | tail -n 1 |<br>
>> awk '{split($2,tmp,":"); print tmp[1]+1;}')<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]:<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new<br>
>> --mode target --tid $TID --targetname<br>
>> iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op bind<br>
>> --mode target --tid $TID -I ALL<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgtadm --lld iscsi --op new<br>
>> --mode logicalunit --tid $TID --lun 1 --backing-store<br>
>> /dev/vg-one/lv-one-26-111<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: sudo tgt-admin --dump |sudo tee<br>
>> /etc/tgt/targets.conf > /dev/null 2>&1" failed: + sudo lvs<br>
>> --noheadings -o lv_size /dev/vg-one/lv-one-26<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records in<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: 131072+0 records out<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: <a href="tel:8589934592" value="+18589934592" target="_blank">8589934592</a> bytes (8.6 GB) copied,<br>
>> 898.903 s, 9.6 MB/s<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: tgtadm: this target already exists<br>
>> Thu Dec 6 14:40:08 2012 [TM][E]: Error cloning<br>
>> compute.admin.lan:/dev/vg-one/lv-one-26-111<br>
>> Thu Dec 6 14:40:08 2012 [TM][I]: ExitCode: 22<br>
>> Thu Dec 6 14:40:08 2012 [TM][E]: Error executing image transfer<br>
>> script: Error cloning compute.admin.lan:/dev/vg-one/lv-one-26-111<br>
>> Thu Dec 6 14:40:09 2012 [DiM][I]: New VM state is FAILED<br>
>><br>
>> After adding traces in the code, I found that there seems to be a race<br>
>> condition in /var/lib/one/remotes/tm/iscsi/clone here the following<br>
>> commands get executed:<br>
>><br>
>> TID=\$($SUDO $(tgtadm_next_tid))<br>
>> $SUDO $(tgtadm_target_new "\$TID" "$NEW_IQN")<br>
>><br>
>> These commands are typically expanded to something like this:<br>
>><br>
>> TID=$(sudo tgtadm --lld iscsi --op show --mode target | grep "Target"<br>
>> | tail -n 1 | awk '{split($2,tmp,":");<br>
>> sudo tgtadm --lld iscsi --op new --mode target --tid $TID<br>
>> --targetname iqn.2012-02.org.opennebula:san.vg-one.lv-one-26-111<br>
>><br>
>> What seems to happens is two (or more) calls to the first command<br>
>> tgtadm_next_tid happen simultaneously before the second command gets a<br>
>> chance to get executed, and then TID as the same value for two (or<br>
>> more) VMs.<br>
>><br>
>> The workaround I found is to replace the line:<br>
>> TID=\$($SUDO $(tgtadm_next_tid))<br>
>> with<br>
>> TID=$VMID<br>
>> in /var/lib/one/remotes/tm/iscsi/clone<br>
>><br>
>> Since $VMID is globally unique no race conditions can happen here.<br>
>> I've tested this and the failures don't happen anymore in my setting.<br>
>> Of course I'm not sure this is the ideal fix, since perhaps VMID can<br>
>> take values that are out of range for tgtadm. So futher testing would<br>
>> be needed.<br>
>><br>
>> I'd be happy to get your thoughts/feedback on this issue.<br>
>><br>
>> Best,<br>
>><br>
>> Alain<br>
>> _______________________________________________<br>
>> Users mailing list<br>
>> <a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
>> <a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>
><br>
><br>
><br>
> --<br>
> Ruben S. Montero, PhD<br>
> Project co-Lead and Chief Architect<br>
> OpenNebula - The Open Source Solution for Data Center Virtualization<br>
> <a href="http://www.OpenNebula.org" target="_blank">www.OpenNebula.org</a> | <a href="mailto:rsmontero@opennebula.org" target="_blank">rsmontero@opennebula.org</a> | @OpenNebula<br>
> _______________________________________________<br>
> Users mailing list<br>
> <a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
> <a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>
<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@lists.opennebula.org" target="_blank">Users@lists.opennebula.org</a><br>
<a href="http://lists.opennebula.org/listinfo.cgi/users-opennebula.org" target="_blank">http://lists.opennebula.org/listinfo.cgi/users-opennebula.org</a><br>
</div></div></blockquote></div><br></div>
<br clear="all"><div><br></div>-- <br>Jaime Melis<br>Project Engineer<br>OpenNebula - The Open Source Toolkit for Cloud Computing<br><a href="http://www.OpenNebula.org" target="_blank">www.OpenNebula.org</a> | <a href="mailto:jmelis@opennebula.org" target="_blank">jmelis@opennebula.org</a><br>