[one-users] VMware VMs stay in boot state

Tino Vazquez tinova at fdi.ucm.es
Mon Feb 22 06:52:25 PST 2010


Hi Stefan,

Sorry for the delay, we are really busy right now.

The machines in the "boot" state are probably due to some missing
"DEPLOY SUCCESS" messages that "oned" should receive from the drivers
and is not getting. This may happen for a variety of reasons, one can
be that the drivers crashes (do you see anything in the oned.log?).

One possible experiment could be to log all incoming and outgoing
messages in the MAD, and try to reproduce the problem. Then, for the
machines in pending state, we can look to the DEPLOY SUCCESS messages
to see if they got one.

Hope it helps, thanks again for the feedback,

-Tino

--
Constantino Vázquez, Grid & Virtualization Technology
Engineer/Researcher: http://www.dsa-research.org/tinova
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org



On Thu, Feb 11, 2010 at 8:06 PM, Stefan Reichel
<Stefan.Reichel at student.hpi.uni-potsdam.de> wrote:
> Hi Tino,
> i am glad to here, that you are still investigating our problem, thanks.
> What we try is easy to sum up. We make sure that the vm list is empty.
> Afterwards we make a "onevm create" with our virtual machine templates. The
> machines are therefore "new" for opennebula and get new Ids. Then we can see
> that VM-Ware Server starts to power on the VM. Nevertheless "onevm list"
> shows often, not always, the wrong old "booting" state.
> We also now, that the Java VmWare driver, seems to work correctly in 100% of
> the cases. This was verified byt putting additional System.put.println in
> the code and redirecting the "disabled" standard output from /dev/null to a
> text file.
> Could this whole issue be releated to a performance problem? I mean during
> the deployment phase the cpu is in heavy use, because our frontend and
> VM-Ware server are on the same test machine. Can this confuse ruby or the
> oned? I don't know what to test, so i can't provide you further information.
> Best regards
> Stefan
> Am 11.02.2010 um 16:22 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> I think I am a bit lost now, let's see if you can get me back on
> track. For what I gather, the VMware drivers are working as expected,
> and the VMs seen in boot state are the ones that were previously
> registered and not unregistered due to a ONE shutdown before this
> could happen.
>
> If this is not correct, could you please elaborate a bit more on this?
>
> Regards,
>
> -Tino
>
> --
> Constantino Vázquez, Grid & Virtualization Technology
> Engineer/Researcher: http://www.dsa-research.org/tinova
> DSA Research Group: http://dsa-research.org
> Globus GridWay Metascheduler: http://www.GridWay.org
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Sat, Feb 6, 2010 at 2:59 PM, Stefan Reichel
> <stefan.reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hello Tino,
>
> are there any new information available about the described bug? Or should i
>
> make some further tests?
>
> Regards and have a nice weekend
>
> Stefan
>
> Am 02.02.2010 um 01:04 schrieb Stefan Reichel:
>
> Hi Tino,
>
> you are right, the output was collected while the machines were still
>
> running. The behavior of the vmm_vmware driver was correct. I also send you
>
> a mail on sunday with additional output, which also underlines that the Java
>
> part of the driver works as expected. Therefore i assume the problem must be
>
> somewhere in the other parts of the driver.
>
> Best regards
>
> Stefan
>
>
> Am 01.02.2010 um 12:55 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> For what I read in the java stack trace, the machine is already
>
> powered on, that is why is failing. This may happen when you kill
>
> OpenNebula (without letting it shutdown the VMs), clear it's DB and
>
> try to submit VMs again, the names will clash (if one-1 is still
>
> running, a new deployment of a one-1 will fail).
>
> If this is not the case, please let me know and we will look at something
>
> else.
>
> Regards,
>
> -Tino
>
> --
>
> Constantino Vázquez, Grid & Virtualization Technology
>
> Engineer/Researcher: http://www.dsa-research.org/tinova
>
> DSA Research Group: http://dsa-research.org
>
> Globus GridWay Metascheduler: http://www.GridWay.org
>
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Fri, Jan 29, 2010 at 2:18 AM, Stefan Reichel
>
> <Stefan.Reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hi Tino,
>
> i tried your command with 5 parameters, i think you missed the checkpoint.
>
> The result is easy to describe: there is nothing. The script itself hangs
>
> and the log files don't contain any failure or success.
>
> Therefore i tried the java class directly by calling:
>
> java -Ddebug=1 OneVmmVmware --username oneadmin --password xxxx --ignorecert
>
> And pasting:
>
> DEPLOY 1 fqdn /usr/share/one/var/1/deployment.0 CP1
>
> The script itself seems to work because of the error i get(at the end of
>
> this document). I also got once another error which was connected to the
>
> network, but think this was caused by network misconfiguration. Nevertheless
>
> i included also that log and the network file. The oned.log is also quiet
>
> useless after the "prolog success" message the monitoring  begins, no deploy
>
> success at all. I also saw once after it a line "failure:     " without any
>
> reason. Perhaps this is connected to the java output below, because there is
>
> also no reason. In that case it would be probably caused by a race
>
> condition, which would also explain why it only happens sometimes.
>
> I hope the output and descriptions give you an indication of how to find the
>
> reason for our problem.
>
> Best regards
>
> Stefan
>
>
>
>
>
>
> Output of java -Ddebug=1 OneVmmVmware --username oneadmin --password xxxx
>
> --ignorecert:
>
> DEPLOY 1 fqdn /usr/share/one/var/1/deployment.0 CP1
>
> DEPLOY FAILURE 1 Failed deploying VM in host fqdn.
>
> [29.01.2010 01:52:17] Failed deploying VM 1 into fqdn.Reason: null
>
> ---- Debug stack trace ----
>
> AxisFault
>
> faultCode: ServerFaultCode
>
> faultSubcode:
>
> faultString: The attempted operation cannot be performed in the current
>
> state (Powered On).
>
> faultActor:
>
> faultNode:
>
> faultDetail:
>
> {urn:vim2}InvalidPowerStateFault:<requestedState>poweredOn</requestedState><existingState>poweredOn</existingState>
>
> The attempted operation cannot be performed in the current state (Powered
>
> On).
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
> at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
> at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>
> at java.lang.Class.newInstance0(Class.java:372)
>
> at java.lang.Class.newInstance(Class.java:325)
>
> at
>
> org.apache.axis.encoding.ser.BeanDeserializer.<init>(BeanDeserializer.java:104)
>
> at
>
> org.apache.axis.encoding.ser.BeanDeserializer.<init>(BeanDeserializer.java:90)
>
> at
>
> com.vmware.vim.InvalidPowerState.getDeserializer(InvalidPowerState.java:156)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:616)
>
> ......
>
> at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
>
> at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
>
> at org.apache.axis.client.Call.invoke(Call.java:2767)
>
> at org.apache.axis.client.Call.invoke(Call.java:2443)
>
> at org.apache.axis.client.Call.invoke(Call.java:2366)
>
> at org.apache.axis.client.Call.invoke(Call.java:1812)
>
> at com.vmware.vim.VimBindingStub.powerOnVM_Task(VimBindingStub.java:24320)
>
> at OperationsOverVM.powerOn(OperationsOverVM.java:82)
>
> at OneVmmVmware.loop(OneVmmVmware.java:204)
>
> at OneVmmVmware.main(OneVmmVmware.java:57)
>
> [29.01.2010 01:52:17] ---------------------------
>
>
>
>
>
> Outputof java -Ddebug=1 OneVmmVmware .... based on network misconfiguration?
>
> DEPLOY FAILURE 0 Failed deploying VM in host fqdn.
>
> [29.01.2010 01:10:43] Failed deploying VM 0 into fqdn.Reason: null
>
> ---- Debug stack trace ----
>
> java.lang.NullPointerException
>
> at DeployVM.configureNetwork(DeployVM.java:268)
>
> at DeployVM.shapeVM(DeployVM.java:220)
>
> at OneVmmVmware.loop(OneVmmVmware.java:168)
>
> at OneVmmVmware.main(OneVmmVmware.java:57)
>
> [29.01.2010 01:10:43] ---------------------------
>
>
>
> Old network config:
>
> NAME   = "VMWareNet"
>
> TYPE   = RANGED
>
> BRIDGE = NAT
>
> NETWORK_ADDRESS = 192.168.189.200
>
> NETWORK_SIZE = 254
>
>
>
> Oned.log (extract)
>
> Fri Jan 29 00:55:28 2010 [TM][D]: Message received: TRANSFER SUCCESS 0 -
>
> Fri Jan 29 00:55:28 2010 [LCM][I]: prolog success:
>
> Fri Jan 29 00:55:39 2010 [VMM][I]: Recovering VMM drivers
>
> Fri Jan 29 00:56:03 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
>
> Fri Jan 29 00:56:06 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
>
> Fri Jan 29 00:56:51 2010 [VMM][I]: Monitoring VM 86.
>
> Fri Jan 29 00:56:54 2010 [InM][I]: Monitoring host fqdn (0)
>
> Fri Jan 29 00:56:57 2010 [InM][D]: Host 0 successfully monitored.
>
> Fri Jan 29 00:56:57 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
>
> Fri Jan 29 00:57:01 2010 [VMM][D]: Message received: POLL SUCCESS 0 STATE=a
>
> Am 28.01.2010 um 19:37 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> Let's try executing the driver by hand. The VMM driver talks with
>
> OpenNebula core using an ASCII protocol. So, if you execute the
>
> driver:
>
> $ONE_LOCATION/lib/mads/one_vmm_vmware
>
> and hit enter, it should wait for input in the standard input, and you
>
> will need to type:
>
> ---8<----
>
> DEPLOY 0 fqdn var/77/images/deployment.0
>
> --->8----
>
> assuming that $ONE_LOCATION/var/77 exists (i.e. a previous attempt to
>
> run a VM with OpenNebula ID 77 has been made, it didn't have to
>
> suceed).
>
> Then answer we are waiting for is a DEPLOY 0 SUCCESS, this is what the
>
> OpenNebula core seems to be not getting.
>
> Your use case is very interesting, we are happy to help. OpenNebula
>
> doesn't feature a web GUIU per-se, but we offer a REST interface using
>
> EC2 or OCCI, over which an AJAX application can be build.
>
> Best regards,
>
> -Tino
>
> --
>
> Constantino Vázquez, Grid & Virtualization Technology
>
> Engineer/Researcher: http://www.dsa-research.org/tinova
>
> DSA Research Group: http://dsa-research.org
>
> Globus GridWay Metascheduler: http://www.GridWay.org
>
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Thu, Jan 28, 2010 at 1:05 AM, Stefan Reichel
>
> <Stefan.Reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hi Tino,
>
> i just wanted to write, that everything is fine, but it isn't. The problem
>
> only occurs sometimes. At the end of this mail you will find some logs.
>
> VM(77) is up and running in the VM-Server, but in the same time in "boot"
>
> state. By the way we currently use VMware-server-2.0.2-203138.i386 .
>
> As you can see in the logs, the "DEPLOY SUCCESS" message is send at least
>
> this believes the OneVmmVmware.java. But it seems that it is never received.
>
> Sometimes it works, but this is not deterministic, in that case it would
>
> also be appear in oned.log.
>
>
> The main goal of our project is to setup a network environment, in which
>
> students can test and investigate several security weaknesses on live
>
> systems. For every such scenario we need different computers which are
>
> simulated via VMs. In effect when a new scenario is loaded, OpenNebula will
>
> be responsable for the VM setup and management.
>
> We will primarily use VMWare images(with VM-Server) but also KVM. To control
>
> our scenarios we will implement a webinterface, which will be used for
>
> management but also monitoring purpose. As far as i know, OpenNebula has
>
> only a command line frontend?
>
> Best Regards
>
> Stefan
>
>
> VM.LOG :
>
> Wed Jan 27 23:57:41 2010 [DiM][I]: New VM state is ACTIVE.
>
> Wed Jan 27 23:57:42 2010 [LCM][I]: New VM state is PROLOG.
>
> Wed Jan 27 23:57:42 2010 [VM][I]: Virtual Machine has no context
>
> Wed Jan 27 23:58:46 2010 [TM][I]: tm_clone.sh:
>
> fqdn:/srv/seclab/images-src/vmware/XP2 fqdn:/srv/seclab/vms/77/images/disk.0
>
> Wed Jan 27 23:58:46 2010 [TM][I]: tm_clone.sh: Cloning
>
> fqdn:/srv/seclab/images-src/vmware/XP2
>
> Wed Jan 27 23:58:46 2010 [LCM][I]: New VM state is BOOT
>
> Wed Jan 27 23:58:46 2010 [VMM][I]: Generating deployment file:
>
> /usr/share/one/var/77/deployment.0
>
> ONED.LOG :
>
> Wed Jan 27 23:57:41 2010 [DiM][D]: Deploying VM 77
>
> Wed Jan 27 23:57:44 2010 [InM][I]: Monitoring host fqdn (0)
>
> Wed Jan 27 23:57:48 2010 [InM][D]: Host 0 successfully monitored.
>
> Wed Jan 27 23:57:52 2010 [VMM][I]: Monitoring VM 76.
>
> Wed Jan 27 23:57:53 2010 [VMM][D]: Message received: POLL SUCCESS 76 STATE=a
>
> USEDMEMORY=25 USEDCPU=0
>
> Wed Jan 27 23:58:46 2010 [TM][D]: Message received: LOG - 77 tm_clone.sh:
>
> fqdn:/srv/seclab/images-src/vmware/XP2 fqdn:/srv/seclab/vms/77/images/disk.0
>
> Wed Jan 27 23:58:46 2010 [TM][D]: Message received: LOG - 77 tm_clone.sh:
>
> Cloning fqdn:/srv/seclab/images-src/vmware/XP2
>
> Wed Jan 27 23:58:46 2010 [TM][D]: Message received: TRANSFER SUCCESS 77 -
>
> Wed Jan 27 23:58:46 2010 [LCM][I]: prolog success:
>
> VMM_VMWARE.LOG (Output added to OneVmmVWare.java)
>
> [27.01.2010 23:41:40] TRY TO POWER ON
>
> [27.01.2010 23:41:45] DEPLOY SUCCESS
>
> [27.01.2010 23:58:50] TRY TO POWER ON
>
> [27.01.2010 23:58:54] DEPLOY SUCCESS
>
> Am 27.01.2010 um 13:04 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> comments inline,
>
> On Wed, Jan 27, 2010 at 10:27 AM, Stefan Reichel
>
> <stefan.reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hi,
>
> i tried to analyze the bug and finally solve this problem. For now these are
>
> my results, please correct my if i am wrong.
>
> First of all,  the VM is running in VMware but in the "onevm list" it  is
>
> still booting. The VirtualMachineManager::deploy_action was finished. These
>
> were the facts now my theory:
>
> Normally the MadManager will receive a message, and forward it to the
>
> corresponding VM-Driver by calling the protocol method. In my case this
>
> would be the VirtualMachineManagerDriver.  Nevertheless its protocol method
>
> is not called and therefore it can't call the LifeCycleManager, which would
>
> in effect set the "running" state, after reacting on a "DEPLOY_SUCCESS".
>
> Therefore i assume, that the corresponding message is never send. But who
>
> should send it???
>
> The VMware VMM mad is responsible to send back the DEPLOY SUCESS, so
>
> it is probably failing to do so. you mentioned in a previous email
>
> that the VM is already running, so I guess the driver is crashing
>
> badly after performing the powerOn (otherwise it will send the "DEPLOY
>
> FAILED " and you would get a "fail" instead of a "boot" in the VM
>
> state). Do you see anything in the one_vmm_vmware log file?
>
>
> I temporary fixed the problem by setting the running state manually in the
>
> mentioned deploy_action. I hope that someone will finally answer one of my
>
> messages. Indeed its my third unanswered? message to this community.
>
> Sorry for the delay in my answers, please take into account that this
>
> is a best effort support mailing list.
>
>
> We try to use opennebula in our current project, but the focus of our
>
> project is not to get software to do what it is used to do. Nevertheless we
>
> are software developers and therefore  we could also fix and extend the
>
> openNebula project if there would be any support from you.
>
> That is great news!! Could you please elaborate a bit on what is that
>
> you intend to do with the software? We are happy to provide best
>
> effort support.
>
> Best regards,
>
> -Tino
>
>
> Kind regards,
>
> Stefan
>
>
>
>
> Am 26.01.2010 um 21:41 schrieb Stefan Reichel:
>
> Hi OpenNebula team,
>
> our developer team tried to use OpenNebula  and now we are able to start vms
>
> in VMWare. But we have a serious problem. Every VM we start stays in the
>
> "boot" state. What is the reason for that, where ca we gather more
>
> information about the problem? We use OpenNebula 1.4 / SVN version on Ubuntu
>
> 9.10 in combination with VMWare Server 2. Any help would be appreciated.
>
> Sincerely
>
> Stefan
>
>
>
>
> _______________________________________________
>
> Users mailing list
>
> Users at lists.opennebula.org
>
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
>
>
>
> --
>
> Constantino Vázquez, Grid & Virtualization Technology
>
> Engineer/Researcher: http://www.dsa-research.org/tinova
>
> DSA Research Group: http://dsa-research.org
>
> Globus GridWay Metascheduler: http://www.GridWay.org
>
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
>
> Postfach 900460, D-14440 Potsdam, Germany
>
> http://www.hpi.uni-potsdam.de
>
> Telefon: 03322/206306  Mobile: 0178/5495023
>
> Email: stefan.reichel at student.hpi.uni-potsdam.de
>
> _____________________________________
>
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
>
> Postfach 900460, D-14440 Potsdam, Germany
>
> http://www.hpi.uni-potsdam.de
>
> Telefon: 03322/206306  Mobile: 0178/5495023
>
> Email: stefan.reichel at student.hpi.uni-potsdam.de
>
> _____________________________________
>
>
>
> _______________________________________________
>
> Users mailing list
>
> Users at lists.opennebula.org
>
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
>
> Postfach 900460, D-14440 Potsdam, Germany
>
> http://www.hpi.uni-potsdam.de
>
> Telefon: 03322/206306  Mobile: 0178/5495023
>
> Email: stefan.reichel at student.hpi.uni-potsdam.de
>
> _____________________________________
>
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
> Postfach 900460, D-14440 Potsdam, Germany
> http://www.hpi.uni-potsdam.de
> Telefon: 03322/206306  Mobile: 0178/5495023
> Email: stefan.reichel at student.hpi.uni-potsdam.de
> _____________________________________
>



More information about the Users mailing list