[one-users] VMware VMs stay in boot state

Wed Feb 24 03:44:38 PST 2010

Hi Stefan,

This is interesting. I'm unable to reproduce it, which Linux flavor
are you using?

Cheers,

-Tino

--
Constantino Vázquez, Grid & Virtualization Technology
Engineer/Researcher: http://www.dsa-research.org/tinova
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org

On Tue, Feb 23, 2010 at 1:27 AM, Stefan Reichel
<Stefan.Reichel at student.hpi.uni-potsdam.de> wrote:
> Hi Tino,
> i think it is done, i found the bug!!!
> After debugging Mad it turned out, that it didn't receive the "DEPLOY
> SUCCESS" message. I know that you are "sending" this message via a
> System.out.println in the VMWare driver. Normally a println is terminated by
> a "\n" and therefore it was clear that you are buffering the message until
> you receive this character. The big question was now, whether the MAD didn't
> receive the "DEPLOY SUCCESS" or just the terminating "\n". It was the second
> case.
> To be honest, i don't understand why it isn't received. Every println is
> terminated not necessarily by a "\n" but by the value of the system property
> "line.separator". In Linux this should be the newline character, so
> everything should be fine.
> Nevertheless i encourage you to use my bug fix. Append a "\n" to every send
> message of the "OneVmmVmware" Java class(method send_message). In that way
> your driver will be OS-independent and also work e.g. under Mac OS.
> Thank you for all your help, you are doing a great job ;-)
> Best regards
> Stefan
> Am 22.02.2010 um 15:52 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> Sorry for the delay, we are really busy right now.
>
> The machines in the "boot" state are probably due to some missing
> "DEPLOY SUCCESS" messages that "oned" should receive from the drivers
> and is not getting. This may happen for a variety of reasons, one can
> be that the drivers crashes (do you see anything in the oned.log?).
>
> One possible experiment could be to log all incoming and outgoing
> messages in the MAD, and try to reproduce the problem. Then, for the
> machines in pending state, we can look to the DEPLOY SUCCESS messages
> to see if they got one.
>
> Hope it helps, thanks again for the feedback,
>
> -Tino
>
> --
> Constantino Vázquez, Grid & Virtualization Technology
> Engineer/Researcher: http://www.dsa-research.org/tinova
> DSA Research Group: http://dsa-research.org
> Globus GridWay Metascheduler: http://www.GridWay.org
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Thu, Feb 11, 2010 at 8:06 PM, Stefan Reichel
> <Stefan.Reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hi Tino,
>
> i am glad to here, that you are still investigating our problem, thanks.
>
> What we try is easy to sum up. We make sure that the vm list is empty.
>
> Afterwards we make a "onevm create" with our virtual machine templates. The
>
> machines are therefore "new" for opennebula and get new Ids. Then we can see
>
> that VM-Ware Server starts to power on the VM. Nevertheless "onevm list"
>
> shows often, not always, the wrong old "booting" state.
>
> We also now, that the Java VmWare driver, seems to work correctly in 100% of
>
> the cases. This was verified byt putting additional System.put.println in
>
> the code and redirecting the "disabled" standard output from /dev/null to a
>
> text file.
>
> Could this whole issue be releated to a performance problem? I mean during
>
> the deployment phase the cpu is in heavy use, because our frontend and
>
> VM-Ware server are on the same test machine. Can this confuse ruby or the
>
> oned? I don't know what to test, so i can't provide you further information.
>
> Best regards
>
> Stefan
>
> Am 11.02.2010 um 16:22 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> I think I am a bit lost now, let's see if you can get me back on
>
> track. For what I gather, the VMware drivers are working as expected,
>
> and the VMs seen in boot state are the ones that were previously
>
> registered and not unregistered due to a ONE shutdown before this
>
> could happen.
>
> If this is not correct, could you please elaborate a bit more on this?
>
> Regards,
>
> -Tino
>
> --
>
> Constantino Vázquez, Grid & Virtualization Technology
>
> Engineer/Researcher: http://www.dsa-research.org/tinova
>
> DSA Research Group: http://dsa-research.org
>
> Globus GridWay Metascheduler: http://www.GridWay.org
>
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Sat, Feb 6, 2010 at 2:59 PM, Stefan Reichel
>
> <stefan.reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hello Tino,
>
> are there any new information available about the described bug? Or should i
>
> make some further tests?
>
> Regards and have a nice weekend
>
> Stefan
>
> Am 02.02.2010 um 01:04 schrieb Stefan Reichel:
>
> Hi Tino,
>
> you are right, the output was collected while the machines were still
>
> running. The behavior of the vmm_vmware driver was correct. I also send you
>
> a mail on sunday with additional output, which also underlines that the Java
>
> part of the driver works as expected. Therefore i assume the problem must be
>
> somewhere in the other parts of the driver.
>
> Best regards
>
> Stefan
>
>
> Am 01.02.2010 um 12:55 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> For what I read in the java stack trace, the machine is already
>
> powered on, that is why is failing. This may happen when you kill
>
> OpenNebula (without letting it shutdown the VMs), clear it's DB and
>
> try to submit VMs again, the names will clash (if one-1 is still
>
> running, a new deployment of a one-1 will fail).
>
> If this is not the case, please let me know and we will look at something
>
> else.
>
> Regards,
>
> -Tino
>
> --
>
> Constantino Vázquez, Grid & Virtualization Technology
>
> Engineer/Researcher: http://www.dsa-research.org/tinova
>
> DSA Research Group: http://dsa-research.org
>
> Globus GridWay Metascheduler: http://www.GridWay.org
>
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Fri, Jan 29, 2010 at 2:18 AM, Stefan Reichel
>
> <Stefan.Reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hi Tino,
>
> i tried your command with 5 parameters, i think you missed the checkpoint.
>
> The result is easy to describe: there is nothing. The script itself hangs
>
> and the log files don't contain any failure or success.
>
> Therefore i tried the java class directly by calling:
>
> java -Ddebug=1 OneVmmVmware --username oneadmin --password xxxx --ignorecert
>
> And pasting:
>
> DEPLOY 1 fqdn /usr/share/one/var/1/deployment.0 CP1
>
> The script itself seems to work because of the error i get(at the end of
>
> this document). I also got once another error which was connected to the
>
> network, but think this was caused by network misconfiguration. Nevertheless
>
> i included also that log and the network file. The oned.log is also quiet
>
> useless after the "prolog success" message the monitoring  begins, no deploy
>
> success at all. I also saw once after it a line "failure:     " without any
>
> reason. Perhaps this is connected to the java output below, because there is
>
> also no reason. In that case it would be probably caused by a race
>
> condition, which would also explain why it only happens sometimes.
>
> I hope the output and descriptions give you an indication of how to find the
>
> reason for our problem.
>
> Best regards
>
> Stefan
>
>
>
>
>
>
> Output of java -Ddebug=1 OneVmmVmware --username oneadmin --password xxxx
>
> --ignorecert:
>
> DEPLOY 1 fqdn /usr/share/one/var/1/deployment.0 CP1
>
> DEPLOY FAILURE 1 Failed deploying VM in host fqdn.
>
> [29.01.2010 01:52:17] Failed deploying VM 1 into fqdn.Reason: null
>
> ---- Debug stack trace ----
>
> AxisFault
>
> faultCode: ServerFaultCode
>
> faultSubcode:
>
> faultString: The attempted operation cannot be performed in the current
>
> state (Powered On).
>
> faultActor:
>
> faultNode:
>
> faultDetail:
>
> {urn:vim2}InvalidPowerStateFault:<requestedState>poweredOn</requestedState><existingState>poweredOn</existingState>
>
> The attempted operation cannot be performed in the current state (Powered
>
> On).
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
> at
>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
> at
>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>
> at java.lang.Class.newInstance0(Class.java:372)
>
> at java.lang.Class.newInstance(Class.java:325)
>
> at
>
> org.apache.axis.encoding.ser.BeanDeserializer.<init>(BeanDeserializer.java:104)
>
> at
>
> org.apache.axis.encoding.ser.BeanDeserializer.<init>(BeanDeserializer.java:90)
>
> at
>
> com.vmware.vim.InvalidPowerState.getDeserializer(InvalidPowerState.java:156)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:616)
>
> ......
>
> at org.apache.axis.client.AxisClient.invoke(AxisClient.java:206)
>
> at org.apache.axis.client.Call.invokeEngine(Call.java:2784)
>
> at org.apache.axis.client.Call.invoke(Call.java:2767)
>
> at org.apache.axis.client.Call.invoke(Call.java:2443)
>
> at org.apache.axis.client.Call.invoke(Call.java:2366)
>
> at org.apache.axis.client.Call.invoke(Call.java:1812)
>
> at com.vmware.vim.VimBindingStub.powerOnVM_Task(VimBindingStub.java:24320)
>
> at OperationsOverVM.powerOn(OperationsOverVM.java:82)
>
> at OneVmmVmware.loop(OneVmmVmware.java:204)
>
> at OneVmmVmware.main(OneVmmVmware.java:57)
>
> [29.01.2010 01:52:17] ---------------------------
>
>
>
>
>
> Outputof java -Ddebug=1 OneVmmVmware .... based on network misconfiguration?
>
> DEPLOY FAILURE 0 Failed deploying VM in host fqdn.
>
> [29.01.2010 01:10:43] Failed deploying VM 0 into fqdn.Reason: null
>
> ---- Debug stack trace ----
>
> java.lang.NullPointerException
>
> at DeployVM.configureNetwork(DeployVM.java:268)
>
> at DeployVM.shapeVM(DeployVM.java:220)
>
> at OneVmmVmware.loop(OneVmmVmware.java:168)
>
> at OneVmmVmware.main(OneVmmVmware.java:57)
>
> [29.01.2010 01:10:43] ---------------------------
>
>
>
> Old network config:
>
> NAME   = "VMWareNet"
>
> TYPE   = RANGED
>
> BRIDGE = NAT
>
> NETWORK_ADDRESS = 192.168.189.200
>
> NETWORK_SIZE = 254
>
>
>
> Oned.log (extract)
>
> Fri Jan 29 00:55:28 2010 [TM][D]: Message received: TRANSFER SUCCESS 0 -
>
> Fri Jan 29 00:55:28 2010 [LCM][I]: prolog success:
>
> Fri Jan 29 00:55:39 2010 [VMM][I]: Recovering VMM drivers
>
> Fri Jan 29 00:56:03 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
>
> Fri Jan 29 00:56:06 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
>
> Fri Jan 29 00:56:51 2010 [VMM][I]: Monitoring VM 86.
>
> Fri Jan 29 00:56:54 2010 [InM][I]: Monitoring host fqdn (0)
>
> Fri Jan 29 00:56:57 2010 [InM][D]: Host 0 successfully monitored.
>
> Fri Jan 29 00:56:57 2010 [ReM][D]: VirtualMachinePoolInfo method invoked
>
> Fri Jan 29 00:57:01 2010 [VMM][D]: Message received: POLL SUCCESS 0 STATE=a
>
> Am 28.01.2010 um 19:37 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> Let's try executing the driver by hand. The VMM driver talks with
>
> OpenNebula core using an ASCII protocol. So, if you execute the
>
> driver:
>
> $ONE_LOCATION/lib/mads/one_vmm_vmware
>
> and hit enter, it should wait for input in the standard input, and you
>
> will need to type:
>
> ---8<----
>
> DEPLOY 0 fqdn var/77/images/deployment.0
>
> --->8----
>
> assuming that $ONE_LOCATION/var/77 exists (i.e. a previous attempt to
>
> run a VM with OpenNebula ID 77 has been made, it didn't have to
>
> suceed).
>
> Then answer we are waiting for is a DEPLOY 0 SUCCESS, this is what the
>
> OpenNebula core seems to be not getting.
>
> Your use case is very interesting, we are happy to help. OpenNebula
>
> doesn't feature a web GUIU per-se, but we offer a REST interface using
>
> EC2 or OCCI, over which an AJAX application can be build.
>
> Best regards,
>
> -Tino
>
> --
>
> Constantino Vázquez, Grid & Virtualization Technology
>
> Engineer/Researcher: http://www.dsa-research.org/tinova
>
> DSA Research Group: http://dsa-research.org
>
> Globus GridWay Metascheduler: http://www.GridWay.org
>
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
>
>
> On Thu, Jan 28, 2010 at 1:05 AM, Stefan Reichel
>
> <Stefan.Reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hi Tino,
>
> i just wanted to write, that everything is fine, but it isn't. The problem
>
> only occurs sometimes. At the end of this mail you will find some logs.
>
> VM(77) is up and running in the VM-Server, but in the same time in "boot"
>
> state. By the way we currently use VMware-server-2.0.2-203138.i386 .
>
> As you can see in the logs, the "DEPLOY SUCCESS" message is send at least
>
> this believes the OneVmmVmware.java. But it seems that it is never received.
>
> Sometimes it works, but this is not deterministic, in that case it would
>
> also be appear in oned.log.
>
>
> The main goal of our project is to setup a network environment, in which
>
> students can test and investigate several security weaknesses on live
>
> systems. For every such scenario we need different computers which are
>
> simulated via VMs. In effect when a new scenario is loaded, OpenNebula will
>
> be responsable for the VM setup and management.
>
> We will primarily use VMWare images(with VM-Server) but also KVM. To control
>
> our scenarios we will implement a webinterface, which will be used for
>
> management but also monitoring purpose. As far as i know, OpenNebula has
>
> only a command line frontend?
>
> Best Regards
>
> Stefan
>
>
> VM.LOG :
>
> Wed Jan 27 23:57:41 2010 [DiM][I]: New VM state is ACTIVE.
>
> Wed Jan 27 23:57:42 2010 [LCM][I]: New VM state is PROLOG.
>
> Wed Jan 27 23:57:42 2010 [VM][I]: Virtual Machine has no context
>
> Wed Jan 27 23:58:46 2010 [TM][I]: tm_clone.sh:
>
> fqdn:/srv/seclab/images-src/vmware/XP2 fqdn:/srv/seclab/vms/77/images/disk.0
>
> Wed Jan 27 23:58:46 2010 [TM][I]: tm_clone.sh: Cloning
>
> fqdn:/srv/seclab/images-src/vmware/XP2
>
> Wed Jan 27 23:58:46 2010 [LCM][I]: New VM state is BOOT
>
> Wed Jan 27 23:58:46 2010 [VMM][I]: Generating deployment file:
>
> /usr/share/one/var/77/deployment.0
>
> ONED.LOG :
>
> Wed Jan 27 23:57:41 2010 [DiM][D]: Deploying VM 77
>
> Wed Jan 27 23:57:44 2010 [InM][I]: Monitoring host fqdn (0)
>
> Wed Jan 27 23:57:48 2010 [InM][D]: Host 0 successfully monitored.
>
> Wed Jan 27 23:57:52 2010 [VMM][I]: Monitoring VM 76.
>
> Wed Jan 27 23:57:53 2010 [VMM][D]: Message received: POLL SUCCESS 76 STATE=a
>
> USEDMEMORY=25 USEDCPU=0
>
> Wed Jan 27 23:58:46 2010 [TM][D]: Message received: LOG - 77 tm_clone.sh:
>
> fqdn:/srv/seclab/images-src/vmware/XP2 fqdn:/srv/seclab/vms/77/images/disk.0
>
> Wed Jan 27 23:58:46 2010 [TM][D]: Message received: LOG - 77 tm_clone.sh:
>
> Cloning fqdn:/srv/seclab/images-src/vmware/XP2
>
> Wed Jan 27 23:58:46 2010 [TM][D]: Message received: TRANSFER SUCCESS 77 -
>
> Wed Jan 27 23:58:46 2010 [LCM][I]: prolog success:
>
> VMM_VMWARE.LOG (Output added to OneVmmVWare.java)
>
> [27.01.2010 23:41:40] TRY TO POWER ON
>
> [27.01.2010 23:41:45] DEPLOY SUCCESS
>
> [27.01.2010 23:58:50] TRY TO POWER ON
>
> [27.01.2010 23:58:54] DEPLOY SUCCESS
>
> Am 27.01.2010 um 13:04 schrieb Tino Vazquez:
>
> Hi Stefan,
>
> comments inline,
>
> On Wed, Jan 27, 2010 at 10:27 AM, Stefan Reichel
>
> <stefan.reichel at student.hpi.uni-potsdam.de> wrote:
>
> Hi,
>
> i tried to analyze the bug and finally solve this problem. For now these are
>
> my results, please correct my if i am wrong.
>
> First of all,  the VM is running in VMware but in the "onevm list" it  is
>
> still booting. The VirtualMachineManager::deploy_action was finished. These
>
> were the facts now my theory:
>
> Normally the MadManager will receive a message, and forward it to the
>
> corresponding VM-Driver by calling the protocol method. In my case this
>
> would be the VirtualMachineManagerDriver.  Nevertheless its protocol method
>
> is not called and therefore it can't call the LifeCycleManager, which would
>
> in effect set the "running" state, after reacting on a "DEPLOY_SUCCESS".
>
> Therefore i assume, that the corresponding message is never send. But who
>
> should send it???
>
> The VMware VMM mad is responsible to send back the DEPLOY SUCESS, so
>
> it is probably failing to do so. you mentioned in a previous email
>
> that the VM is already running, so I guess the driver is crashing
>
> badly after performing the powerOn (otherwise it will send the "DEPLOY
>
> FAILED " and you would get a "fail" instead of a "boot" in the VM
>
> state). Do you see anything in the one_vmm_vmware log file?
>
>
> I temporary fixed the problem by setting the running state manually in the
>
> mentioned deploy_action. I hope that someone will finally answer one of my
>
> messages. Indeed its my third unanswered? message to this community.
>
> Sorry for the delay in my answers, please take into account that this
>
> is a best effort support mailing list.
>
>
> We try to use opennebula in our current project, but the focus of our
>
> project is not to get software to do what it is used to do. Nevertheless we
>
> are software developers and therefore  we could also fix and extend the
>
> openNebula project if there would be any support from you.
>
> That is great news!! Could you please elaborate a bit on what is that
>
> you intend to do with the software? We are happy to provide best
>
> effort support.
>
> Best regards,
>
> -Tino
>
>
> Kind regards,
>
> Stefan
>
>
>
>
> Am 26.01.2010 um 21:41 schrieb Stefan Reichel:
>
> Hi OpenNebula team,
>
> our developer team tried to use OpenNebula  and now we are able to start vms
>
> in VMWare. But we have a serious problem. Every VM we start stays in the
>
> "boot" state. What is the reason for that, where ca we gather more
>
> information about the problem? We use OpenNebula 1.4 / SVN version on Ubuntu
>
> 9.10 in combination with VMWare Server 2. Any help would be appreciated.
>
> Sincerely
>
> Stefan
>
>
>
>
> _______________________________________________
>
> Users mailing list
>
> Users at lists.opennebula.org
>
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
>
>
>
> --
>
> Constantino Vázquez, Grid & Virtualization Technology
>
> Engineer/Researcher: http://www.dsa-research.org/tinova
>
> DSA Research Group: http://dsa-research.org
>
> Globus GridWay Metascheduler: http://www.GridWay.org
>
> OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
>
> Postfach 900460, D-14440 Potsdam, Germany
>
> http://www.hpi.uni-potsdam.de
>
> Telefon: 03322/206306  Mobile: 0178/5495023
>
> Email: stefan.reichel at student.hpi.uni-potsdam.de
>
> _____________________________________
>
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
>
> Postfach 900460, D-14440 Potsdam, Germany
>
> http://www.hpi.uni-potsdam.de
>
> Telefon: 03322/206306  Mobile: 0178/5495023
>
> Email: stefan.reichel at student.hpi.uni-potsdam.de
>
> _____________________________________
>
>
>
> _______________________________________________
>
> Users mailing list
>
> Users at lists.opennebula.org
>
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
>
> Postfach 900460, D-14440 Potsdam, Germany
>
> http://www.hpi.uni-potsdam.de
>
> Telefon: 03322/206306  Mobile: 0178/5495023
>
> Email: stefan.reichel at student.hpi.uni-potsdam.de
>
> _____________________________________
>
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
>
> Postfach 900460, D-14440 Potsdam, Germany
>
> http://www.hpi.uni-potsdam.de
>
> Telefon: 03322/206306  Mobile: 0178/5495023
>
> Email: stefan.reichel at student.hpi.uni-potsdam.de
>
> _____________________________________
>
>
> _____________________________________
>
> Stefan Reichel,  M.Sc. Candidate
>
> Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
> Postfach 900460, D-14440 Potsdam, Germany
> http://www.hpi.uni-potsdam.de
> Telefon: 03322/206306  Mobile: 0178/5495023
> Email: stefan.reichel at student.hpi.uni-potsdam.de
> _____________________________________
>