[one-users] Hadoop on Opennebula

Wed Nov 11 10:02:13 PST 2009

Hi Evert,

We are not Hadoop experts, so we will appreciate if you could redirect the
Hadoop file copying problem to the relevant mailing list.

About the execution of Hadoop in a cloud, OpenNebula can provide your Hadoop
VMs with contextualization information. This means that the VM can have its
hostname and IP set, and this can be accessed by other VMs. So, for
instance, your master node can be given an IP address (or you can have this
IP to be fixed) and then the worker nodes will be able to access this
information (IP and/or hostname of the master), inside an attached ISO. More
information on contextualization in OpenNebula can be found in [1]. I think
you have a very interesting use case, we are happy to provide support for
any contextualization related issue.

Hope it helps,

-Tino

[1] http://opennebula.org/doku.php?id=documentation:rel1.4:cong

--
Constantino Vázquez, Grid Technology Engineer/Researcher:
http://www.dsa-research.org/tinova
DSA Research Group: http://dsa-research.org
Globus GridWay Metascheduler: http://www.GridWay.org
OpenNebula Virtual Infrastructure Engine: http://www.OpenNebula.org

On Wed, Nov 11, 2009 at 11:42 AM, Evert Lammerts <Evert.Lammerts at sara.nl>wrote:

>  Hi list,
>
>
>
> We have a cluster with 44 cores on which we are evaluating OpenNebula. All
> nodes run Ubuntu server 64bit. I’m trying to get Hadoop 0.20.1 to work, so
> I’m following the cluster setup steps on hadoop.apache.org.
>
>
>
> As a start I created three VM’s running Ubuntu 9.10 32bit desktop edition.
> After installing Sun’s 1.6 JRE I put Hadoop into my homedir. I configured
> the three installations of Hadoop as follows:
>
>
>
> == conf/hadoop-env.sh ==
>
> Set JAVA_HOME to the appropriate directory
>
>
>
> == conf/core-site.xml ==
>
> Set fs.default.name to the ip address of the designated namenode, on port
> 9000:
>
> <property>
>
>    <name>fs.default.name</name>
>
>    <value>hdfs://XXX.XXX.X.XXX:9000/</value>
>
>  </property>
>
>
>
> == conf/hdfs-site.xml ==
>
> Set dfs.name.dir to a directory in my homedir:
>
> <property>
>
>   <name>dfs.name.dir</name>
>
>   <value>/home/cloud/var/log/hadoop/</value>
>
> </property>
>
>
>
> == conf/mapred-site.xml ==
>
> Set mapred.job.tracker to the ip address of the designated jobtracker, on
> port 9001:
>
> <property>
>
>   <name>mapred.job.tracker</name>
>
>   <value>XXX.XXX.X.XXX:9001</value>
>
> <property>
>
>
>
> Apart from Hadoop configuration I manually set the hostname for the
> namenode, jobtracker and slave (datanode & tasktracker) to respectively
> hadoop-namenode, hadoop-jobtracker and hadoop-slave01.
>
>
>
> I’m able to start the hdfs with bin/start-dfs.sh, and mapreduce with
> bin/start-mapred.sh without any exceptions. However, when I now try to copy
> files onto hdfs, I’m getting the following exception:
>
>
>
> $ bin/hadoop fs -put conf input
>
> 09/11/11 05:24:47 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/cloud/input/capacity-scheduler.xml could only be replicated to 0
> nodes, instead of 1
>
>                 at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267)
>
>                 at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>
>                 at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown
> Source)
>
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>                 at java.lang.reflect.Method.invoke(Method.java:597)
>
>                 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>
>                 at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>
>                 at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:396)
>
>                 at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>
>
>                 at org.apache.hadoop.ipc.Client.call(Client.java:739)
>
>                 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>
>                 at $Proxy0.addBlock(Unknown Source)
>
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>                 at java.lang.reflect.Method.invoke(Method.java:597)
>
>                 at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>
>                 at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>
>                 at $Proxy0.addBlock(Unknown Source)
>
>                 at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2904)
>
>                 at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2786)
>
>                 at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>
>                 at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
>
>
> 09/11/11 05:24:47 WARN hdfs.DFSClient: Error Recovery for block null bad
> datanode[0] nodes == null
>
> 09/11/11 05:24:47 WARN hdfs.DFSClient: Could not get block locations.
> Source file "/user/cloud/input/capacity-scheduler.xml" - Aborting...
>
> put: java.io.IOException: File /user/cloud/input/capacity-scheduler.xml
> could only be replicated to 0 nodes, instead of 1
>
>
>
> Can anybody shed light on this? I’m guessing it’s a configuration issue, so
> that’s the direction I’m looking at.
>
>
>
> Another question I have is more generally about getting Hadoop to work on a
> cloud. The issue I foresee is about the ip addresses of the masters and
> slaves. How do I dynamically configure the hadoop instances during start-up
> of the images to end up with a namenode, a jobtracker and a number of
> slaves? I’ll need the ip addresses of all machines and all machines need a
> unique hostname… Does anybody have any experience with this?
>
>
>
> Thanks in advance!
>
>
>
> Evert Lammerts
>
> Adviseur
>
> SARA Computing & Network Services
>
> High Performance Computing & Visualization
>
> eScience Support Group
>
>
>
> Phone: +31 20 888 4101
>
> Email: evert.lammerts at sara.nl
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opennebula.org
> http://lists.opennebula.org/listinfo.cgi/users-opennebula.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opennebula.org/pipermail/users-opennebula.org/attachments/20091111/5e882d31/attachment-0003.htm>