Thursday, February 7, 2013

Working with Hadoop on a Client Machine

After getting the VMs configured and running, I started mucking around with actually running my very own piece of code; which was simply just a copy of the WordCount example. My goal was to try and run this external to the name node server.

So how the heck do I do that??

Well, the first thing I did was take a copy of the name node server's installation of Hadoop. That way I was sure I had the same version. Then I modified the contents of the hadoop-env.sh file (in conf) so that it reflected my local machine's settings.

I'm using a Mac Book Air, and I have java installed in the directory /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/


After carefully configuring everything, I verified it installed correctly by typing:
bin/hadoop fs -ls / 
This returned back my file system; which was pretty sweet. I had a minor hiccup where I used /etc/hosts in each machine to hard-code each IP Address but my mac didn't know how to resolve the names. Just updating the /etc/hosts file was good enough. So far I just added the name node to the list.


I tried starting my Java application to count words (a simple start; much like a hello world app) and the results were a lot of exceptions:
bin/hadoop jar Hadoop-WordCount.jar net.victa.hadoop.example.WordCount /experiments /experiments-output
Exception in thread "main" org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=<user>, access=WRITE, inode="mapred":hadoop-user:supergroup:rwxr-xr-x
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
The easiest solution I found is to disable security altogether (see this post). Then you can upload whatever files you want and access whatever you wanted.

I modified the file hdfs-site.xml on the name node server; adding the property dfs.permissions and setting it to false

hdfs-site.xml<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
Obviously this is a bad thing to do in production. Since this is a test and I really wanted to run my little program I wanted to take the path of least resistance :)


No comments:

Post a Comment