Hadoop can't recognize mahout library
I am trying to run the example at
http://chimpler.wordpress.com/2013/06/24/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages-part-2-distribute-classification-with-hadoop/comment-page-1/#comment-693
but facing issue as it looks like my hadoop isn't recognizing external
libraries, especially mahout which is much needed for the example to run.
This is the error message I am seeing which i am not able to fix:
13/09/07 20:59:07 INFO mapred.JobClient: Task Id :
attempt_201309071836_0006_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
MapReduceClassifier$ClassifierMap.initClassifier(MapReduceClassifier.java:39)
at MapReduceClassifier$ClassifierMap.setup(MapReduceClassifier.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Here are few things I tried so far which didn't help -
Added all the .jars inside my 'mahout-distribution-0.7' folder to
HADOOP_CLASSPATH w.r.t
http://mail-archives.apache.org/mod_mbox/mahout-user/201103.mbox/%3C2658E54B540D284981EA57E6A549EA70A3A977EE30@INBLRK77M1MSX.in002.siemens.net%3E
Ran 'mvn package' (at 'mahout-distribution-0.7' folder) as suggested by
some one on Error while clustering data with kmeans which finished clean
(took about an hour but final result showed 'BUILD SUCCESSFUL')
Looked up How do I build/run this simple Mahout program without getting
exceptions? but i see mathout's math libraries are already present in the
pom.xml under 'mahout-distribution-0.7' folder.
One thing to note is that initially my 'hadoop jar xxx' command threw
errors that it can't find MultiSet (com.google.common.collect.Multiset), i
tweaked the Classifier.java code to use a HashMap instead of MultiSet, so
bypassed that error. But now, looking at the code, I need to make hadoop
recognize the Vector class to be able to successfully run the program.
Can some one please help how do I make my hadoop recognize the mahout
library and fix the above error?
I am using 'Hadoop 0.20.2' and 'mahout-distribution-0.7'.
Thanks in advance
No comments:
Post a Comment