Big Data Drupal with Cloudera, Hadoop, MapReduce, Nutch and Solr
Posted by niccolox on March 6, 2013 at 5:59pm
thanks to the recent work of the Solr Nutch sandbox project I've managed to get Nutch 1.6 jobs to run on a Cloudera CDH3 4 node cluster sending results to Solr 3.6.2 (hosted within Tomcat on Aegir BOA) and then integrated into the Apache Solr 7.1.1 module (not the dev) into search results and Apache Solr Views
I must say, I am pretty excited about Hadoop / Cloudera running Nutch and Solr and integrating with Drupal
for anyone interested in setting up a Cloudera cluster I recommend masterschema (centos) and Gregory Grubbs on YouTube (debian)
I'll post some notes etc ASAP
Read more