Rakuten tech conf

Jrubyで実現する
分散並列処理フレームワーク
Hadoop Papyrus
and more...

2010/10/16
楽天テクノロジーカンファレンス2010

日本JRubyユーザ会／ハピルス株式会社
藤川幸一 FUJIKAWA Koichi @fujibee

JRubyユーザ会
・2010年5月に設立
・Jrubyユーザの交流の場として、勉強会などを　
行っている
・第０回　設立準備会
・第１回　Google AppEngine with JRuby
・第２回　JRubyユーザ会 in RubyKaigi2010
・第３回　＜今ココ＞
・参加希望はML(Google Group)へ登録！
　http://groups.google.com/group/jruby-users-jp

Hadoopとは?

大規模データ並列分散処理フレームワーク
Google MapReduceのオープンソースク


ローン

テラバイトレベルのデータ処理に必要

標準的なHDDがRead 50MB/sとして
400TB(Webスケール)のReadだけで2000時間

分散ファイルシステムと分散処理フレームワー
クが必要

Hadoop Papyrus

HadoopジョブをRubyのDSLで実行できる


オープンソースフレームワーク

本来HadoopジョブはJavaで記述する

Javaだと複雑な記述がほんの数行で書ける

IPA未踏本体２００９年上期のサポート

Hudson上でジョブを記述/実行が可能

Step.1
JavaではなくRubyで記述

Step.2
RubyによるDSLでMapReduceを
シンプルに

Map Reduce Job
Description

Log Analysis
DSL

Step.3
Hadoopサーバ構成を容易に利用可能に

package org.apache.hadoop.examples; Java
import java.io.IOException;
import java.util.StringTokenizer;
同様な処理がJavaでは70行必要だが、
import org.apache.hadoop.conf.Configuration ;
HadoopPapyrusだと10行に！
import org.apache.hadoop.fs.Path ;
import org.apache.hadoop.io.IntWritable ;
import org.apache.hadoop.io.Text ;
import org.apache.hadoop.mapreduce.Job ;
import org.apache.hadoop.mapreduce.Mapper ;
public static class IntSumReducer extends
import org.apache.hadoop.mapreduce.Reducer ;
Reducer<Text, IntWritable, Text, IntWritable> {
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat ;
private IntWritable result = new IntWritable();
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat ;
import org.apache.hadoop.util.GenericOptionsParser ;
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
public class WordCountint sum = 0;
{
for (IntWritable val : values) {
sum += val.get();
public static class TokenizerMapper extends
}
Mapper<Object, Text, Text, IntWritable> {
result.set(sum);
Hadoop Papyrus
context.write(key, result);
}
private final static IntWritable one = new IntWritable(1);
dsl 'LogAnalysis‘
}
private Text word = new Text();

public static void main(String[] args) throws Exception {
public void map(Object key, Text value,conf = new Configuration();
Configuration Context context)
from ‘test/in‘
throws IOException, InterruptedException { = new GenericOptionsParser(conf, args)
String[] otherArgs
StringTokenizer itr = new StringTokenizer(value.toString());
.getRemainingArgs();
to ‘test/out’
while (itr.hasMoreTokens()) {(otherArgs.length != 2) {
if
word.set(itr.nextToken()); System.err.println("Usage: wordcount <in> <out>");
context.write(word, one); System.exit(2);
}
}
} pattern /[[([^|]:]+)[^]:]*]]/
Job job = new Job(conf, "word count");
} job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
column_name :link
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); topic "link num", :label => 'n' do
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
count_uniq column[:link]
}
}
end

Hadoop Papyrus 詳細
Javaで書く必要があるMap/Reduce処理内
で、JRubyを利用してRubyスクリプトを呼び出す

Hadoop Papyrus 詳細 (続き)
さらに、処理したい内容（ログ分析など）を記述したDSLを用意して
おき、Map処理、Reduce処理でそれぞれ異なる動きをさせることで1
枚のDSL記述でMapReduce処理を行うことができる。

Hapyrus (ハピルス)
・HapyrusはHadoop処理などの大量並列分散処理
のベストプラクティスを共有・実行するサービス
・Amazon EC2上に構築されHadoopをサービスと
して利用できる
・内部的にJRubyを利用
– HadoopとRuby(RoR利用)の接続として
・2010年10月からハピルス株式会社として開発開
始・鋭意開発中！
・年末にはアルファ版公開予定
ご期待ください！

JRubyでHadoopにアクセス

Hadoop
Hadoop
Hadoop IPC
Client
Client JobTracker
JobTracker
<JRuby>
<JRuby> <Java>
<Java>
Hadoop内のオブジェクトデータに
直接アクセス可能！

ありがとうございました

Twitter ID: @fujibee

Rakuten tech conf

More Related Content

What's hot (19)

Viewers also liked (6)

Similar to Rakuten tech conf (12)

Recently uploaded (12)

Rakuten tech conf