SlideShare a Scribd company logo
Ruby for
Distributed Storage Systems
RubyKaigi 2017: Sep 20, 2017
Satoshi Tagomori (@tagomoris)
Treasure Data, Inc.
Satoshi Tagomori (@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, Woothee, ...
Treasure Data, Inc.
Ruby and Distributed Storage Systems
-45°
Ruby for
Distributed Storage Systems
Ruby for and
Distributed Storage Systems
Ruby and Performance
• Web? or Not?
• Disk & Network I/O
• "I/O spends most of time on servers"... is it real?
• Storages are getting faster and faster

(SSD, NVMe, ...)
• Networks too (10GbE, fast network in Cloud, ...)
Storage Systems
• Disk I/O
• Network I/O
• Serialization / Deserialization (json, msgpack, ...)
• read/write data from/to disk
• parse/generate HTTP request/response
• Indexing (update, search)
• Timer
• Threads + Locks
Distributed Storage Systems
• Data replication
• Checksum
• Asynchronous network I/O
• Quorum
• More Threads + Locks
Replication w/ 3 replicas
• Create 3 replica of data, including local storage
accept request
to write data
write the data
into local storage
(1)
receive responses
to replicate data
(3)
send response
to write data
input input
input
input
input
input input
input
input
input input
input
input
input input
send requests
to replicate data
Replication in Quoram Systems: In Action
• Create 2 replica of data at least (max 3), including local storage
accept request
to write data,
and write it locally
(1)
send response
to write data
input input
input
input
input
? ?
input
input
input
input
input
input
create 2 threads
to send requests
to replicate data
input
input
receive a successful

response to

replicate data
(2)
? ? ?
Discard a thread
for another node
Bigdam
Bigdam
• Brand new data ingestion pipeline

in Treasure Data
• Huge data
• Extraordinary large number of connections /
requests
• Many edge endpoints on the planet
Bigdam:

Edge locations on the earth + the Central location
Bigdam
components
@narittan
@tagomoris @nalsh
@k0kubun
@komamitsu_tw
Bigdam-pool
• OSS (in future... not yet)
• Distributed key-value storage
• for buffer pool in Bigdam
• to build S3 free data ingestion pipeline
Bigdam-pool: Small Buffers
• Small buffers (MBs)
• Write: append support for many small chunks (KBs)
• Read: secondary index to query/read many buffers at once
• Short buffer lifetime: minutes (create - append - read - delete)
• Buffers store ids of chunks (for deduplication)
buffer buffer buffer buffer
chunk
chunk
chunk
chunk
chunk
chunk
chunk
chunk
chunk
chunk
account_id, database, table
Bigdam-pool: Replication
• Replication in a cluster
• without maintaining replica factor
• Clients send requests to all living nodes
Bigdam-pool: Buffer Transferring over Clusters
Edge location Central location
Over Internet
Using HTTPS or HTTP/2
Buffer committed
(size or timeout)
written in Java
Designing Bigdam
• Architecture Design - split a system to 5 microservices
• consistency, availability
• performance (how to scale it out?)
• deployment, cost
• API Design
• Mocking
• Interface Test
• Integration Test
Mocking Bigdam using Ruby
• Mocking
• build mock servers of all components
• implement all public APIs between components
• Find/add missing parameters required
• Prepare to develop components in parallel
• Mocked using Ruby, Sinatra
• public APIs - it's just a Webapp
• fast and easy to do :D
Interface/Integration Tests of Bigdam using Ruby
• Interface tests:
• verify all public APIs are implemented correctly
• Integration tests
• verify the whole pipeline can import data correctly
• Written in Ruby, test-unit
• less code to serialize/deserialize various req/res
• readable test cases
• fast and easy to do :D
And,
Bigdam-pool-ruby
• Port bigdam-pool from Java to Ruby
• Experiment to know Ruby is good enough or
not
Ruby and Distributed Storage Systems
Bigdam-pool-ruby
• Perfectly compatible with Java implementation
• Public API, Private API
• Data formats on local storage, of secondary
index
• Under development
• only supports stand alone mode, for now
Studies: Serialization / Deserialization
• All network API call requires it
• parsing HTTP request
• parsing request content body (json/msgpack)
• building response content body (json/msgpack)
• building HTTP response
• Should be parallelized on CPU cores
Studies: Asynchronous Network I/O
• EventMachine? Cool.io? Celluloid::IO?
• 🤔
• I want to use only async network I/O at a time!

(not disk, not timer)
• Event driven I/O library?
• Thread pools + callback?
• or any idea?
Threading / Timers
• ExecutorService in Java is very useful...
• Fixed / non-fixed thread pools with Queue
• (and some other executor models)
• Runner of Runnable tasks
• "Runnable task" is just like a lambda w/o args
• To be implemented as Gem?
• Queue and SizedQueue look useful for it
Queue#peek

Get the head object w/o removing it from queue
https://github.com/ruby/ruby/pull/1698
Queue#peek

Get the head object w/o removing it from queue
https://github.com/ruby/ruby/pull/1698
MonitorMixin#mon_locked? and #mon_owned?
• Mutex#owned? exists
https://github.com/ruby/ruby/pull/1699
Resource Control
Make sure to release resources: try-with-resources in Java
Resource Control
Make sure to release resources: try-with-resources in Java
Typing?
• Defining APIs
• Rubyists (including me) MAY be using:

[string, integer, boolean, string, ...]
• Rubyists (including me) MAY be using:

{"time": unix_time (but sometimes float)}

• Explicit definition makes nothing bad in designing APIs
• Json schema or something others may help us...
Typing: in logging and others
https://bugs.ruby-lang.org/issues/13913
Process Built-in Application Servers
• Distributed Storage Systems:
• Background worker threads
• Timers
• Communication workers to other nodes
• Various async operation workers
• Public API request handlers
• Private API request handlers (inter-nodes)
• Startup/Shutdown hooks
• It's NOT just web application, but handles HTTP requests
https://github.com/tagomoris/bigdam-pool-ruby
https://github.com/tagomoris/bigdam-pool-ruby
NOT YET
"Why Do You Want to Write Such Code in Ruby?"
"Why Do You Want to Write Such Code in Ruby?"
"Because I WANT TO DO IT!"
"Why Do You Want to Write Such Code in Ruby?"
"Because I WANT TO DO IT!"
"... And we already have Fluentd :P"
Thank you.
@tagomoris

More Related Content

What's hot (20)

PDF
Tale of ISUCON and Its Bench Tools
SATOSHI TAGOMORI
 
PDF
Technologies for Data Analytics Platform
N Masahiro
 
PDF
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng
 
PDF
To Have Own Data Analytics Platform, Or NOT To
SATOSHI TAGOMORI
 
PDF
Presto anatomy
Dongmin Yu
 
PDF
Open Source Software, Distributed Systems, Database as a Cloud Service
SATOSHI TAGOMORI
 
ZIP
Constructing Web APIs with Rack, Sinatra and MongoDB
Oisin Hurley
 
PDF
Perfect Norikra 2nd Season
SATOSHI TAGOMORI
 
PDF
Treasure Data and AWS - Developers.io 2015
N Masahiro
 
PDF
Plazma - Treasure Data’s distributed analytical database -
Treasure Data, Inc.
 
PDF
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
PDF
Presto At Treasure Data
Taro L. Saito
 
PDF
Overview of data analytics service: Treasure Data Service
SATOSHI TAGOMORI
 
PDF
Fluentd at Bay Area Kubernetes Meetup
Sadayuki Furuhashi
 
PDF
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
PPTX
Embulk and Machine Learning infrastructure
Hiroshi Toyama
 
PDF
Spark Streamingによるリアルタイムユーザ属性推定
Yoshiyasu SAEKI
 
PDF
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI
 
PDF
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
confluent
 
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
Tale of ISUCON and Its Bench Tools
SATOSHI TAGOMORI
 
Technologies for Data Analytics Platform
N Masahiro
 
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng
 
To Have Own Data Analytics Platform, Or NOT To
SATOSHI TAGOMORI
 
Presto anatomy
Dongmin Yu
 
Open Source Software, Distributed Systems, Database as a Cloud Service
SATOSHI TAGOMORI
 
Constructing Web APIs with Rack, Sinatra and MongoDB
Oisin Hurley
 
Perfect Norikra 2nd Season
SATOSHI TAGOMORI
 
Treasure Data and AWS - Developers.io 2015
N Masahiro
 
Plazma - Treasure Data’s distributed analytical database -
Treasure Data, Inc.
 
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
Presto At Treasure Data
Taro L. Saito
 
Overview of data analytics service: Treasure Data Service
SATOSHI TAGOMORI
 
Fluentd at Bay Area Kubernetes Meetup
Sadayuki Furuhashi
 
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI
 
Embulk and Machine Learning infrastructure
Hiroshi Toyama
 
Spark Streamingによるリアルタイムユーザ属性推定
Yoshiyasu SAEKI
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI
 
Kafka Summit SF 2017 - Streaming Processing in Python – 10 ways to avoid summ...
confluent
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 

Viewers also liked (16)

PPTX
やさしいGemパッチの作り方
Maki Toshio
 
PDF
RSpec Performance Turning
Go Sueyoshi (a.k.a sue445)
 
PDF
Test::Kantan - Perl and Testing
Tokuhiro Matsuno
 
PDF
Quine・難解プログラミングについて
mametter
 
PDF
How to Begin to Develop Ruby Core
Hiroshi SHIBATA
 
PDF
Cookpad 17 day Tech internship 2017 言語処理系入門 Rubyをコンパイルしよう
Koichi Sasada
 
PDF
Modern Black Mages Fighting in the Real World
SATOSHI TAGOMORI
 
PDF
How To Write Middleware In Ruby
SATOSHI TAGOMORI
 
PDF
Fighting API Compatibility On Fluentd Using "Black Magic"
SATOSHI TAGOMORI
 
PDF
Fluentd Overview, Now and Then
SATOSHI TAGOMORI
 
PDF
20160730 fluentd meetup in matsue slide
cosmo0920
 
PDF
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
PDF
Esoteric, Obfuscated, Artistic Programming in Ruby
mametter
 
ODP
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
pospome
 
PDF
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
都元ダイスケ Miyamoto
 
PDF
Fluentd v0.14 Plugin API Details
SATOSHI TAGOMORI
 
やさしいGemパッチの作り方
Maki Toshio
 
RSpec Performance Turning
Go Sueyoshi (a.k.a sue445)
 
Test::Kantan - Perl and Testing
Tokuhiro Matsuno
 
Quine・難解プログラミングについて
mametter
 
How to Begin to Develop Ruby Core
Hiroshi SHIBATA
 
Cookpad 17 day Tech internship 2017 言語処理系入門 Rubyをコンパイルしよう
Koichi Sasada
 
Modern Black Mages Fighting in the Real World
SATOSHI TAGOMORI
 
How To Write Middleware In Ruby
SATOSHI TAGOMORI
 
Fighting API Compatibility On Fluentd Using "Black Magic"
SATOSHI TAGOMORI
 
Fluentd Overview, Now and Then
SATOSHI TAGOMORI
 
20160730 fluentd meetup in matsue slide
cosmo0920
 
The Patterns of Distributed Logging and Containers
SATOSHI TAGOMORI
 
Esoteric, Obfuscated, Artistic Programming in Ruby
mametter
 
Goのサーバサイド実装におけるレイヤ設計とレイヤ内実装について考える
pospome
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
都元ダイスケ Miyamoto
 
Fluentd v0.14 Plugin API Details
SATOSHI TAGOMORI
 
Ad

Similar to Ruby and Distributed Storage Systems (20)

PDF
Voldemort Nosql
elliando dias
 
KEY
Synchronous Reads Asynchronous Writes RubyConf 2009
pauldix
 
PPS
Beyond the File System: Designing Large-Scale File Storage and Serving
mclee
 
PPS
Web20expo Filesystems
royans
 
PPS
Web20expo Filesystems
royans
 
PPS
Web20expo Filesystems
royans
 
PPS
Web20expo Filesystems
guest18a0f1
 
PPS
Beyond the File System - Designing Large Scale File Storage and Serving
mclee
 
PPS
Filesystems
royans
 
PDF
JRuby on Rails - RoR's Simplicity Meets Java's Class (a case in point)
Darshan Karandikar
 
PDF
Distributed Programming with Ruby 1st Edition Mark Bates
jnewsgustel
 
PDF
TorqueBox at DC:JBUG - November 2011
bobmcwhirter
 
PPT
Farms, Fabrics and Clouds
Steve Loughran
 
PDF
Porting Rails Apps to High Availability Systems
Marcelo Pinheiro
 
PDF
Scaling Ruby with Evented I/O - Ruby underground
Omer Gazit
 
PDF
Scaling capacity while saving cash
Kim Moir
 
PDF
Katello on TorqueBox
lzap
 
PPTX
No callbacks, No Threads - Cooperative web servers in Ruby 1.9
Ilya Grigorik
 
PPT
Ruby - The Hard Bits
Paul Gallagher
 
PDF
Is Ruby on Rails Object Oriented? A Comprehensive Exploration
rorbitssoftware
 
Voldemort Nosql
elliando dias
 
Synchronous Reads Asynchronous Writes RubyConf 2009
pauldix
 
Beyond the File System: Designing Large-Scale File Storage and Serving
mclee
 
Web20expo Filesystems
royans
 
Web20expo Filesystems
royans
 
Web20expo Filesystems
royans
 
Web20expo Filesystems
guest18a0f1
 
Beyond the File System - Designing Large Scale File Storage and Serving
mclee
 
Filesystems
royans
 
JRuby on Rails - RoR's Simplicity Meets Java's Class (a case in point)
Darshan Karandikar
 
Distributed Programming with Ruby 1st Edition Mark Bates
jnewsgustel
 
TorqueBox at DC:JBUG - November 2011
bobmcwhirter
 
Farms, Fabrics and Clouds
Steve Loughran
 
Porting Rails Apps to High Availability Systems
Marcelo Pinheiro
 
Scaling Ruby with Evented I/O - Ruby underground
Omer Gazit
 
Scaling capacity while saving cash
Kim Moir
 
Katello on TorqueBox
lzap
 
No callbacks, No Threads - Cooperative web servers in Ruby 1.9
Ilya Grigorik
 
Ruby - The Hard Bits
Paul Gallagher
 
Is Ruby on Rails Object Oriented? A Comprehensive Exploration
rorbitssoftware
 
Ad

More from SATOSHI TAGOMORI (12)

PDF
Ractor's speed is not light-speed
SATOSHI TAGOMORI
 
PDF
Good Things and Hard Things of SaaS Development/Operations
SATOSHI TAGOMORI
 
PDF
Maccro Strikes Back
SATOSHI TAGOMORI
 
PDF
Invitation to the dark side of Ruby
SATOSHI TAGOMORI
 
PDF
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
SATOSHI TAGOMORI
 
PDF
Make Your Ruby Script Confusing
SATOSHI TAGOMORI
 
PDF
Hijacking Ruby Syntax in Ruby
SATOSHI TAGOMORI
 
PDF
Lock, Concurrency and Throughput of Exclusive Operations
SATOSHI TAGOMORI
 
PDF
Fluentd 101
SATOSHI TAGOMORI
 
PDF
Hive dirty/beautiful hacks in TD
SATOSHI TAGOMORI
 
PDF
Data-Driven Development Era and Its Technologies
SATOSHI TAGOMORI
 
PDF
Engineer as a Leading Role
SATOSHI TAGOMORI
 
Ractor's speed is not light-speed
SATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
SATOSHI TAGOMORI
 
Maccro Strikes Back
SATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
SATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
SATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
SATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
SATOSHI TAGOMORI
 
Fluentd 101
SATOSHI TAGOMORI
 
Hive dirty/beautiful hacks in TD
SATOSHI TAGOMORI
 
Data-Driven Development Era and Its Technologies
SATOSHI TAGOMORI
 
Engineer as a Leading Role
SATOSHI TAGOMORI
 

Recently uploaded (20)

PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 

Ruby and Distributed Storage Systems

  • 1. Ruby for Distributed Storage Systems RubyKaigi 2017: Sep 20, 2017 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.
  • 2. Satoshi Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, Woothee, ... Treasure Data, Inc.
  • 6. Ruby for and Distributed Storage Systems
  • 7. Ruby and Performance • Web? or Not? • Disk & Network I/O • "I/O spends most of time on servers"... is it real? • Storages are getting faster and faster
 (SSD, NVMe, ...) • Networks too (10GbE, fast network in Cloud, ...)
  • 8. Storage Systems • Disk I/O • Network I/O • Serialization / Deserialization (json, msgpack, ...) • read/write data from/to disk • parse/generate HTTP request/response • Indexing (update, search) • Timer • Threads + Locks
  • 9. Distributed Storage Systems • Data replication • Checksum • Asynchronous network I/O • Quorum • More Threads + Locks
  • 10. Replication w/ 3 replicas • Create 3 replica of data, including local storage accept request to write data write the data into local storage (1) receive responses to replicate data (3) send response to write data input input input input input input input input input input input input input input input send requests to replicate data
  • 11. Replication in Quoram Systems: In Action • Create 2 replica of data at least (max 3), including local storage accept request to write data, and write it locally (1) send response to write data input input input input input ? ? input input input input input input create 2 threads to send requests to replicate data input input receive a successful
 response to
 replicate data (2) ? ? ? Discard a thread for another node
  • 13. Bigdam • Brand new data ingestion pipeline
 in Treasure Data • Huge data • Extraordinary large number of connections / requests • Many edge endpoints on the planet
  • 14. Bigdam:
 Edge locations on the earth + the Central location
  • 16. Bigdam-pool • OSS (in future... not yet) • Distributed key-value storage • for buffer pool in Bigdam • to build S3 free data ingestion pipeline
  • 17. Bigdam-pool: Small Buffers • Small buffers (MBs) • Write: append support for many small chunks (KBs) • Read: secondary index to query/read many buffers at once • Short buffer lifetime: minutes (create - append - read - delete) • Buffers store ids of chunks (for deduplication) buffer buffer buffer buffer chunk chunk chunk chunk chunk chunk chunk chunk chunk chunk account_id, database, table
  • 18. Bigdam-pool: Replication • Replication in a cluster • without maintaining replica factor • Clients send requests to all living nodes
  • 19. Bigdam-pool: Buffer Transferring over Clusters Edge location Central location Over Internet Using HTTPS or HTTP/2 Buffer committed (size or timeout)
  • 21. Designing Bigdam • Architecture Design - split a system to 5 microservices • consistency, availability • performance (how to scale it out?) • deployment, cost • API Design • Mocking • Interface Test • Integration Test
  • 22. Mocking Bigdam using Ruby • Mocking • build mock servers of all components • implement all public APIs between components • Find/add missing parameters required • Prepare to develop components in parallel • Mocked using Ruby, Sinatra • public APIs - it's just a Webapp • fast and easy to do :D
  • 23. Interface/Integration Tests of Bigdam using Ruby • Interface tests: • verify all public APIs are implemented correctly • Integration tests • verify the whole pipeline can import data correctly • Written in Ruby, test-unit • less code to serialize/deserialize various req/res • readable test cases • fast and easy to do :D
  • 24. And,
  • 25. Bigdam-pool-ruby • Port bigdam-pool from Java to Ruby • Experiment to know Ruby is good enough or not
  • 27. Bigdam-pool-ruby • Perfectly compatible with Java implementation • Public API, Private API • Data formats on local storage, of secondary index • Under development • only supports stand alone mode, for now
  • 28. Studies: Serialization / Deserialization • All network API call requires it • parsing HTTP request • parsing request content body (json/msgpack) • building response content body (json/msgpack) • building HTTP response • Should be parallelized on CPU cores
  • 29. Studies: Asynchronous Network I/O • EventMachine? Cool.io? Celluloid::IO? • 🤔 • I want to use only async network I/O at a time!
 (not disk, not timer) • Event driven I/O library? • Thread pools + callback? • or any idea?
  • 30. Threading / Timers • ExecutorService in Java is very useful... • Fixed / non-fixed thread pools with Queue • (and some other executor models) • Runner of Runnable tasks • "Runnable task" is just like a lambda w/o args • To be implemented as Gem? • Queue and SizedQueue look useful for it
  • 31. Queue#peek
 Get the head object w/o removing it from queue https://github.com/ruby/ruby/pull/1698
  • 32. Queue#peek
 Get the head object w/o removing it from queue https://github.com/ruby/ruby/pull/1698
  • 33. MonitorMixin#mon_locked? and #mon_owned? • Mutex#owned? exists https://github.com/ruby/ruby/pull/1699
  • 34. Resource Control Make sure to release resources: try-with-resources in Java
  • 35. Resource Control Make sure to release resources: try-with-resources in Java
  • 36. Typing? • Defining APIs • Rubyists (including me) MAY be using:
 [string, integer, boolean, string, ...] • Rubyists (including me) MAY be using:
 {"time": unix_time (but sometimes float)}
 • Explicit definition makes nothing bad in designing APIs • Json schema or something others may help us...
  • 37. Typing: in logging and others https://bugs.ruby-lang.org/issues/13913
  • 38. Process Built-in Application Servers • Distributed Storage Systems: • Background worker threads • Timers • Communication workers to other nodes • Various async operation workers • Public API request handlers • Private API request handlers (inter-nodes) • Startup/Shutdown hooks • It's NOT just web application, but handles HTTP requests
  • 41. "Why Do You Want to Write Such Code in Ruby?"
  • 42. "Why Do You Want to Write Such Code in Ruby?" "Because I WANT TO DO IT!"
  • 43. "Why Do You Want to Write Such Code in Ruby?" "Because I WANT TO DO IT!" "... And we already have Fluentd :P" Thank you. @tagomoris