SlideShare a Scribd company logo
Distributed
Ruby and Rails
      @ihower
   http://ihower.tw
        2010/1
About Me
•           a.k.a. ihower
    • http://ihower.tw
    • http://twitter.com/ihower
    • http://github.com/ihower
• Ruby on Rails Developer since 2006
• Ruby Taiwan Community
 • http://ruby.tw
Agenda
• Distributed Ruby
• Distributed Message Queues
• Background-processing in Rails
• Message Queues for Rails
• SOA for Rails
• Distributed Filesystem
• Distributed database
1.Distributed Ruby

• DRb
• Rinda
• Starfish
• MapReduce
• MagLev VM
DRb

• Ruby's RMI                 system
             (remote method invocation)


• an object in one Ruby process can invoke
  methods on an object in another Ruby
  process on the same or a different machine
DRb (cont.)

• no defined interface, faster development time
• tightly couple applications, because no
  defined API, but rather method on objects
• unreliable under large-scale, heavy loads
  production environments
server example 1
require 'drb'

class HelloWorldServer

      def say_hello
          'Hello, world!'
      end

end

DRb.start_service("druby://127.0.0.1:61676",
HelloWorldServer.new)
DRb.thread.join
client example 1
require 'drb'

server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

puts server.say_hello
puts server.inspect

# Hello, world!
# <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby://
127.0.0.1:61676">
example 2
# user.rb
class User

  attr_accessor :username

end
server example 2
require 'drb'
require 'user'

class UserServer

  attr_accessor :users

  def find(id)
    self.users[id-1]
  end

end

user_server = UserServer.new
user_server.users = []
5.times do |i|
  user = User.new
  user.username = i + 1
  user_server.users << user
end

DRb.start_service("druby://127.0.0.1:61676", user_server)
DRb.thread.join
client example 2
require 'drb'

user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676")

user = user_server.find(2)

puts user.inspect
puts "Username: #{user.username}"
user.name = "ihower"
puts "Username: #{user.username}"
Err...

# <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo:
tUser006:016@usernameia">
# client2.rb:8: undefined method `username' for
#<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
Why? DRbUndumped
•   Default DRb operation

    • Pass by value
    • Must share code
• With DRbUndumped
 • Pass by reference
 • No need to share code
Example 2 Fixed
# user.rb
class User

  include DRbUndumped

  attr_accessor :username

end

# <DRb::DRbObject:0x1003b84f8 @ref=2149433940,
@uri="druby://127.0.0.1:61676">
# Username: 2
# Username: ihower
Why use DRbUndumped?

 • Big objects
 • Singleton objects
 • Lightweight clients
 • Rapidly changing software
ID conversion
• Converts reference into DRb object on server
 • DRbIdConv (Default)
 • TimerIdConv
 • NamedIdConv
 • GWIdConv
Beware of garbage
         collection
•   referenced objects may be collected on
    server (usually doesn't matter)
•   Building Your own ID Converter if you want
    to control persistent state.
DRb security
require 'drb'

ro = DRbObject.new_with_uri("druby://127.0.0.1:61676")
class << ro
    undef :instance_eval
end

# !!!!!!!! WARNING !!!!!!!!! DO NOT RUN
ro.instance_eval("`rm -rf *`")
$SAFE=1

instance_eval':
Insecure operation - instance_eval (SecurityError)
DRb security (cont.)

• Access Control Lists (ACLs)
 • via IP address array
 • still can run denial-of-service attack
• DRb over SSL
Rinda

• Rinda is a Ruby port of Linda distributed
    computing paradigm.
•   Linda is a model of coordination and communication among several parallel processes
    operating upon objects stored in and retrieved from shared, virtual, associative memory. This
    model is implemented as a "coordination language" in which several primitives operating on
    ordered sequence of typed data objects, "tuples," are added to a sequential language, such
    as C, and a logically global associative memory, called a tuplespace, in which processes
    store and retrieve tuples. (WikiPedia)
Rinda (cont.)

• Rinda consists of:
 • a TupleSpace implementation
 • a RingServer that allows DRb services to
    automatically discover each other.
RingServer

• We hardcoded IP addresses in DRb
  program, it’s tight coupling of applications
  and make fault tolerance difficult.
• RingServer can detect and interact with
  other services on the network without
  knowing IP addresses.
1. Where Service X?

                                                          RingServer
                                                          via broadcast UDP
                                                                address
                2. Service X: 192.168.1.12




  Client
@192.1681.100

                        3. Hi, Service X @ 192.168.1.12



                                                           Service X
                                                           @ 192.168.1.12
                  4. Hi There 192.168.1.100
ring server example
require 'rinda/ring'
require 'rinda/tuplespace'

DRb.start_service
Rinda::RingServer.new(Rinda::TupleSpace.new)
DRb.thread.join
service example
require 'rinda/ring'

class HelloWorldServer
    include DRbUndumped # Need for RingServer

      def say_hello
          'Hello, world!'
      end

end

DRb.start_service
ring_server = Rinda::RingFinger.primary
ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new,
                   'I like to say hi!'], Rinda::SimpleRenewer.new)

DRb.thread.join
client example
require 'rinda/ring'

DRb.start_service
ring_server = Rinda::RingFinger.primary

service = ring_server.read([:hello_world_service, nil,nil,nil])
server = service[2]

puts server.say_hello
puts service.inspect

# Hello, world!
# [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650
@uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like
to say hi!"]
TupleSpaces

• Shared object space
• Atomic access
• Just like bulletin board
• Tuple template is
  [:name, :Class, object, ‘description’ ]
5 Basic Operations

• write
• read
• take (Atomic Read+Delete)
• read_all
• notify (Callback for write/take/delete)
Starfish

• Starfish is a utility to make distributed
  programming ridiculously easy
• It runs both the server and the client in
  infinite loops
• MapReduce with ActiveRecode or Files
starfish foo.rb
# foo.rb

class Foo
  attr_reader :i

  def initialize
    @i = 0
  end

  def inc
    logger.info "YAY it incremented by 1 up to #{@i}"
    @i += 1
  end
end

server :log => "foo.log" do |object|
  object = Foo.new
end

client do |object|
  object.inc
end
starfish server example
   ARGV.unshift('server.rb')

   require 'rubygems'
   require 'starfish'

   class HelloWorld
     def say_hi
       'Hi There'
     end
   end

   Starfish.server = lambda do |object|
       object = HelloWorld.new
   end

   Starfish.new('hello_world').server
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda do |object|
     puts object.say_hi
     exit(0) # exit program immediately
   end

   Starfish.new('hello_world').client
starfish client example                 (another way)


       ARGV.unshift('server.rb')

       require 'rubygems'
       require 'starfish'

       catch(:halt) do
         Starfish.client = lambda do
       |object|
           puts object.say_hi
           throw :halt
         end

         Starfish.new
       ('hello_world').client

       end

       puts "bye bye"
MapReduce

• introduced by Google to support
  distributed computing on large data sets on
  clusters of computers.
• inspired by map and reduce functions
  commonly used in functional programming.
starfish server example
ARGV.unshift('server.rb')

require 'rubygems'
require 'starfish'

Starfish.server = lambda{ |map_reduce|
  map_reduce.type = File
  map_reduce.input = "/var/log/apache2/access.log"
  map_reduce.queue_size = 10
  map_reduce.lines_per_client = 5
  map_reduce.rescan_when_complete = false
}

Starfish.new('log_server').server
starfish client example
   ARGV.unshift('client.rb')

   require 'rubygems'
   require 'starfish'

   Starfish.client = lambda { |logs|
     logs.each do |log|
       puts "Processing #{log}"
       sleep(1)
     end
   }

   Starfish.new("log_server").client
Other implementations
• Skynet
 • Use TupleSpace or MySQL as message queue
 • Include an extension for ActiveRecord
 • http://skynet.rubyforge.org/
• MRToolkit based on Hadoop
 • http://code.google.com/p/mrtoolkit/
MagLev VM

• a fast, stable, Ruby implementation with
  integrated object persistence and
  distributed shared cache.
• http://maglev.gemstone.com/
• public Alpha currently
2.Distributed Message
       Queues

• Starling
• AMQP/RabbitMQ
• Stomp/ActiveMQ
• beanstalkd
what’s message queue?
          Message X
 Client                Queue



                      Check and processing




          Processor
Why not DRb?

• DRb has security risk and poorly designed APIs
• distributed message queue is a great way to do
  distributed programming: reliable and scalable.
Starling
• a light-weight persistent queue server that
  speaks the Memcache protocol (mimics its
  API)
• Fast, effective, quick setup and ease of use
• Powered by EventMachine
  http://eventmachine.rubyforge.org/EventMachine.html



• Twitter’s open source project, they use it
  before 2009. (now switch to Kestrel, a port of Starling from Ruby
  to Scala)
Starling command

• sudo gem install starling-starling
 • http://github.com/starling/starling
• sudo starling -h 192.168.1.100
• sudo starling_top -h 192.168.1.100
Starling set example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.1.4:22122')

100.times do |i|
  starling.set('my_queue', i)
end

                     append to the queue, not
                     overwrite in Memcached
Starling get example
require 'rubygems'
require 'starling'

starling = Starling.new('192.168.2.4:22122')

loop do
  puts starling.get("my_queue")
end
get method
• FIFO
• After get, the object is no longer in the
  queue. You will lost message if processing
  error happened.
• The get method blocks until something is
  returned. It’s infinite loop.
Handle processing
 error exception
 require 'rubygems'
 require 'starling'

 starling = Starling.new('192.168.2.4:22122')
 results = starling.get("my_queue")

 begin
     puts results.flatten
 rescue NoMethodError => e
     puts e.message
     Starling.set("my_queue", [results])
 rescue Exception => e
     Starling.set("my_queue", results)
     raise e
 end
Starling cons

• Poll queue constantly
• RabbitMQ can subscribe to a queue that
  notify you when a message is available for
  processing.
AMQP/RabbitMQ
• a complete and highly reliable enterprise
  messaging system based on the emerging
  AMQP standard.
  • Erlang
• http://github.com/tmm1/amqp
 • Powered by EventMachine
Stomp/ActiveMQ

• Apache ActiveMQ is the most popular and
  powerful open source messaging and
  Integration Patterns provider.
• sudo gem install stomp
• ActiveMessaging plugin for Rails
beanstalkd
• Beanstalk is a simple, fast workqueue
  service. Its interface is generic, but was
  originally designed for reducing the latency
  of page views in high-volume web
  applications by running time-consuming tasks
  asynchronously.
• http://kr.github.com/beanstalkd/
• http://beanstalk.rubyforge.org/
• Facebook’s open source project
Why we need asynchronous/
 background-processing in Rails?

• cron-like processing
  text search index update etc)
                                      (compute daily statistics data, create reports, Full-



• long-running tasks             (sending mail, resizing photo’s, encoding videos,
  generate PDF, image upload to S3, posting something to twitter etc)


 • Server traffic jam: expensive request will block
     server resources(i.e. your Rails app)
  • Bad user experience: they maybe try to reload
     and reload again! (responsive matters)
3.Background-
   processing for Rails
• script/runner
• rake
• cron
• daemon
• run_later plugin
• spawn plugin
script/runner


• In Your Rails App root:
• script/runner “Worker.process”
rake

• In RAILS_ROOT/lib/tasks/dev.rake
• rake dev:process
  namespace :dev do
    task :process do
          #...
    end
  end
cron

• Cron is a time-based job scheduler in Unix-
  like computer operating systems.
• crontab -e
Whenever
          http://github.com/javan/whenever

•   A Ruby DSL for Defining Cron Jobs

• http://asciicasts.com/episodes/164-cron-in-ruby
• or http://cronedit.rubyforge.org/
          every 3.hours do
            runner "MyModel.some_process"
            rake "my:rake:task"
            command "/usr/bin/my_great_command"
          end
Daemon

• http://daemons.rubyforge.org/
• http://github.com/dougal/daemon_generator/
rufus-scheduler
   http://github.com/jmettraux/rufus-scheduler


• scheduling pieces of code (jobs)
• Not replacement for cron/at since it runs
  inside of Ruby.
           require 'rubygems'
           require 'rufus/scheduler'

           scheduler =
           Rufus::Scheduler.start_new

           scheduler.every '5s' do
               puts 'check blood pressure'
           end

           scheduler.join
Daemon Kit
   http://github.com/kennethkalmer/daemon-kit



• Creating Ruby daemons by providing a
  sound application skeleton (through a
  generator), task specific generators (jabber
  bot, etc) and robust environment
  management code.
Monitor your daemon

• http://mmonit.com/monit/
• http://github.com/arya/bluepill
• http://god.rubyforge.org/
daemon_controller
http://github.com/FooBarWidget/daemon_controller




• A library for robust daemon management
• Make daemon-dependent applications Just
  Work without having to start the daemons
  manually.
off-load task via system
       command
# mailings_controller.rb
def deliver
  call_rake :send_mailing, :mailing_id => params[:id].to_i
  flash[:notice] = "Delivering mailing"
  redirect_to mailings_url
end

# controllers/application.rb
def call_rake(task, options = {})
  options[:rails_env] ||= Rails.env
  args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" }
  system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &"
end

# lib/tasks/mailer.rake
desc "Send mailing"
task :send_mailing => :environment do
  mailing = Mailing.find(ENV["MAILING_ID"])
  mailing.deliver
end

# models/mailing.rb
def deliver
  sleep 10 # placeholder for sending email
  update_attribute(:delivered_at, Time.now)
end
Simple Thread

after_filter do
    Thread.new do
        AccountMailer.deliver_signup(@user)
    end
end
run_later plugin
      http://github.com/mattmatt/run_later


• Borrowed from Merb
• Uses worker thread and a queue
• Simple solution for simple tasks
  run_later do
      AccountMailer.deliver_signup(@user)
  end
spawn plugin
http://github.com/tra/spawn


  spawn do
    logger.info("I feel sleepy...")
    sleep 11
    logger.info("Time to wake up!")
  end
spawn (cont.)
• By default, spawn will use the fork to spawn
  child processes.You can configure it to do
  threading.
• Works by creating new database
  connections in ActiveRecord::Base for the
  spawned block.
• Fock need copy Rails every time
threading vs. forking
•   Forking advantages:
    •   more reliable? - the ActiveRecord code is not thread-safe.
    •   keep running - subprocess can live longer than its parent.
    •   easier - just works with Rails default settings. Threading
        requires you set allow_concurrency=true and. Also,
        beware of automatic reloading of classes in development
        mode (config.cache_classes = false).
•   Threading advantages:
    •   less filling - threads take less resources... how much less?
        it depends.
    •   debugging - you can set breakpoints in your threads
Okay, we need
    reliable messaging system:
•   Persistent
•   Scheduling: not necessarily all at the same time
•   Scalability: just throw in more instances of your
    program to speed up processing
•   Loosely coupled components that merely ‘talk’
    to each other
•   Ability to easily replace Ruby with something
    else for specific tasks
•   Easy to debug and monitor
4.Message Queues
     (for Rails only)
• ar_mailer
• BackgroundDRb
• workling
• delayed_job
• resque
Rails only?

• Easy to use/write code
• Jobs are Ruby classes or objects
• But need to load Rails environment
ar_mailer
       http://seattlerb.rubyforge.org/ar_mailer/



• a two-phase delivery agent for ActionMailer.
 • Store messages into the database
 • Delivery by a separate process, ar_sendmail
    later.
BackgroundDRb
            http://backgroundrb.rubyforge.org/

• BackgrounDRb is a Ruby job server and
  scheduler.
• Have scalability problem due to
  Mark Bates)
                                         (~20 servers for



• Hard to know if processing error
• Use database to persist tasks
• Use memcached to know processing result
workling
     http://github.com/purzelrakete/workling




• Gives your Rails App a simple API that you
  can use to make code run in the
  background, outside of the your request.
• Supports Starling(default), BackgroundJob,
  Spawn and AMQP/RabbitMQ Runners.
Workling/Starling
         setup
• script/plugin install git://github.com/purzelrae/
  workling.git
• sudo starling -p 15151
• RAILS_ENV=production script/
  workling_client start
Workling example
 class EmailWorker < Workling::Base
   def deliver(options)
     user = User.find(options[:id])
     user.deliver_activation_email
   end
 end


 # in your controller
 def create
     EmailWorker.asynch_deliver( :id => 1)
 end
delayed_job
• Database backed asynchronous priority
  queue
• Extracted from Shopify
• you can place any Ruby object on its queue
  as arguments
• Only load the Rails environment only once
delayed_job setup
                (use fork version)




• script/plugin install git://github.com/
  collectiveidea/delayed_job.git
• script/generate delayed_job
• rake db:migrate
delayed_job example
     send_later
def deliver
  mailing = Mailing.find(params[:id])
  mailing.send_later(:deliver)
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
end
delayed_job example
  custom workers
class MailingJob < Struct.new(:mailing_id)

  def perform
    mailing = Mailing.find(mailing_id)
    mailing.deliver
  end

end

# in your controller
def deliver
  Delayed::Job.enqueue(MailingJob.new(params[:id]))
  flash[:notice] = "Mailing is being delivered."
  redirect_to mailings_url
end
delayed_job example
       always asynchronously


   class Device
     def deliver
       # long running method
     end
     handle_asynchronously :deliver
   end

   device = Device.new
   device.deliver
Running jobs

• rake jobs:works
  (Don’t use in production, it will exit if the database has any network connectivity
  problems.)


• RAILS_ENV=production script/delayed_job start
• RAILS_ENV=production script/delayed_job stop
Priority
                  just Integer, default is 0

• you can run multipie workers to handle different
  priority jobs
• RAILS_ENV=production script/delayed_job -min-
  priority 3 start

  Delayed::Job.enqueue(MailingJob.new(params[:id]), 3)

  Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
Scheduled
        no guarantees at precise time, just run_after_at



Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)

Delayed::Job.enqueue(MailingJob.new(params[:id]),
                                    3, 1.month.from_now.beginning_of_month)
Configuring Dealyed
        Job
# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 5 # sleep if empty queue
Delayed::Worker.max_attempts = 25
Delayed::Worker.max_run_time = 4.hours # set to the amount of time
of longest task will take
Automatic retry on failure
 • If a method throws an exception it will be
   caught and the method rerun later.
 • The method will be retried up to 25
   (default) times at increasingly longer
   intervals until it passes.
   • 108 hours at most
     Job.db_time_now + (job.attempts ** 4) + 5
Capistrano Recipes
• Remember to restart delayed_job after
  deployment
• Check out lib/delayed_job/recipes.rb
   after "deploy:stop",    "delayed_job:stop"
   after "deploy:start",   "delayed_job:start"
   after "deploy:restart", "delayed_job:restart"
Resque
             http://github.com/defunkt/resque

•   a Redis-backed library for creating background jobs,
    placing those jobs on multiple queues, and processing
    them later.
•   Github’s open source project
•   you can only place JSONable Ruby objects
•   includes a Sinatra app for monitoring what's going on
•   support multiple queues
•   you expect a lot of failure/chaos
My recommendations:

• General purpose: delayed_job
  (Github highly recommend DelayedJob to anyone whose site is not 50% background work.)



• Time-scheduled: cron + rake
5. SOA for Rails

• What’s SOA
• Why SOA
• Considerations
• The tool set
What’s SOA
           Service oriented architectures



• “monolithic” approach is not enough
• SOA is a way to design complex applications
  by splitting out major components into
  individual services and communicating via
  APIs.
• a service is a vertical slice of functionality:
  database, application code and caching layer
a monolithic web app example
                 request




             Load
            Balancer




            WebApps




            Database
a SOA example
                                     request




                                 Load
       request
                                Balancer



     WebApp                  WebApps
for Administration           for User




       Services A    Services B




        Database     Database
Why SOA? Isolation
• Shared Resources
• Encapsulation
• Scalability
• Interoperability
• Reuse
• Testability
• Reduce Local Complexity
Shared Resources
• Different front-web website use the same
  resource.
• SOA help you avoiding duplication databases
  and code.
• Why not only shared database?
 • code is not DRY                 WebApp
                              for Administration
                                                      WebApps
                                                      for User


 • caching will be problematic
                                               Database
Encapsulation

• you can change underly implementation in
  services without affect other parts of system
 • upgrade library
 • upgrade to Ruby 1.9
• you can provide API versioning
Scalability1: Partitioned
     Data Provides
•   Database is the first bottleneck, a single DB
    server can not scale. SOA help you reduce
    database load
•   Anti-pattern: only split the database              WebApps


    •   model relationship is broken
    •   referential integrity               Database
                                               A
                                                                 Database
                                                                    B


•   Myth: database replication can not help you
    speed and consistency
Scalability 2: Caching

• SOA help you design caching system easier
 • Cache data at the right times and expire
    at the right times
 • Cache logical model, not physical
 • You do not need cache view everywhere
Scalability 3: Efficient
• Different components have different task
  loading, SOA can scale by service.

                               WebApps



              Load
             Balancer                                 Load
                                                     Balancer




    Services A    Services A    Services B   Services B    Services B   Services B
Security

• Different services can be inside different
  firewall
  • You can only open public web and
    services, others are inside firewall.
Interoperability
• HTTP is the common interface, SOA help
  you integrate them:
 • Multiple languages
 • Internal system e.g. Full-text searching engine
 • Legacy database, system
 • External vendors
Reuse

• Reuse across multiple applications
• Reuse for public APIs
• Example: Amazon Web Services (AWS)
Testability

• Isolate problem
• Mocking API calls
 • Reduce the time to run test suite
Reduce Local
         Complexity
• Team modularity along the same module
  splits as your software
• Understandability: The amount of code is
  minimized to a quantity understandable by
  a small team
• Source code control
Considerations

• Partition into Separate Services
• API Design
• Which Protocol
How to partition into
 Separate Services
• Partitioning on Logical Function
• Partitioning on Read/Write Frequencies
• Partitioning by Minimizing Joins
• Partitioning by Iteration Speed
API Design

• Send Everything you need
• Parallel HTTP requests
• Send as Little as Possible
• Use Logical Models
Physical Models &
     Logical Models
• Physical models are mapped to database
  tables through ORM. (It’s 3NF)
• Logical models are mapped to your
  business problem. (External API use it)
• Logical models are mapped to physical
  models by you.
Logical Models
• Not relational or normalized
• Maintainability
  • can change with no change to data store
  • can stay the same while the data store
    changes
• Better fit for REST interfaces
• Better caching
Which Protocol?

• SOAP
• XML-RPC
• REST
RESTful Web services

• Rails way
• REST is about resources
 • URL
 • Verbs: GET/PUT/POST/DELETE
The tool set

• Web framework
• XML Parser
• JSON Parser
• HTTP Client
Web framework

• We do not need controller, view too much
• Rails is a little more, how about Sinatra?
• Rails metal
ActiveResource

• Mapping RESTful resources as models in a
  Rails application.
• But not useful in practice, why?
XML parser

• http://nokogiri.org/
• Nokogiri ( ) is an HTML, XML, SAX, and
  Reader parser. Among Nokogiri’s many
  features is the ability to search documents
  via XPath or CSS3 selectors.
JSON Parser

• http://github.com/brianmario/yajl-ruby/
• An extremely efficient streaming JSON
  parsing and encoding library. Ruby C
  bindings to Yajl
HTTP Client


• http://github.com/pauldix/typhoeus/
• Typhoeus runs HTTP requests in parallel
  while cleanly encapsulating handling logic
Tips

• Define your logical model (i.e. your service
  request result) first.

• model.to_json and model.to_xml is easy to
  use, but not useful in practice.
6.Distributed File System
 •   NFS not scale
     •   we can use rsync to duplicate
 •   MogileFS
     •   http://www.danga.com/mogilefs/
     •   http://seattlerb.rubyforge.org/mogilefs-client/
 •   Amazon S3
 •   HDFS (Hadoop Distributed File System)
 •   GlusterFS
7.Distributed Database

• NoSQL
• CAP theorem
 • Eventually consistent
• HBase/Cassandra/Voldemort
The End
References
•   Books&Articles:
    •    Distributed Programming with Ruby, Mark Bates (Addison Wesley)
    •    Enterprise Rails, Dan Chak (O’Reilly)
    •    Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley)
    •    RESTful Web Services, Richardson&Ruby (O’Reilly)
    •    RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly)
    •    Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers)
    •    Ruby in Practice, McAnally&Arkin (Manning)

    •    Building Scalable Web Sites, Cal Henderson (O’Reilly)
    •    Background Processing in Rails, Erik Andrejko (Rails Magazine)
    •    Background Processing with Delayed_Job, James Harrison (Rails Magazine)
    •    Bulinging Scalable Web Sites, Cal Henderson (O’Reilly)
    •                 Web   点          (                 )
•   Slides:
    •    Background Processing (Rob Mack) Austin on Rails - April 2009
    •    The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH)
    •    Asynchronous Processing (Jonathan Dahl)
    •    Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008
    •    Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008
    •    Physical Models & Logical Models in Rails, dan chak
References
•   Links:
    •   http://segment7.net/projects/ruby/drb/
    •   http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces
    •   http://github.com/blog/542-introducing-resque
    •   http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/
    •   http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/
    •   http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html
    •   http://blog.gslin.org/archives/2009/07/25/2065/
    •   http://www.javaeye.com/topic/524977
    •   http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Todo (maybe next time)
•   AMQP/RabbitMQ example code
    •   How about Nanite?
•   XMPP
•   MagLev VM
•   More MapReduce example code
    •   How about Amazon Elastic MapReduce?
•   Resque example code
•   More SOA example and code
•   MogileFS example code

More Related Content

What's hot (20)

PPT
MYSQL Aggregate Functions
Leroy Blair
 
PDF
MySQL for beginners
Saeid Zebardast
 
PPTX
Sql - Structured Query Language
Wan Hussain Wan Ishak
 
PPTX
SQL Functions
ammarbrohi
 
PPTX
Group By, Having Clause and Order By clause
Deepam Aggarwal
 
DOCX
Oracle architecture
Soumya Das
 
PPT
Java Programming - Inheritance
Oum Saokosal
 
PPTX
Chapter 3 stored procedures
baabtra.com - No. 1 supplier of quality freshers
 
PPT
Oracle PLSQL Step By Step Guide
Srinimf-Slides
 
PDF
Mysql query optimization
Baohua Cai
 
PPTX
SQL - Structured query language introduction
Smriti Jain
 
PPT
Displaying Data from Multiple Tables - Oracle Data Base
Salman Memon
 
PPT
Joins in SQL
Vigneshwaran Sankaran
 
PPT
Writing Basic SQL SELECT Statements
Salman Memon
 
PDF
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
Edureka!
 
PDF
SQL Joins and Query Optimization
Brian Gallagher
 
PPT
Sql Commands_Dr.R.Shalini.ppt
DrRShaliniVISTAS
 
PPT
Mysql
TSUBHASHRI
 
PPT
Sql – Structured Query Language
pandey3045_bit
 
PPT
Oracle Database Trigger
Eryk Budi Pratama
 
MYSQL Aggregate Functions
Leroy Blair
 
MySQL for beginners
Saeid Zebardast
 
Sql - Structured Query Language
Wan Hussain Wan Ishak
 
SQL Functions
ammarbrohi
 
Group By, Having Clause and Order By clause
Deepam Aggarwal
 
Oracle architecture
Soumya Das
 
Java Programming - Inheritance
Oum Saokosal
 
Oracle PLSQL Step By Step Guide
Srinimf-Slides
 
Mysql query optimization
Baohua Cai
 
SQL - Structured query language introduction
Smriti Jain
 
Displaying Data from Multiple Tables - Oracle Data Base
Salman Memon
 
Joins in SQL
Vigneshwaran Sankaran
 
Writing Basic SQL SELECT Statements
Salman Memon
 
MySQL Tutorial For Beginners | Relational Database Management System | MySQL ...
Edureka!
 
SQL Joins and Query Optimization
Brian Gallagher
 
Sql Commands_Dr.R.Shalini.ppt
DrRShaliniVISTAS
 
Mysql
TSUBHASHRI
 
Sql – Structured Query Language
pandey3045_bit
 
Oracle Database Trigger
Eryk Budi Pratama
 

Viewers also liked (20)

PDF
Getting Distributed (With Ruby On Rails)
martinbtt
 
PDF
ESPM 2009
mawacomw
 
ODP
IntelliSemantc - Second generation semantic technologies for patents
Alberto Ciaramella
 
PDF
Make your app idea a reality with Ruby On Rails
Nataly Tkachuk
 
KEY
ActiveRecord Validations, Season 2
RORLAB
 
PDF
ActiveRecord Query Interface (1), Season 1
RORLAB
 
PDF
Service-Oriented Design and Implement with Rails3
Wen-Tien Chang
 
PPTX
Rails Engine Patterns
Andy Maleh
 
DOC
Proyecto festival
festimango
 
PPT
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
Paul Gallagher
 
PDF
6 reasons Jubilee could be a Rubyist's new best friend
Forrest Chang
 
PDF
Rails Performance
Wen-Tien Chang
 
ODP
Performance Optimization of Rails Applications
Serge Smetana
 
PPTX
Neev Expertise in Ruby on Rails (RoR)
Neev Technologies
 
PDF
Introduction to Ruby on Rails
Agnieszka Figiel
 
ODP
Ruby on Rails
Aizat Faiz
 
PDF
IoT×Emotion
Takaaki Shimoji
 
PDF
Ruby on Rails Presentation
Michael MacDonald
 
PDF
Ruby Beyond Rails
Gaveen Prabhasara
 
PDF
Ruby on Rails versus Django - A newbie Web Developer's Perspective -Shreyank...
ThoughtWorks
 
Getting Distributed (With Ruby On Rails)
martinbtt
 
ESPM 2009
mawacomw
 
IntelliSemantc - Second generation semantic technologies for patents
Alberto Ciaramella
 
Make your app idea a reality with Ruby On Rails
Nataly Tkachuk
 
ActiveRecord Validations, Season 2
RORLAB
 
ActiveRecord Query Interface (1), Season 1
RORLAB
 
Service-Oriented Design and Implement with Rails3
Wen-Tien Chang
 
Rails Engine Patterns
Andy Maleh
 
Proyecto festival
festimango
 
ActiveWarehouse/ETL - BI & DW for Ruby/Rails
Paul Gallagher
 
6 reasons Jubilee could be a Rubyist's new best friend
Forrest Chang
 
Rails Performance
Wen-Tien Chang
 
Performance Optimization of Rails Applications
Serge Smetana
 
Neev Expertise in Ruby on Rails (RoR)
Neev Technologies
 
Introduction to Ruby on Rails
Agnieszka Figiel
 
Ruby on Rails
Aizat Faiz
 
IoT×Emotion
Takaaki Shimoji
 
Ruby on Rails Presentation
Michael MacDonald
 
Ruby Beyond Rails
Gaveen Prabhasara
 
Ruby on Rails versus Django - A newbie Web Developer's Perspective -Shreyank...
ThoughtWorks
 
Ad

Similar to Distributed Ruby and Rails (20)

KEY
DRb and Rinda
Mark
 
PDF
Distributed Programming with Ruby 1st Edition Mark Bates
jnewsgustel
 
KEY
Redis, Resque & Friends
Christopher Spring
 
PDF
D Rb Silicon Valley Ruby Conference
nextlib
 
PDF
Glrb2010 auvi
Auvi Rahma
 
PDF
Combining the Strengths or Erlang and Ruby
Wooga
 
PDF
Combining the strength of erlang and Ruby
Martin Rehfeld
 
PDF
Remote Method Invocation
ashishspace
 
PDF
drb09
mseki
 
PDF
Lindsay distributed geventzmq
Robin Xiao
 
PDF
Corba
vantinhkhuc
 
ODP
DRb at the Ruby Drink-up of Sophia, December 2011
rivierarb
 
PDF
App Engine Meetup
John Woodell
 
PPT
Session18 Madduri
ISSGC Summer School
 
PDF
The Joy Of Ruby
Clinton Dreisbach
 
PDF
Red Dirt Ruby Conference
John Woodell
 
PPTX
10 Networking
Deepak Hagadur Bheemaraju
 
PDF
Cloud Native API Design and Management
AllBits BVBA (freelancer)
 
PDF
l-rubysocks-a4
tutorialsruby
 
PDF
l-rubysocks-a4
tutorialsruby
 
DRb and Rinda
Mark
 
Distributed Programming with Ruby 1st Edition Mark Bates
jnewsgustel
 
Redis, Resque & Friends
Christopher Spring
 
D Rb Silicon Valley Ruby Conference
nextlib
 
Glrb2010 auvi
Auvi Rahma
 
Combining the Strengths or Erlang and Ruby
Wooga
 
Combining the strength of erlang and Ruby
Martin Rehfeld
 
Remote Method Invocation
ashishspace
 
drb09
mseki
 
Lindsay distributed geventzmq
Robin Xiao
 
DRb at the Ruby Drink-up of Sophia, December 2011
rivierarb
 
App Engine Meetup
John Woodell
 
Session18 Madduri
ISSGC Summer School
 
The Joy Of Ruby
Clinton Dreisbach
 
Red Dirt Ruby Conference
John Woodell
 
Cloud Native API Design and Management
AllBits BVBA (freelancer)
 
l-rubysocks-a4
tutorialsruby
 
l-rubysocks-a4
tutorialsruby
 
Ad

More from Wen-Tien Chang (20)

PDF
評估驅動開發 Eval-Driven Development (EDD): 生成式 AI 軟體不確定性的解決方法
Wen-Tien Chang
 
PDF
⼤語⾔模型 LLM 應⽤開發入⾨
Wen-Tien Chang
 
PDF
Ruby Rails 老司機帶飛
Wen-Tien Chang
 
PDF
A brief introduction to Machine Learning
Wen-Tien Chang
 
PDF
淺談 Startup 公司的軟體開發流程 v2
Wen-Tien Chang
 
PDF
RSpec on Rails Tutorial
Wen-Tien Chang
 
PDF
RSpec & TDD Tutorial
Wen-Tien Chang
 
PDF
ALPHAhackathon: How to collaborate
Wen-Tien Chang
 
PDF
Git 版本控制系統 -- 從微觀到宏觀
Wen-Tien Chang
 
PDF
Exception Handling: Designing Robust Software in Ruby (with presentation note)
Wen-Tien Chang
 
PDF
Exception Handling: Designing Robust Software in Ruby
Wen-Tien Chang
 
PDF
從 Classes 到 Objects: 那些 OOP 教我的事
Wen-Tien Chang
 
PDF
Yet another introduction to Git - from the bottom up
Wen-Tien Chang
 
PDF
A brief introduction to Vagrant – 原來 VirtualBox 可以這樣玩
Wen-Tien Chang
 
PDF
Ruby 程式語言綜覽簡介
Wen-Tien Chang
 
PDF
A brief introduction to SPDY - 邁向 HTTP/2.0
Wen-Tien Chang
 
PDF
RubyConf Taiwan 2012 Opening & Closing
Wen-Tien Chang
 
PDF
從 Scrum 到 Kanban: 為什麼 Scrum 不適合 Lean Startup
Wen-Tien Chang
 
PDF
Git Tutorial 教學
Wen-Tien Chang
 
PDF
那些 Functional Programming 教我的事
Wen-Tien Chang
 
評估驅動開發 Eval-Driven Development (EDD): 生成式 AI 軟體不確定性的解決方法
Wen-Tien Chang
 
⼤語⾔模型 LLM 應⽤開發入⾨
Wen-Tien Chang
 
Ruby Rails 老司機帶飛
Wen-Tien Chang
 
A brief introduction to Machine Learning
Wen-Tien Chang
 
淺談 Startup 公司的軟體開發流程 v2
Wen-Tien Chang
 
RSpec on Rails Tutorial
Wen-Tien Chang
 
RSpec & TDD Tutorial
Wen-Tien Chang
 
ALPHAhackathon: How to collaborate
Wen-Tien Chang
 
Git 版本控制系統 -- 從微觀到宏觀
Wen-Tien Chang
 
Exception Handling: Designing Robust Software in Ruby (with presentation note)
Wen-Tien Chang
 
Exception Handling: Designing Robust Software in Ruby
Wen-Tien Chang
 
從 Classes 到 Objects: 那些 OOP 教我的事
Wen-Tien Chang
 
Yet another introduction to Git - from the bottom up
Wen-Tien Chang
 
A brief introduction to Vagrant – 原來 VirtualBox 可以這樣玩
Wen-Tien Chang
 
Ruby 程式語言綜覽簡介
Wen-Tien Chang
 
A brief introduction to SPDY - 邁向 HTTP/2.0
Wen-Tien Chang
 
RubyConf Taiwan 2012 Opening & Closing
Wen-Tien Chang
 
從 Scrum 到 Kanban: 為什麼 Scrum 不適合 Lean Startup
Wen-Tien Chang
 
Git Tutorial 教學
Wen-Tien Chang
 
那些 Functional Programming 教我的事
Wen-Tien Chang
 

Recently uploaded (20)

PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Digital Circuits, important subject in CS
contactparinay1
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 

Distributed Ruby and Rails

  • 1. Distributed Ruby and Rails @ihower http://ihower.tw 2010/1
  • 2. About Me • a.k.a. ihower • http://ihower.tw • http://twitter.com/ihower • http://github.com/ihower • Ruby on Rails Developer since 2006 • Ruby Taiwan Community • http://ruby.tw
  • 3. Agenda • Distributed Ruby • Distributed Message Queues • Background-processing in Rails • Message Queues for Rails • SOA for Rails • Distributed Filesystem • Distributed database
  • 4. 1.Distributed Ruby • DRb • Rinda • Starfish • MapReduce • MagLev VM
  • 5. DRb • Ruby's RMI system (remote method invocation) • an object in one Ruby process can invoke methods on an object in another Ruby process on the same or a different machine
  • 6. DRb (cont.) • no defined interface, faster development time • tightly couple applications, because no defined API, but rather method on objects • unreliable under large-scale, heavy loads production environments
  • 7. server example 1 require 'drb' class HelloWorldServer def say_hello 'Hello, world!' end end DRb.start_service("druby://127.0.0.1:61676", HelloWorldServer.new) DRb.thread.join
  • 8. client example 1 require 'drb' server = DRbObject.new_with_uri("druby://127.0.0.1:61676") puts server.say_hello puts server.inspect # Hello, world! # <DRb::DRbObject:0x1003c04c8 @ref=nil, @uri="druby:// 127.0.0.1:61676">
  • 9. example 2 # user.rb class User attr_accessor :username end
  • 10. server example 2 require 'drb' require 'user' class UserServer attr_accessor :users def find(id) self.users[id-1] end end user_server = UserServer.new user_server.users = [] 5.times do |i| user = User.new user.username = i + 1 user_server.users << user end DRb.start_service("druby://127.0.0.1:61676", user_server) DRb.thread.join
  • 11. client example 2 require 'drb' user_server = DRbObject.new_with_uri("druby://127.0.0.1:61676") user = user_server.find(2) puts user.inspect puts "Username: #{user.username}" user.name = "ihower" puts "Username: #{user.username}"
  • 12. Err... # <DRb::DRbUnknown:0x1003b8318 @name="User", @buf="004bo: tUser006:016@usernameia"> # client2.rb:8: undefined method `username' for #<DRb::DRbUnknown:0x1003b8318> (NoMethodError)
  • 13. Why? DRbUndumped • Default DRb operation • Pass by value • Must share code • With DRbUndumped • Pass by reference • No need to share code
  • 14. Example 2 Fixed # user.rb class User include DRbUndumped attr_accessor :username end # <DRb::DRbObject:0x1003b84f8 @ref=2149433940, @uri="druby://127.0.0.1:61676"> # Username: 2 # Username: ihower
  • 15. Why use DRbUndumped? • Big objects • Singleton objects • Lightweight clients • Rapidly changing software
  • 16. ID conversion • Converts reference into DRb object on server • DRbIdConv (Default) • TimerIdConv • NamedIdConv • GWIdConv
  • 17. Beware of garbage collection • referenced objects may be collected on server (usually doesn't matter) • Building Your own ID Converter if you want to control persistent state.
  • 18. DRb security require 'drb' ro = DRbObject.new_with_uri("druby://127.0.0.1:61676") class << ro undef :instance_eval end # !!!!!!!! WARNING !!!!!!!!! DO NOT RUN ro.instance_eval("`rm -rf *`")
  • 19. $SAFE=1 instance_eval': Insecure operation - instance_eval (SecurityError)
  • 20. DRb security (cont.) • Access Control Lists (ACLs) • via IP address array • still can run denial-of-service attack • DRb over SSL
  • 21. Rinda • Rinda is a Ruby port of Linda distributed computing paradigm. • Linda is a model of coordination and communication among several parallel processes operating upon objects stored in and retrieved from shared, virtual, associative memory. This model is implemented as a "coordination language" in which several primitives operating on ordered sequence of typed data objects, "tuples," are added to a sequential language, such as C, and a logically global associative memory, called a tuplespace, in which processes store and retrieve tuples. (WikiPedia)
  • 22. Rinda (cont.) • Rinda consists of: • a TupleSpace implementation • a RingServer that allows DRb services to automatically discover each other.
  • 23. RingServer • We hardcoded IP addresses in DRb program, it’s tight coupling of applications and make fault tolerance difficult. • RingServer can detect and interact with other services on the network without knowing IP addresses.
  • 24. 1. Where Service X? RingServer via broadcast UDP address 2. Service X: 192.168.1.12 Client @192.1681.100 3. Hi, Service X @ 192.168.1.12 Service X @ 192.168.1.12 4. Hi There 192.168.1.100
  • 25. ring server example require 'rinda/ring' require 'rinda/tuplespace' DRb.start_service Rinda::RingServer.new(Rinda::TupleSpace.new) DRb.thread.join
  • 26. service example require 'rinda/ring' class HelloWorldServer include DRbUndumped # Need for RingServer def say_hello 'Hello, world!' end end DRb.start_service ring_server = Rinda::RingFinger.primary ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new, 'I like to say hi!'], Rinda::SimpleRenewer.new) DRb.thread.join
  • 27. client example require 'rinda/ring' DRb.start_service ring_server = Rinda::RingFinger.primary service = ring_server.read([:hello_world_service, nil,nil,nil]) server = service[2] puts server.say_hello puts service.inspect # Hello, world! # [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650 @uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like to say hi!"]
  • 28. TupleSpaces • Shared object space • Atomic access • Just like bulletin board • Tuple template is [:name, :Class, object, ‘description’ ]
  • 29. 5 Basic Operations • write • read • take (Atomic Read+Delete) • read_all • notify (Callback for write/take/delete)
  • 30. Starfish • Starfish is a utility to make distributed programming ridiculously easy • It runs both the server and the client in infinite loops • MapReduce with ActiveRecode or Files
  • 31. starfish foo.rb # foo.rb class Foo attr_reader :i def initialize @i = 0 end def inc logger.info "YAY it incremented by 1 up to #{@i}" @i += 1 end end server :log => "foo.log" do |object| object = Foo.new end client do |object| object.inc end
  • 32. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' class HelloWorld def say_hi 'Hi There' end end Starfish.server = lambda do |object| object = HelloWorld.new end Starfish.new('hello_world').server
  • 33. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda do |object| puts object.say_hi exit(0) # exit program immediately end Starfish.new('hello_world').client
  • 34. starfish client example (another way) ARGV.unshift('server.rb') require 'rubygems' require 'starfish' catch(:halt) do Starfish.client = lambda do |object| puts object.say_hi throw :halt end Starfish.new ('hello_world').client end puts "bye bye"
  • 35. MapReduce • introduced by Google to support distributed computing on large data sets on clusters of computers. • inspired by map and reduce functions commonly used in functional programming.
  • 36. starfish server example ARGV.unshift('server.rb') require 'rubygems' require 'starfish' Starfish.server = lambda{ |map_reduce| map_reduce.type = File map_reduce.input = "/var/log/apache2/access.log" map_reduce.queue_size = 10 map_reduce.lines_per_client = 5 map_reduce.rescan_when_complete = false } Starfish.new('log_server').server
  • 37. starfish client example ARGV.unshift('client.rb') require 'rubygems' require 'starfish' Starfish.client = lambda { |logs| logs.each do |log| puts "Processing #{log}" sleep(1) end } Starfish.new("log_server").client
  • 38. Other implementations • Skynet • Use TupleSpace or MySQL as message queue • Include an extension for ActiveRecord • http://skynet.rubyforge.org/ • MRToolkit based on Hadoop • http://code.google.com/p/mrtoolkit/
  • 39. MagLev VM • a fast, stable, Ruby implementation with integrated object persistence and distributed shared cache. • http://maglev.gemstone.com/ • public Alpha currently
  • 40. 2.Distributed Message Queues • Starling • AMQP/RabbitMQ • Stomp/ActiveMQ • beanstalkd
  • 41. what’s message queue? Message X Client Queue Check and processing Processor
  • 42. Why not DRb? • DRb has security risk and poorly designed APIs • distributed message queue is a great way to do distributed programming: reliable and scalable.
  • 43. Starling • a light-weight persistent queue server that speaks the Memcache protocol (mimics its API) • Fast, effective, quick setup and ease of use • Powered by EventMachine http://eventmachine.rubyforge.org/EventMachine.html • Twitter’s open source project, they use it before 2009. (now switch to Kestrel, a port of Starling from Ruby to Scala)
  • 44. Starling command • sudo gem install starling-starling • http://github.com/starling/starling • sudo starling -h 192.168.1.100 • sudo starling_top -h 192.168.1.100
  • 45. Starling set example require 'rubygems' require 'starling' starling = Starling.new('192.168.1.4:22122') 100.times do |i| starling.set('my_queue', i) end append to the queue, not overwrite in Memcached
  • 46. Starling get example require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') loop do puts starling.get("my_queue") end
  • 47. get method • FIFO • After get, the object is no longer in the queue. You will lost message if processing error happened. • The get method blocks until something is returned. It’s infinite loop.
  • 48. Handle processing error exception require 'rubygems' require 'starling' starling = Starling.new('192.168.2.4:22122') results = starling.get("my_queue") begin puts results.flatten rescue NoMethodError => e puts e.message Starling.set("my_queue", [results]) rescue Exception => e Starling.set("my_queue", results) raise e end
  • 49. Starling cons • Poll queue constantly • RabbitMQ can subscribe to a queue that notify you when a message is available for processing.
  • 50. AMQP/RabbitMQ • a complete and highly reliable enterprise messaging system based on the emerging AMQP standard. • Erlang • http://github.com/tmm1/amqp • Powered by EventMachine
  • 51. Stomp/ActiveMQ • Apache ActiveMQ is the most popular and powerful open source messaging and Integration Patterns provider. • sudo gem install stomp • ActiveMessaging plugin for Rails
  • 52. beanstalkd • Beanstalk is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously. • http://kr.github.com/beanstalkd/ • http://beanstalk.rubyforge.org/ • Facebook’s open source project
  • 53. Why we need asynchronous/ background-processing in Rails? • cron-like processing text search index update etc) (compute daily statistics data, create reports, Full- • long-running tasks (sending mail, resizing photo’s, encoding videos, generate PDF, image upload to S3, posting something to twitter etc) • Server traffic jam: expensive request will block server resources(i.e. your Rails app) • Bad user experience: they maybe try to reload and reload again! (responsive matters)
  • 54. 3.Background- processing for Rails • script/runner • rake • cron • daemon • run_later plugin • spawn plugin
  • 55. script/runner • In Your Rails App root: • script/runner “Worker.process”
  • 56. rake • In RAILS_ROOT/lib/tasks/dev.rake • rake dev:process namespace :dev do task :process do #... end end
  • 57. cron • Cron is a time-based job scheduler in Unix- like computer operating systems. • crontab -e
  • 58. Whenever http://github.com/javan/whenever • A Ruby DSL for Defining Cron Jobs • http://asciicasts.com/episodes/164-cron-in-ruby • or http://cronedit.rubyforge.org/ every 3.hours do runner "MyModel.some_process" rake "my:rake:task" command "/usr/bin/my_great_command" end
  • 60. rufus-scheduler http://github.com/jmettraux/rufus-scheduler • scheduling pieces of code (jobs) • Not replacement for cron/at since it runs inside of Ruby. require 'rubygems' require 'rufus/scheduler' scheduler = Rufus::Scheduler.start_new scheduler.every '5s' do puts 'check blood pressure' end scheduler.join
  • 61. Daemon Kit http://github.com/kennethkalmer/daemon-kit • Creating Ruby daemons by providing a sound application skeleton (through a generator), task specific generators (jabber bot, etc) and robust environment management code.
  • 62. Monitor your daemon • http://mmonit.com/monit/ • http://github.com/arya/bluepill • http://god.rubyforge.org/
  • 63. daemon_controller http://github.com/FooBarWidget/daemon_controller • A library for robust daemon management • Make daemon-dependent applications Just Work without having to start the daemons manually.
  • 64. off-load task via system command # mailings_controller.rb def deliver call_rake :send_mailing, :mailing_id => params[:id].to_i flash[:notice] = "Delivering mailing" redirect_to mailings_url end # controllers/application.rb def call_rake(task, options = {}) options[:rails_env] ||= Rails.env args = options.map { |n, v| "#{n.to_s.upcase}='#{v}'" } system "/usr/bin/rake #{task} #{args.join(' ')} --trace 2>&1 >> #{Rails.root}/log/rake.log &" end # lib/tasks/mailer.rake desc "Send mailing" task :send_mailing => :environment do mailing = Mailing.find(ENV["MAILING_ID"]) mailing.deliver end # models/mailing.rb def deliver sleep 10 # placeholder for sending email update_attribute(:delivered_at, Time.now) end
  • 65. Simple Thread after_filter do Thread.new do AccountMailer.deliver_signup(@user) end end
  • 66. run_later plugin http://github.com/mattmatt/run_later • Borrowed from Merb • Uses worker thread and a queue • Simple solution for simple tasks run_later do AccountMailer.deliver_signup(@user) end
  • 67. spawn plugin http://github.com/tra/spawn spawn do logger.info("I feel sleepy...") sleep 11 logger.info("Time to wake up!") end
  • 68. spawn (cont.) • By default, spawn will use the fork to spawn child processes.You can configure it to do threading. • Works by creating new database connections in ActiveRecord::Base for the spawned block. • Fock need copy Rails every time
  • 69. threading vs. forking • Forking advantages: • more reliable? - the ActiveRecord code is not thread-safe. • keep running - subprocess can live longer than its parent. • easier - just works with Rails default settings. Threading requires you set allow_concurrency=true and. Also, beware of automatic reloading of classes in development mode (config.cache_classes = false). • Threading advantages: • less filling - threads take less resources... how much less? it depends. • debugging - you can set breakpoints in your threads
  • 70. Okay, we need reliable messaging system: • Persistent • Scheduling: not necessarily all at the same time • Scalability: just throw in more instances of your program to speed up processing • Loosely coupled components that merely ‘talk’ to each other • Ability to easily replace Ruby with something else for specific tasks • Easy to debug and monitor
  • 71. 4.Message Queues (for Rails only) • ar_mailer • BackgroundDRb • workling • delayed_job • resque
  • 72. Rails only? • Easy to use/write code • Jobs are Ruby classes or objects • But need to load Rails environment
  • 73. ar_mailer http://seattlerb.rubyforge.org/ar_mailer/ • a two-phase delivery agent for ActionMailer. • Store messages into the database • Delivery by a separate process, ar_sendmail later.
  • 74. BackgroundDRb http://backgroundrb.rubyforge.org/ • BackgrounDRb is a Ruby job server and scheduler. • Have scalability problem due to Mark Bates) (~20 servers for • Hard to know if processing error • Use database to persist tasks • Use memcached to know processing result
  • 75. workling http://github.com/purzelrakete/workling • Gives your Rails App a simple API that you can use to make code run in the background, outside of the your request. • Supports Starling(default), BackgroundJob, Spawn and AMQP/RabbitMQ Runners.
  • 76. Workling/Starling setup • script/plugin install git://github.com/purzelrae/ workling.git • sudo starling -p 15151 • RAILS_ENV=production script/ workling_client start
  • 77. Workling example class EmailWorker < Workling::Base def deliver(options) user = User.find(options[:id]) user.deliver_activation_email end end # in your controller def create EmailWorker.asynch_deliver( :id => 1) end
  • 78. delayed_job • Database backed asynchronous priority queue • Extracted from Shopify • you can place any Ruby object on its queue as arguments • Only load the Rails environment only once
  • 79. delayed_job setup (use fork version) • script/plugin install git://github.com/ collectiveidea/delayed_job.git • script/generate delayed_job • rake db:migrate
  • 80. delayed_job example send_later def deliver mailing = Mailing.find(params[:id]) mailing.send_later(:deliver) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  • 81. delayed_job example custom workers class MailingJob < Struct.new(:mailing_id) def perform mailing = Mailing.find(mailing_id) mailing.deliver end end # in your controller def deliver Delayed::Job.enqueue(MailingJob.new(params[:id])) flash[:notice] = "Mailing is being delivered." redirect_to mailings_url end
  • 82. delayed_job example always asynchronously class Device def deliver # long running method end handle_asynchronously :deliver end device = Device.new device.deliver
  • 83. Running jobs • rake jobs:works (Don’t use in production, it will exit if the database has any network connectivity problems.) • RAILS_ENV=production script/delayed_job start • RAILS_ENV=production script/delayed_job stop
  • 84. Priority just Integer, default is 0 • you can run multipie workers to handle different priority jobs • RAILS_ENV=production script/delayed_job -min- priority 3 start Delayed::Job.enqueue(MailingJob.new(params[:id]), 3) Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
  • 85. Scheduled no guarantees at precise time, just run_after_at Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now) Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 1.month.from_now.beginning_of_month)
  • 86. Configuring Dealyed Job # config/initializers/delayed_job_config.rb Delayed::Worker.destroy_failed_jobs = false Delayed::Worker.sleep_delay = 5 # sleep if empty queue Delayed::Worker.max_attempts = 25 Delayed::Worker.max_run_time = 4.hours # set to the amount of time of longest task will take
  • 87. Automatic retry on failure • If a method throws an exception it will be caught and the method rerun later. • The method will be retried up to 25 (default) times at increasingly longer intervals until it passes. • 108 hours at most Job.db_time_now + (job.attempts ** 4) + 5
  • 88. Capistrano Recipes • Remember to restart delayed_job after deployment • Check out lib/delayed_job/recipes.rb after "deploy:stop", "delayed_job:stop" after "deploy:start", "delayed_job:start" after "deploy:restart", "delayed_job:restart"
  • 89. Resque http://github.com/defunkt/resque • a Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. • Github’s open source project • you can only place JSONable Ruby objects • includes a Sinatra app for monitoring what's going on • support multiple queues • you expect a lot of failure/chaos
  • 90. My recommendations: • General purpose: delayed_job (Github highly recommend DelayedJob to anyone whose site is not 50% background work.) • Time-scheduled: cron + rake
  • 91. 5. SOA for Rails • What’s SOA • Why SOA • Considerations • The tool set
  • 92. What’s SOA Service oriented architectures • “monolithic” approach is not enough • SOA is a way to design complex applications by splitting out major components into individual services and communicating via APIs. • a service is a vertical slice of functionality: database, application code and caching layer
  • 93. a monolithic web app example request Load Balancer WebApps Database
  • 94. a SOA example request Load request Balancer WebApp WebApps for Administration for User Services A Services B Database Database
  • 95. Why SOA? Isolation • Shared Resources • Encapsulation • Scalability • Interoperability • Reuse • Testability • Reduce Local Complexity
  • 96. Shared Resources • Different front-web website use the same resource. • SOA help you avoiding duplication databases and code. • Why not only shared database? • code is not DRY WebApp for Administration WebApps for User • caching will be problematic Database
  • 97. Encapsulation • you can change underly implementation in services without affect other parts of system • upgrade library • upgrade to Ruby 1.9 • you can provide API versioning
  • 98. Scalability1: Partitioned Data Provides • Database is the first bottleneck, a single DB server can not scale. SOA help you reduce database load • Anti-pattern: only split the database WebApps • model relationship is broken • referential integrity Database A Database B • Myth: database replication can not help you speed and consistency
  • 99. Scalability 2: Caching • SOA help you design caching system easier • Cache data at the right times and expire at the right times • Cache logical model, not physical • You do not need cache view everywhere
  • 100. Scalability 3: Efficient • Different components have different task loading, SOA can scale by service. WebApps Load Balancer Load Balancer Services A Services A Services B Services B Services B Services B
  • 101. Security • Different services can be inside different firewall • You can only open public web and services, others are inside firewall.
  • 102. Interoperability • HTTP is the common interface, SOA help you integrate them: • Multiple languages • Internal system e.g. Full-text searching engine • Legacy database, system • External vendors
  • 103. Reuse • Reuse across multiple applications • Reuse for public APIs • Example: Amazon Web Services (AWS)
  • 104. Testability • Isolate problem • Mocking API calls • Reduce the time to run test suite
  • 105. Reduce Local Complexity • Team modularity along the same module splits as your software • Understandability: The amount of code is minimized to a quantity understandable by a small team • Source code control
  • 106. Considerations • Partition into Separate Services • API Design • Which Protocol
  • 107. How to partition into Separate Services • Partitioning on Logical Function • Partitioning on Read/Write Frequencies • Partitioning by Minimizing Joins • Partitioning by Iteration Speed
  • 108. API Design • Send Everything you need • Parallel HTTP requests • Send as Little as Possible • Use Logical Models
  • 109. Physical Models & Logical Models • Physical models are mapped to database tables through ORM. (It’s 3NF) • Logical models are mapped to your business problem. (External API use it) • Logical models are mapped to physical models by you.
  • 110. Logical Models • Not relational or normalized • Maintainability • can change with no change to data store • can stay the same while the data store changes • Better fit for REST interfaces • Better caching
  • 111. Which Protocol? • SOAP • XML-RPC • REST
  • 112. RESTful Web services • Rails way • REST is about resources • URL • Verbs: GET/PUT/POST/DELETE
  • 113. The tool set • Web framework • XML Parser • JSON Parser • HTTP Client
  • 114. Web framework • We do not need controller, view too much • Rails is a little more, how about Sinatra? • Rails metal
  • 115. ActiveResource • Mapping RESTful resources as models in a Rails application. • But not useful in practice, why?
  • 116. XML parser • http://nokogiri.org/ • Nokogiri ( ) is an HTML, XML, SAX, and Reader parser. Among Nokogiri’s many features is the ability to search documents via XPath or CSS3 selectors.
  • 117. JSON Parser • http://github.com/brianmario/yajl-ruby/ • An extremely efficient streaming JSON parsing and encoding library. Ruby C bindings to Yajl
  • 118. HTTP Client • http://github.com/pauldix/typhoeus/ • Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
  • 119. Tips • Define your logical model (i.e. your service request result) first. • model.to_json and model.to_xml is easy to use, but not useful in practice.
  • 120. 6.Distributed File System • NFS not scale • we can use rsync to duplicate • MogileFS • http://www.danga.com/mogilefs/ • http://seattlerb.rubyforge.org/mogilefs-client/ • Amazon S3 • HDFS (Hadoop Distributed File System) • GlusterFS
  • 121. 7.Distributed Database • NoSQL • CAP theorem • Eventually consistent • HBase/Cassandra/Voldemort
  • 123. References • Books&Articles: • Distributed Programming with Ruby, Mark Bates (Addison Wesley) • Enterprise Rails, Dan Chak (O’Reilly) • Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley) • RESTful Web Services, Richardson&Ruby (O’Reilly) • RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly) • Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers) • Ruby in Practice, McAnally&Arkin (Manning) • Building Scalable Web Sites, Cal Henderson (O’Reilly) • Background Processing in Rails, Erik Andrejko (Rails Magazine) • Background Processing with Delayed_Job, James Harrison (Rails Magazine) • Bulinging Scalable Web Sites, Cal Henderson (O’Reilly) • Web 点 ( ) • Slides: • Background Processing (Rob Mack) Austin on Rails - April 2009 • The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH) • Asynchronous Processing (Jonathan Dahl) • Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008 • Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008 • Physical Models & Logical Models in Rails, dan chak
  • 124. References • Links: • http://segment7.net/projects/ruby/drb/ • http://www.slideshare.net/luccastera/concurrent-programming-with-ruby-and-tuple-spaces • http://github.com/blog/542-introducing-resque • http://www.engineyard.com/blog/2009/5-tips-for-deploying-background-jobs/ • http://www.opensourcery.co.za/2008/07/07/messaging-and-ruby-part-1-the-big-picture/ • http://leemoonsoo.blogspot.com/2009/04/simple-comparison-open-source.html • http://blog.gslin.org/archives/2009/07/25/2065/ • http://www.javaeye.com/topic/524977 • http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
  • 125. Todo (maybe next time) • AMQP/RabbitMQ example code • How about Nanite? • XMPP • MagLev VM • More MapReduce example code • How about Amazon Elastic MapReduce? • Resque example code • More SOA example and code • MogileFS example code