Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Taro L. Saito, Ph.D.
GitHub: @xerial
Arm Treasure Data
Airframe
Lightweight Building Blocks for Scala
Treasure Dataを支える技術: Airframe編
October 17th, 2018
Plazma - TD Tech Talk
1
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe
● Lightweight Building Blocks for Scala
● Essential for building any applications
● Used in production for 2+ years
● Based on my code collection since 2009
● Initially written in Java
● Gradually migrated to Scala
● Repackaged into wvlet.airframe in 2016
● For maintainability
● 18 Modules
● Simplifying your daily programming in Scala
2
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe
● Named From A Novel By Michael Crichton (1942-2008)
● The author of Jurassic Park
3
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
About Me: Taro L. Saito (Leo)
● An Engineer with Research Background
● Ph.D., University of Tokyo
● DBMS & Genome Science
● Developing Query Engines in TD
● Living in US for 3+ years
● Bay Area, Silicon Valley
● Active OSS Developer
● airframe
● sqlite-jdbc
■ More than 1000 GitHub stars
● snappy-java
■ Compression library used in
Spark, Parquet
● sbt-sonatype
■ Used in 2000+ Scala projects
● ...
4
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Personal Goal of Today
● Collect 200 GitHub Stars
● keyword: Airframe + Scala
● https://github.com/wvlet/airframe
5
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Major Goals
● Providing A Standard Toolkit For Building Reliable Services
● Removing Complexities In Application Development
● Providing Simplicity By Design
6
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Simplicity By Design
● “Simplicity” by Philippe Dufour
● A clock made by a legendary
watchmaker in Switzerland
● Every part of the clock is built
by himself
● Airframe
● Provides simplicity for
application developers
7
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application Development with Airframe
● Bootstrap
● Parsing command-line options
● Reading configuration files
● Reading databases
● Object - Data Mapping
● Mapping data to objects (object mapping)
● Saving objects to files (serialization)
● Debugging
● Logging
● Collecting metrics
● Monitoring
● Building Services
● Creating service objects using dependency injection (DI)
8
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
18 Airframe Modules
● Bootstrap
● airframe-config Configuration loader
● airframe-opts Command-line option parser
● Object Serialization
● airframe-codec encoder/decoder SPI + standard codecs
● airframe-msgpack pure-Scala MessagePack implementation
● airframe-tablet CSV/TSV/JSON/JDBC ResultSet <-> Object
● Monitoring & Debugging
● airframe-log Logging
● airframe-metrics Human-readable metrics for time, date, data size, etc
● airframe-jmx Object metrics provider through JMX
● Building Service Objects
● airframe Dependency injection
● airframe-surface Object type inspector
● Misc:
● airframe-control, airframe-jdbc, airframe-json, airframe-http, etc.
9
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Configuring Applications (airframe-config)
● Embedding static configurations for all environments into a docker image
● Merging YAML + external configurations + object default parameters
YAML
development:
addr: api-dev.com
production:
addr: api.com
Config Object
case class ServerConfig(
addr: String,
port: Int = 8080,
password: String
)
production:
addr: api.com
Select env:production Credentials and Local
Configurations
Merge
Immutable
Object Default Parameters
(e.g., port = 8080)
Object
Mapping
11
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Reading And Saving Query Results
● Can we standardize this pattern?
RDBMS
JDBC
ResultSet
Seq[A]
Object
Mapping
External
Storage
(Cache)
Object
Serialization
Object
Deserialization
(Reload from cache)
12
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Problem: Data Mapping is Everywhere
● How many data readers and object mappers do we need?
● How can we simplify this?
YAML
JDBC
ResultSet
YAML Parser +
Object Mapper
Config
Object
Table
Object
Object-Relation
Mapper
JSON
JSON Parser +
Object Mapper
Object
13
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Using MessagePack As An Intermediate Data Format
● Why MessagePack?
● Flexible to support conversions from various types of data format
● Compact and efficient compared to JSON
● Easy to create schema-on-read object mapper
JDBC
ResultSet
MessagePack Object
Pack/Unpack
Unpack
Pack
YAML
JSON
14
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe Codec: Pack/Unpack Interface
● MessageCodec[A]
● pack: Convert object A into MessagePack
● unpack: Convert MessagePack into object A
Input MessagePack Output
Pack Unpack
PackUnpack
15
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Pre-defined Codecs in airframe-codec
● Primitive Codecs
● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec
● FloatCodec, DoubleCodec
● StringCodec
● BooleanCodec
● TimeStampCodec
● Collection Codec
● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc.
● OptionCodec
● JsonCodec
● Java-specific Codec
● FileCodec, ZonedDateTimeCodec, ResultSetCodec, etc.
● Adding Custom Codecs
● Implement MessageCodec[X] interface
16
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
ObjectCodec[A]: Combination of Codecs
● Generate Complex Codecs From The Parameter List of Objects
● MessagePack based serializer/deserializer
class A(
port:Int,
name:String,
timeoutSec:Double
)
Unpack
Pack
IntCodec
StringCodec
DoubleCodec
MessagePack
Array
Map
ObjectCodec[A]
17
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Power of Schema-On-Read Codec
● MessagePack values describe their own data types (self-describing)
● How to deserialize the data can be determined based on MessagePack types
Int
Float
Boolean
String
Array
Map
Binary
CSV
MessagePack
JDBC
ResultSet
Column
Scala.Int
parseInt
toInt
0 or 1
IntCodec
Pack Unpack
Error or
Zero.of[Int]
“100”
(string)
100
(bigint)
100
(int)
18
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 1/5: Treasure Data = MessagePack DBMS
● Fluentd -> MessagePack -> Treasure Data
● Automatically Generating Schema from Data
● Apply schema–on-read for providing table data to Presto/Hive/Spark, etc.
MessagePack
Fluentd
Mobile SDK
Table Schema
IntCodec
StringCodec
Generate Generate
Table Reader
Presto
Hive
Spark
Schema-free Data
19
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 2/5: Data Transformation
● Airframe Codec Works As A Lightweight Embulk
List[A] MessagePack
Pack
Unpack
TSV
Pack/Unpack
JSON
SQLite3
20
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 3/5: Taking Snapshots of Workflow Tasks
● Frequently Used for Data Analytic Pipelines
● Save Task Results As MessagePack (binary)
● Save the cost of re-computation
Result: Seq[A] MessagePack Storage
Pack
Save
Unpack
Task
Run
Load
Load
Compute
(e.g., 10 min)
First run
Snapshot
21
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 4/5: Scala.js RPC
● Scala.js
● Compiling Scala code into JavaScript for Web Browsers
● Model classes can be shared between Scala and Scala.js
● airframe-msgpack
● Added pure-Scala MessagePack implementation for supporting Scala.js
UserInfo MessagePack UserInfo
Pack Unpack
PackUnpack
Scala
Server Side
Scala.js
Client Side
XML RPC
22
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Application 5/5: Airframe HTTP Web Service
● Mapping HTTP Responses and Requests to Method Call Argument and Return
Values
● Airframe HTTP: Building Low-Friction Web Services Over Finagle (Medium Blog)
Http
Request
MessagePack
Pack
Request
Handler
Method
Unpack to Function Arguments
Http
Response
MessagePack
Unpack Return value
23
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
A Challenge: Type Erasure in Java Compiler (javac)
● Java class files (byte code) removes generic type information (type erasure)
class A (data:List[B])
class A
data: List[java.lang.Object]
javac
compiler
Class Parameter Type List ObjectCodec
Generate
???
Type Erasure
24
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe Surface
● Reading Type Signatures From ScalaSig
● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files
● Airframe Surface
● A library for inspecting object shapes
class A (data:List[B])
class A
data: List[java.lang.Object]
class A
data: List[java.lang.Object]
ScalaSig: data:List[B]
javac
scalac
Surface.of[A]
data: List[B]
scala.reflect.runtime.
universe.TypeTag
Type Erasure
25
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
What is Dependency Injection (DI)?
● Many Confusing Articles
● Inversion of Control Containers and the Dependency Injection pattern. Martin
Fowler (2004)
● StackOverflow, Wikipedia, …
● Many Frameworks
● Spring, Google Guice, Scaldi, Macwire, Grafter, Weld, etc.
● No framework approaches do exists (Pure-Scala DI)
● Recent Definition:
● Dependency Injection is the process of creating the static, stateless graph of
service objects, where each service is parameterised by its dependencies.
■ What is Dependency Injection? by Adam Warski (2018)
● However, it’s still difficult to understand what is DI
27
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Simplifying DI with Airframe
● Airframe Usage
● import wvlet.airframe._
● Simple 3 Step DI
● bind
● design
● build
● To Fully Understand DI …
● Think about what you can simplify
with DI
● Thinking about DI itself doesn’t
make much sense
■ e.g., comparing Guice,
Airframe, etc.
28
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
3 Things You Can Forget with Airframe DI
● 1. How to Build Service Objects
● 2. How to Manage Resource Lifecycle
● 3. How to Use DI Itself (!!)
Airframe Gives You A Focus On Application Development
29
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
1: Forget How to Build Service Objects
● When coding A and B
● You can focus on only direct dependencies
● You can forget about indirect dependencies
● Airframe DI builds A, B, and direct/indirect dependencies on your behalf.
A
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
HttpClient
B
30
You can forget this part
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Replacing Modules For Testing The Service
● In Airframe Design
● You can replace DB and FluentdLogger to In-Memory Impl
● How to build A and B differs, but the same code can be used
A
Memory
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
In-memory
Logger
B
31
Overriding Design for Testing
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
2: Forget How to Manage Resource Lifecycle
● FILO := First-In Last-Out
● Airframe can add onStart and onShutdown lifecycle hooks when creating
instances
● When closing sessions, onShutdown will be called in the reverse order
● Dependencies forms DAG
● Dependencies will be generated when creating new service objects
A
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
HttpClient
B
134
56
7
2
8
Shared
Resource
32
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
3: Forget How to Use DI
33
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Summary: Reducing Code Complexity with Airframe DI
● You can effectively forget about:
● How to build service objects
● How to manage resources in FILO order
● How to use DI itself
A
DB
Connection
Pool
DB
Client
DB Monitor
Fluentd
Logger
HttpClient
B
134
56
7
2
8
34
Implementation Details
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Debugging Applications: Airframe Log
● Airframe Log: A Modern Logging Library for Scala (Medium Blog)
● ANSI color, source code location support
36
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe Metrics
● Human Readable Data Format (Duration, DataSize, etc.)
● Handy Time Window String Support (Used in TD_INTERVAL)
37
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Airframe JMX
● Checking the internal states of remote JVM processes
● JMX clients
● jconsole has JMX metric monitor
● Airframe JMX -> DataDog -> Dashboard
38
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Future Work
● Airframe Stream
● Stream Query Processing Engine for MessagePack
● Can be a query engine for various types of data through MessagePack
● Airframe Fluentd
● Metric objects -> MessagePack -> Fluentd
● and more ...
Input MessagePack
Pack
Unpack
Stream SQL MessagePack
Query
Processing
Filter/Aggregation/Join, etc.
40
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Current State of Airframe
● Version 0.69 (As of October 2018)
● We already had 35+ releases in 2018
● Automated Release
● Cross building libraries for Scala 2.11, 2.12, 2.13, and Scala.js
● ‘sbt release’ command took 3 hours
■ Sequential steps:
○ compile -> test -> package -> upload x 18 modules x 4 Scala versions
● Now a new version can be released in 10 minutes on Travis CI
● Blog
● 3 Tips for Maintaining Scala Projects
41
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Summary
● Airframe
● Simplicity By Design
● 18 modules for simplifying application
development
● Key Technologies
● MessagePack-based codec
● airframe-surface to inspect object shapes
● Dependency injection (DI)
● Think What Can Be Simplified
● How MessagePack can be used to simplify
data transformation
● Airframe DI: 3 things you can forget
Don’t Forget Adding GitHub Star!
wvlet/airframe
42
Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017
Thank You!
Danke!
Merci!
谢谢!
ありがとう!
Gracias!
Kiitos!

More Related Content

PDF
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
PDF
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
PDF
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
PDF
Recent Changes and Challenges for Future Presto
PDF
Managing Machine Learning workflows on Treasure Data
PDF
201810 td tech_talk
PDF
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
PDF
Presto At Arm Treasure Data - 2019 Updates
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Custom Script Execution Environment on TD Workflow @ TD Tech Talk 2018-10-17
PLAZMA TD Tech Talk 2018 at Shibuya: Hive2 as a new td hadoop core engine
Recent Changes and Challenges for Future Presto
Managing Machine Learning workflows on Treasure Data
201810 td tech_talk
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
Presto At Arm Treasure Data - 2019 Updates

What's hot (20)

PDF
Airframe Meetup #3: 2019 Updates & AirSpec
PDF
Improve data engineering work with Digdag and Presto UDF
PDF
Reading The Source Code of Presto
PDF
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
PDF
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
PDF
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
PPTX
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
PDF
Graal VM: Multi-Language Execution Platform
PDF
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
PDF
Micro-Benchmarking Considered Harmful
PDF
Graal Tutorial at CGO 2015 by Christian Wimmer
PDF
InfluxDB 2.0 Client Libraries by Noah Crowley
PPTX
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
PPTX
Enabling Java: Windows on Arm64 - A Success Story!
PDF
Inside the InfluxDB storage engine
PDF
Kapacitor Manager
PPTX
Developer insight into why applications run amazingly Fast in CF 2018
PDF
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
PDF
WRITING QUERIES (INFLUXQL AND TICK)
PPTX
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...
Airframe Meetup #3: 2019 Updates & AirSpec
Improve data engineering work with Digdag and Presto UDF
Reading The Source Code of Presto
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
InfluxDB 101 – Concepts and Architecture by Michael DeSa, Software Engineer |...
Graal VM: Multi-Language Execution Platform
Graal and Truffle: Modularity and Separation of Concerns as Cornerstones for ...
Micro-Benchmarking Considered Harmful
Graal Tutorial at CGO 2015 by Christian Wimmer
InfluxDB 2.0 Client Libraries by Noah Crowley
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
Enabling Java: Windows on Arm64 - A Success Story!
Inside the InfluxDB storage engine
Kapacitor Manager
Developer insight into why applications run amazingly Fast in CF 2018
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
WRITING QUERIES (INFLUXQL AND TICK)
InfluxDB IOx Tech Talks: A Rusty Introduction to Apache Arrow and How it App...

Similar to Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17 (20)

PDF
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
PDF
Five cool ways the JVM can run Apache Spark faster
PDF
Spark Summit EU 2015: Reynold Xin Keynote
PDF
Porting a Streaming Pipeline from Scala to Rust
PPTX
Madeo - a CAD Tool for reconfigurable Hardware
PDF
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
PDF
Netflix Machine Learning Infra for Recommendations - 2018
PDF
ML Infra for Netflix Recommendations - AI NEXTCon talk
PDF
Make your PySpark Data Fly with Arrow!
PDF
Terraform 101: What's infrastructure as code?
PDF
Jump Start into Apache® Spark™ and Databricks
PDF
P4_tutorial.pdf
PPTX
Alexander Pavlenko, Java Software Engineer, DataArt.
PDF
Big data distributed processing: Spark introduction
PPTX
Running Presto and Spark on the Netflix Big Data Platform
PDF
JavaOne 2013: Memory Efficient Java
PDF
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
ODP
Doc store
PDF
Reactive app using actor model & apache spark
PDF
Introduction to dataset
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
Five cool ways the JVM can run Apache Spark faster
Spark Summit EU 2015: Reynold Xin Keynote
Porting a Streaming Pipeline from Scala to Rust
Madeo - a CAD Tool for reconfigurable Hardware
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Netflix Machine Learning Infra for Recommendations - 2018
ML Infra for Netflix Recommendations - AI NEXTCon talk
Make your PySpark Data Fly with Arrow!
Terraform 101: What's infrastructure as code?
Jump Start into Apache® Spark™ and Databricks
P4_tutorial.pdf
Alexander Pavlenko, Java Software Engineer, DataArt.
Big data distributed processing: Spark introduction
Running Presto and Spark on the Netflix Big Data Platform
JavaOne 2013: Memory Efficient Java
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Doc store
Reactive app using actor model & apache spark
Introduction to dataset

More from Taro L. Saito (18)

PDF
Airframe RPC
PDF
Tips For Maintaining OSS Projects
PDF
Learning Silicon Valley Culture
PDF
Presto At Treasure Data
PDF
Scala at Treasure Data
PDF
Introduction to Presto at Treasure Data
PDF
Workflow Hacks #1 - dots. Tokyo
PDF
Presto @ Treasure Data - Presto Meetup Boston 2015
PDF
Presto As A Service - Treasure DataでのPresto運用事例
PPTX
JNuma Library
PDF
Presto as a Service - Tips for operation and monitoring
PDF
Treasure Dataを支える技術 - MessagePack編
PDF
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
PPTX
Spark Internals - Hadoop Source Code Reading #16 in Japan
PPTX
Streaming Distributed Data Processing with Silk #deim2014
PDF
Silkによる並列分散ワークフロープログラミング
PDF
2011年度 生物データベース論 2日目 木構造データ
PDF
Relational-Style XML Query @ SIGMOD-J 2008 Dec.
Airframe RPC
Tips For Maintaining OSS Projects
Learning Silicon Valley Culture
Presto At Treasure Data
Scala at Treasure Data
Introduction to Presto at Treasure Data
Workflow Hacks #1 - dots. Tokyo
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto As A Service - Treasure DataでのPresto運用事例
JNuma Library
Presto as a Service - Tips for operation and monitoring
Treasure Dataを支える技術 - MessagePack編
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Spark Internals - Hadoop Source Code Reading #16 in Japan
Streaming Distributed Data Processing with Silk #deim2014
Silkによる並列分散ワークフロープログラミング
2011年度 生物データベース論 2日目 木構造データ
Relational-Style XML Query @ SIGMOD-J 2008 Dec.

Recently uploaded (20)

PPTX
sub station Simple Design of Substation PPT.pptx
PDF
Recent Trends in Network Security - 2025
PPTX
highway-150803160405-lva1-app6891 (1).pptx
PDF
BBC NW_Tech Facilities_30 Odd Yrs Ago [J].pdf
PDF
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
PPTX
MODULE 02 - CLOUD COMPUTING-Virtual Machines and Virtualization of Clusters a...
PPTX
Software-Development-Life-Cycle-SDLC.pptx
PDF
IAE-V2500 Engine for Airbus Family 319/320
PPTX
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
PDF
SURVEYING BRIDGING DBATU LONERE 2025 SYLLABUS
PDF
Introduction to Machine Learning -Basic concepts,Models and Description
PDF
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
PDF
CBCN cam bien cong nghiep bach khoa da năng
PPTX
Unit IILATHEACCESSORSANDATTACHMENTS.pptx
PDF
Module 1 part 1.pdf engineering notes s7
PDF
The Journal of Finance - July 1993 - JENSEN - The Modern Industrial Revolutio...
PDF
BTCVPE506F_Module 1 History & Theories of Town Planning.pdf
PPTX
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
PDF
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
PDF
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)
sub station Simple Design of Substation PPT.pptx
Recent Trends in Network Security - 2025
highway-150803160405-lva1-app6891 (1).pptx
BBC NW_Tech Facilities_30 Odd Yrs Ago [J].pdf
ASPEN PLUS USER GUIDE - PROCESS SIMULATIONS
MODULE 02 - CLOUD COMPUTING-Virtual Machines and Virtualization of Clusters a...
Software-Development-Life-Cycle-SDLC.pptx
IAE-V2500 Engine for Airbus Family 319/320
Soft Skills Unit 2 Listening Speaking Reading Writing.pptx
SURVEYING BRIDGING DBATU LONERE 2025 SYLLABUS
Introduction to Machine Learning -Basic concepts,Models and Description
MACCAFERRY GUIA GAVIONES TERRAPLENES EN ESPAÑOL
CBCN cam bien cong nghiep bach khoa da năng
Unit IILATHEACCESSORSANDATTACHMENTS.pptx
Module 1 part 1.pdf engineering notes s7
The Journal of Finance - July 1993 - JENSEN - The Modern Industrial Revolutio...
BTCVPE506F_Module 1 History & Theories of Town Planning.pdf
22ME926Introduction to Business Intelligence and Analytics, Advanced Integrat...
ST MNCWANGO P2 WIL (MEPR302) FINAL REPORT.pdf
Engineering Solutions for Ethical Dilemmas in Healthcare (www.kiu.ac.ug)

Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17

  • 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Taro L. Saito, Ph.D. GitHub: @xerial Arm Treasure Data Airframe Lightweight Building Blocks for Scala Treasure Dataを支える技術: Airframe編 October 17th, 2018 Plazma - TD Tech Talk 1
  • 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Airframe ● Lightweight Building Blocks for Scala ● Essential for building any applications ● Used in production for 2+ years ● Based on my code collection since 2009 ● Initially written in Java ● Gradually migrated to Scala ● Repackaged into wvlet.airframe in 2016 ● For maintainability ● 18 Modules ● Simplifying your daily programming in Scala 2
  • 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Airframe ● Named From A Novel By Michael Crichton (1942-2008) ● The author of Jurassic Park 3
  • 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. About Me: Taro L. Saito (Leo) ● An Engineer with Research Background ● Ph.D., University of Tokyo ● DBMS & Genome Science ● Developing Query Engines in TD ● Living in US for 3+ years ● Bay Area, Silicon Valley ● Active OSS Developer ● airframe ● sqlite-jdbc ■ More than 1000 GitHub stars ● snappy-java ■ Compression library used in Spark, Parquet ● sbt-sonatype ■ Used in 2000+ Scala projects ● ... 4
  • 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Personal Goal of Today ● Collect 200 GitHub Stars ● keyword: Airframe + Scala ● https://github.com/wvlet/airframe 5
  • 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Major Goals ● Providing A Standard Toolkit For Building Reliable Services ● Removing Complexities In Application Development ● Providing Simplicity By Design 6
  • 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Simplicity By Design ● “Simplicity” by Philippe Dufour ● A clock made by a legendary watchmaker in Switzerland ● Every part of the clock is built by himself ● Airframe ● Provides simplicity for application developers 7
  • 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Application Development with Airframe ● Bootstrap ● Parsing command-line options ● Reading configuration files ● Reading databases ● Object - Data Mapping ● Mapping data to objects (object mapping) ● Saving objects to files (serialization) ● Debugging ● Logging ● Collecting metrics ● Monitoring ● Building Services ● Creating service objects using dependency injection (DI) 8
  • 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 18 Airframe Modules ● Bootstrap ● airframe-config Configuration loader ● airframe-opts Command-line option parser ● Object Serialization ● airframe-codec encoder/decoder SPI + standard codecs ● airframe-msgpack pure-Scala MessagePack implementation ● airframe-tablet CSV/TSV/JSON/JDBC ResultSet <-> Object ● Monitoring & Debugging ● airframe-log Logging ● airframe-metrics Human-readable metrics for time, date, data size, etc ● airframe-jmx Object metrics provider through JMX ● Building Service Objects ● airframe Dependency injection ● airframe-surface Object type inspector ● Misc: ● airframe-control, airframe-jdbc, airframe-json, airframe-http, etc. 9
  • 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  • 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Configuring Applications (airframe-config) ● Embedding static configurations for all environments into a docker image ● Merging YAML + external configurations + object default parameters YAML development: addr: api-dev.com production: addr: api.com Config Object case class ServerConfig( addr: String, port: Int = 8080, password: String ) production: addr: api.com Select env:production Credentials and Local Configurations Merge Immutable Object Default Parameters (e.g., port = 8080) Object Mapping 11
  • 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Reading And Saving Query Results ● Can we standardize this pattern? RDBMS JDBC ResultSet Seq[A] Object Mapping External Storage (Cache) Object Serialization Object Deserialization (Reload from cache) 12
  • 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Problem: Data Mapping is Everywhere ● How many data readers and object mappers do we need? ● How can we simplify this? YAML JDBC ResultSet YAML Parser + Object Mapper Config Object Table Object Object-Relation Mapper JSON JSON Parser + Object Mapper Object 13
  • 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Using MessagePack As An Intermediate Data Format ● Why MessagePack? ● Flexible to support conversions from various types of data format ● Compact and efficient compared to JSON ● Easy to create schema-on-read object mapper JDBC ResultSet MessagePack Object Pack/Unpack Unpack Pack YAML JSON 14
  • 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Airframe Codec: Pack/Unpack Interface ● MessageCodec[A] ● pack: Convert object A into MessagePack ● unpack: Convert MessagePack into object A Input MessagePack Output Pack Unpack PackUnpack 15
  • 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Pre-defined Codecs in airframe-codec ● Primitive Codecs ● ByteCodec, CharCodec, ShortCodec, IntCodec, LongCodec ● FloatCodec, DoubleCodec ● StringCodec ● BooleanCodec ● TimeStampCodec ● Collection Codec ● ArrayCodec, SeqCodec, ListCodec, IndexSeqCodec, MapCodec, etc. ● OptionCodec ● JsonCodec ● Java-specific Codec ● FileCodec, ZonedDateTimeCodec, ResultSetCodec, etc. ● Adding Custom Codecs ● Implement MessageCodec[X] interface 16
  • 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. ObjectCodec[A]: Combination of Codecs ● Generate Complex Codecs From The Parameter List of Objects ● MessagePack based serializer/deserializer class A( port:Int, name:String, timeoutSec:Double ) Unpack Pack IntCodec StringCodec DoubleCodec MessagePack Array Map ObjectCodec[A] 17
  • 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Power of Schema-On-Read Codec ● MessagePack values describe their own data types (self-describing) ● How to deserialize the data can be determined based on MessagePack types Int Float Boolean String Array Map Binary CSV MessagePack JDBC ResultSet Column Scala.Int parseInt toInt 0 or 1 IntCodec Pack Unpack Error or Zero.of[Int] “100” (string) 100 (bigint) 100 (int) 18
  • 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Application 1/5: Treasure Data = MessagePack DBMS ● Fluentd -> MessagePack -> Treasure Data ● Automatically Generating Schema from Data ● Apply schema–on-read for providing table data to Presto/Hive/Spark, etc. MessagePack Fluentd Mobile SDK Table Schema IntCodec StringCodec Generate Generate Table Reader Presto Hive Spark Schema-free Data 19
  • 20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Application 2/5: Data Transformation ● Airframe Codec Works As A Lightweight Embulk List[A] MessagePack Pack Unpack TSV Pack/Unpack JSON SQLite3 20
  • 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Application 3/5: Taking Snapshots of Workflow Tasks ● Frequently Used for Data Analytic Pipelines ● Save Task Results As MessagePack (binary) ● Save the cost of re-computation Result: Seq[A] MessagePack Storage Pack Save Unpack Task Run Load Load Compute (e.g., 10 min) First run Snapshot 21
  • 22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Application 4/5: Scala.js RPC ● Scala.js ● Compiling Scala code into JavaScript for Web Browsers ● Model classes can be shared between Scala and Scala.js ● airframe-msgpack ● Added pure-Scala MessagePack implementation for supporting Scala.js UserInfo MessagePack UserInfo Pack Unpack PackUnpack Scala Server Side Scala.js Client Side XML RPC 22
  • 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Application 5/5: Airframe HTTP Web Service ● Mapping HTTP Responses and Requests to Method Call Argument and Return Values ● Airframe HTTP: Building Low-Friction Web Services Over Finagle (Medium Blog) Http Request MessagePack Pack Request Handler Method Unpack to Function Arguments Http Response MessagePack Unpack Return value 23
  • 24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. A Challenge: Type Erasure in Java Compiler (javac) ● Java class files (byte code) removes generic type information (type erasure) class A (data:List[B]) class A data: List[java.lang.Object] javac compiler Class Parameter Type List ObjectCodec Generate ??? Type Erasure 24
  • 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Airframe Surface ● Reading Type Signatures From ScalaSig ● Scala compiler embeds Scala Type Signatures (ScalaSig) to class files ● Airframe Surface ● A library for inspecting object shapes class A (data:List[B]) class A data: List[java.lang.Object] class A data: List[java.lang.Object] ScalaSig: data:List[B] javac scalac Surface.of[A] data: List[B] scala.reflect.runtime. universe.TypeTag Type Erasure 25
  • 26. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  • 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. What is Dependency Injection (DI)? ● Many Confusing Articles ● Inversion of Control Containers and the Dependency Injection pattern. Martin Fowler (2004) ● StackOverflow, Wikipedia, … ● Many Frameworks ● Spring, Google Guice, Scaldi, Macwire, Grafter, Weld, etc. ● No framework approaches do exists (Pure-Scala DI) ● Recent Definition: ● Dependency Injection is the process of creating the static, stateless graph of service objects, where each service is parameterised by its dependencies. ■ What is Dependency Injection? by Adam Warski (2018) ● However, it’s still difficult to understand what is DI 27
  • 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Simplifying DI with Airframe ● Airframe Usage ● import wvlet.airframe._ ● Simple 3 Step DI ● bind ● design ● build ● To Fully Understand DI … ● Think about what you can simplify with DI ● Thinking about DI itself doesn’t make much sense ■ e.g., comparing Guice, Airframe, etc. 28
  • 29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 3 Things You Can Forget with Airframe DI ● 1. How to Build Service Objects ● 2. How to Manage Resource Lifecycle ● 3. How to Use DI Itself (!!) Airframe Gives You A Focus On Application Development 29
  • 30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 1: Forget How to Build Service Objects ● When coding A and B ● You can focus on only direct dependencies ● You can forget about indirect dependencies ● Airframe DI builds A, B, and direct/indirect dependencies on your behalf. A DB Connection Pool DB Client DB Monitor Fluentd Logger HttpClient B 30 You can forget this part
  • 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Replacing Modules For Testing The Service ● In Airframe Design ● You can replace DB and FluentdLogger to In-Memory Impl ● How to build A and B differs, but the same code can be used A Memory DB Connection Pool DB Client DB Monitor Fluentd Logger In-memory Logger B 31 Overriding Design for Testing
  • 32. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 2: Forget How to Manage Resource Lifecycle ● FILO := First-In Last-Out ● Airframe can add onStart and onShutdown lifecycle hooks when creating instances ● When closing sessions, onShutdown will be called in the reverse order ● Dependencies forms DAG ● Dependencies will be generated when creating new service objects A DB Connection Pool DB Client DB Monitor Fluentd Logger HttpClient B 134 56 7 2 8 Shared Resource 32
  • 33. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. 3: Forget How to Use DI 33
  • 34. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Summary: Reducing Code Complexity with Airframe DI ● You can effectively forget about: ● How to build service objects ● How to manage resources in FILO order ● How to use DI itself A DB Connection Pool DB Client DB Monitor Fluentd Logger HttpClient B 134 56 7 2 8 34 Implementation Details
  • 35. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  • 36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Debugging Applications: Airframe Log ● Airframe Log: A Modern Logging Library for Scala (Medium Blog) ● ANSI color, source code location support 36
  • 37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Airframe Metrics ● Human Readable Data Format (Duration, DataSize, etc.) ● Handy Time Window String Support (Used in TD_INTERVAL) 37
  • 38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Airframe JMX ● Checking the internal states of remote JVM processes ● JMX clients ● jconsole has JMX metric monitor ● Airframe JMX -> DataDog -> Dashboard 38
  • 39. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
  • 40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Future Work ● Airframe Stream ● Stream Query Processing Engine for MessagePack ● Can be a query engine for various types of data through MessagePack ● Airframe Fluentd ● Metric objects -> MessagePack -> Fluentd ● and more ... Input MessagePack Pack Unpack Stream SQL MessagePack Query Processing Filter/Aggregation/Join, etc. 40
  • 41. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Current State of Airframe ● Version 0.69 (As of October 2018) ● We already had 35+ releases in 2018 ● Automated Release ● Cross building libraries for Scala 2.11, 2.12, 2.13, and Scala.js ● ‘sbt release’ command took 3 hours ■ Sequential steps: ○ compile -> test -> package -> upload x 18 modules x 4 Scala versions ● Now a new version can be released in 10 minutes on Travis CI ● Blog ● 3 Tips for Maintaining Scala Projects 41
  • 42. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Summary ● Airframe ● Simplicity By Design ● 18 modules for simplifying application development ● Key Technologies ● MessagePack-based codec ● airframe-surface to inspect object shapes ● Dependency injection (DI) ● Think What Can Be Simplified ● How MessagePack can be used to simplify data transformation ● Airframe DI: 3 things you can forget Don’t Forget Adding GitHub Star! wvlet/airframe 42
  • 43. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos!