SlideShare a Scribd company logo
Wes McKinney @wesmckinn
SHARED INFRASTRUCTURE FOR
DATA SCIENCE
WES MCKINNEY @WESMCKINN
Rice Data Science Conference | October 2017
ME
2
I M P O R TA N T L E G A L I N F O R M AT I O N
• The information presented here is offered for informational purposes only and should not be used for any other purpose
(including, without limitation, the making of investment decisions). Examples provided herein are for illustrative purposes
only and are not necessarily based on actual data. Nothing herein constitutes: an offer to sell or the solicitation of any
offer to buy any security or other interest; tax advice; or investment advice. This presentation shall remain the property of
Two Sigma Investments, LP (“Two Sigma”) and Two Sigma reserves the right to require the return of this presentation at
any time.
• Some of the images, logos or other material used herein may be protected by copyright and/or trademark. If so, such
copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for
identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright
or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma,
nor vice versa.
• Copyright © 2017 TWO SIGMA INVESTMENTS, LP. All rights reserved
Wes McKinney @wesmckinn 3
THINKING ON THE LAST 10 YEARS
4
2007 2017
CLOSED SOURCE OPEN SOURCE
5
Shared front-ends
for data science
THE NEXT 10 YEARS AND BEYOND
7
2017 2027 …
THE AI ARMS RACE
Wes McKinney @wesmckinn 8
CHANGING HARDWARE LANDSCAPE
DISK
PROCESSIN
G
MEMORY
9
T
DATA SCIENCE “LANGUAGE “SILOS”
FRONT-END
PYTHON R JVM JULIA …
10
WHAT’S IN A
SILO?
STORAGE /
DATA ACCESS
DATA
STRUCTURES /
IN-MEMORY
FORMATS
GENERAL
COMPUTE
ENGINE(S)
ADVANCED
ANALYTICS
11
WHAT’S IN A
SILO?
STORAGE /
DATA ACCESS
DATA
STRUCTURES /
IN-MEMORY
FORMATS
GENERAL
COMPUTE
ENGINE(S)
ADVANCED
ANALYTICS
pandas NumPy
pandas
NumPy
pandas
scikit-learn
12
RENOVATING PANDAS
Wes McKinney @wesmckinn 13
27
T
MAKING THE SILOS “SMALLER”
FRONT-END
PYTHON R JVM JULIA
?
…
14
PROGRAMMING LANGUAGES
AS USER INTERFACES
15
GRAPHIC: Iceberg under sea (only top
part visible to naked eye)
T
df <- read_csv(…)
df % group_by(…) % summarise(…)
df = read_csv(…)
df.groupby(…).aggregate(…)
PYTHON
R
SAME ANALYSIS, DIFFERENT
IMPLEMENTATION
17
T
A SHARED RUNTIME FOR DATA SCIENCE
FRONT-END
PYTHON R JVM JULIA
SHARED DATA SCIENCE RUNTIME
…
18
FROM IDEA TO ACTION
19
T
PART 1: STANDARD IN-MEMORY FORMAT
R
PYTHON
JVM
PORTABLE DATA
FRAME
Non-Portable Data Frames
20…
T
PART 2: ZERO COPY INTERCHANGE
RPYTHON JVM
SHARED MEMORY + STANDARD MEMORY FORMATS
…
21
T
PART 3: HIGH PERFORMANCE DATA
ACCESS
BINARY
COLUMNAR
CSV
SQL
PORTABLE
DATA FRAME
Storage Formats/ Databases
… 22
T
PART 4: FLEXIBLE COMPUTATION ENGINE
• Zero-overhead User-defined Functions
• Portable Operator “Graphs”
• “Embeddable” in Larger Systems
23
APACHE ARROW
Language-agnostic Data Frame Format
Zero-Copy Interchange
24
24
Without Arrow With Arrow
Simple, fast data interchange
24
• Cache-efficient columnar memory: optimized for CPU affinity and
SIMD / parallel processing, O(1) random value access
• Zero-copy messaging / IPC: Language-agnostic metadata,
batch/file-based and streaming binary formats
• Complex schema support: Flat and nested data types
• Main implementations in C++ and Java: with integration tests
• Bindings / implementations for C, Python, Ruby, Javascript in
various stages of development
Big picture Arrow goals
T
BUILDING THE ARROW FORMAT
• “Superset” of representations supported by
R, pandas, SQL engines
• Optimized for CPU cache affinity
• ASF Governance: Open + Transparent
Community Project
25
FEATHER: MINIMALIST ARROW ON DISK
Some Arrow OSS Users
Feather Format
Ray Project
27
FROM ARROW TO PANDAS2
28
Logical Operator Graphs
27
(a + b).log()
Log Add
a
b
Terminology
27
• Kernel functions: atomic units of
computation
• Operator nodes: input/output types,
operator parallelism properties
Parallel Execution of Operator Graphs
27
a b
ADD LOG
tmp out
Some Optimization strategies
27
• Multicore scheduling
• Elimination of temporaries
• Operator fusion / pipelinng
A
28
Arrow-optimized data connectors
Arrow in-memory format
Logical Data Frame Expression Graphs
Parallel Dataflow Execution Engine
Python user API, DataFrame semantics,
User-defined functions
pandas2
Apache Arrow
BUILDING THE FUTURE
28
Shared Infrastructure for Data Science
Wes McKinney @wesmckinn
THANK YOU
WES MCKINNEY @WESMCKINN
Apache Arrow: http://arrow.apache.org

More Related Content

What's hot (19)

PPTX
Self Service Reporting & Analytics For an Enterprise
Sreejith Madhavan
 
PPTX
Apache Arrow - An Overview
Dremio Corporation
 
PDF
Hyperspace for Delta Lake
Databricks
 
PDF
Snowflake Architecture
mymailforspamfr
 
PDF
Blockchain Based Decentralized Cloud System
Dhruvdoshi25071999
 
PDF
Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman ...
HostedbyConfluent
 
PDF
Snowflake for Data Engineering
Harald Erb
 
PDF
Using trained machine learning predictors in Gurobi
Xavier Nodet
 
PPTX
Delivering Data Democratization in the Cloud with Snowflake
Kent Graziano
 
PDF
A glimpse of cassandra 4.0 features netflix
Vinay Kumar Chella
 
PPTX
Introduction to Apache Pig
Jason Shao
 
PPTX
Scalabilité de MongoDB
MongoDB
 
PDF
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Databricks
 
PPTX
OLAP Cubes: Basic operations
Sthefan Berwanger
 
PDF
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
PDF
Event Sourcing - Greg Young
JAXLondon2014
 
PPTX
Grokking Techtalk #37: Software design and refactoring
Grokking VN
 
PDF
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Databricks
 
PDF
Why you should care about data layout in the file system with Cheng Lian and ...
Databricks
 
Self Service Reporting & Analytics For an Enterprise
Sreejith Madhavan
 
Apache Arrow - An Overview
Dremio Corporation
 
Hyperspace for Delta Lake
Databricks
 
Snowflake Architecture
mymailforspamfr
 
Blockchain Based Decentralized Cloud System
Dhruvdoshi25071999
 
Real-Time Processing of Spatial Data Using Kafka Streams, Ian Feeney & Roman ...
HostedbyConfluent
 
Snowflake for Data Engineering
Harald Erb
 
Using trained machine learning predictors in Gurobi
Xavier Nodet
 
Delivering Data Democratization in the Cloud with Snowflake
Kent Graziano
 
A glimpse of cassandra 4.0 features netflix
Vinay Kumar Chella
 
Introduction to Apache Pig
Jason Shao
 
Scalabilité de MongoDB
MongoDB
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Databricks
 
OLAP Cubes: Basic operations
Sthefan Berwanger
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Event Sourcing - Greg Young
JAXLondon2014
 
Grokking Techtalk #37: Software design and refactoring
Grokking VN
 
Redis + Structured Streaming—A Perfect Combination to Scale-Out Your Continuo...
Databricks
 
Why you should care about data layout in the file system with Cheng Lian and ...
Databricks
 

Similar to Shared Infrastructure for Data Science (20)

PDF
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
 
PDF
Got Big Data? Get OpenSplice!
Angelo Corsaro
 
PPTX
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
 
PDF
Digital Innovation Trends in Government Blockchain Machine Learning and Inter...
scoopnewsgroup
 
PPTX
5 Tips to Building a Successful Big Data Strategy
Western Digital
 
PDF
Big Data Scotland 2017
Ray Bugg
 
PDF
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
 
PDF
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Spark Summit
 
PPTX
Big Data Mining Keynote presentation Sept 2013 09012013
Julio Da Silva
 
PDF
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Quantopian
 
PPT
District Office of Info and KM - Proposed - by Joel Magnussen - 2004
Peter Stinson
 
PDF
Watson data platform_sofia_20171017
Mladen Jovanovski
 
PDF
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 
PDF
What is Hadoop? Nov 20 2013 - IRMAC
Adam Muise
 
PDF
What is Hadoop? Oct 17 2013
Adam Muise
 
PPTX
Big data
promediakw
 
PPTX
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
BCS Data Management Specialist Group
 
PDF
How Can Analytics Improve Business?
Inside Analysis
 
PDF
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
MDS ap
 
PDF
2014 feb 5_what_ishadoop_mda
Adam Muise
 
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
 
Got Big Data? Get OpenSplice!
Angelo Corsaro
 
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
 
Digital Innovation Trends in Government Blockchain Machine Learning and Inter...
scoopnewsgroup
 
5 Tips to Building a Successful Big Data Strategy
Western Digital
 
Big Data Scotland 2017
Ray Bugg
 
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
 
Improving Python and Spark Performance and Interoperability: Spark Summit Eas...
Spark Summit
 
Big Data Mining Keynote presentation Sept 2013 09012013
Julio Da Silva
 
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Quantopian
 
District Office of Info and KM - Proposed - by Joel Magnussen - 2004
Peter Stinson
 
Watson data platform_sofia_20171017
Mladen Jovanovski
 
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 
What is Hadoop? Nov 20 2013 - IRMAC
Adam Muise
 
What is Hadoop? Oct 17 2013
Adam Muise
 
Big data
promediakw
 
Horse meat or beef? (3) D Murphy, National Grid, 21/3/13
BCS Data Management Specialist Group
 
How Can Analytics Improve Business?
Inside Analysis
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
MDS ap
 
2014 feb 5_what_ishadoop_mda
Adam Muise
 
Ad

More from Wes McKinney (20)

PDF
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
 
PDF
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
PDF
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
PDF
New Directions for Apache Arrow
Wes McKinney
 
PDF
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
PDF
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
 
PDF
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
 
PDF
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
PDF
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
 
PDF
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
PDF
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
 
PDF
Ursa Labs and Apache Arrow in 2019
Wes McKinney
 
PDF
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
 
PDF
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
PDF
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
 
PDF
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
 
PPTX
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
 
PDF
Python Data Wrangling: Preparing for the Future
Wes McKinney
 
PDF
PyCon APAC 2016 Keynote
Wes McKinney
 
PDF
Apache Arrow and Python: The latest
Wes McKinney
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
 
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
New Directions for Apache Arrow
Wes McKinney
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
 
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
 
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
 
Ursa Labs and Apache Arrow in 2019
Wes McKinney
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
 
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
 
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
 
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
 
Python Data Wrangling: Preparing for the Future
Wes McKinney
 
PyCon APAC 2016 Keynote
Wes McKinney
 
Apache Arrow and Python: The latest
Wes McKinney
 
Ad

Recently uploaded (20)

PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 

Shared Infrastructure for Data Science

  • 1. Wes McKinney @wesmckinn SHARED INFRASTRUCTURE FOR DATA SCIENCE WES MCKINNEY @WESMCKINN Rice Data Science Conference | October 2017
  • 3. I M P O R TA N T L E G A L I N F O R M AT I O N • The information presented here is offered for informational purposes only and should not be used for any other purpose (including, without limitation, the making of investment decisions). Examples provided herein are for illustrative purposes only and are not necessarily based on actual data. Nothing herein constitutes: an offer to sell or the solicitation of any offer to buy any security or other interest; tax advice; or investment advice. This presentation shall remain the property of Two Sigma Investments, LP (“Two Sigma”) and Two Sigma reserves the right to require the return of this presentation at any time. • Some of the images, logos or other material used herein may be protected by copyright and/or trademark. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa. • Copyright © 2017 TWO SIGMA INVESTMENTS, LP. All rights reserved Wes McKinney @wesmckinn 3
  • 4. THINKING ON THE LAST 10 YEARS 4 2007 2017
  • 7. THE NEXT 10 YEARS AND BEYOND 7 2017 2027 …
  • 8. THE AI ARMS RACE Wes McKinney @wesmckinn 8
  • 10. T DATA SCIENCE “LANGUAGE “SILOS” FRONT-END PYTHON R JVM JULIA … 10
  • 11. WHAT’S IN A SILO? STORAGE / DATA ACCESS DATA STRUCTURES / IN-MEMORY FORMATS GENERAL COMPUTE ENGINE(S) ADVANCED ANALYTICS 11
  • 12. WHAT’S IN A SILO? STORAGE / DATA ACCESS DATA STRUCTURES / IN-MEMORY FORMATS GENERAL COMPUTE ENGINE(S) ADVANCED ANALYTICS pandas NumPy pandas NumPy pandas scikit-learn 12
  • 14. 27
  • 15. T MAKING THE SILOS “SMALLER” FRONT-END PYTHON R JVM JULIA ? … 14
  • 17. GRAPHIC: Iceberg under sea (only top part visible to naked eye)
  • 18. T df <- read_csv(…) df % group_by(…) % summarise(…) df = read_csv(…) df.groupby(…).aggregate(…) PYTHON R SAME ANALYSIS, DIFFERENT IMPLEMENTATION 17
  • 19. T A SHARED RUNTIME FOR DATA SCIENCE FRONT-END PYTHON R JVM JULIA SHARED DATA SCIENCE RUNTIME … 18
  • 20. FROM IDEA TO ACTION 19
  • 21. T PART 1: STANDARD IN-MEMORY FORMAT R PYTHON JVM PORTABLE DATA FRAME Non-Portable Data Frames 20…
  • 22. T PART 2: ZERO COPY INTERCHANGE RPYTHON JVM SHARED MEMORY + STANDARD MEMORY FORMATS … 21
  • 23. T PART 3: HIGH PERFORMANCE DATA ACCESS BINARY COLUMNAR CSV SQL PORTABLE DATA FRAME Storage Formats/ Databases … 22
  • 24. T PART 4: FLEXIBLE COMPUTATION ENGINE • Zero-overhead User-defined Functions • Portable Operator “Graphs” • “Embeddable” in Larger Systems 23
  • 25. APACHE ARROW Language-agnostic Data Frame Format Zero-Copy Interchange 24
  • 26. 24 Without Arrow With Arrow Simple, fast data interchange
  • 27. 24 • Cache-efficient columnar memory: optimized for CPU affinity and SIMD / parallel processing, O(1) random value access • Zero-copy messaging / IPC: Language-agnostic metadata, batch/file-based and streaming binary formats • Complex schema support: Flat and nested data types • Main implementations in C++ and Java: with integration tests • Bindings / implementations for C, Python, Ruby, Javascript in various stages of development Big picture Arrow goals
  • 28. T BUILDING THE ARROW FORMAT • “Superset” of representations supported by R, pandas, SQL engines • Optimized for CPU cache affinity • ASF Governance: Open + Transparent Community Project 25
  • 30. Some Arrow OSS Users Feather Format Ray Project 27
  • 31. FROM ARROW TO PANDAS2 28
  • 32. Logical Operator Graphs 27 (a + b).log() Log Add a b
  • 33. Terminology 27 • Kernel functions: atomic units of computation • Operator nodes: input/output types, operator parallelism properties
  • 34. Parallel Execution of Operator Graphs 27 a b ADD LOG tmp out
  • 35. Some Optimization strategies 27 • Multicore scheduling • Elimination of temporaries • Operator fusion / pipelinng
  • 36. A 28 Arrow-optimized data connectors Arrow in-memory format Logical Data Frame Expression Graphs Parallel Dataflow Execution Engine Python user API, DataFrame semantics, User-defined functions pandas2 Apache Arrow
  • 39. Wes McKinney @wesmckinn THANK YOU WES MCKINNEY @WESMCKINN Apache Arrow: http://arrow.apache.org