Python VS GO

INTEGRATION DAY 2018 -
Python Vs Go

Agenda
• Background
• Go Intro
• Our experiments
• Python optimizations
• Go optimizations
• Conclusions

Our Stack
• Our stack is heavily relying on Python -
– Celery as our task distribution infra
– Django as our web framework and ORM layer
– All of our data pipeline tasks are written in Python
– We use a lot of great python libs to parse 3rd parties' data
– We all love python :)

Our Analytics Stack
S3
File Preload
Processing
Data Source
Integration
Periodic Druid
Loader

• We need to ingest (almost) user-level data into our aggregated analytics data pipeline
The problem
S3
File Preload
Processing
Data Source
Integration
Periodic Druid
Loader
User-Level Stack
Daily Export

The problem
Building a new data integration for our user-level stack -
• Download the last S3 files
• Read each one of the CSV files and parse the data
• Group by set of dimensions and aggregate all data in memory
• Dump the aggregated data to S3

• Should we just optimize our python code or can we just rewrite it in Go
and gain performance improvements for "free"?
• We have multi-core beasts working in AWS - should we try and
parallelize better? if so, how does performance looks then with Go v.s.
Python?
• Let's say that Go will be faster, huge downside is.. most people don't
know Go (yet) in the company - should we consider it?
The Questions?

Golang
• 8 years of Go
• Open source project
• Created at Google
• Compiled language
• Staticlly typed
• Garbage collection
• GoLand
• Invented by Ken Thompson & Rob Pike

Golang
• Concurrency is easy
– Goruotines
• Come with built-in primitives to communicate between themselves (channels)
• Have faster startup time than threads and allow you to avoid mutexes when
sharing data structures
• Goroutines are multiplexed into small number of OS threads (they do not have
1:1 mapping with OS threads)

Golang
• Goroutine example -
f("hello", "world") // f runs; we wait
go f("hello", "world") // f starts running
g() // does not wait for f to return

Golang
• Channels example -
timerChan := make(chan time.Time)
go func() {
time.Sleep(deltaT)
timerChan <- time.Now() // send time on
timerChan
}()
// Do something else; when ready, receive.
// Receive will block until timerChan delivers.
completedAt := <-timerChan

Golang
• Go intentionally leaves out many features of modern OOP languages -
– No Classes (and obviously doesn't support inheritance)
– No constructors
– No annotations
– No generics
– No exceptions

Go VS Python - Our Experiments
• A light-weight version of our user-level integration code.
• The data set -
– 2 months of data for 1 customer (60 files)
– ~1GB compressed / ~18GB uncompressed
– Output file should be 971MB with 5,836,455 lines
• Code Structrue -
– Read each one of the CSV files and parse it
– Group by set of dimensions and aggregate all data in memory
– Dump aggregated results to disk

Run Time Memory
Python Naive
Implementation
12:57 minutes 6.96 GiB
Go Naive
Implementation
Python with Pypy 08:43 minutes 10.0 GiB
Go with Goroutines 09:01 minutes 7.45 GiB
Go with Goroutines -
minimum allocations
Python3.6 with
multiprocessing pool
Python with Pandas 09:23 minutes 8.67 GiB

• Conclusions from our benchmark -
– Naive implementations
• Go was ~25% faster
• Python's memory was 15% better
– Pypy behaves quite impressavily
• Not fun in production
– Go with minimum allocations performs the best

The benchmark game -
• https://benchmarksgame.alioth.debian.org/u64q/go.html
Go VS Python - Other Benchmarks

Data Structures in Python (2.7) - Comparison
object creation time set item get item empty size len=30
tuple 20.1 ns - 45.9 ns 56 296
namedtuple 1.09 µs - 121 ns 56 296
list 267 ns 49.4 ns 37.4 ns 72 352
dict 1.3 µs 54.7 ns 46.3 ns 280 3352
dict (Py3.6) 812 ns 42.7 ns 36.5 ns 240 1184
class 2.87 µs 72.1 ns 56.4 ns 344 4643
class with
__slots__
1.92 µs 63.1 ns 55.3 ns 16 325

Comparison - Results
- In Python 2.7 dicts are very inefficient in memory
- In python3 they take ⅓ memory
- Classes are less efficient
- lists, tuples and dicts have similar access times
- tuples are the most efficient
- namedtuple is better than class in creation time and memory
- access times are less efficient than list or dict
- Classes with __slots__ are effecient in memory, but they have slow creation time
- Try to optimize by re-using it

Garbage Collection
- In python objects are deallocated when there is no reference to them
- Object references can be stored in many places in the code
print sys.getrefcount(obj)
import objgraph
objgraph.show_refs(obj, filename='pasten.png')

Garbage Collection
- To avoid too many references you can use the weekref module -
import weakref
class Dict(dict):
pass
my_fat_dict = Dict(tons_of_data='pasten')
print sys.getrefcount(my_fat_dict) //1
dict_ref = weakref.ref(my_fat_dict)
print sys.getrefcount(my_fat_dict) //1

Garbage Collection
• The GC is mainly responsible for solving cyclic references -
• The GC doesn't run in real-time and runs periodically, you can also run
it manually to release memory at certain point -
– gc.collect()
gc.disable()
lst = []
lst.append(lst)
del lst

Plan your Cache!
This cache implementation is useless!
• This code will have lots of cache misses

Fastcache Vs Cachetools
• We process a lot of rows!
– We have a lot of DB queries per row!
– We need to have a good caching mechanism.
• fastcache - https://github.com/pbrady/fastcache
– x10 times faster while using the untyped version
– x24 times faster while using the typed one
– Implemented in c using Python headers (simple compiling without
dependencies in ext. libs)
– Improved our run-time by 20%!

Plan Your Concurrency
const numCPU = runtime.NumCPU() // number of CPU cores
func DoAll(u Vector) {
c := make(chan int, numCPU)
for i := 0; i < numCPU; i++ {
go v.DoSome(i*len(v)/numCPU, (i+1)*len(v)/numCPU, c)
}
// Drain the channel.
for i := 0; i < numCPU; i++ {
<-c // wait for one task to complete
}
// All done.
}

func Query(conns []Conn, query string) Result {
ch := make(chan Result, len(conns))
for _, conn := range conns {
go func(c Conn) {
ch <- c.DoQuery(query):
}(conn)
}
return <-ch
}

• There are also some patterns out there -
– Future
– Generator
– Fan-in, Fan-out
– Many more
– ...

func futureData(url string) <-chan data {
c := make(chan data, 1)
go func() {
resp, err := http.Get(url)
c <- resp
}()
return c
}
func main() {
future := futureData("http://test.future.com")
// do many other things
resp := <-future
// do something with resp
}

[]byte Vs string in Go
• A []byte is essentially just this -
• A string is essentially just this -
type slice struct{
data uintptr
len int
cap int
}
type slice struct{
data uintptr
len int
}

• So, if you have a []byte and you need a string then -
– Go compiler would just set data pointer and length from []byte in the string
• No!
• Each time you do []byte("pasten") or string([]byte{0x40,0x40}) there's
an allocation and a copy

import {
"unsafe"
}
func BytesSliceToString(ba []byte) string {
return *(*string)(unsafe.Pointer(&ba))
}

Concurrency Vs Parallelism
Concurrency is about dealing with lots of things at once. Parallelism is
about doing lots of things at once.
VS

Concurrency Vs Parallelism
Python -
• Threads
– Concurrency
• Processes
– Parallelism
• Gevent (/greenlets)
– Concurrency
Go -
• Goroutines
– Parallelizeable but mainly concurrent.

Python Vs Go
Python -
• Threads
– Not so good for for CPU-bound tasks
• Processes
– less effecient in memory
• Gevent (/greenlets)
– Not so good for for CPU-bound tasks (but less than threads)
• Many effecient libreries
– Load C libraries (Not a lot of room for further optimizations)
• Development is easier and faster
Go -
• Goroutines
– Faster startup time than threads
– Memory is used only when needed - stacks start small but grow and shrink as required
– Cheap and lightweight
– You can run many of them (more than threads)
• Easy to learn!

Example of Relevant Use Cases
• A tool for importing user-level data
– Sends millions of events
• A micro-service for handling incoming events
– Needs to handle thousands of events per second
• Parsing and processing big CSV files
– processes tens of millions of rows
• Writing to a DB in a high throughput

Summary
• We introduced our background to the problem
• We demonstrated our benchmarks
• We talked (a bit) about Python & Go optimizations
• We explained and concluded our way of thinking about it

Python VS GO

More Related Content

What's hot (20)

Similar to Python VS GO (20)

Recently uploaded (20)

Python VS GO