SlideShare a Scribd company logo
INTEGRATION DAY 2018 -
Python Vs Go
Agenda
• Background
• Go Intro
• Our experiments
• Python optimizations
• Go optimizations
• Conclusions
3
Background
Our Stack
• Our stack is heavily relying on Python -
– Celery as our task distribution infra
– Django as our web framework and ORM layer
– All of our data pipeline tasks are written in Python
– We use a lot of great python libs to parse 3rd parties' data
– We all love python :)
Our Analytics Stack
S3
File Preload
Processing
Data Source
Integration
Periodic Druid
Loader
• We need to ingest (almost) user-level data into our aggregated analytics data pipeline
The problem
S3
File Preload
Processing
Data Source
Integration
Periodic Druid
Loader
User-Level Stack
Daily Export
The problem
Building a new data integration for our user-level stack -
• Download the last S3 files
• Read each one of the CSV files and parse the data
• Group by set of dimensions and aggregate all data in memory
• Dump the aggregated data to S3
• Should we just optimize our python code or can we just rewrite it in Go
and gain performance improvements for "free"?
• We have multi-core beasts working in AWS - should we try and
parallelize better? if so, how does performance looks then with Go v.s.
Python?
• Let's say that Go will be faster, huge downside is.. most people don't
know Go (yet) in the company - should we consider it?
The Questions?
9
Go
Golang
• 8 years of Go
• Open source project
• Created at Google
• Compiled language
• Staticlly typed
• Garbage collection
• GoLand
• Invented by Ken Thompson & Rob Pike
Golang
• Concurrency is easy
– Goruotines
• Come with built-in primitives to communicate between themselves (channels)
• Have faster startup time than threads and allow you to avoid mutexes when
sharing data structures
• Goroutines are multiplexed into small number of OS threads (they do not have
1:1 mapping with OS threads)
Golang
• Goroutine example -
f("hello", "world") // f runs; we wait
go f("hello", "world") // f starts running
g() // does not wait for f to return
Golang
• Channels example -
timerChan := make(chan time.Time)
go func() {
time.Sleep(deltaT)
timerChan <- time.Now() // send time on
timerChan
}()
// Do something else; when ready, receive.
// Receive will block until timerChan delivers.
completedAt := <-timerChan
Golang
• Go intentionally leaves out many features of modern OOP languages -
– No Classes (and obviously doesn't support inheritance)
– No constructors
– No annotations
– No generics
– No exceptions
Golang
Go VS Python - Our Experiments
• A light-weight version of our user-level integration code.
• The data set -
– 2 months of data for 1 customer (60 files)
– ~1GB compressed / ~18GB uncompressed
– Output file should be 971MB with 5,836,455 lines
• Code Structrue -
– Read each one of the CSV files and parse it
– Group by set of dimensions and aggregate all data in memory
– Dump aggregated results to disk
Go VS Python - Our Experiments
Run Time Memory
Python Naive
Implementation
12:57 minutes 6.96 GiB
Go Naive
Implementation
10:03 minutes 8.15 GiB
Python with Pypy 08:43 minutes 10.0 GiB
Go with Goroutines 09:01 minutes 7.45 GiB
Go with Goroutines -
minimum allocations
03:33 minutes 3.90 GiB
Python3.6 with
multiprocessing pool
11:01 minutes 7.02 GiB
Python with Pandas 09:23 minutes 8.67 GiB
Go VS Python - Our Experiments
• Conclusions from our benchmark -
– Naive implementations
• Go was ~25% faster
• Python's memory was 15% better
– Pypy behaves quite impressavily
• Not fun in production
– Go with minimum allocations performs the best
Aerospike Write Benchmark
The benchmark game -
• https://benchmarksgame.alioth.debian.org/u64q/go.html
Go VS Python - Other Benchmarks
21
Python Optimizations
What is __slots__?
Data Structures in Python (2.7) - Comparison
object creation time set item get item empty size len=30
tuple 20.1 ns - 45.9 ns 56 296
namedtuple 1.09 µs - 121 ns 56 296
list 267 ns 49.4 ns 37.4 ns 72 352
dict 1.3 µs 54.7 ns 46.3 ns 280 3352
dict (Py3.6) 812 ns 42.7 ns 36.5 ns 240 1184
class 2.87 µs 72.1 ns 56.4 ns 344 4643
class with
__slots__
1.92 µs 63.1 ns 55.3 ns 16 325
Comparison - Results
- In Python 2.7 dicts are very inefficient in memory
- In python3 they take ⅓ memory
- Classes are less efficient
- lists, tuples and dicts have similar access times
- tuples are the most efficient
- namedtuple is better than class in creation time and memory
- access times are less efficient than list or dict
- Classes with __slots__ are effecient in memory, but they have slow creation time
- Try to optimize by re-using it
Garbage Collection
- In python objects are deallocated when there is no reference to them
- Object references can be stored in many places in the code
print sys.getrefcount(obj)
import objgraph
objgraph.show_refs(obj, filename='pasten.png')
Garbage Collection
- To avoid too many references you can use the weekref module -
import weakref
class Dict(dict):
pass
my_fat_dict = Dict(tons_of_data='pasten')
print sys.getrefcount(my_fat_dict) //1
dict_ref = weakref.ref(my_fat_dict)
print sys.getrefcount(my_fat_dict) //1
Garbage Collection
• The GC is mainly responsible for solving cyclic references -
• The GC doesn't run in real-time and runs periodically, you can also run
it manually to release memory at certain point -
– gc.collect()
gc.disable()
lst = []
lst.append(lst)
del lst
Plan your Cache!
This cache implementation is useless!
• This code will have lots of cache misses
Plan your Cache!
Fastcache Vs Cachetools
• We process a lot of rows!
– We have a lot of DB queries per row!
– We need to have a good caching mechanism.
• fastcache - https://github.com/pbrady/fastcache
– x10 times faster while using the untyped version
– x24 times faster while using the typed one
– Implemented in c using Python headers (simple compiling without
dependencies in ext. libs)
– Improved our run-time by 20%!
31
Go Optimizations
Plan Your Concurrency
const numCPU = runtime.NumCPU() // number of CPU cores
func DoAll(u Vector) {
c := make(chan int, numCPU)
for i := 0; i < numCPU; i++ {
go v.DoSome(i*len(v)/numCPU, (i+1)*len(v)/numCPU, c)
}
// Drain the channel.
for i := 0; i < numCPU; i++ {
<-c // wait for one task to complete
}
// All done.
}
Plan Your Concurrency
func Query(conns []Conn, query string) Result {
ch := make(chan Result, len(conns))
for _, conn := range conns {
go func(c Conn) {
ch <- c.DoQuery(query):
}(conn)
}
return <-ch
}
Plan Your Concurrency
• There are also some patterns out there -
– Future
– Generator
– Fan-in, Fan-out
– Many more
– ...
Plan Your Concurrency
func futureData(url string) <-chan data {
c := make(chan data, 1)
go func() {
resp, err := http.Get(url)
c <- resp
}()
return c
}
func main() {
future := futureData("http://test.future.com")
// do many other things
resp := <-future
// do something with resp
}
[]byte Vs string in Go
• A []byte is essentially just this -
• A string is essentially just this -
type slice struct{
data uintptr
len int
cap int
}
type slice struct{
data uintptr
len int
}
[]byte Vs string in Go
• So, if you have a []byte and you need a string then -
– Go compiler would just set data pointer and length from []byte in the string
• No!
• Each time you do []byte("pasten") or string([]byte{0x40,0x40}) there's
an allocation and a copy
[]byte Vs string in Go
import {
"unsafe"
}
func BytesSliceToString(ba []byte) string {
return *(*string)(unsafe.Pointer(&ba))
}
39
Conclusions
Concurrency Vs Parallelism
Concurrency is about dealing with lots of things at once. Parallelism is
about doing lots of things at once.
VS
Concurrency Vs Parallelism
Python -
• Threads
– Concurrency
• Processes
– Parallelism
• Gevent (/greenlets)
– Concurrency
Go -
• Goroutines
– Parallelizeable but mainly concurrent.
Python Vs Go
Python -
• Threads
– Not so good for for CPU-bound tasks
• Processes
– less effecient in memory
• Gevent (/greenlets)
– Not so good for for CPU-bound tasks (but less than threads)
• Many effecient libreries
– Load C libraries (Not a lot of room for further optimizations)
• Development is easier and faster
Go -
• Goroutines
– Faster startup time than threads
– Memory is used only when needed - stacks start small but grow and shrink as required
– Cheap and lightweight
– You can run many of them (more than threads)
• Easy to learn!
Example of Relevant Use Cases
• A tool for importing user-level data
– Sends millions of events
• A micro-service for handling incoming events
– Needs to handle thousands of events per second
• Parsing and processing big CSV files
– processes tens of millions of rows
• Writing to a DB in a high throughput
Summary
• We introduced our background to the problem
• We demonstrated our benchmarks
• We talked (a bit) about Python & Go optimizations
• We explained and concluded our way of thinking about it
Python VS GO

More Related Content

What's hot (20)

PDF
On heap cache vs off-heap cache
rgrebski
 
PDF
Graphite & Metrictank - Meetup Tel Aviv Yafo
Dieter Plaetinck
 
PDF
Apache Storm Tutorial
Farzad Nozarian
 
PPTX
Jonathan Coveney: Why Pig?
mortardata
 
PDF
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
Satoshi Nagayasu
 
PDF
Resource planning on the (Amazon) cloud
Enis Afgan
 
PDF
Performance
Christophe Marchal
 
PPTX
UKOUG, Lies, Damn Lies and I/O Statistics
Kyle Hailey
 
PDF
Мониторинг. Опять, rootconf 2016
Vsevolod Polyakov
 
PPTX
Metrics: where and how
Vsevolod Polyakov
 
PDF
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Codemotion Tel Aviv
 
PDF
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
confluent
 
PDF
Three Ruby usages
Kouhei Sutou
 
PDF
Accelerating Big Data beyond the JVM - Fosdem 2018
Holden Karau
 
PDF
Node Interactive Debugging Node.js In Production
Yunong Xiao
 
PDF
JRuby: Pushing the Java Platform Further
Charles Nutter
 
PDF
Luigi Presentation at OSCON 2013
Erik Bernhardsson
 
PDF
BWB Meetup: Storm - distributed realtime computation system
Andrii Gakhov
 
PPTX
Am I reading GC logs Correctly?
Tier1 App
 
PDF
Buzzwords Numba Presentation
kammeyer
 
On heap cache vs off-heap cache
rgrebski
 
Graphite & Metrictank - Meetup Tel Aviv Yafo
Dieter Plaetinck
 
Apache Storm Tutorial
Farzad Nozarian
 
Jonathan Coveney: Why Pig?
mortardata
 
PostgreSQL 9.4, 9.5 and Beyond @ COSCUP 2015 Taipei
Satoshi Nagayasu
 
Resource planning on the (Amazon) cloud
Enis Afgan
 
Performance
Christophe Marchal
 
UKOUG, Lies, Damn Lies and I/O Statistics
Kyle Hailey
 
Мониторинг. Опять, rootconf 2016
Vsevolod Polyakov
 
Metrics: where and how
Vsevolod Polyakov
 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Codemotion Tel Aviv
 
Kafka Summit SF 2017 - One Day, One Data Hub, 100 Billion Messages: Kafka at ...
confluent
 
Three Ruby usages
Kouhei Sutou
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Holden Karau
 
Node Interactive Debugging Node.js In Production
Yunong Xiao
 
JRuby: Pushing the Java Platform Further
Charles Nutter
 
Luigi Presentation at OSCON 2013
Erik Bernhardsson
 
BWB Meetup: Storm - distributed realtime computation system
Andrii Gakhov
 
Am I reading GC logs Correctly?
Tier1 App
 
Buzzwords Numba Presentation
kammeyer
 

Similar to Python VS GO (20)

PPTX
Go from a PHP Perspective
Barry Jones
 
PPTX
Gopher in performance_tales_ms_go_cracow
MateuszSzczyrzyca
 
PDF
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI
 
PDF
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
PDF
Transactional writes to cloud storage with Eric Liang
Databricks
 
PDF
[AWS Builders] Effective AWS Glue
Amazon Web Services Korea
 
PPTX
carrow - Go bindings to Apache Arrow via C++-API
Yoni Davidson
 
PPTX
How to Write the Fastest JSON Parser/Writer in the World
Milo Yip
 
PDF
Big Data Beyond the JVM - Strata San Jose 2018
Holden Karau
 
PDF
Performance and Abstractions
Metosin Oy
 
PDF
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
PDF
Lessons learned while building Omroep.nl
bartzon
 
PPTX
Gpgpu intro
Dominik Seifert
 
PDF
PyData Boston 2013
Travis Oliphant
 
PPTX
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 
PDF
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
 
PDF
Go. why it goes v2
Sergey Pichkurov
 
PDF
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
Uwe Korn
 
PDF
Lessons learned while building Omroep.nl
tieleman
 
PDF
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
Pôle Systematic Paris-Region
 
Go from a PHP Perspective
Barry Jones
 
Gopher in performance_tales_ms_go_cracow
MateuszSzczyrzyca
 
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI
 
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Transactional writes to cloud storage with Eric Liang
Databricks
 
[AWS Builders] Effective AWS Glue
Amazon Web Services Korea
 
carrow - Go bindings to Apache Arrow via C++-API
Yoni Davidson
 
How to Write the Fastest JSON Parser/Writer in the World
Milo Yip
 
Big Data Beyond the JVM - Strata San Jose 2018
Holden Karau
 
Performance and Abstractions
Metosin Oy
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
Lessons learned while building Omroep.nl
bartzon
 
Gpgpu intro
Dominik Seifert
 
PyData Boston 2013
Travis Oliphant
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Flink Forward
 
Go. why it goes v2
Sergey Pichkurov
 
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
Uwe Korn
 
Lessons learned while building Omroep.nl
tieleman
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
Pôle Systematic Paris-Region
 
Ad

Recently uploaded (20)

PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Ad

Python VS GO

  • 1. INTEGRATION DAY 2018 - Python Vs Go
  • 2. Agenda • Background • Go Intro • Our experiments • Python optimizations • Go optimizations • Conclusions
  • 4. Our Stack • Our stack is heavily relying on Python - – Celery as our task distribution infra – Django as our web framework and ORM layer – All of our data pipeline tasks are written in Python – We use a lot of great python libs to parse 3rd parties' data – We all love python :)
  • 5. Our Analytics Stack S3 File Preload Processing Data Source Integration Periodic Druid Loader
  • 6. • We need to ingest (almost) user-level data into our aggregated analytics data pipeline The problem S3 File Preload Processing Data Source Integration Periodic Druid Loader User-Level Stack Daily Export
  • 7. The problem Building a new data integration for our user-level stack - • Download the last S3 files • Read each one of the CSV files and parse the data • Group by set of dimensions and aggregate all data in memory • Dump the aggregated data to S3
  • 8. • Should we just optimize our python code or can we just rewrite it in Go and gain performance improvements for "free"? • We have multi-core beasts working in AWS - should we try and parallelize better? if so, how does performance looks then with Go v.s. Python? • Let's say that Go will be faster, huge downside is.. most people don't know Go (yet) in the company - should we consider it? The Questions?
  • 10. Golang • 8 years of Go • Open source project • Created at Google • Compiled language • Staticlly typed • Garbage collection • GoLand • Invented by Ken Thompson & Rob Pike
  • 11. Golang • Concurrency is easy – Goruotines • Come with built-in primitives to communicate between themselves (channels) • Have faster startup time than threads and allow you to avoid mutexes when sharing data structures • Goroutines are multiplexed into small number of OS threads (they do not have 1:1 mapping with OS threads)
  • 12. Golang • Goroutine example - f("hello", "world") // f runs; we wait go f("hello", "world") // f starts running g() // does not wait for f to return
  • 13. Golang • Channels example - timerChan := make(chan time.Time) go func() { time.Sleep(deltaT) timerChan <- time.Now() // send time on timerChan }() // Do something else; when ready, receive. // Receive will block until timerChan delivers. completedAt := <-timerChan
  • 14. Golang • Go intentionally leaves out many features of modern OOP languages - – No Classes (and obviously doesn't support inheritance) – No constructors – No annotations – No generics – No exceptions
  • 16. Go VS Python - Our Experiments • A light-weight version of our user-level integration code. • The data set - – 2 months of data for 1 customer (60 files) – ~1GB compressed / ~18GB uncompressed – Output file should be 971MB with 5,836,455 lines • Code Structrue - – Read each one of the CSV files and parse it – Group by set of dimensions and aggregate all data in memory – Dump aggregated results to disk
  • 17. Go VS Python - Our Experiments Run Time Memory Python Naive Implementation 12:57 minutes 6.96 GiB Go Naive Implementation 10:03 minutes 8.15 GiB Python with Pypy 08:43 minutes 10.0 GiB Go with Goroutines 09:01 minutes 7.45 GiB Go with Goroutines - minimum allocations 03:33 minutes 3.90 GiB Python3.6 with multiprocessing pool 11:01 minutes 7.02 GiB Python with Pandas 09:23 minutes 8.67 GiB
  • 18. Go VS Python - Our Experiments • Conclusions from our benchmark - – Naive implementations • Go was ~25% faster • Python's memory was 15% better – Pypy behaves quite impressavily • Not fun in production – Go with minimum allocations performs the best
  • 20. The benchmark game - • https://benchmarksgame.alioth.debian.org/u64q/go.html Go VS Python - Other Benchmarks
  • 23. Data Structures in Python (2.7) - Comparison object creation time set item get item empty size len=30 tuple 20.1 ns - 45.9 ns 56 296 namedtuple 1.09 µs - 121 ns 56 296 list 267 ns 49.4 ns 37.4 ns 72 352 dict 1.3 µs 54.7 ns 46.3 ns 280 3352 dict (Py3.6) 812 ns 42.7 ns 36.5 ns 240 1184 class 2.87 µs 72.1 ns 56.4 ns 344 4643 class with __slots__ 1.92 µs 63.1 ns 55.3 ns 16 325
  • 24. Comparison - Results - In Python 2.7 dicts are very inefficient in memory - In python3 they take ⅓ memory - Classes are less efficient - lists, tuples and dicts have similar access times - tuples are the most efficient - namedtuple is better than class in creation time and memory - access times are less efficient than list or dict - Classes with __slots__ are effecient in memory, but they have slow creation time - Try to optimize by re-using it
  • 25. Garbage Collection - In python objects are deallocated when there is no reference to them - Object references can be stored in many places in the code print sys.getrefcount(obj) import objgraph objgraph.show_refs(obj, filename='pasten.png')
  • 26. Garbage Collection - To avoid too many references you can use the weekref module - import weakref class Dict(dict): pass my_fat_dict = Dict(tons_of_data='pasten') print sys.getrefcount(my_fat_dict) //1 dict_ref = weakref.ref(my_fat_dict) print sys.getrefcount(my_fat_dict) //1
  • 27. Garbage Collection • The GC is mainly responsible for solving cyclic references - • The GC doesn't run in real-time and runs periodically, you can also run it manually to release memory at certain point - – gc.collect() gc.disable() lst = [] lst.append(lst) del lst
  • 28. Plan your Cache! This cache implementation is useless! • This code will have lots of cache misses
  • 30. Fastcache Vs Cachetools • We process a lot of rows! – We have a lot of DB queries per row! – We need to have a good caching mechanism. • fastcache - https://github.com/pbrady/fastcache – x10 times faster while using the untyped version – x24 times faster while using the typed one – Implemented in c using Python headers (simple compiling without dependencies in ext. libs) – Improved our run-time by 20%!
  • 32. Plan Your Concurrency const numCPU = runtime.NumCPU() // number of CPU cores func DoAll(u Vector) { c := make(chan int, numCPU) for i := 0; i < numCPU; i++ { go v.DoSome(i*len(v)/numCPU, (i+1)*len(v)/numCPU, c) } // Drain the channel. for i := 0; i < numCPU; i++ { <-c // wait for one task to complete } // All done. }
  • 33. Plan Your Concurrency func Query(conns []Conn, query string) Result { ch := make(chan Result, len(conns)) for _, conn := range conns { go func(c Conn) { ch <- c.DoQuery(query): }(conn) } return <-ch }
  • 34. Plan Your Concurrency • There are also some patterns out there - – Future – Generator – Fan-in, Fan-out – Many more – ...
  • 35. Plan Your Concurrency func futureData(url string) <-chan data { c := make(chan data, 1) go func() { resp, err := http.Get(url) c <- resp }() return c } func main() { future := futureData("http://test.future.com") // do many other things resp := <-future // do something with resp }
  • 36. []byte Vs string in Go • A []byte is essentially just this - • A string is essentially just this - type slice struct{ data uintptr len int cap int } type slice struct{ data uintptr len int }
  • 37. []byte Vs string in Go • So, if you have a []byte and you need a string then - – Go compiler would just set data pointer and length from []byte in the string • No! • Each time you do []byte("pasten") or string([]byte{0x40,0x40}) there's an allocation and a copy
  • 38. []byte Vs string in Go import { "unsafe" } func BytesSliceToString(ba []byte) string { return *(*string)(unsafe.Pointer(&ba)) }
  • 40. Concurrency Vs Parallelism Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once. VS
  • 41. Concurrency Vs Parallelism Python - • Threads – Concurrency • Processes – Parallelism • Gevent (/greenlets) – Concurrency Go - • Goroutines – Parallelizeable but mainly concurrent.
  • 42. Python Vs Go Python - • Threads – Not so good for for CPU-bound tasks • Processes – less effecient in memory • Gevent (/greenlets) – Not so good for for CPU-bound tasks (but less than threads) • Many effecient libreries – Load C libraries (Not a lot of room for further optimizations) • Development is easier and faster Go - • Goroutines – Faster startup time than threads – Memory is used only when needed - stacks start small but grow and shrink as required – Cheap and lightweight – You can run many of them (more than threads) • Easy to learn!
  • 43. Example of Relevant Use Cases • A tool for importing user-level data – Sends millions of events • A micro-service for handling incoming events – Needs to handle thousands of events per second • Parsing and processing big CSV files – processes tens of millions of rows • Writing to a DB in a high throughput
  • 44. Summary • We introduced our background to the problem • We demonstrated our benchmarks • We talked (a bit) about Python & Go optimizations • We explained and concluded our way of thinking about it