“Software is eating the world”
128k LoC
4-5M LoC
9M LoC
18M LoC
45M LoC
150M LoC
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
Machine Learning on Source Code
Francesc Campoy
VP of Product & DevRel
source{d}
Machine Learning for Large Scale Code Analysis
@francesc | #MLonCode
Francesc Campoy
Agenda
● Machine Learning on Source Code
● Research
● Use Cases
● The Future
Machine Learning on Source Code
Machine Learning on Source Code
Field of Machine Learning where the input data is source code.
MLonCode
Machine Learning on Source Code
Requires:
● Lots of data
● Really, lots and lots of data
● Fancy ML Algorithms
● A little bit of luck
Related Fields:
● Data Mining
● Natural Language Processing
● Graph Based Machine Learning
Challenge #1
Data Retrieval
The datasets of ML on Code
● GH Archive: https://www.gharchive.org
● Public Git Archive https://pga.sourced.tech
Announcement: blog.sourced.tech/post/announcing-pga
Public Git Archive
Rooted repositories
blog.sourced.tech/post/pga_history/
Challenge #2
Data Analysis
'112', '97', '99', '107', '97', '103',
'101', '32', '109', '97', '105', '110',
'10', '10', '105', '109', '112', '111',
'114', '116', '32', '40', '10', '9',
'34', '102', '109', '116', '34', '10',
'41', '10', '10', '102', '117', '110',
'99', '32', '109', '97', '105', '110',
'40', '41', '32', '123', '10', '9',
'102', '109', '116', '46', '80', '114',
'105', '110', '116', '108', '110', '40',
'34', '72', '101', '108', '108', '111',
'44', '32', '112', '108', '97', '121',
'103', '114', '111', '117', '110', '100',
'34', '41', '10', '125', '10'
package main
import “fmt”
func main() {
fmt.Println(“Hello, Copenhagen”)
}
What is Source Code
package package
IDENT main
;
import import
STRING "fmt"
;
func func
IDENT main
(
)
What is Source Code
{
IDENT fmt
.
IDENT Println
(
STRING "Hello, Denver"
)
;
}
;
package main
import “fmt”
func main() {
fmt.Println(“Hello, Copenhagen”)
}
What is Source Code
package main
import “fmt”
func main() {
fmt.Println(“Hello, Copenhagen”)
}
What is Source Code
package main
import “fmt”
func main() {
fmt.Println(“Hello, Copenhagen”)
}
What is Source Code
● A sequence of bytes
● A sequence of tokens
● An abstract syntax tree
● A graph (e.g. Control Flow Graph)
Tasks
● Language Classification
● File Parsing
● Token Extraction
● History Analysis
● Reference Resolution
Tools
● enry
● babelfish
● libuast & XPath selectors
● go-git
● kythe.io
Analyzing Code
source{d} engine
github.com/src-d/engine
babelfish
gitbase
jupiter
Demo time!
Challenge #3
Learning from Source Code
Neural Networks
Basically fancy linear regression machines
Given an input of a constant length,
they predict an output of constant length.
Example:
MNIST:
Input: images with 28x28 px
Output: a digit from zero to 9
MNIST
~0
~0
~0
~0
~0
~0
~0
~0
~1
~0
MLonCode: Predict the next token
for
i
:=
0
;
i
<
10
;
i
++
Recurrent Neural Networks
Can process sequences of variable length.
Uses its own output as a new input.
Example:
Natural Language Translation:
Input: “bonjour, les gauffres”
Output: “hi, waffles”
MLonCode: Code Generation
charRNN: Given n characters, predict the next one
Trained over the Go standard library
Achieved 61% accuracy on predictions.
Before training
r t,
kp0t@pp kpktp 0p000 xS%%%?ttk?^@p0rk^@%ppp@ta#p^@ #pp}}%p^@?P%^@@k#%@P}}ta S?@}^@t%@% %%aNt i
^@SSt@@ pyikkp?%y ?t k L P0L t% ^@i%yy ^@p i? %L%LL tyLP?a ?L@Ly?tkk^@ @^@ykk^@i#P^@iL@??@%1tt%^@tPTta L
^@LL%% %i1::yyy^@^@t tP @?@a#Patt 1^@@ k^@k ? yt%L1^@tP%k1?k? % ^@i ^@ta1?1taktt1P?a^@^@Pkt?#^@t^@##1?##
#^@t11#:^@%??t%1^@a 1?a at1P ^@^@Pt #%^@^@ ^@aaak^@#a#?P1Pa^@tt%?^@kt?#akP ?#^@i%%aa ^@1%t tt?a?%
t^@k^@^@k^@ a : ^@1 P# % ^@^@#t% :% kkP ^@#?P: t^@a
?%##?kkPaP^@ #a k?t?? ai?i%PPk taP% P^@ k^@iiai#?^@# #t ?# P?P^@ i^@ttPt #
1%11 ti a^@k P^@k ^@kt %^@%y?#a a#% @? kt ^@t%k? ^@PtttkL tkLa1 ^@iaay?p1% Pta tt ik?ty
k^@kpt%^@tktpkryyp^@?pP# %kt?ki? i @t^@k^@%#P} ?at}akP##Pa11%^@i% ^@?ia ia%##%tki %
}i%%%}} a ay^@%yt }%t ^@tU%a% t}yi^@ ^@ @t yt%? aP @% ^@??^@%? ^@??k#%
kk#%t?a: P}^@t :#^@#1t^@#: w^@P#%w:Pt t # t%aa%i@ak@@^@ka@^@a # y}^@# ^@? % tP i?
?tk ktPPt a tpprrpt? a^@ pP pt %p ? k? ^@^@ kP^@%%?tk a Pt^@#
tP? P kkP1L1tP a%? t1P%PPti^@?%ytk %#%%t?@?^@ty^@iyk%1#^@@^@1#t a t@P^@^@ P@^@1P^@%%#@P:^@%^@ t
1:#P#@LtL#@L L1 %%dt??^@L ^@iBt yTk%p ^@i
After one epoch (dataset seen once)
if testingValuesIntering() {
t.SetCaterCleen(time.SewsallSetrive(true)
if weq := nil {
t.Errorf("eshould: wont %v", touts anverals, prc.Strnared, error
}
t, err := ntr.Soare(cueper(err, err)
if err != nil {
t.Errorf("preveth dime il resetests:%d; want %#',", tr.test.Into
}
if err != nil {
return
}
if err == nel {
t.Errorf("LoconserrSathe foot %q::%q: %s;%want %d", d, err)
},
defarenContateFule(temt.Canses)
}
if err != nil {
return err
}
// Treters and restives of the sesconse stmpeletatareservet
// This no to the result digares wheckader. Constate bytes alleal
After two epochs
if !ok {
t.Errorf("%d: %v not %v", i, err)
}
if !ot.Close()
if enr != nil {
t.Fatal(err)
}
if !ers != nil {
t.Fatal(err)
}
if err != nil {
t.Fatal(err)
}
if err != nil {
t.Errorf("error %q: %s not %v", i, err)
}
return nil
}
if got := t.struct(); !ok {
t.Fatalf("Got %q: %q, %v, want %q", test, true
}
if !strings.Connig(t) {
t.Fatalf("Got %q: %q", want %q", t, err)
}
if !ot {
t.Errorf("%s < %v", x, y)
}
if !ok {
t.Errorf("%d <= %d", err)
}
if !stricgs(); !ot {
t.Errorf("!(%d <= %v", x, e)
}
}
if !ot != nil {
return ""
}
After many epochs
Learning to Represent Programs with Graphs
from, err := os.Open("a.txt")
if err != nil {
log.Fatal(err)
}
defer from.Close()
to, err := os.Open("b.txt")
if err != nil {
log.Fatal(err)
}
defer ???.Close()
io.Copy(to, from)
Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi
https://arxiv.org/abs/1711.00740
The VARMISUSE Task:
Given a program and a gap in it,
predict what variable is missing.
code2vec: Learning Distributed Representations of Code
Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav
https://arxiv.org/abs/1803.09473 | https://code2vec.org/
Much more research
github.com/src-d/awesome-machine-learning-on-source-code
Challenge #4
What can we build?
Predictable vs Predicted
~0
~0
~0
~0
~0
~0
~0
~0
~1
~0
A
G
o
PR
An attention model for code reviews.
Machine Learning on Code - SF meetup
Machine Learning on Code - SF meetup
Can you see the mistake?
Prediction vs Expectation
for i := 0; i < 10; i-- {
if i %2 == 0 {
fmt.Println("where's the mistake?")
}
}
Can you see the mistake?
Prediction vs Expectation
for i := 0; i < 10; i-- {
if i %2 == 0 {
fmt.Println("where's the mistake?")
}
}
Can you see the mistake?
VARMISUSE
from, err := os.Open("a.txt")
if err != nil {
log.Fatal(err)
}
defer from.Close()
to, err := os.Open("b.txt")
if err != nil {
log.Fatal(err)
}
defer from.Close()
io.Copy(to, from)
Can you see the mistake?
VARMISUSE
from, err := os.Open("a.txt")
if err != nil {
log.Fatal(err)
}
defer from.Close()
to, err := os.Open("b.txt")
if err != nil {
log.Fatal(err)
}
defer from.Close() ← s/from/to/
io.Copy(to, from)
Is this a good name?
func XXX(list []string, text string) bool {
for _, s := range list {
if s == text {
return true
}
}
return false
}
Suggestions:
● Contains
● Has
func XXX(list []string, text string) int {
for i, s := range list {
if s == text {
return i
}
}
return -1
}
Suggestions:
● Find
● Index
code2vec: Learning Distributed Representations of Code
Splitting millions of identifiers with Deep Learning
https://blog.sourced.tech/post/idsplit/
isthisCorrect? → is this correct? → isThisCorrect?
Demo time!
learning Go
code2vec.org
neural splitter
source: WOCinTech
Assisted code review! src-d/lookout
putting everything together
lookout
Coming up soon:
● Automated Style Guide Enforcing
● Bug Prediction
Coming … later:
● Automated Code Review
● Code Generation: from unit tests, specification, natural language
description.
● Natural Analysis: code description and conversational analysis.
● Education
And so much more
Will developers be replaced?
Developers will be empowered.
Want to know more?
● sourced.tech (pssh, we’re hiring)
● bit.ly/awesome-mloncode
● francesc@sourced.tech
● come say hi, I have stickers
Thanks
francesc

More Related Content

PDF
Boost.Python - domesticating the snake
PDF
Reversing the dropbox client on windows
PDF
2018 cosup-delete unused python code safely - english
PDF
Basic c++ 11/14 for python programmers
PDF
PyPy's approach to construct domain-specific language runtime
PDF
Threads and Callbacks for Embedded Python
PPTX
Boost.Python: C++ and Python Integration
PDF
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions
Boost.Python - domesticating the snake
Reversing the dropbox client on windows
2018 cosup-delete unused python code safely - english
Basic c++ 11/14 for python programmers
PyPy's approach to construct domain-specific language runtime
Threads and Callbacks for Embedded Python
Boost.Python: C++ and Python Integration
WAD : A Module for Converting Fatal Extension Errors into Python Exceptions

What's hot (20)

PDF
Go Lang Tutorial
PDF
Python to scala
PDF
TDD in C - Recently Used List Kata
PDF
PyWPS Development restart
PPTX
Async await in C++
PDF
Go. why it goes v2
PDF
Command line arguments that make you smile
PDF
RAII and ScopeGuard
PPTX
Golang iran - tutorial go programming language - Preliminary
PDF
A peek on numerical programming in perl and python e christopher dyken 2005
PDF
Golang dot-testing-lite
PDF
Integrating R with C++: Rcpp, RInside and RProtoBuf
PDF
Антон Бикинеев, Reflection in C++Next
PDF
TensorFlow XLA RPC
PDF
Fuzzing: The New Unit Testing
PDF
Tensor comprehensions
PPTX
Summary of C++17 features
PPT
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtim...
PDF
Crange: Clang based tool to index and cross-reference C/C++ source code
PPTX
Go. Why it goes
Go Lang Tutorial
Python to scala
TDD in C - Recently Used List Kata
PyWPS Development restart
Async await in C++
Go. why it goes v2
Command line arguments that make you smile
RAII and ScopeGuard
Golang iran - tutorial go programming language - Preliminary
A peek on numerical programming in perl and python e christopher dyken 2005
Golang dot-testing-lite
Integrating R with C++: Rcpp, RInside and RProtoBuf
Антон Бикинеев, Reflection in C++Next
TensorFlow XLA RPC
Fuzzing: The New Unit Testing
Tensor comprehensions
Summary of C++17 features
[PyCon 2014 APAC] How to integrate python into a scala stack to build realtim...
Crange: Clang based tool to index and cross-reference C/C++ source code
Go. Why it goes
Ad

Similar to Machine Learning on Code - SF meetup (20)

PDF
Machine learning on Go Code
PDF
Introduction to source{d} Engine and source{d} Lookout
PPTX
Neel Sundaresan - Teaching a machine to code
PPTX
Semantic-Aware Code Model: Elevating the Future of Software Development
PPT
Introduction to the intermediate Python - v1.1
PDF
PDF
Codex AI.pdf
PPTX
Learning from other's mistakes: Data-driven code analysis
PDF
Securing Neural Networks
PDF
Pytorch for tf_developers
PPTX
Jay Yagnik at AI Frontiers : A History Lesson on AI
PPTX
Keynote at IWLS 2017
PDF
A Brief Overview of (Static) Program Query Languages
PPTX
primitiv: Neural Network Toolkit
PDF
Program Synthesis, DreamCoder, and ARC
PDF
SANN: Programming Code Representation Using Attention Neural Network with Opt...
PDF
On the code of data science
PDF
A Few of My Favorite (Python) Things
PDF
A tour of Python
PDF
An overview of Python 2.7
Machine learning on Go Code
Introduction to source{d} Engine and source{d} Lookout
Neel Sundaresan - Teaching a machine to code
Semantic-Aware Code Model: Elevating the Future of Software Development
Introduction to the intermediate Python - v1.1
Codex AI.pdf
Learning from other's mistakes: Data-driven code analysis
Securing Neural Networks
Pytorch for tf_developers
Jay Yagnik at AI Frontiers : A History Lesson on AI
Keynote at IWLS 2017
A Brief Overview of (Static) Program Query Languages
primitiv: Neural Network Toolkit
Program Synthesis, DreamCoder, and ARC
SANN: Programming Code Representation Using Attention Neural Network with Opt...
On the code of data science
A Few of My Favorite (Python) Things
A tour of Python
An overview of Python 2.7
Ad

More from source{d} (13)

PDF
Overton, Apple Flavored ML
PDF
Unlocking Engineering Observability with advanced IT analytics
PDF
What's new in the latest source{d} releases!
PDF
Code as Data workshop: Using source{d} Engine to extract insights from git re...
PPTX
Gitbase, SQL interface to Git repositories
PPTX
Deduplication on large amounts of code
PDF
Assisted code review with source{d} lookout
PDF
Inextricably linked reproducibility and productivity in data science and ai ...
PDF
source{d} Engine - your code as data
PDF
Introduction to the source{d} Stack
PDF
source{d} Engine: Exploring git repos with SQL
PPTX
Improving go-git performance
PDF
Machine learning on source code
Overton, Apple Flavored ML
Unlocking Engineering Observability with advanced IT analytics
What's new in the latest source{d} releases!
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Gitbase, SQL interface to Git repositories
Deduplication on large amounts of code
Assisted code review with source{d} lookout
Inextricably linked reproducibility and productivity in data science and ai ...
source{d} Engine - your code as data
Introduction to the source{d} Stack
source{d} Engine: Exploring git repos with SQL
Improving go-git performance
Machine learning on source code

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
A review of recent deep learning applications in wood surface defect identifi...
PPTX
Modernising the Digital Integration Hub
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Tartificialntelligence_presentation.pptx
PPTX
The various Industrial Revolutions .pptx
PDF
August Patch Tuesday
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
STKI Israel Market Study 2025 version august
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Five Habits of High-Impact Board Members
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Hybrid model detection and classification of lung cancer
Hindi spoken digit analysis for native and non-native speakers
A review of recent deep learning applications in wood surface defect identifi...
Modernising the Digital Integration Hub
1 - Historical Antecedents, Social Consideration.pdf
Getting Started with Data Integration: FME Form 101
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
sustainability-14-14877-v2.pddhzftheheeeee
Tartificialntelligence_presentation.pptx
The various Industrial Revolutions .pptx
August Patch Tuesday
NewMind AI Weekly Chronicles – August ’25 Week III
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
STKI Israel Market Study 2025 version august
Univ-Connecticut-ChatGPT-Presentaion.pdf
Chapter 5: Probability Theory and Statistics
Group 1 Presentation -Planning and Decision Making .pptx
Five Habits of High-Impact Board Members
CloudStack 4.21: First Look Webinar slides
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf

Machine Learning on Code - SF meetup

  • 1. “Software is eating the world”
  • 13. Machine Learning on Source Code Francesc Campoy
  • 14. VP of Product & DevRel source{d} Machine Learning for Large Scale Code Analysis @francesc | #MLonCode Francesc Campoy
  • 15. Agenda ● Machine Learning on Source Code ● Research ● Use Cases ● The Future
  • 16. Machine Learning on Source Code
  • 17. Machine Learning on Source Code Field of Machine Learning where the input data is source code. MLonCode
  • 18. Machine Learning on Source Code Requires: ● Lots of data ● Really, lots and lots of data ● Fancy ML Algorithms ● A little bit of luck Related Fields: ● Data Mining ● Natural Language Processing ● Graph Based Machine Learning
  • 20. The datasets of ML on Code ● GH Archive: https://www.gharchive.org ● Public Git Archive https://pga.sourced.tech
  • 25. '112', '97', '99', '107', '97', '103', '101', '32', '109', '97', '105', '110', '10', '10', '105', '109', '112', '111', '114', '116', '32', '40', '10', '9', '34', '102', '109', '116', '34', '10', '41', '10', '10', '102', '117', '110', '99', '32', '109', '97', '105', '110', '40', '41', '32', '123', '10', '9', '102', '109', '116', '46', '80', '114', '105', '110', '116', '108', '110', '40', '34', '72', '101', '108', '108', '111', '44', '32', '112', '108', '97', '121', '103', '114', '111', '117', '110', '100', '34', '41', '10', '125', '10' package main import “fmt” func main() { fmt.Println(“Hello, Copenhagen”) } What is Source Code
  • 26. package package IDENT main ; import import STRING "fmt" ; func func IDENT main ( ) What is Source Code { IDENT fmt . IDENT Println ( STRING "Hello, Denver" ) ; } ; package main import “fmt” func main() { fmt.Println(“Hello, Copenhagen”) }
  • 27. What is Source Code package main import “fmt” func main() { fmt.Println(“Hello, Copenhagen”) }
  • 28. What is Source Code package main import “fmt” func main() { fmt.Println(“Hello, Copenhagen”) }
  • 29. What is Source Code ● A sequence of bytes ● A sequence of tokens ● An abstract syntax tree ● A graph (e.g. Control Flow Graph)
  • 30. Tasks ● Language Classification ● File Parsing ● Token Extraction ● History Analysis ● Reference Resolution Tools ● enry ● babelfish ● libuast & XPath selectors ● go-git ● kythe.io Analyzing Code
  • 34. Neural Networks Basically fancy linear regression machines Given an input of a constant length, they predict an output of constant length. Example: MNIST: Input: images with 28x28 px Output: a digit from zero to 9
  • 36. MLonCode: Predict the next token for i := 0 ; i < 10 ; i ++
  • 37. Recurrent Neural Networks Can process sequences of variable length. Uses its own output as a new input. Example: Natural Language Translation: Input: “bonjour, les gauffres” Output: “hi, waffles”
  • 38. MLonCode: Code Generation charRNN: Given n characters, predict the next one Trained over the Go standard library Achieved 61% accuracy on predictions.
  • 39. Before training r t, kp0t@pp kpktp 0p000 xS%%%?ttk?^@p0rk^@%ppp@ta#p^@ #pp}}%p^@?P%^@@k#%@P}}ta S?@}^@t%@% %%aNt i ^@SSt@@ pyikkp?%y ?t k L P0L t% ^@i%yy ^@p i? %L%LL tyLP?a ?L@Ly?tkk^@ @^@ykk^@i#P^@iL@??@%1tt%^@tPTta L ^@LL%% %i1::yyy^@^@t tP @?@a#Patt 1^@@ k^@k ? yt%L1^@tP%k1?k? % ^@i ^@ta1?1taktt1P?a^@^@Pkt?#^@t^@##1?## #^@t11#:^@%??t%1^@a 1?a at1P ^@^@Pt #%^@^@ ^@aaak^@#a#?P1Pa^@tt%?^@kt?#akP ?#^@i%%aa ^@1%t tt?a?% t^@k^@^@k^@ a : ^@1 P# % ^@^@#t% :% kkP ^@#?P: t^@a ?%##?kkPaP^@ #a k?t?? ai?i%PPk taP% P^@ k^@iiai#?^@# #t ?# P?P^@ i^@ttPt # 1%11 ti a^@k P^@k ^@kt %^@%y?#a a#% @? kt ^@t%k? ^@PtttkL tkLa1 ^@iaay?p1% Pta tt ik?ty k^@kpt%^@tktpkryyp^@?pP# %kt?ki? i @t^@k^@%#P} ?at}akP##Pa11%^@i% ^@?ia ia%##%tki % }i%%%}} a ay^@%yt }%t ^@tU%a% t}yi^@ ^@ @t yt%? aP @% ^@??^@%? ^@??k#% kk#%t?a: P}^@t :#^@#1t^@#: w^@P#%w:Pt t # t%aa%i@ak@@^@ka@^@a # y}^@# ^@? % tP i? ?tk ktPPt a tpprrpt? a^@ pP pt %p ? k? ^@^@ kP^@%%?tk a Pt^@# tP? P kkP1L1tP a%? t1P%PPti^@?%ytk %#%%t?@?^@ty^@iyk%1#^@@^@1#t a t@P^@^@ P@^@1P^@%%#@P:^@%^@ t 1:#P#@LtL#@L L1 %%dt??^@L ^@iBt yTk%p ^@i
  • 40. After one epoch (dataset seen once) if testingValuesIntering() { t.SetCaterCleen(time.SewsallSetrive(true) if weq := nil { t.Errorf("eshould: wont %v", touts anverals, prc.Strnared, error } t, err := ntr.Soare(cueper(err, err) if err != nil { t.Errorf("preveth dime il resetests:%d; want %#',", tr.test.Into } if err != nil { return } if err == nel { t.Errorf("LoconserrSathe foot %q::%q: %s;%want %d", d, err) }, defarenContateFule(temt.Canses) } if err != nil { return err } // Treters and restives of the sesconse stmpeletatareservet // This no to the result digares wheckader. Constate bytes alleal
  • 41. After two epochs if !ok { t.Errorf("%d: %v not %v", i, err) } if !ot.Close() if enr != nil { t.Fatal(err) } if !ers != nil { t.Fatal(err) } if err != nil { t.Fatal(err) } if err != nil { t.Errorf("error %q: %s not %v", i, err) } return nil }
  • 42. if got := t.struct(); !ok { t.Fatalf("Got %q: %q, %v, want %q", test, true } if !strings.Connig(t) { t.Fatalf("Got %q: %q", want %q", t, err) } if !ot { t.Errorf("%s < %v", x, y) } if !ok { t.Errorf("%d <= %d", err) } if !stricgs(); !ot { t.Errorf("!(%d <= %v", x, e) } } if !ot != nil { return "" } After many epochs
  • 43. Learning to Represent Programs with Graphs from, err := os.Open("a.txt") if err != nil { log.Fatal(err) } defer from.Close() to, err := os.Open("b.txt") if err != nil { log.Fatal(err) } defer ???.Close() io.Copy(to, from) Miltiadis Allamanis, Marc Brockschmidt, Mahmoud Khademi https://arxiv.org/abs/1711.00740 The VARMISUSE Task: Given a program and a gap in it, predict what variable is missing.
  • 44. code2vec: Learning Distributed Representations of Code Uri Alon, Meital Zilberstein, Omer Levy, Eran Yahav https://arxiv.org/abs/1803.09473 | https://code2vec.org/
  • 48. A G o PR An attention model for code reviews.
  • 51. Can you see the mistake? Prediction vs Expectation for i := 0; i < 10; i-- { if i %2 == 0 { fmt.Println("where's the mistake?") } }
  • 52. Can you see the mistake? Prediction vs Expectation for i := 0; i < 10; i-- { if i %2 == 0 { fmt.Println("where's the mistake?") } }
  • 53. Can you see the mistake? VARMISUSE from, err := os.Open("a.txt") if err != nil { log.Fatal(err) } defer from.Close() to, err := os.Open("b.txt") if err != nil { log.Fatal(err) } defer from.Close() io.Copy(to, from)
  • 54. Can you see the mistake? VARMISUSE from, err := os.Open("a.txt") if err != nil { log.Fatal(err) } defer from.Close() to, err := os.Open("b.txt") if err != nil { log.Fatal(err) } defer from.Close() ← s/from/to/ io.Copy(to, from)
  • 55. Is this a good name? func XXX(list []string, text string) bool { for _, s := range list { if s == text { return true } } return false } Suggestions: ● Contains ● Has func XXX(list []string, text string) int { for i, s := range list { if s == text { return i } } return -1 } Suggestions: ● Find ● Index code2vec: Learning Distributed Representations of Code
  • 56. Splitting millions of identifiers with Deep Learning https://blog.sourced.tech/post/idsplit/ isthisCorrect? → is this correct? → isThisCorrect?
  • 58. source: WOCinTech Assisted code review! src-d/lookout
  • 60. Coming up soon: ● Automated Style Guide Enforcing ● Bug Prediction Coming … later: ● Automated Code Review ● Code Generation: from unit tests, specification, natural language description. ● Natural Analysis: code description and conversational analysis. ● Education And so much more
  • 61. Will developers be replaced?
  • 62. Developers will be empowered.
  • 63. Want to know more? ● sourced.tech (pssh, we’re hiring) ● bit.ly/awesome-mloncode ● [email protected] ● come say hi, I have stickers