SlideShare a Scribd company logo
Mining the Technical Skills of Open
Source Developers
João Eduardo Montandon
Advisor: Marco Túlio Valente
“Our civilization runs on software.”
̶ Bjarne Stroustrup
2
Banking Traveling
Transport
Telecoms
Media & Social
Shopping
3
Banking Traveling
Transport
Telecoms
Media & Social
Shopping
4
“Software development is a human
centric activity, which makes
developers the most important asset
of software companies.”
DeMarco, T. and Lister, T. (1999). Peopleware:
Productive Projects and Teams.
5
“Our [Facebook’s] policy is literally to hire as
many talented engineers as we can find.”
̶ Mark Zuckerberg
“Hiring well is the most important thing in
the universe.”
̶ Valve Corporation
6
Where do we find the
right people?
7
Where do we find them? … GitHub!
60M
56M
new repositories*
new users*
* In 2020, only!
8
Where do we find them? … GitHub!
60M
56M
new repositories*
new users*
* In 2020, only!
9
How do we choose the
right people?
10
The Literature
Perspective
Right people: The ones that
better align with the project
The Industry
Perspective
Right people: The ones that
better align with the company
project
general
specific
company 11
The Industry
Perspective
Right people: The ones that
better align with the company
general
specific
company
OUR
FOCUS
12
Our proposal is…
To investigate methods and techniques to effectively
identify technical skills of software developers
13
Our goals
➢ Understand which skills and abilities are more required by IT companies
when looking for new professionals.
➢ Investigate methods and techniques to identify expertise elements in a
deep perspective (3rd party libraries).
➢ Investigate methods and techniques to identify expertise elements in a
broad perspective (technical roles).
14
What Skills do IT
Companies look for in
New Developers?
???
???
15
What kind of skills are the
industry interested in?
16
What we expect
from you
17
Soft Skills
18
Soft Skills
Technical Skills
19
Our study: Technical Skills
20,968
Job posted in 2019
14
pre-defined roles
67K
Technical skills
occurrences
6
High level
technical skills
(282)
20
21
22
23
Our study: Soft Skills
376
Opportunities
randomly selected
3
Researchers manually
annotated sentences
1,530
Sentences annotated
24
25
Takeaway
➢ Developer-based positions do require expertise in 3rd
party components
➢ Programming languages skills are needed across all roles
➢ Soft skills: teamwork and communication do matter!
26
Identifying Experts in Software
Libraries and Frameworks
among GitHub Users
Libs &
frameworks
???
27
Software development
increasingly relies on
3rd party software
components
28
Search interest for
“npm” overtime
29
Investigate data-driven methods to identify developers’
expertise level on 3rd party libraries
30
H = When maintaining a piece of code, developers gain
expertise on frameworks and libraries
31
o RQ.1: How accurate are [supervised] machine learning
classifiers when used to identify library experts?
o RQ.2: Which features best distinguish library experts?
32
Our Ground-Truth
33
Library Description #Dev.
ReactJS Fronted library used to build user
interfaces
8,742
MongoDB Official JavaScript driver for MongoDB 454
socket.io Real-time communication library 608
Our Ground-Truth
Could you please rank your expertise on [target library]
in a scale from 1 (novice) to 5 (expert)?
34
ReactJS MongoDB socket.io
2,185
mails
418
answers
454
mails
68
answers
608
mails
89
answers
19% 15% 15%
35
ReactJS MongoDB socket.io
2,185
mails
418
answers
454
mails
68
answers
608
mails
89
answers
19% 15% 15%
36
Dimension Description Features
Volume
Amount of code written by
developer using the library
Frequency
Amount of time a developer
has used the library
Breadth
Quantify the situations the
library was used
37
RQ.1 RQ.2
Random Forest & SVM
❖ 3 classes: All three libraries
❖ 5 classes: ReactJS
KMeans
❖ 3 classes: ReactJS & MongoDB
❖ 5 classes: socket.io
38
RQ.1 RQ.2
Random Forest & SVM
❖ 3 classes: All three
❖ 5 classes: ReactJS
KMeans
❖ 3 classes: ReactJS & MongoDB
❖ 5 classes: socket.io
In this presentation,
we will focus here
39
RQ.2
40
RQ.2
41
RQ.2
42
LinkedIn Triangulation
2,129 ReactJS developers
263
experts
43
LinkedIn Triangulation
2,129 ReactJS developers
263
experts
160 LinkedIn profiles
44
LinkedIn Triangulation
2,129 ReactJS developers
72%
Explicitly
Mentioned
ReactJS
263
experts
160 LinkedIn profiles
45
Takeaway
➢ Exploratory analysis can identify groups of experts based
on the selected features
➢ We found clusters dominated by experts
( Experts Novices)
46
Mining The Technical
Roles of GitHub Users
Tech roles
Libs &
frameworks
47
The Surgical
Team
“Mills proposes that each segment of a large job be
tackled by a team, but that the team be
organized like a surgical team […]”
48
Technical roles are one of the first information
used by companies when hiring new developers
49
Identify the technical roles played by developers using
information available in OSS platforms.
50
H = The technologies that developers master define
their technical roles
51
o RQ.1: How accurate are ML classifiers on identifying developers’
technical roles?
o RQ.2: What are the most relevant features to distinguish
technical roles?
o RQ.3: Do technical roles influence each other during
classification?
o RQ.4: How effectively can we identify full-stack developers? 52
o RQ.1: How accurate are ML classifiers on identifying developers’
technical roles?
o RQ.2: What are the most relevant features to distinguish
technical roles?
o RQ.3: Do technical roles influence each other during
classification?
o RQ.4: How effectively can we identify full-stack developers?* 53
Our Ground Truth
1. Selected developers from Stack
Overflow with GitHub profiles
2. Filtered the ones self-ascribed in the
following roles: Backend, Frontend,
DevOps, DataScience, Mobile, and
FullStack*.
54
* RQ.4, only
Our Ground Truth
● Selected developers from Stack
Overflow with GitHub profiles
● Filtered the ones self-ascribed in the
following roles: Backend, Frontend,
DevOps, DataScience, and Mobile.
55
Our Feature Set
Programming Language
Short Bio
Projects’ names, topics &
descriptions
3rd party dependencies
56
Our Feature Set
Programming Language
Short Bio
Projects’ names, topics &
descriptions
3rd party dependencies
57
(RQ.1 & RQ.4) How accurate are ML classifiers in identifying technical
roles?
Role Precision Recall F1
FullStack* 0.99 0.71 0.83
Backend 0.87 0.63 0.73
Frontend 0.86 0.89 0.87
Mobile 0.80 0.34 0.47
DevOps 0.75 0.06 0.11
DataScience 0.86 0.62 0.71
Overall 0.88 0.69 0.77
0.89
AUC
58
* RQ.4, only
(RQ.2) What are the
most relevant
features?
59
(RQ.2) What are the
most relevant
features?
60
Takeaway
➢ We show that it is possible to identify major technical roles
from developers
➢ Features associated to PLs are relevant for all roles
61
Conclusions &
Future Work
Tech roles
Libs &
frameworks
62
We extensively analyzed data-driven methods and
techniques to leverage software developers’ profiles
In this thesis…
63
We extensively analyzed data-driven methods and
techniques to leverage software developers’ profiles
In this thesis…
64
We extensively analyzed data-driven methods and
techniques to leverage software developers’ profiles
In this thesis…
65
We extensively analyzed data-driven methods and
techniques to leverage software developers’ profiles
In this thesis…
66
Our Contributions
✓ A map of technical and soft skills more required by the IT industry
✓ Ground truth with expertise information of 575 developers
✓ A method based on low-level data to identify specific expertise
information (libs & frameworks)
✓ A method based on coarse-grained features to detect general expertise
elements (technical roles)
67
Future Work
1. Expand soft skills analysis
2. Study technical expertise in PLs
3. 3rd party libs in other ecosystems
68
1. J. E. Montandon, L. L. Silva, M. T. Valente. Identifying Experts in Software
Libraries and Frameworks among GitHub Users. MSR, 2019.
2. J. E. Montandon, C. Politowski, L. L. Silva, M. T. Valente, F. Petrillo, Y.
Guéhéneuc. What Skills do IT Companies look for in New Developers?
A Study with Stack Overflow Jobs. IST, 2021.
3. J. E. Montandon, M. T. Valente, L. L. Silva. Mining the Technical Roles of
GitHub Users. IST, 2021.
Publications
69
Mining the Technical Skills of Open
Source Developers
João Eduardo Montandon
Advisor: Marco Túlio Valente
“Expertise then refers to the characteristics, skills, and
knowledge that distinguish experts from novices and
less experienced people.”
The Cambridge Handbook of Expertise and Expert Performance, K. A. Ericsson, 2012
The Importance of Expertise
71
Towards a Theory of Software Development Expertise, Baltes and Diehl, 2018
72
Towards a Theory of Software Development Expertise, Baltes and Diehl, 2018
73
Towards a Theory of Software Development Expertise, Baltes and Diehl, 2018
74
215
jobs
372
jobs
135
jobs
Out of 1,839 job offers…
722 (39%)
75
0.99
0.94
0.93
0.88
0.72
0.46
-0.84
1
0.87
0.71
0.61
0.61
0.54
0.41
-0.7
-0.71
ReactJS MongoDB
76
(RQ.1) How accurate are ML classifiers in identifying technical roles?
Role Precision Recall F1
Backend 0.62 0.12 0.18
Frontend 0.77 0.78 0.77
Mobile 0.78 0.38 0.51
DevOps 0.70 0.13 0.20
DataScience 0.86 0.66 0.74
Overall 0.77 0.49 0.59
0.71
Random Forest
AUC
77
(RQ.4) The FullStack Dataset
“Being a Full-Stack developer […]
means that you are able to
work on both sides”
2,284
FullStack dataset
+783
Backend + Frontend
developers
78
(RQ.4) How effectively can we identify full-stack developers?
Role Precision Recall F1
FullStack 0.99 0.71 0.83
Backend 0.87
(+0.25)
0.63
(+0.51)
0.73
(+0.55)
Frontend 0.86
(+0.09)
0.89
(+0.11)
0.87
(+0.10)
Mobile 0.80
(+0.03)
0.34
(-0.04)
0.47
(-0.04)
DevOps 0.75
(+0.05)
0.06
(-0.07)
0.11
(-0.09)
DataScience 0.86
(0.00)
0.62
(-0.04)
0.71
(-0.03)
Overall 0.88
(+0.11)
0.69
(+0.20)
0.77
(+0.18)
0.89(+0.18)
AUC
79
(RQ.4) How effectively can we identify full-stack developers?
Role Precision Recall F1
FullStack 0.99 0.71 0.83
Backend 0.87
(+0.25)
0.63
(+0.51)
0.73
(+0.55)
Frontend 0.86
(+0.09)
0.89
(+0.11)
0.87
(+0.10)
Mobile 0.80
(+0.03)
0.34
(-0.04)
0.47
(-0.04)
DevOps 0.75
(+0.05)
0.06
(-0.07)
0.11
(-0.09)
DataScience 0.86
(0.00)
0.62
(-0.04)
0.71
(-0.03)
Overall 0.88
(+0.11)
0.69
(+0.20)
0.77
(+0.18)
0.89(+0.18)
AUC
80

More Related Content

Similar to Mining the Technical Skills of Open Source Developers (20)

PDF
Supporting Newcomers in Software Development Projects
Sebastiano Panichella
 
PDF
Application Of Software Engineering Field
Michelle Singh
 
PDF
Application Of Software Engineering Field
Jessica Howard
 
PPTX
Big Data: the weakest link
CS, NcState
 
PDF
Analyzing Big Data's Weakest Link (hint: it might be you)
HPCC Systems
 
PDF
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Markus Borg
 
PDF
An illustrated guide to microservices (ploneconf 10 21-2016)
Ambassador Labs
 
DOCX
reserach review 1 what makes great software engineer.docx
VrajeshShah35
 
PDF
Keynote at-icpc-2020
Ralf Laemmel
 
PDF
Developing Effective Software Productively
Gail Murphy
 
PDF
PuppetConf track overview: Culture
Puppet
 
PDF
Software Systems Requirements Engineering
Kristen Wilson
 
PDF
What's new in the latest source{d} releases!
source{d}
 
PDF
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 
PDF
what makes a great software engineer?
mustafa sarac
 
PDF
Applying AI to software engineering problems: Do not forget the human!
University of Córdoba
 
PPTX
The Semantic Knowledge Graph
Trey Grainger
 
PPTX
Introduction of Software Architect(Definition, Mindset, Process).pptx
rony setyawansyah
 
PDF
Navigating the Talent Crunch - Effective Reskilling Strategies for Software E...
Draup3
 
PDF
EMMM: A Unified Meta-Model for Tracking Machine Learning Experiments
SEAA 2022
 
Supporting Newcomers in Software Development Projects
Sebastiano Panichella
 
Application Of Software Engineering Field
Michelle Singh
 
Application Of Software Engineering Field
Jessica Howard
 
Big Data: the weakest link
CS, NcState
 
Analyzing Big Data's Weakest Link (hint: it might be you)
HPCC Systems
 
Agility in Software 2.0 - Notebook Interfaces and MLOps with Buttresses and R...
Markus Borg
 
An illustrated guide to microservices (ploneconf 10 21-2016)
Ambassador Labs
 
reserach review 1 what makes great software engineer.docx
VrajeshShah35
 
Keynote at-icpc-2020
Ralf Laemmel
 
Developing Effective Software Productively
Gail Murphy
 
PuppetConf track overview: Culture
Puppet
 
Software Systems Requirements Engineering
Kristen Wilson
 
What's new in the latest source{d} releases!
source{d}
 
Photoshop Tutorial for Beginners (2024 Edition)
alowpalsadig
 
what makes a great software engineer?
mustafa sarac
 
Applying AI to software engineering problems: Do not forget the human!
University of Córdoba
 
The Semantic Knowledge Graph
Trey Grainger
 
Introduction of Software Architect(Definition, Mindset, Process).pptx
rony setyawansyah
 
Navigating the Talent Crunch - Effective Reskilling Strategies for Software E...
Draup3
 
EMMM: A Unified Meta-Model for Tracking Machine Learning Experiments
SEAA 2022
 

Recently uploaded (20)

PPTX
formations-of-rock-layers-grade 11_.pptx
GraceSarte
 
PDF
Histry of resresches in Genetics notes
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Pharmaceutical Microbiology (sem-3) unit 1.pptx
payalpilaji
 
PPTX
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
PDF
Polarized Multiwavelength Emission from Pulsar Wind—Accretion Disk Interactio...
Sérgio Sacani
 
PDF
Discovery and dynamics of a Sedna-like object with a perihelion of 66 au
Sérgio Sacani
 
DOCX
Introduction to Weather & Ai Integration (UI)
kutatomoshi
 
PPTX
Buoyancy, Archimedes' principle, and Pascal's.pptx
kmistwentyfour
 
PDF
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Different Disease and pest of honey bee .pptx
MrRABIRANJAN
 
DOCX
Table - Technique selection matrix in CleaningValidation
Markus Janssen
 
PDF
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 
PDF
Continuous Model-Based Engineering of Software-Intensive Systems: Approaches,...
Hugo Bruneliere
 
PPT
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
PDF
The Kardashev Scale From Planetary to Cosmic Civilizations
Saikat Basu
 
PPTX
MICROBIOLOGY PART-1 INTRODUCTION .pptx
Mohit Kumar
 
PPTX
Anatomy and physiology of digestive system.pptx
Ashwini I Chuncha
 
PDF
Perchlorate brine formation from frost at the Viking 2 landing site
Sérgio Sacani
 
DOCX
Precise Weather Research (UI) & Applied Technology / Science Weather Tracking
kutatomoshi
 
formations-of-rock-layers-grade 11_.pptx
GraceSarte
 
Histry of resresches in Genetics notes
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Pharmaceutical Microbiology (sem-3) unit 1.pptx
payalpilaji
 
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
Polarized Multiwavelength Emission from Pulsar Wind—Accretion Disk Interactio...
Sérgio Sacani
 
Discovery and dynamics of a Sedna-like object with a perihelion of 66 au
Sérgio Sacani
 
Introduction to Weather & Ai Integration (UI)
kutatomoshi
 
Buoyancy, Archimedes' principle, and Pascal's.pptx
kmistwentyfour
 
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Different Disease and pest of honey bee .pptx
MrRABIRANJAN
 
Table - Technique selection matrix in CleaningValidation
Markus Janssen
 
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 
Continuous Model-Based Engineering of Software-Intensive Systems: Approaches,...
Hugo Bruneliere
 
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
The Kardashev Scale From Planetary to Cosmic Civilizations
Saikat Basu
 
MICROBIOLOGY PART-1 INTRODUCTION .pptx
Mohit Kumar
 
Anatomy and physiology of digestive system.pptx
Ashwini I Chuncha
 
Perchlorate brine formation from frost at the Viking 2 landing site
Sérgio Sacani
 
Precise Weather Research (UI) & Applied Technology / Science Weather Tracking
kutatomoshi
 
Ad

Mining the Technical Skills of Open Source Developers

  • 1. Mining the Technical Skills of Open Source Developers João Eduardo Montandon Advisor: Marco Túlio Valente
  • 2. “Our civilization runs on software.” ̶ Bjarne Stroustrup 2
  • 5. “Software development is a human centric activity, which makes developers the most important asset of software companies.” DeMarco, T. and Lister, T. (1999). Peopleware: Productive Projects and Teams. 5
  • 6. “Our [Facebook’s] policy is literally to hire as many talented engineers as we can find.” ̶ Mark Zuckerberg “Hiring well is the most important thing in the universe.” ̶ Valve Corporation 6
  • 7. Where do we find the right people? 7
  • 8. Where do we find them? … GitHub! 60M 56M new repositories* new users* * In 2020, only! 8
  • 9. Where do we find them? … GitHub! 60M 56M new repositories* new users* * In 2020, only! 9
  • 10. How do we choose the right people? 10
  • 11. The Literature Perspective Right people: The ones that better align with the project The Industry Perspective Right people: The ones that better align with the company project general specific company 11
  • 12. The Industry Perspective Right people: The ones that better align with the company general specific company OUR FOCUS 12
  • 13. Our proposal is… To investigate methods and techniques to effectively identify technical skills of software developers 13
  • 14. Our goals ➢ Understand which skills and abilities are more required by IT companies when looking for new professionals. ➢ Investigate methods and techniques to identify expertise elements in a deep perspective (3rd party libraries). ➢ Investigate methods and techniques to identify expertise elements in a broad perspective (technical roles). 14
  • 15. What Skills do IT Companies look for in New Developers? ??? ??? 15
  • 16. What kind of skills are the industry interested in? 16
  • 20. Our study: Technical Skills 20,968 Job posted in 2019 14 pre-defined roles 67K Technical skills occurrences 6 High level technical skills (282) 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. Our study: Soft Skills 376 Opportunities randomly selected 3 Researchers manually annotated sentences 1,530 Sentences annotated 24
  • 25. 25
  • 26. Takeaway ➢ Developer-based positions do require expertise in 3rd party components ➢ Programming languages skills are needed across all roles ➢ Soft skills: teamwork and communication do matter! 26
  • 27. Identifying Experts in Software Libraries and Frameworks among GitHub Users Libs & frameworks ??? 27
  • 28. Software development increasingly relies on 3rd party software components 28
  • 30. Investigate data-driven methods to identify developers’ expertise level on 3rd party libraries 30
  • 31. H = When maintaining a piece of code, developers gain expertise on frameworks and libraries 31
  • 32. o RQ.1: How accurate are [supervised] machine learning classifiers when used to identify library experts? o RQ.2: Which features best distinguish library experts? 32
  • 33. Our Ground-Truth 33 Library Description #Dev. ReactJS Fronted library used to build user interfaces 8,742 MongoDB Official JavaScript driver for MongoDB 454 socket.io Real-time communication library 608
  • 34. Our Ground-Truth Could you please rank your expertise on [target library] in a scale from 1 (novice) to 5 (expert)? 34
  • 37. Dimension Description Features Volume Amount of code written by developer using the library Frequency Amount of time a developer has used the library Breadth Quantify the situations the library was used 37
  • 38. RQ.1 RQ.2 Random Forest & SVM ❖ 3 classes: All three libraries ❖ 5 classes: ReactJS KMeans ❖ 3 classes: ReactJS & MongoDB ❖ 5 classes: socket.io 38
  • 39. RQ.1 RQ.2 Random Forest & SVM ❖ 3 classes: All three ❖ 5 classes: ReactJS KMeans ❖ 3 classes: ReactJS & MongoDB ❖ 5 classes: socket.io In this presentation, we will focus here 39
  • 43. LinkedIn Triangulation 2,129 ReactJS developers 263 experts 43
  • 44. LinkedIn Triangulation 2,129 ReactJS developers 263 experts 160 LinkedIn profiles 44
  • 45. LinkedIn Triangulation 2,129 ReactJS developers 72% Explicitly Mentioned ReactJS 263 experts 160 LinkedIn profiles 45
  • 46. Takeaway ➢ Exploratory analysis can identify groups of experts based on the selected features ➢ We found clusters dominated by experts ( Experts Novices) 46
  • 47. Mining The Technical Roles of GitHub Users Tech roles Libs & frameworks 47
  • 48. The Surgical Team “Mills proposes that each segment of a large job be tackled by a team, but that the team be organized like a surgical team […]” 48
  • 49. Technical roles are one of the first information used by companies when hiring new developers 49
  • 50. Identify the technical roles played by developers using information available in OSS platforms. 50
  • 51. H = The technologies that developers master define their technical roles 51
  • 52. o RQ.1: How accurate are ML classifiers on identifying developers’ technical roles? o RQ.2: What are the most relevant features to distinguish technical roles? o RQ.3: Do technical roles influence each other during classification? o RQ.4: How effectively can we identify full-stack developers? 52
  • 53. o RQ.1: How accurate are ML classifiers on identifying developers’ technical roles? o RQ.2: What are the most relevant features to distinguish technical roles? o RQ.3: Do technical roles influence each other during classification? o RQ.4: How effectively can we identify full-stack developers?* 53
  • 54. Our Ground Truth 1. Selected developers from Stack Overflow with GitHub profiles 2. Filtered the ones self-ascribed in the following roles: Backend, Frontend, DevOps, DataScience, Mobile, and FullStack*. 54 * RQ.4, only
  • 55. Our Ground Truth ● Selected developers from Stack Overflow with GitHub profiles ● Filtered the ones self-ascribed in the following roles: Backend, Frontend, DevOps, DataScience, and Mobile. 55
  • 56. Our Feature Set Programming Language Short Bio Projects’ names, topics & descriptions 3rd party dependencies 56
  • 57. Our Feature Set Programming Language Short Bio Projects’ names, topics & descriptions 3rd party dependencies 57
  • 58. (RQ.1 & RQ.4) How accurate are ML classifiers in identifying technical roles? Role Precision Recall F1 FullStack* 0.99 0.71 0.83 Backend 0.87 0.63 0.73 Frontend 0.86 0.89 0.87 Mobile 0.80 0.34 0.47 DevOps 0.75 0.06 0.11 DataScience 0.86 0.62 0.71 Overall 0.88 0.69 0.77 0.89 AUC 58 * RQ.4, only
  • 59. (RQ.2) What are the most relevant features? 59
  • 60. (RQ.2) What are the most relevant features? 60
  • 61. Takeaway ➢ We show that it is possible to identify major technical roles from developers ➢ Features associated to PLs are relevant for all roles 61
  • 62. Conclusions & Future Work Tech roles Libs & frameworks 62
  • 63. We extensively analyzed data-driven methods and techniques to leverage software developers’ profiles In this thesis… 63
  • 64. We extensively analyzed data-driven methods and techniques to leverage software developers’ profiles In this thesis… 64
  • 65. We extensively analyzed data-driven methods and techniques to leverage software developers’ profiles In this thesis… 65
  • 66. We extensively analyzed data-driven methods and techniques to leverage software developers’ profiles In this thesis… 66
  • 67. Our Contributions ✓ A map of technical and soft skills more required by the IT industry ✓ Ground truth with expertise information of 575 developers ✓ A method based on low-level data to identify specific expertise information (libs & frameworks) ✓ A method based on coarse-grained features to detect general expertise elements (technical roles) 67
  • 68. Future Work 1. Expand soft skills analysis 2. Study technical expertise in PLs 3. 3rd party libs in other ecosystems 68
  • 69. 1. J. E. Montandon, L. L. Silva, M. T. Valente. Identifying Experts in Software Libraries and Frameworks among GitHub Users. MSR, 2019. 2. J. E. Montandon, C. Politowski, L. L. Silva, M. T. Valente, F. Petrillo, Y. Guéhéneuc. What Skills do IT Companies look for in New Developers? A Study with Stack Overflow Jobs. IST, 2021. 3. J. E. Montandon, M. T. Valente, L. L. Silva. Mining the Technical Roles of GitHub Users. IST, 2021. Publications 69
  • 70. Mining the Technical Skills of Open Source Developers João Eduardo Montandon Advisor: Marco Túlio Valente
  • 71. “Expertise then refers to the characteristics, skills, and knowledge that distinguish experts from novices and less experienced people.” The Cambridge Handbook of Expertise and Expert Performance, K. A. Ericsson, 2012 The Importance of Expertise 71
  • 72. Towards a Theory of Software Development Expertise, Baltes and Diehl, 2018 72
  • 73. Towards a Theory of Software Development Expertise, Baltes and Diehl, 2018 73
  • 74. Towards a Theory of Software Development Expertise, Baltes and Diehl, 2018 74
  • 75. 215 jobs 372 jobs 135 jobs Out of 1,839 job offers… 722 (39%) 75
  • 77. (RQ.1) How accurate are ML classifiers in identifying technical roles? Role Precision Recall F1 Backend 0.62 0.12 0.18 Frontend 0.77 0.78 0.77 Mobile 0.78 0.38 0.51 DevOps 0.70 0.13 0.20 DataScience 0.86 0.66 0.74 Overall 0.77 0.49 0.59 0.71 Random Forest AUC 77
  • 78. (RQ.4) The FullStack Dataset “Being a Full-Stack developer […] means that you are able to work on both sides” 2,284 FullStack dataset +783 Backend + Frontend developers 78
  • 79. (RQ.4) How effectively can we identify full-stack developers? Role Precision Recall F1 FullStack 0.99 0.71 0.83 Backend 0.87 (+0.25) 0.63 (+0.51) 0.73 (+0.55) Frontend 0.86 (+0.09) 0.89 (+0.11) 0.87 (+0.10) Mobile 0.80 (+0.03) 0.34 (-0.04) 0.47 (-0.04) DevOps 0.75 (+0.05) 0.06 (-0.07) 0.11 (-0.09) DataScience 0.86 (0.00) 0.62 (-0.04) 0.71 (-0.03) Overall 0.88 (+0.11) 0.69 (+0.20) 0.77 (+0.18) 0.89(+0.18) AUC 79
  • 80. (RQ.4) How effectively can we identify full-stack developers? Role Precision Recall F1 FullStack 0.99 0.71 0.83 Backend 0.87 (+0.25) 0.63 (+0.51) 0.73 (+0.55) Frontend 0.86 (+0.09) 0.89 (+0.11) 0.87 (+0.10) Mobile 0.80 (+0.03) 0.34 (-0.04) 0.47 (-0.04) DevOps 0.75 (+0.05) 0.06 (-0.07) 0.11 (-0.09) DataScience 0.86 (0.00) 0.62 (-0.04) 0.71 (-0.03) Overall 0.88 (+0.11) 0.69 (+0.20) 0.77 (+0.18) 0.89(+0.18) AUC 80