SlideShare a Scribd company logo
N E T W O R K E D MAC H I N E L E A R N I N G
J OAQ U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4
#OpenML
Research different.
1 6 1 0
G A L I L E O G A L I L E I
D I S C O V E R S S A T U R N ’ S R I N G S
‘ S M A I S M R M I L M E P O E TA L
E U M I B U N E N U G T TA U I R A S ’
Research different.
Royal society: Take nobody’s word for it
Scientific Journal: Reputation-based culture
3 0 0 Y E A R S L AT E R
J O U R N A L S S H O W L I M I T S
• Complex code not included
• Large data sets not included
• Experiment details scant
• Results hard to reproduce
• Papers not updatable
• Slow, incomplete tracking of
paper impact
• Publication bias
• No online public discussion
• Open access?
J O U R N A L S : L O N G - T E R M M E M O RY
I N T E R N E T: S H O R T- T E R M W O R K I N G M E M O RY
N E T W O R K E D S C I E N C E
O N L I N E D A TA B A S E S
O P E N S O U R C E C O D E
W E B S E R V I C E S , A P I S
C O L L A B O R A T I V E T O O L S
!
O P E N , S C A L A B L E C O L L A B O R A T I O N
R E A L - T I M E D I S C U S S I O N
C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S
C I T I Z E N S C I E N C E
Research different.
Polymaths: Solve math problems through
massive collaboration (not competition)
Broadcast question, combine 	

many minds to solve it
Solved hard problems in weeks
Many (joint) publications
Research different.
SDSS: Robotic telescope, data publicly online (SkyServer)
+1 million distinct users 	

vs. 10.000 astronomers
Broadcast data, allow many minds to ask the right questions
Thousands of papers
Research different.
Galaxy Zoo: citizen scientists classify a million galaxies
Offer right tools so that anybody can be a scientist
Many novel discoveries by scientists and citizens
Research different.
Sharing data sparks discovery
Designed serendipity:	

- What’s hard for one scientist is
easy for another	

- Surprising ideas, observations
can spark new discoveries
Share, organise data for easy, 	

large-scale collaboration
Data exploding in all sciences: 	

collaborative data analysis needed
Building reputation
Authorship: easy to contribute + contributions stored, visible online
Collaboration: build trust, work 	

with new people
Citation: more people see, build upon, and cite your work. 	

Tell people how to cite data and code.
Altmetrics: track reuse/interest online (ArXiv)
N E T W O R K E D MAC H I N E L E A R N I N G
Machine learning
Complex code, large-scale data, experiments (impossible to print)
Experiments not shared online: impossible to build on prior work:
inhibits deeper analysis (e.g. meta-learning)
Low reproducibility, generalisability (studies contradict)
What if we could all connect with each other, and with other 	

scientists, to explore and apply machine learning?
Few collaborative tools to speed up research
OpenML
Place to share data, code, experiments in full detail
All results organised, linked together for further (meta)analysis,
reuse, discussion, study, education
Links to (open-source) code, open data anywhere online.
Anyone can post data to analyse, anyone can share code and
results (models, predictions, evaluations)
Integrated in ML platforms (R,Weka, Rapidminer,…) 	

to automatically load data, upload results
Scientists can work in teams, but results only publicly visible if
data, code shared
OpenML: benefits for scientists
More time: automates routinizable work: 	

- find data and/or code	

- setup and run large-scale experiments	

- results compared to state-of-the-art	

- log experiment details for future reference
More control: 	

- state how others should cite your work	

- track reuse	

- share results more easily
More knowledge: 	

- more time for actual research	

- build directly on prior work	

- easier, large-scale collaboration + interaction
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
OpenML Tutorial: Networked Science in Machine Learning
Plugins:WEKA
Plugins: MOA
Plugins: RapidMiner
1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C )
2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S
3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S
OpenML: under development
OpenML studies	

- collection of datasets, flows, runs, results in a study	

- online counterpart of paper (with url)	

- construct by simply tagging resources	

- easily include (build on) data of others
Reputation building	

- Profile page: statistics of activity and impact on OpenML 	

- Collaborative leaderboards: best contributors to solving a task
Teams	

- Add scientists in teams (circles)	

- Share resources, results within team only	

- Make public at any time (e.g. after publication)
Meta-learning support	

- Data/Flow qualities: easy adding, better overviews	

- Algorithm selection techniques running on website (vs humans?)
J O I N T H E C LU B

More Related Content

Similar to OpenML Tutorial: Networked Science in Machine Learning (20)

PDF
OpenML data@Sheffield
Joaquin Vanschoren
 
PDF
OpenML Reproducibility in Machine Learning ICML2017
Joaquin Vanschoren
 
PDF
OpenML DALI
Joaquin Vanschoren
 
PDF
Open and Automated Machine Learning
Joaquin Vanschoren
 
PPTX
ContentMine: Open Data and Social Machines
petermurrayrust
 
PPTX
ContentMine: Open Data and Social Machines
TheContentMine
 
PPT
Sm4ss presentation rehearsal
Mike Seyfang
 
PPT
Open Research methodologies
jessykate
 
PPTX
Making Theses USEFUL
petermurrayrust
 
PPTX
Making Theses USEFUL
TheContentMine
 
PDF
Open science 2014
Dan Gezelter
 
PDF
Open science
Nicolas Rougier
 
PDF
Science in the Open
Cameron Neylon
 
PDF
Open science for Bio-Hacker
Stefan Kasberger
 
PPT
The Future of Research (Science and Technology)
Duncan Hull
 
PPTX
Open sciencerefresher2019
heila1
 
ODP
Scholarship in a connected world: New ways to know, new ways to show
Derek Keats
 
PDF
The web as a tool - rather than a threat
Cameron Neylon
 
PPT
Murpha11
Philip Bourne
 
PPT
Web Native Laboratory Record
Cameron Neylon
 
OpenML data@Sheffield
Joaquin Vanschoren
 
OpenML Reproducibility in Machine Learning ICML2017
Joaquin Vanschoren
 
OpenML DALI
Joaquin Vanschoren
 
Open and Automated Machine Learning
Joaquin Vanschoren
 
ContentMine: Open Data and Social Machines
petermurrayrust
 
ContentMine: Open Data and Social Machines
TheContentMine
 
Sm4ss presentation rehearsal
Mike Seyfang
 
Open Research methodologies
jessykate
 
Making Theses USEFUL
petermurrayrust
 
Making Theses USEFUL
TheContentMine
 
Open science 2014
Dan Gezelter
 
Open science
Nicolas Rougier
 
Science in the Open
Cameron Neylon
 
Open science for Bio-Hacker
Stefan Kasberger
 
The Future of Research (Science and Technology)
Duncan Hull
 
Open sciencerefresher2019
heila1
 
Scholarship in a connected world: New ways to know, new ways to show
Derek Keats
 
The web as a tool - rather than a threat
Cameron Neylon
 
Murpha11
Philip Bourne
 
Web Native Laboratory Record
Cameron Neylon
 

More from Joaquin Vanschoren (13)

PDF
Meta learning tutorial
Joaquin Vanschoren
 
PDF
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
PDF
OpenML 2019
Joaquin Vanschoren
 
PDF
Exposé Ontology
Joaquin Vanschoren
 
PDF
Designed Serendipity
Joaquin Vanschoren
 
PDF
Learning how to learn
Joaquin Vanschoren
 
PDF
OpenML NeurIPS2018
Joaquin Vanschoren
 
PDF
Data science
Joaquin Vanschoren
 
PDF
Open Machine Learning
Joaquin Vanschoren
 
PDF
Hadoop tutorial
Joaquin Vanschoren
 
PDF
Hadoop sensordata part2
Joaquin Vanschoren
 
PDF
Hadoop sensordata part1
Joaquin Vanschoren
 
PDF
Hadoop sensordata part3
Joaquin Vanschoren
 
Meta learning tutorial
Joaquin Vanschoren
 
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
OpenML 2019
Joaquin Vanschoren
 
Exposé Ontology
Joaquin Vanschoren
 
Designed Serendipity
Joaquin Vanschoren
 
Learning how to learn
Joaquin Vanschoren
 
OpenML NeurIPS2018
Joaquin Vanschoren
 
Data science
Joaquin Vanschoren
 
Open Machine Learning
Joaquin Vanschoren
 
Hadoop tutorial
Joaquin Vanschoren
 
Hadoop sensordata part2
Joaquin Vanschoren
 
Hadoop sensordata part1
Joaquin Vanschoren
 
Hadoop sensordata part3
Joaquin Vanschoren
 
Ad

Recently uploaded (20)

PDF
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
PPTX
GB1 Q1 04 Life in a Cell (1).pptx GRADE 11
JADE ACOSTA
 
PDF
Primordial Black Holes and the First Stars
Sérgio Sacani
 
PPTX
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
PPT
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
PDF
Annual report 2024 - Inria - English version.pdf
Inria
 
PPTX
Animal Reproductive Behaviors Quiz Presentation in Maroon Brown Flat Graphic ...
LynetteGaniron1
 
PPTX
Different formulation of fungicides.pptx
MrRABIRANJAN
 
PPTX
Immunopharmaceuticals and microbial Application
xxkaira1
 
PDF
GK_GS One Liner For Competitive Exam.pdf
abhi01nm
 
PPTX
Hypothalamus_nuclei_ structure_functions.pptx
muralinath2
 
PPTX
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
PDF
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
PDF
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PDF
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
PDF
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
PPT
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
PPTX
Anatomy and physiology of digestive system.pptx
Ashwini I Chuncha
 
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
GB1 Q1 04 Life in a Cell (1).pptx GRADE 11
JADE ACOSTA
 
Primordial Black Holes and the First Stars
Sérgio Sacani
 
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
Annual report 2024 - Inria - English version.pdf
Inria
 
Animal Reproductive Behaviors Quiz Presentation in Maroon Brown Flat Graphic ...
LynetteGaniron1
 
Different formulation of fungicides.pptx
MrRABIRANJAN
 
Immunopharmaceuticals and microbial Application
xxkaira1
 
GK_GS One Liner For Competitive Exam.pdf
abhi01nm
 
Hypothalamus_nuclei_ structure_functions.pptx
muralinath2
 
Phage Therapy and Bacteriophage Biology.pptx
Prachi Virat
 
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
Anatomy and physiology of digestive system.pptx
Ashwini I Chuncha
 
Ad

OpenML Tutorial: Networked Science in Machine Learning

  • 1. N E T W O R K E D MAC H I N E L E A R N I N G J OAQ U I N VA N S C H O R E N ( T U / E ) , 2 0 1 4 #OpenML
  • 3. 1 6 1 0 G A L I L E O G A L I L E I D I S C O V E R S S A T U R N ’ S R I N G S ‘ S M A I S M R M I L M E P O E TA L E U M I B U N E N U G T TA U I R A S ’
  • 4. Research different. Royal society: Take nobody’s word for it Scientific Journal: Reputation-based culture
  • 5. 3 0 0 Y E A R S L AT E R J O U R N A L S S H O W L I M I T S • Complex code not included • Large data sets not included • Experiment details scant • Results hard to reproduce • Papers not updatable • Slow, incomplete tracking of paper impact • Publication bias • No online public discussion • Open access?
  • 6. J O U R N A L S : L O N G - T E R M M E M O RY I N T E R N E T: S H O R T- T E R M W O R K I N G M E M O RY N E T W O R K E D S C I E N C E O N L I N E D A TA B A S E S O P E N S O U R C E C O D E W E B S E R V I C E S , A P I S C O L L A B O R A T I V E T O O L S ! O P E N , S C A L A B L E C O L L A B O R A T I O N R E A L - T I M E D I S C U S S I O N C O M B I N E , R E U S E S C I E N T I F I C R E S U LT S C I T I Z E N S C I E N C E
  • 7. Research different. Polymaths: Solve math problems through massive collaboration (not competition) Broadcast question, combine many minds to solve it Solved hard problems in weeks Many (joint) publications
  • 8. Research different. SDSS: Robotic telescope, data publicly online (SkyServer) +1 million distinct users vs. 10.000 astronomers Broadcast data, allow many minds to ask the right questions Thousands of papers
  • 9. Research different. Galaxy Zoo: citizen scientists classify a million galaxies Offer right tools so that anybody can be a scientist Many novel discoveries by scientists and citizens
  • 10. Research different. Sharing data sparks discovery Designed serendipity: - What’s hard for one scientist is easy for another - Surprising ideas, observations can spark new discoveries Share, organise data for easy, large-scale collaboration Data exploding in all sciences: collaborative data analysis needed
  • 11. Building reputation Authorship: easy to contribute + contributions stored, visible online Collaboration: build trust, work with new people Citation: more people see, build upon, and cite your work. Tell people how to cite data and code. Altmetrics: track reuse/interest online (ArXiv)
  • 12. N E T W O R K E D MAC H I N E L E A R N I N G
  • 13. Machine learning Complex code, large-scale data, experiments (impossible to print) Experiments not shared online: impossible to build on prior work: inhibits deeper analysis (e.g. meta-learning) Low reproducibility, generalisability (studies contradict) What if we could all connect with each other, and with other scientists, to explore and apply machine learning? Few collaborative tools to speed up research
  • 14. OpenML Place to share data, code, experiments in full detail All results organised, linked together for further (meta)analysis, reuse, discussion, study, education Links to (open-source) code, open data anywhere online. Anyone can post data to analyse, anyone can share code and results (models, predictions, evaluations) Integrated in ML platforms (R,Weka, Rapidminer,…) to automatically load data, upload results Scientists can work in teams, but results only publicly visible if data, code shared
  • 15. OpenML: benefits for scientists More time: automates routinizable work: - find data and/or code - setup and run large-scale experiments - results compared to state-of-the-art - log experiment details for future reference More control: - state how others should cite your work - track reuse - share results more easily More knowledge: - more time for actual research - build directly on prior work - easier, large-scale collaboration + interaction
  • 57. Plugins: RapidMiner 1 . O P E R AT O R T O D O W N L O A D TA S K ( TA S K T Y P E S P E C I F I C ) 2 . S U B W O R K F L O W T H AT S O LV E S T H E TA S K , G E N E R AT E S R E S U LT S 3 . O P E R AT O R F O R U P L O A D I N G R E S U LT S
  • 58. OpenML: under development OpenML studies - collection of datasets, flows, runs, results in a study - online counterpart of paper (with url) - construct by simply tagging resources - easily include (build on) data of others Reputation building - Profile page: statistics of activity and impact on OpenML - Collaborative leaderboards: best contributors to solving a task Teams - Add scientists in teams (circles) - Share resources, results within team only - Make public at any time (e.g. after publication) Meta-learning support - Data/Flow qualities: easy adding, better overviews - Algorithm selection techniques running on website (vs humans?)
  • 59. J O I N T H E C LU B