SlideShare a Scribd company logo
© 2019 KNIME AG. All Rights Reserved.
Google BigQuery for analysis of
scientific datasets: Interactive
exploration and analysis of the data
using KNIME Analytics Platform
Greg Landrum
Martyna Pawletta
Jeanette Prinz
greg.landrum@knime.com
@dr_greg_landrum
© 2019 KNIME AG. All Rights Reserved. 2
Acknowledgements
• Steve Boyer (Collabra)
• Lutz Weber (OntoChem)
• Ian Wetherbee (Google)
© 2019 KNIME AG. All Rights Reserved. 3
Google BigQuery?
• A giant collection of tables that I can query with SQL
• If the tables share common keys, I can do interesting
things
Might be an oversimplification. ☺
© 2019 KNIME AG. All Rights Reserved. 4
An aside: searching vs exploring
a.k.a. why I’m enthusiastic about this project
© 2019 KNIME AG. All Rights Reserved. 5
An aside: searching vs exploring
© 2019 KNIME AG. All Rights Reserved. 6
An aside: searching vs exploring
© 2019 KNIME AG. All Rights Reserved. 7
An aside: searching vs exploring
• There are definitely arguments for specialized
interfaces that are tailored to make answering a
particular question super efficient and easy
• But! There are times when I’m still trying to figure
out exactly what the question is
• For this it’s nice to have a giant pile of data and a
general purpose tool for exploring it
© 2019 KNIME AG. All Rights Reserved. 8
What we’re going to do here
• Do some exploration of the scientific data that’s
now in BigQuery…
• … with KNIME
© 2019 KNIME AG. All Rights Reserved. 9
Workflow part 1
© 2019 KNIME AG. All Rights Reserved. 10
Workflow part 2
© 2019 KNIME AG. All Rights Reserved. 11
The first database queries
© 2019 KNIME AG. All Rights Reserved. 12
Picking the disease/condition
© 2019 KNIME AG. All Rights Reserved. 13
Results
© 2019 KNIME AG. All Rights Reserved. 14
Compound classes
© 2019 KNIME AG. All Rights Reserved. 15
© 2019 KNIME AG. All Rights Reserved. 16

More Related Content

What's hot (13)

PDF
SpaceCurve - Integrating with Hadoop
Spacecurve
 
PPTX
Visualising your Big Data: Eye Vegetables and Eye Candy
DataWorks Summit
 
PDF
Scoring Metrics for Classification Models
KNIMESlides
 
PPTX
Analysis and interpretation of monitoring data
corehard_by
 
PPTX
Cluster vision at Amsterdam Tech Job Fair
TechMeetups
 
PPTX
Optalysys Optical Processing for HPC
inside-BigData.com
 
PDF
DataOps: An Agile Method for Data-Driven Organizations
Ellen Friedman
 
PDF
kleemann8_12_16c
GunnarKl
 
PDF
Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...
Jean-Claude Meteodyn
 
PDF
SGI Big Data Launch
inside-BigData.com
 
PDF
Emerson Technology Group (ETG)
ebtsusa
 
PDF
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Jen Aman
 
PDF
ODSC data science to DataOps
Christopher Bergh
 
SpaceCurve - Integrating with Hadoop
Spacecurve
 
Visualising your Big Data: Eye Vegetables and Eye Candy
DataWorks Summit
 
Scoring Metrics for Classification Models
KNIMESlides
 
Analysis and interpretation of monitoring data
corehard_by
 
Cluster vision at Amsterdam Tech Job Fair
TechMeetups
 
Optalysys Optical Processing for HPC
inside-BigData.com
 
DataOps: An Agile Method for Data-Driven Organizations
Ellen Friedman
 
kleemann8_12_16c
GunnarKl
 
Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...
Jean-Claude Meteodyn
 
SGI Big Data Launch
inside-BigData.com
 
Emerson Technology Group (ETG)
ebtsusa
 
Utilizing Human Data Validation For KPI Analysis And Machine Learning
Jen Aman
 
ODSC data science to DataOps
Christopher Bergh
 

Similar to Google BigQuery for analysis of scientific datasets: Interactive exploration and analysis of the data using KNIME Analytics Platform (20)

PDF
Augmented OLAP Analytics for Big Data
Tyler Wishnoff
 
PDF
Augmented OLAP for Big Data
Luke Han
 
PDF
Open Source Story and what’s new in KNIME Software
KNIMESlides
 
PPTX
Your Data Nerd Friends Need You!
DataKitchen
 
PDF
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
PPTX
Advance Data Visualization and Storytelling Virtual Workshop
CCG
 
PPTX
Google vs bing
gitam university( anjali)
 
PPTX
Prototype: Its methods, techniques, and key features.
ONE BCG
 
PPTX
What is Prototype,Rapid prototyping and Methods.
Taniya K
 
PDF
Slicing heuristics - Techniques for improving value generation, speed to mark...
Killick Agile Consulting Services
 
PPT
U4 l01 What is big data?
Chapelgate Christian Academy
 
PDF
Why i love Apache Spark?
Jean-Georges Perrin
 
PDF
10 reasons why you should choose big data hadoop as career in 2018
JanBask Training
 
PDF
Deltaplan - SEO Search
Roy Huiskes
 
PPTX
Cross Device Optimisation - Google Analytics Shortcuts
Craig Sullivan
 
PDF
Webinar-Building a Strong Brand For Your Organization -2017-03-07
TechSoup
 
PPTX
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
Jason Hong
 
PPTX
Augmented OLAP for Big Data Analytics
Tyler Wishnoff
 
PDF
From Data to Insights with Google Cloud Platform
AugustoMello11
 
PDF
How to Use Big Data by Onehub
Charles Mount
 
Augmented OLAP Analytics for Big Data
Tyler Wishnoff
 
Augmented OLAP for Big Data
Luke Han
 
Open Source Story and what’s new in KNIME Software
KNIMESlides
 
Your Data Nerd Friends Need You!
DataKitchen
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
Advance Data Visualization and Storytelling Virtual Workshop
CCG
 
Prototype: Its methods, techniques, and key features.
ONE BCG
 
What is Prototype,Rapid prototyping and Methods.
Taniya K
 
Slicing heuristics - Techniques for improving value generation, speed to mark...
Killick Agile Consulting Services
 
U4 l01 What is big data?
Chapelgate Christian Academy
 
Why i love Apache Spark?
Jean-Georges Perrin
 
10 reasons why you should choose big data hadoop as career in 2018
JanBask Training
 
Deltaplan - SEO Search
Roy Huiskes
 
Cross Device Optimisation - Google Analytics Shortcuts
Craig Sullivan
 
Webinar-Building a Strong Brand For Your Organization -2017-03-07
TechSoup
 
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
Jason Hong
 
Augmented OLAP for Big Data Analytics
Tyler Wishnoff
 
From Data to Insights with Google Cloud Platform
AugustoMello11
 
How to Use Big Data by Onehub
Charles Mount
 
Ad

More from Greg Landrum (15)

PDF
Chemical registration
Greg Landrum
 
PDF
Mike Lynch Award Lecture, ICCS 2022
Greg Landrum
 
PDF
ACS San Diego - The RDKit: Open-source cheminformatics
Greg Landrum
 
PDF
Let’s talk about reproducible data analysis
Greg Landrum
 
PDF
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Greg Landrum
 
PDF
Processing malaria HTS results using KNIME: a tutorial
Greg Landrum
 
PDF
Big (chemical) data? No Problem!
Greg Landrum
 
PDF
Is one enough? Data warehousing for biomedical research
Greg Landrum
 
PDF
Some "challenges" on the open-source/open-data front
Greg Landrum
 
PDF
Large scale classification of chemical reactions from patent data
Greg Landrum
 
PDF
Machine learning in the life sciences with knime
Greg Landrum
 
PDF
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
PDF
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
PDF
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
PDF
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 
Chemical registration
Greg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Greg Landrum
 
ACS San Diego - The RDKit: Open-source cheminformatics
Greg Landrum
 
Let’s talk about reproducible data analysis
Greg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Greg Landrum
 
Big (chemical) data? No Problem!
Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Greg Landrum
 
Some "challenges" on the open-source/open-data front
Greg Landrum
 
Large scale classification of chemical reactions from patent data
Greg Landrum
 
Machine learning in the life sciences with knime
Greg Landrum
 
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 
Ad

Recently uploaded (20)

PPTX
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
PPTX
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
PPT
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
PPTX
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
PDF
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
PDF
Primordial Black Holes and the First Stars
Sérgio Sacani
 
PPTX
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
PDF
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
PPTX
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
PDF
Adding Geochemistry To Understand Recharge Areas - Kinney County, Texas - Jim...
Texas Alliance of Groundwater Districts
 
PDF
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
PPTX
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
PPTX
Hypothalamus_nuclei_ structure_functions.pptx
muralinath2
 
PDF
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
PPTX
Structure and uses of DDT, Saccharin..pptx
harsimrankaur204
 
PPTX
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
PPTX
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
PDF
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 
Envenomation AND ANIMAL BITES DETAILS.pptx
HARISH543351
 
Qualification of DISSOLUTION TEST APPARATUS.pptx
shrutipandit17
 
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
A young gas giant and hidden substructures in a protoplanetary disk
Sérgio Sacani
 
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
Chemokines and Receptors Overview – Key to Immune Cell Signaling
Benjamin Lewis Lewis
 
Primordial Black Holes and the First Stars
Sérgio Sacani
 
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
Insect Behaviour : Patterns And Determinants
SheikhArshaqAreeb
 
Lamarckism is one of the earliest theories of evolution, proposed before Darw...
Laxman Khatal
 
Adding Geochemistry To Understand Recharge Areas - Kinney County, Texas - Jim...
Texas Alliance of Groundwater Districts
 
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
Akshay tunneling .pptx_20250331_165945_0000.pptx
akshaythaker18
 
Hypothalamus_nuclei_ structure_functions.pptx
muralinath2
 
NRRM 330 Dynamic Equlibrium Presentation
Rowan Sales
 
Structure and uses of DDT, Saccharin..pptx
harsimrankaur204
 
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
A proposed mechanism for the formation of protocell-like structures on Titan
Sérgio Sacani
 

Google BigQuery for analysis of scientific datasets: Interactive exploration and analysis of the data using KNIME Analytics Platform

  • 1. © 2019 KNIME AG. All Rights Reserved. Google BigQuery for analysis of scientific datasets: Interactive exploration and analysis of the data using KNIME Analytics Platform Greg Landrum Martyna Pawletta Jeanette Prinz [email protected] @dr_greg_landrum
  • 2. © 2019 KNIME AG. All Rights Reserved. 2 Acknowledgements • Steve Boyer (Collabra) • Lutz Weber (OntoChem) • Ian Wetherbee (Google)
  • 3. © 2019 KNIME AG. All Rights Reserved. 3 Google BigQuery? • A giant collection of tables that I can query with SQL • If the tables share common keys, I can do interesting things Might be an oversimplification. ☺
  • 4. © 2019 KNIME AG. All Rights Reserved. 4 An aside: searching vs exploring a.k.a. why I’m enthusiastic about this project
  • 5. © 2019 KNIME AG. All Rights Reserved. 5 An aside: searching vs exploring
  • 6. © 2019 KNIME AG. All Rights Reserved. 6 An aside: searching vs exploring
  • 7. © 2019 KNIME AG. All Rights Reserved. 7 An aside: searching vs exploring • There are definitely arguments for specialized interfaces that are tailored to make answering a particular question super efficient and easy • But! There are times when I’m still trying to figure out exactly what the question is • For this it’s nice to have a giant pile of data and a general purpose tool for exploring it
  • 8. © 2019 KNIME AG. All Rights Reserved. 8 What we’re going to do here • Do some exploration of the scientific data that’s now in BigQuery… • … with KNIME
  • 9. © 2019 KNIME AG. All Rights Reserved. 9 Workflow part 1
  • 10. © 2019 KNIME AG. All Rights Reserved. 10 Workflow part 2
  • 11. © 2019 KNIME AG. All Rights Reserved. 11 The first database queries
  • 12. © 2019 KNIME AG. All Rights Reserved. 12 Picking the disease/condition
  • 13. © 2019 KNIME AG. All Rights Reserved. 13 Results
  • 14. © 2019 KNIME AG. All Rights Reserved. 14 Compound classes
  • 15. © 2019 KNIME AG. All Rights Reserved. 15
  • 16. © 2019 KNIME AG. All Rights Reserved. 16