SlideShare a Scribd company logo
Big Data and AI In Fighting
Against Covid-19
--
Andrew Zhang
zhangan@amazon.com
7/8/2020
Big Data and AI in Fighting Against COVID-19
1. Introduction
2. Supercomputers for Scientific Research
3. Covid-19 Open Data Lake
4. NLP and BERT to answer scientific questions
Agenda
Speaker: Andrew Zhang
Senior Tech Acct Manager at AWS, his specialties are big data, machine
learning, and HPC. Before joining Amazon, Andrew was a data science
engineer with IBM. His interest is scaling machine learning in a hybrid
multi-cloud enterprise environment. Previously, Andrew was an
enterprise architect with Novartis Pharmaceuticals.
Source
Source
Motivation
6
Supercomputers for Scientific Research
Extensive research in
bioinformatics, epidemiology,
and molecular modeling to
understand the treatment
and develop strategies
Bringing together leaders to
provide access to the world’s
most powerful high-
performance computing
resources.
Covid-19 High Performance Computing Consortium
Covid-19 High Performance Computing Consortium
https://mit-satori.github.io/
Covid-19 Active Research Projects
“We have identified two target proteins that generate novel molecules to inhibit
the relevant proteins. The compute capacity will enable us to run and optimize our
neural networks to generate better molecules and estimate their binding affinity to
the target proteins, drug-likeness and ADMET properties. Our work will evolve to
use 3D SMILES (currently at 2D) and other improvement.”
“ We are working with a team who have developed a device to allow safe ventilator
splitting between 2 or more patients. We made the software to guide device
selection based on the patient's respiratory states, but we want the app to allow for
just lookup into pre-computed values from the …”
We have designed a mobile app and a technological platform, compliant to the
European legislation, which enables unidentified contact/exposure information of
users to be efficiently collected in a fully anonymous way.
https://covid19-hpc-consortium.org/
10
Covid-19 Open Data Lake
Covid-19 Tracking and Prediction
COVID-19 confirmed cases and deaths Genomic epidemiological tracking Hospital resource utilization modeling
This is a visual representation of the
number of confirmed cases (counties)
and deaths (circles).
Data Source: COVID-19 data sources:
the 2019 Novel Coronavirus COVID-19
(2019-nCoV) Data Repository by Johns
Hopkins CSSE.
Genomic epidemiology of novel
coronavirus which provides real-
time tracking of pathogen
evolution (click to play the
transmissions and phylogeny)
Hospital resource utilization
modeling Data Source: University of
Washington’s Institute of Health and
Metrics Evaluation (IHME) COVID-19
projections.
Source: DataBricks
Covid-19 Research and Diagnosis
Answer Key Questions from Scientific Literature Read COVID-19 X-ray or CT image
• What is known about transmission, incubation,
and environmental stability?
• What do we know about COVID-19 risk factors?
• What do we know about virus genetics, origin,
and evolution?
• What has been published about medical care?
While PCR tests offer many advantages they are physical things that
require shipping the test or the sample. X-ray machines can be
plugged in to screen patients as long as they have electricity.
AI tools can help general practitioners to triage and treat patients.
Companies are developing AI tools and deploying them at
hospitals Wired 2020.
Source: IEEESource: Kaggle
Open Data Lake: Query and Visualization
(Amazon)
• Global Coronavirus (COVID-19) Data – Tracks confirmed COVID-19 cases in
provinces, states, and countries across the world with a breakdown to the
county level in the US.
• Coronavirus (COVID-19) Data in the United States – Tracks confirmed
cases and deaths in the US by state and county.
• Coronavirus Disease (COVID-19) Testing Data – Tracks the number of
people tested, pending tests, and positive and negative tests for COVID-
19.
• USA Hospital Beds – COVID-19 – Data on hospital beds and their
utilization in the US.
• COVID-19 Open Research Dataset (CORD-19) – A collection of over 45,000
research articles (over 33,000 with full text) about COVID-19, SARS-CoV-2,
and related coronaviruses. AWS has preprocessed and enriched these
with annotations extracted from Amazon Comprehend Medical.
• Amazon: S3 Explorer https://dj2taa9i652rf.cloudfront.net/
• Amazon: Glue Simple Cost Effective ETL https://aws.amazon.com/glue/
• Amazon: Athena a serverless SQL query engine
https://aws.amazon.com/athena/
• Amazon: QuickSight https://aws.amazon.com/quicksight/
A public data lake for analysis of COVID-19 data | AWS Big ... QuickSight Dashboard
Open Data Lake: Query and Visualization (Google)
COVID-19 data from Johns Hopkins Center for
Systems Science and Engineering
OpenStreetMap Public Dataset : World map including
healthcare provider locations
Global Health Dataset from The World Bank :
Global health and population trends asked questions
and tips to get started.
New York Times COVID-19 database: The New York
Times' COVID-19 database based on US health
agency reports.
ECDC COVID-19 Cases by Country : COVID-19 cases
by country as reported by the European Centre for
Disease Prevention and Control.
USAFacts COVID-19 Cases by US County : COVID-19
cases by county aggregated by USAFacts from US
health agencies.
Big Query
15
NLP and BERT to answer scientific
questions
16
NLP and BERT
Source
• BERT, as a contextual model, captures these
relationships in a bidirectional way.
• I made a bank deposit the unidirectional
representation of bank is only based on I made
a but not deposit.
• The pre-trained model on massive datasets enables
anyone building natural language processing to use
this free powerhouse.
• BERT theoretically allows us to smash multiple
benchmarks with minimal task-specific fine-
tuning.
• Corporate data to create different application.
17
COVID-19 Open Research Dataset Challenge
https://www.whitehouse.gov/briefings-statements/call-action-tech-community-new-machine-readable-covid-19-dataset
Every scientist working on a cure or vaccine
must understand this prior research.
158,000 Coronavirus scholarly articles
including 75,000 with full text. • What is known about transmission, incubation, and environmental stability?
• What do we know about COVID-19 risk factors?
• What do we know about vaccines and therapeutics?
• What do we know about virus genetics, origin, and evolution?
• What has been published about medical care?
• What has been published about ethical and social science considerations?
• What do we know about non-pharmaceutical interventions?
• What do we know about diagnostics and surveillance?
Explore Covid-19 Scientific Literature (1)
Generate Summaries from Abstracts by training Summarizer Model
Databricks
Generate a WordCloud from all the titles
Explore Covid-19 Scientific Literature (2)All Task/Challenges answers using NLP:
We will use different libraries to get answer from these
papers.
• Bert QA Model (Pretrained by SQuAD dataset)
• BERT summary Model
• Python Google translate package
• HTML for visualize result
All Flow:
• Using QA Model, read all paper's abstract then
find answer for all tasks
• Concatenate Top 50 confident answers to be
article, and using Summary model to write
summary of answers
• Translate multiple language by google translate
• Write HTML to show summary of all ‘papers
answer for all tasks.
Kaggle
20
Explore Covid-19 Scientific Literature (3)
Google
1. When the user asks an initial
question, the tool not only returns a
set of papers (like in a traditional
search) but also highlights snippets
from the paper that are possible
answers to the question.
2. The user can review the snippets
and quickly make a decision on
whether or not that paper is worth
further reading.
3. If the user is satisfied with the initial
set of papers and snippets, we have
added functionality to pose follow-
up questions, which act as new
queries for the original set of
retrieved articles.
21
Explore Covid-19 Scientific Literature (4)
Amazon
AWS COVID-19 knowledge graph (CKG)
using AWS CloudFormation and Amazon
Neptune, and query the graph using
Jupyter notebooks hosted on Amazon
SageMaker in your AWS account.
The CKG aids in the exploration and
analysis of the COVID-19 Open Research
Dataset (CORD-19), hosted in the AWS
COVID-19 data lake.
The strength of the graph comes from
the connections between scholarly
articles, authors, scientific concepts, and
institutions. The CKG also helps power
the CORD-19 search page..
Questions ?
Twitter: @a9zhang
Email the speaker: zhangan@amazon.com

More Related Content

PPTX
Simont Braun - Webinar PSD3 PSR Evolution or Revolution?
FinTech Belgium
 
PDF
Emerging Giants in Asia Pacific.pdf
digitalinasia
 
PPTX
Diagnostics & Healthcare
Jayashree Prabhu
 
PDF
Nubank Pitch Deck-final.pdf
PedroRagazzoPaiva
 
PPTX
Walmart
Anant Pandey
 
PPTX
Walmart
Aditya singh
 
PDF
Linio IR Deck - May 2014
SYGroup
 
Simont Braun - Webinar PSD3 PSR Evolution or Revolution?
FinTech Belgium
 
Emerging Giants in Asia Pacific.pdf
digitalinasia
 
Diagnostics & Healthcare
Jayashree Prabhu
 
Nubank Pitch Deck-final.pdf
PedroRagazzoPaiva
 
Walmart
Anant Pandey
 
Walmart
Aditya singh
 
Linio IR Deck - May 2014
SYGroup
 

What's hot (20)

PDF
The Future of Digital Health in 2022
Diana Girnita
 
PDF
Bank marketing mini-project
Divya Ganjoo, PMP® CSM®
 
PPTX
Telecommunication industry in Sri Lanka (Group presentation ) 2016
Nirasha Nissanka
 
PDF
QR Codes: A Point of View
BBDO
 
PDF
Bank: Trends, Tech and Future
Ivano Digital
 
PDF
Venmo Company Presentation
Mara Gordon
 
PPTX
Pitch Deck
Otel2Go
 
PPTX
Planning the marketing of a multi centric diagnostic centre
Kavita Soni
 
PPT
Digitalisation strategies
Suprabha B
 
PDF
Digital Transformation
Heru WIjayanto
 
PDF
Using Industry 4.0 Technologies to Enrich Manufacturing SMEs in Egypt
Nile University
 
PDF
Adopting Information Systems in a Hospital - A Case Study & Lessons Learned
Nawanan Theera-Ampornpunt
 
PDF
HP Supply Chain
Woraphan Atikomtrirat
 
PPTX
Top Trends in Retail Banking: 2021
Capgemini
 
PPTX
The digital project manager
Project Management Solutions
 
PDF
Innovation in Healthcare
Ki Nam Jin
 
PPTX
Jpmorgan chase
Nishant Sinha
 
PDF
Monzo Deep Dive | Fintech Fraternity
Fintech Fraternity
 
PDF
Alibaba roadshow presentation
Pierre Poignant
 
PDF
Value chain of a hospital
jennifer malabrigo, MBA
 
The Future of Digital Health in 2022
Diana Girnita
 
Bank marketing mini-project
Divya Ganjoo, PMP® CSM®
 
Telecommunication industry in Sri Lanka (Group presentation ) 2016
Nirasha Nissanka
 
QR Codes: A Point of View
BBDO
 
Bank: Trends, Tech and Future
Ivano Digital
 
Venmo Company Presentation
Mara Gordon
 
Pitch Deck
Otel2Go
 
Planning the marketing of a multi centric diagnostic centre
Kavita Soni
 
Digitalisation strategies
Suprabha B
 
Digital Transformation
Heru WIjayanto
 
Using Industry 4.0 Technologies to Enrich Manufacturing SMEs in Egypt
Nile University
 
Adopting Information Systems in a Hospital - A Case Study & Lessons Learned
Nawanan Theera-Ampornpunt
 
HP Supply Chain
Woraphan Atikomtrirat
 
Top Trends in Retail Banking: 2021
Capgemini
 
The digital project manager
Project Management Solutions
 
Innovation in Healthcare
Ki Nam Jin
 
Jpmorgan chase
Nishant Sinha
 
Monzo Deep Dive | Fintech Fraternity
Fintech Fraternity
 
Alibaba roadshow presentation
Pierre Poignant
 
Value chain of a hospital
jennifer malabrigo, MBA
 
Ad

Similar to Big Data and AI in Fighting Against COVID-19 (20)

PDF
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
National Information Standards Organization (NISO)
 
PPTX
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Big Data, The Community and The Commons (May 12, 2014)
Robert Grossman
 
PDF
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Robert Grossman
 
DOCX
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
makdul
 
PDF
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
fionabrinkman
 
PDF
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Databricks
 
PDF
Tag.bio aws public jun 08 2021
Sanjay Padhi, Ph.D
 
PPTX
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
PDF
TranSMART: How open source software revolutionizes drug discovery through cro...
keesvb
 
PDF
From algorithms to advancing care: genomics data drives progress
Jack DiGiovanna
 
PDF
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
Databricks
 
PPTX
Democratising biodiversity and genomics research: open and citizen science to...
GigaScience, BGI Hong Kong
 
PPTX
2015 genome-center
c.titus.brown
 
PPTX
Presentation (1).pptx
Krishna20539
 
PPTX
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
PPTX
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
GigaScience, BGI Hong Kong
 
PPTX
Bionimbus Cambridge Workshop (3-28-11, v7)
Robert Grossman
 
PDF
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...
Dr. Haxel Consult
 
PDF
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
Ganesan Narayanasamy
 
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
National Information Standards Organization (NISO)
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
The Statistical and Applied Mathematical Sciences Institute
 
Big Data, The Community and The Commons (May 12, 2014)
Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Robert Grossman
 
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
makdul
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
fionabrinkman
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Databricks
 
Tag.bio aws public jun 08 2021
Sanjay Padhi, Ph.D
 
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
TranSMART: How open source software revolutionizes drug discovery through cro...
keesvb
 
From algorithms to advancing care: genomics data drives progress
Jack DiGiovanna
 
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
Databricks
 
Democratising biodiversity and genomics research: open and citizen science to...
GigaScience, BGI Hong Kong
 
2015 genome-center
c.titus.brown
 
Presentation (1).pptx
Krishna20539
 
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
GigaScience, BGI Hong Kong
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Robert Grossman
 
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...
Dr. Haxel Consult
 
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
Ganesan Narayanasamy
 
Ad

More from Bill Liu (20)

PDF
Walk Through a Real World ML Production Project
Bill Liu
 
PDF
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Bill Liu
 
PDF
Productizing Machine Learning at the Edge
Bill Liu
 
PPTX
Transformers in Vision: From Zero to Hero
Bill Liu
 
PDF
Deep AutoViML For Tensorflow Models and MLOps Workflows
Bill Liu
 
PDF
Metaflow: The ML Infrastructure at Netflix
Bill Liu
 
PDF
Practical Crowdsourcing for ML at Scale
Bill Liu
 
PDF
Building large scale transactional data lake using apache hudi
Bill Liu
 
PDF
Deep Reinforcement Learning and Its Applications
Bill Liu
 
PDF
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Bill Liu
 
PDF
Build computer vision models to perform object detection and classification w...
Bill Liu
 
PDF
Causal Inference in Data Science and Machine Learning
Bill Liu
 
PDF
Weekly #106: Deep Learning on Mobile
Bill Liu
 
PDF
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Bill Liu
 
PDF
AISF19 - On Blending Machine Learning with Microeconomics
Bill Liu
 
PDF
AISF19 - Travel in the AI-First World
Bill Liu
 
PDF
AISF19 - Unleash Computer Vision at the Edge
Bill Liu
 
PDF
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Bill Liu
 
PDF
Toronto meetup 20190917
Bill Liu
 
PPTX
Feature Engineering for NLP
Bill Liu
 
Walk Through a Real World ML Production Project
Bill Liu
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Bill Liu
 
Productizing Machine Learning at the Edge
Bill Liu
 
Transformers in Vision: From Zero to Hero
Bill Liu
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Bill Liu
 
Metaflow: The ML Infrastructure at Netflix
Bill Liu
 
Practical Crowdsourcing for ML at Scale
Bill Liu
 
Building large scale transactional data lake using apache hudi
Bill Liu
 
Deep Reinforcement Learning and Its Applications
Bill Liu
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Bill Liu
 
Build computer vision models to perform object detection and classification w...
Bill Liu
 
Causal Inference in Data Science and Machine Learning
Bill Liu
 
Weekly #106: Deep Learning on Mobile
Bill Liu
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Bill Liu
 
AISF19 - On Blending Machine Learning with Microeconomics
Bill Liu
 
AISF19 - Travel in the AI-First World
Bill Liu
 
AISF19 - Unleash Computer Vision at the Edge
Bill Liu
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Bill Liu
 
Toronto meetup 20190917
Bill Liu
 
Feature Engineering for NLP
Bill Liu
 

Recently uploaded (20)

PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Doc9.....................................
SofiaCollazos
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Doc9.....................................
SofiaCollazos
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 

Big Data and AI in Fighting Against COVID-19

  • 1. Big Data and AI In Fighting Against Covid-19 -- Andrew Zhang [email protected] 7/8/2020
  • 3. 1. Introduction 2. Supercomputers for Scientific Research 3. Covid-19 Open Data Lake 4. NLP and BERT to answer scientific questions Agenda
  • 4. Speaker: Andrew Zhang Senior Tech Acct Manager at AWS, his specialties are big data, machine learning, and HPC. Before joining Amazon, Andrew was a data science engineer with IBM. His interest is scaling machine learning in a hybrid multi-cloud enterprise environment. Previously, Andrew was an enterprise architect with Novartis Pharmaceuticals.
  • 7. Extensive research in bioinformatics, epidemiology, and molecular modeling to understand the treatment and develop strategies Bringing together leaders to provide access to the world’s most powerful high- performance computing resources. Covid-19 High Performance Computing Consortium
  • 8. Covid-19 High Performance Computing Consortium https://mit-satori.github.io/
  • 9. Covid-19 Active Research Projects “We have identified two target proteins that generate novel molecules to inhibit the relevant proteins. The compute capacity will enable us to run and optimize our neural networks to generate better molecules and estimate their binding affinity to the target proteins, drug-likeness and ADMET properties. Our work will evolve to use 3D SMILES (currently at 2D) and other improvement.” “ We are working with a team who have developed a device to allow safe ventilator splitting between 2 or more patients. We made the software to guide device selection based on the patient's respiratory states, but we want the app to allow for just lookup into pre-computed values from the …” We have designed a mobile app and a technological platform, compliant to the European legislation, which enables unidentified contact/exposure information of users to be efficiently collected in a fully anonymous way. https://covid19-hpc-consortium.org/
  • 11. Covid-19 Tracking and Prediction COVID-19 confirmed cases and deaths Genomic epidemiological tracking Hospital resource utilization modeling This is a visual representation of the number of confirmed cases (counties) and deaths (circles). Data Source: COVID-19 data sources: the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE. Genomic epidemiology of novel coronavirus which provides real- time tracking of pathogen evolution (click to play the transmissions and phylogeny) Hospital resource utilization modeling Data Source: University of Washington’s Institute of Health and Metrics Evaluation (IHME) COVID-19 projections. Source: DataBricks
  • 12. Covid-19 Research and Diagnosis Answer Key Questions from Scientific Literature Read COVID-19 X-ray or CT image • What is known about transmission, incubation, and environmental stability? • What do we know about COVID-19 risk factors? • What do we know about virus genetics, origin, and evolution? • What has been published about medical care? While PCR tests offer many advantages they are physical things that require shipping the test or the sample. X-ray machines can be plugged in to screen patients as long as they have electricity. AI tools can help general practitioners to triage and treat patients. Companies are developing AI tools and deploying them at hospitals Wired 2020. Source: IEEESource: Kaggle
  • 13. Open Data Lake: Query and Visualization (Amazon) • Global Coronavirus (COVID-19) Data – Tracks confirmed COVID-19 cases in provinces, states, and countries across the world with a breakdown to the county level in the US. • Coronavirus (COVID-19) Data in the United States – Tracks confirmed cases and deaths in the US by state and county. • Coronavirus Disease (COVID-19) Testing Data – Tracks the number of people tested, pending tests, and positive and negative tests for COVID- 19. • USA Hospital Beds – COVID-19 – Data on hospital beds and their utilization in the US. • COVID-19 Open Research Dataset (CORD-19) – A collection of over 45,000 research articles (over 33,000 with full text) about COVID-19, SARS-CoV-2, and related coronaviruses. AWS has preprocessed and enriched these with annotations extracted from Amazon Comprehend Medical. • Amazon: S3 Explorer https://dj2taa9i652rf.cloudfront.net/ • Amazon: Glue Simple Cost Effective ETL https://aws.amazon.com/glue/ • Amazon: Athena a serverless SQL query engine https://aws.amazon.com/athena/ • Amazon: QuickSight https://aws.amazon.com/quicksight/ A public data lake for analysis of COVID-19 data | AWS Big ... QuickSight Dashboard
  • 14. Open Data Lake: Query and Visualization (Google) COVID-19 data from Johns Hopkins Center for Systems Science and Engineering OpenStreetMap Public Dataset : World map including healthcare provider locations Global Health Dataset from The World Bank : Global health and population trends asked questions and tips to get started. New York Times COVID-19 database: The New York Times' COVID-19 database based on US health agency reports. ECDC COVID-19 Cases by Country : COVID-19 cases by country as reported by the European Centre for Disease Prevention and Control. USAFacts COVID-19 Cases by US County : COVID-19 cases by county aggregated by USAFacts from US health agencies. Big Query
  • 15. 15 NLP and BERT to answer scientific questions
  • 16. 16 NLP and BERT Source • BERT, as a contextual model, captures these relationships in a bidirectional way. • I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit. • The pre-trained model on massive datasets enables anyone building natural language processing to use this free powerhouse. • BERT theoretically allows us to smash multiple benchmarks with minimal task-specific fine- tuning. • Corporate data to create different application.
  • 17. 17 COVID-19 Open Research Dataset Challenge https://www.whitehouse.gov/briefings-statements/call-action-tech-community-new-machine-readable-covid-19-dataset Every scientist working on a cure or vaccine must understand this prior research. 158,000 Coronavirus scholarly articles including 75,000 with full text. • What is known about transmission, incubation, and environmental stability? • What do we know about COVID-19 risk factors? • What do we know about vaccines and therapeutics? • What do we know about virus genetics, origin, and evolution? • What has been published about medical care? • What has been published about ethical and social science considerations? • What do we know about non-pharmaceutical interventions? • What do we know about diagnostics and surveillance?
  • 18. Explore Covid-19 Scientific Literature (1) Generate Summaries from Abstracts by training Summarizer Model Databricks Generate a WordCloud from all the titles
  • 19. Explore Covid-19 Scientific Literature (2)All Task/Challenges answers using NLP: We will use different libraries to get answer from these papers. • Bert QA Model (Pretrained by SQuAD dataset) • BERT summary Model • Python Google translate package • HTML for visualize result All Flow: • Using QA Model, read all paper's abstract then find answer for all tasks • Concatenate Top 50 confident answers to be article, and using Summary model to write summary of answers • Translate multiple language by google translate • Write HTML to show summary of all ‘papers answer for all tasks. Kaggle
  • 20. 20 Explore Covid-19 Scientific Literature (3) Google 1. When the user asks an initial question, the tool not only returns a set of papers (like in a traditional search) but also highlights snippets from the paper that are possible answers to the question. 2. The user can review the snippets and quickly make a decision on whether or not that paper is worth further reading. 3. If the user is satisfied with the initial set of papers and snippets, we have added functionality to pose follow- up questions, which act as new queries for the original set of retrieved articles.
  • 21. 21 Explore Covid-19 Scientific Literature (4) Amazon AWS COVID-19 knowledge graph (CKG) using AWS CloudFormation and Amazon Neptune, and query the graph using Jupyter notebooks hosted on Amazon SageMaker in your AWS account. The CKG aids in the exploration and analysis of the COVID-19 Open Research Dataset (CORD-19), hosted in the AWS COVID-19 data lake. The strength of the graph comes from the connections between scholarly articles, authors, scientific concepts, and institutions. The CKG also helps power the CORD-19 search page..