SlideShare a Scribd company logo
Workflow-Driven Geoinformatics
Applications and Training in the Big Data Era
İlkay ALTINTAŞ, Ph.D.
Chief Data Science Officer, San Diego Supercomputer Center
Founder and Director, Workflows for Data Science Center of Excellence
Data Science Today is Both a Big Data and a Big Compute Discipline
BIG DATA
COMPUTING AT
SCALE
Enables dynamic data-driven applications
Smart Manufacturing
Computer-Aided Drug Discovery
Personalized Precision Medicine
Smart Cities
Smart Grid and Energy Management
Disaster Resilience and Response
Requires:
• Data management
• Data-driven methods
• Scalable tools for
dynamic coordination
and resource
optimization
• Skilled interdisciplinary
workforce
New era of
data science!
New era of data
science!
Needs and Trends for the New Era Data Science
-- the Big Data Era Goals --
• data-driven
• dynamic
• process-driven
• collaborative
• accountable
• reproducible
• interactive
• heterogeneous
BigData
Insight
Action
Data Science
How does successful data science happen?
Insight Data Product
Big Data
Question
Exploratory
Analysis
and
Modeling
Insight
Geospatial Big Data
• Flood	of	new	data	sources	and	types
• Needs	new	data	management,	storage	and	analysis	
methods
• Too	big	for	a	single	server,	fast	growing	data	volume
• Requires	special	database	structures	that	can	handle	
data	variety
• Too	continuous	for	analysis	at	a	later	time,	with	
increasing	streaming	rate,	i.e.,	velocity
• Varying	degrees	of	uncertainty	in	measurements,	and	
other	veracity issues
• Provides	opportunities	for	scientific	understanding	at	
different	scales	more	than	ever,	i.e.,	potential	high	value
Real-time sensors
Weather forecast
Satellite imagery
Sea Surface Temperature
Measurements
Drone imagery
Insights amplify the value of data…
…, but there are many ways to get to insights.
What are some challenges to generate
value from geospatial big data?
The ‘scalability’ bottleneck
• Resources	needed	for	geospatial	big	data	(e.g.,	satellite	imagery)	analysis	
exceed	current	capabilities,	especially	in	an	on-demand	fashion
• Cloud computing	is	an	attractive	on-demand	decentralized	model
• Need	new	scheduling	capabilities
• on-demand	access	to	a	shared	configurable	resources	
• networks,	servers,	storage,	applications,	and	services
• Need	ability	to	easily	combine	users	environment	and	community	tools	together	in	
a	scalable	way
• Various	tools	with	different	computing	scalability	needs
• Cost!!!
The ‘sensor data’ bottleneck
• Data	streaming	in	at	various	rates
• “Big	Data” by	definition	in	its	volume,	variety,	velocity	and	viscosity
• Need	to	improve	veracity and	add	value by	providing	provenance- and	
standards-aware	on-the-fly	archival	capabilities
• QA/QC	and	automate	(real-time)	analysis	of	streaming	data	before	it	is	even	
archived.
• Often	low	signal-to-noise	ratio	requiring	new	methods
• Need	for	integration	of	new	streaming	data	technologies
The “workforce” bottleneck
• Geospatial	data	processing	requires	a	lot	of	expertise	
• GIS,	domain	expertise,	data	engineering,	scalable	computing,	machine	
learning,	…
• No	open	geospatially	enabled	big	data	science	education	platform
• Teach	not	just	technical	knowledge,	but	collaborative	work	culture	
and	ethics
Some P’s of Data Science Practice
Platforms
Process
People
Purpose?
Programmability
How can I get smart people
to collaborate and
communicate
to analyze data and
computing to generate
insight and solve a
question?
Focus	on	the	
question,	
not	the	
technology!
Workflows for Data Science
Center of Excellence at SDSC
Goal: Methodology and tool
development to build automated
and operational workflow-driven
solution architectures on big data
and HPC platforms.
Focus	on	the	
question,	
not	the	
technology! • Access and query data
• Support exploratory design
• Scale computational analysis
• Increase reuse
• Save time, energy and money
• Formalize and standardize
Real-Time	Hazards	Management
wifire.ucsd.edu
Data-Parallel	Bioinformatics
bioKepler.org
Scalable	Automated	Molecular	Dynamics	and	Drug	Discovery
nbcr.ucsd.edu
WorDS.sdsc.edu
So what is a workflow?
• Access and query data
• Support exploratory design
• Scale computational analysis
• Increase reuse
• Save time, energy and money
• Formalize and standardize
Scientific	workflows	emerged	as	an	
answer	to	the	need	to	combine	
multiple	Cyberinfrastructure	
components	in	automated	process	
networks.
Support for end-to-
end computational
scientific process
Workflows
are a Core CI
Technology
Workflow
Design
Reporting
Workflow
Monitoring
Workflow
Execution
Workflow
Scheduling
and
Execution
Planning
Run
Review
Provenance
Analysis
Deploy
and
Publish
Programmability
Ease of use, re-use, re-purpose
Scalability
From local experiments to large-scale runs
Reproducibility
Ability to validate, re-run, re-play
BUILD SHARE RUN LEARN
In addition, workflows today are…
• Key	integrator	for	(big	and	small)	data	science
• Encapsulations	of	scientific	knowledge
• Easy	to	share	bits	of	scientific	process
• e.g.,	as	research	objects
• Mostly	portable
• Facilitate	and	encourage	reproducible	science
• Track	provenance	at	each	step	of	science…	
• A	means	to	standardize	scientific	data	products
• Training	students	on	domain	tools	and	other	technical	methods
Example: Using geospatial big data for
wildfire predictions
Big Data Fire Modeling
Visualization
Monitoring
WIFIRE: A Scalable Data-Driven Monitoring, Dynamic
Prediction and Resilience Cyberinfrastructure for Wildfires
Fire Modeling Workflows in WIFIRE
Real-time sensors
Weather forecast
Fire perimeter
Landscape data
Monitoring &
fire mapping
Many Integrated Workflows and
Processing Tools
• Components	for	modeling	and	data	tools
• Understanding	of	geo	data	formats
• Transparent	interaction	with	data	and	computing	resources	
• Open	source	and	extensible
Union Filter Rasterize Find
Polygons
Closing the Loop using Big Data
-- Wildfire Behavior Modeling and Data Assimilation --
• Computational	costs	for	existing	
models	too	high	for	real-time	
analysis
• a	priori ->	a	posteriori
• Parameter	estimation	to	make	
adjustments	to	the	(input)	parameters	
• State	estimation	to	adjust	the	
simulated	fire	front	location	with	an	a	
posteriori	update/measurement	of	the	
actual	fire	front	location	Conceptual Data Assimilation Workflow with
Prediction and Update Steps using Sensor Data
Some Machine Learning Case Studies
• Smoke	and	fire	perimeter	detection	based	on	imagery
• Prediction	of	Santa	Ana	and	fire	conditions	specific	to	location
• Prediction	of	fuel	build	up	based	on	fire	and	weather	history
• NLP	for	understanding	local	conditions	based	on	radio	
communications
• Deep	learning	on	multi-spectra	imagery	for	high	resolution	fuel	maps
• Classification	project	to	generate	more	accurate	fuel	maps	(using	
Planet	labs	data)
Classification project to generate more
accurate fuel maps
• Accurate	and	up-to-date	fuel	maps	are	critical	for	
modeling	wildfire	rate	of	speed	and	potential	burn	
areas.
• Challenge:	
• USGS	Landfire provides	the	best	available	fuel	maps	
every	two	years.	
• The	WIFIRE	system	is	limited	by	these	potentially	2-year	
old	inputs.	 Fuel	maps	created	at	a	higher	temporal	
frequency	is	desired.
• Approach:	
• Using	high-resolution	satellite	imagery	and	deep	
learning	methods,	produce	surface	fuel	maps	of	San	
Diego	County	and	other	regions	in	Southern	California.
• Use	LandFire fuel	maps	as	the	target	variable,	the	
objective	is	create	a	classification	model	that	will	
provide	fuel	maps	at	greater	frequency	with	a	measure	
of	uncertainty.
Cluster 1: Short Grass
Summary
• Geospatial	big	data	has	all	the	typical	big	data	challenges
• Lessons	learned	from	other	disciplines	to	deal	with	these	challenges	
should	be	applied
• Workflows	can	be	used	both	for	managing	scalable	coordination	and	
training	students	and	workforce
• Dynamic	data-driven	integration	of	machine	learning,	data	
assimilation	and	modeling	is	of	potential	use	to	many	geo	
applications
WIFIRE Team: It takes a village!
• PhD	level	researchers	
• Professional	software	
developers
• 24	undergraduate	students
• UC	San	Diego
• UC	Merced
• MURPA	University
• University	of	Queensland	
• 1	high	school	student
• 4	MSc	and	5	MAS	students
• 2	PhD	students	(UMD)
• 1	postdoctoral	researcher
UMD - Fire modeling
UCSD MAE - Data assimilation
SDSC -
Cyberinfrastructure,
Workflows,
Data engineering,
Machine Learning,
Information Visualization,
HPWREN
Calit2/QI-
Cyberinfrastructure, GIS,
Advanced Visualization,
Machine Learning,
Urban Sustainability,
HPWREN
SIO - HPWREN
WorDSDirector:		Ilkay	Altintas,	Ph.D.
Email:	altintas@sdsc.edu
Questions?
Part of the presented work is funded by NSF, DOE, NIH, UC San Diego and various industry partners.

More Related Content

PDF
CTO Perspectives: What's Next for Data Management and Healthcare?
Health Catalyst
 
PDF
Data Science: An Emerging Field for Future Jobs
Jian Qin
 
PDF
Big data hype or reality
E2 Partners
 
PDF
Big data user group big data application - mar 2016
Chulalongkorn University
 
PPTX
Big data Introduction
Musa Kalimullah
 
PPTX
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
PPTX
Using Big Data for Improved Healthcare Operations and Analytics
Perficient, Inc.
 
PDF
Myths about data science and big data analytics
Chulalongkorn University
 
CTO Perspectives: What's Next for Data Management and Healthcare?
Health Catalyst
 
Data Science: An Emerging Field for Future Jobs
Jian Qin
 
Big data hype or reality
E2 Partners
 
Big data user group big data application - mar 2016
Chulalongkorn University
 
Big data Introduction
Musa Kalimullah
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
Using Big Data for Improved Healthcare Operations and Analytics
Perficient, Inc.
 
Myths about data science and big data analytics
Chulalongkorn University
 

What's hot (19)

PDF
IT & Innovation - short summary
Perry Nouwens
 
PPTX
Massive-Scale Analytics Applied to Real-World Problems
inside-BigData.com
 
PDF
elgendy2014.pdf
Akuhuruf
 
PDF
Data quality management Basic
Khaled Mosharraf
 
PDF
Walmart Big Data Expo
BigDataExpo
 
PPTX
Big Data Analytics - It is here and now!
Farhan Khan
 
PDF
Data Virtualization Modernizes Biobanking
Denodo
 
PPT
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
sethuraman R
 
PDF
Hadoop and Data Virtualization - A Case Study by VHA
Denodo
 
PPTX
Managing Data Science | Lessons from the Field
Domino Data Lab
 
PDF
Big Data & DS Analytics for PAARL
Philippine Association of Academic/Research Librarians
 
PPSX
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Tomasz Bednarz
 
PPTX
Cloud-native Enterprise Data Science Teams
Boston Consulting Group
 
PPTX
Big Data Introduction
Tiago Knoch
 
PPTX
Big Data
ipower softwares
 
PDF
[Infographic] Uniting Internet of Things and Big Data
SnapLogic
 
PDF
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
PDF
Making an impact with data science
Jordan Engbers
 
PPTX
Data Warehousing: Bridging Islands of Health Information Systems
removed_62798267384a091db5c693ad7f1cc5ac
 
IT & Innovation - short summary
Perry Nouwens
 
Massive-Scale Analytics Applied to Real-World Problems
inside-BigData.com
 
elgendy2014.pdf
Akuhuruf
 
Data quality management Basic
Khaled Mosharraf
 
Walmart Big Data Expo
BigDataExpo
 
Big Data Analytics - It is here and now!
Farhan Khan
 
Data Virtualization Modernizes Biobanking
Denodo
 
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
sethuraman R
 
Hadoop and Data Virtualization - A Case Study by VHA
Denodo
 
Managing Data Science | Lessons from the Field
Domino Data Lab
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Tomasz Bednarz
 
Cloud-native Enterprise Data Science Teams
Boston Consulting Group
 
Big Data Introduction
Tiago Knoch
 
[Infographic] Uniting Internet of Things and Big Data
SnapLogic
 
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
Making an impact with data science
Jordan Engbers
 
Data Warehousing: Bridging Islands of Health Information Systems
removed_62798267384a091db5c693ad7f1cc5ac
 
Ad

Similar to Workflow-Driven Geoinformatics Applications and Training in the Big Data Era (20)

PPTX
An Overview of VIEW
Shiyong Lu
 
PDF
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
PDF
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
PDF
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
PPTX
Scientific workflow-overview-2012-01-rev-2
Terence Critchlow
 
PPTX
FAIR Computational Workflows
Carole Goble
 
PDF
Data legend dh_benelux_2017.key
Richard Zijdeman
 
PPTX
Advances in Scientific Workflow Environments
Carole Goble
 
PPTX
Big data ppt diala
diala wedyan
 
PPT
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Carole Goble
 
PDF
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
Ilkay Altintas, Ph.D.
 
DOCX
95Orchestrating Big Data Analysis Workflows in the Cloud.docx
fredharris32
 
DOCX
95Orchestrating Big Data Analysis Workflows in the Cloud.docx
blondellchancy
 
PPTX
Big Data Pipelines and Machine Learning at Uber
Sudhir Tonse
 
PDF
Invited Talk for EUDAT Workshop in Barcelona
Ilkay Altintas, Ph.D.
 
PPTX
FAIR Computational Workflows
Carole Goble
 
PPT
Session 46 - Principles of workflow management and execution
ISSGC Summer School
 
PDF
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 
PPTX
Scientific Workflows Systems :In Drug discovery informatics
Khaled Tumbi
 
PDF
TUW - Quality of data-aware data analytics workflows
Hong-Linh Truong
 
An Overview of VIEW
Shiyong Lu
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
Scientific workflow-overview-2012-01-rev-2
Terence Critchlow
 
FAIR Computational Workflows
Carole Goble
 
Data legend dh_benelux_2017.key
Richard Zijdeman
 
Advances in Scientific Workflow Environments
Carole Goble
 
Big data ppt diala
diala wedyan
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Carole Goble
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
Ilkay Altintas, Ph.D.
 
95Orchestrating Big Data Analysis Workflows in the Cloud.docx
fredharris32
 
95Orchestrating Big Data Analysis Workflows in the Cloud.docx
blondellchancy
 
Big Data Pipelines and Machine Learning at Uber
Sudhir Tonse
 
Invited Talk for EUDAT Workshop in Barcelona
Ilkay Altintas, Ph.D.
 
FAIR Computational Workflows
Carole Goble
 
Session 46 - Principles of workflow management and execution
ISSGC Summer School
 
Sharing massive data analysis: from provenance to linked experiment reports
Gaignard Alban
 
Scientific Workflows Systems :In Drug discovery informatics
Khaled Tumbi
 
TUW - Quality of data-aware data analytics workflows
Hong-Linh Truong
 
Ad

Recently uploaded (20)

PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
Chad Readey - An Independent Thinker
Chad Readey
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Employee Salary Presentation.l based on data science collection of data
barridevakumari2004
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 

Workflow-Driven Geoinformatics Applications and Training in the Big Data Era

  • 1. Workflow-Driven Geoinformatics Applications and Training in the Big Data Era İlkay ALTINTAŞ, Ph.D. Chief Data Science Officer, San Diego Supercomputer Center Founder and Director, Workflows for Data Science Center of Excellence
  • 2. Data Science Today is Both a Big Data and a Big Compute Discipline BIG DATA COMPUTING AT SCALE Enables dynamic data-driven applications Smart Manufacturing Computer-Aided Drug Discovery Personalized Precision Medicine Smart Cities Smart Grid and Energy Management Disaster Resilience and Response Requires: • Data management • Data-driven methods • Scalable tools for dynamic coordination and resource optimization • Skilled interdisciplinary workforce New era of data science!
  • 3. New era of data science! Needs and Trends for the New Era Data Science -- the Big Data Era Goals -- • data-driven • dynamic • process-driven • collaborative • accountable • reproducible • interactive • heterogeneous BigData Insight Action Data Science
  • 4. How does successful data science happen? Insight Data Product Big Data Question Exploratory Analysis and Modeling Insight
  • 5. Geospatial Big Data • Flood of new data sources and types • Needs new data management, storage and analysis methods • Too big for a single server, fast growing data volume • Requires special database structures that can handle data variety • Too continuous for analysis at a later time, with increasing streaming rate, i.e., velocity • Varying degrees of uncertainty in measurements, and other veracity issues • Provides opportunities for scientific understanding at different scales more than ever, i.e., potential high value Real-time sensors Weather forecast Satellite imagery Sea Surface Temperature Measurements Drone imagery
  • 6. Insights amplify the value of data… …, but there are many ways to get to insights.
  • 7. What are some challenges to generate value from geospatial big data?
  • 8. The ‘scalability’ bottleneck • Resources needed for geospatial big data (e.g., satellite imagery) analysis exceed current capabilities, especially in an on-demand fashion • Cloud computing is an attractive on-demand decentralized model • Need new scheduling capabilities • on-demand access to a shared configurable resources • networks, servers, storage, applications, and services • Need ability to easily combine users environment and community tools together in a scalable way • Various tools with different computing scalability needs • Cost!!!
  • 9. The ‘sensor data’ bottleneck • Data streaming in at various rates • “Big Data” by definition in its volume, variety, velocity and viscosity • Need to improve veracity and add value by providing provenance- and standards-aware on-the-fly archival capabilities • QA/QC and automate (real-time) analysis of streaming data before it is even archived. • Often low signal-to-noise ratio requiring new methods • Need for integration of new streaming data technologies
  • 10. The “workforce” bottleneck • Geospatial data processing requires a lot of expertise • GIS, domain expertise, data engineering, scalable computing, machine learning, … • No open geospatially enabled big data science education platform • Teach not just technical knowledge, but collaborative work culture and ethics
  • 11. Some P’s of Data Science Practice Platforms Process People Purpose? Programmability
  • 12. How can I get smart people to collaborate and communicate to analyze data and computing to generate insight and solve a question? Focus on the question, not the technology!
  • 13. Workflows for Data Science Center of Excellence at SDSC Goal: Methodology and tool development to build automated and operational workflow-driven solution architectures on big data and HPC platforms. Focus on the question, not the technology! • Access and query data • Support exploratory design • Scale computational analysis • Increase reuse • Save time, energy and money • Formalize and standardize Real-Time Hazards Management wifire.ucsd.edu Data-Parallel Bioinformatics bioKepler.org Scalable Automated Molecular Dynamics and Drug Discovery nbcr.ucsd.edu WorDS.sdsc.edu
  • 14. So what is a workflow? • Access and query data • Support exploratory design • Scale computational analysis • Increase reuse • Save time, energy and money • Formalize and standardize Scientific workflows emerged as an answer to the need to combine multiple Cyberinfrastructure components in automated process networks.
  • 15. Support for end-to- end computational scientific process Workflows are a Core CI Technology Workflow Design Reporting Workflow Monitoring Workflow Execution Workflow Scheduling and Execution Planning Run Review Provenance Analysis Deploy and Publish Programmability Ease of use, re-use, re-purpose Scalability From local experiments to large-scale runs Reproducibility Ability to validate, re-run, re-play BUILD SHARE RUN LEARN
  • 16. In addition, workflows today are… • Key integrator for (big and small) data science • Encapsulations of scientific knowledge • Easy to share bits of scientific process • e.g., as research objects • Mostly portable • Facilitate and encourage reproducible science • Track provenance at each step of science… • A means to standardize scientific data products • Training students on domain tools and other technical methods
  • 17. Example: Using geospatial big data for wildfire predictions
  • 18. Big Data Fire Modeling Visualization Monitoring WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires
  • 19. Fire Modeling Workflows in WIFIRE Real-time sensors Weather forecast Fire perimeter Landscape data Monitoring & fire mapping
  • 20. Many Integrated Workflows and Processing Tools • Components for modeling and data tools • Understanding of geo data formats • Transparent interaction with data and computing resources • Open source and extensible Union Filter Rasterize Find Polygons
  • 21. Closing the Loop using Big Data -- Wildfire Behavior Modeling and Data Assimilation -- • Computational costs for existing models too high for real-time analysis • a priori -> a posteriori • Parameter estimation to make adjustments to the (input) parameters • State estimation to adjust the simulated fire front location with an a posteriori update/measurement of the actual fire front location Conceptual Data Assimilation Workflow with Prediction and Update Steps using Sensor Data
  • 22. Some Machine Learning Case Studies • Smoke and fire perimeter detection based on imagery • Prediction of Santa Ana and fire conditions specific to location • Prediction of fuel build up based on fire and weather history • NLP for understanding local conditions based on radio communications • Deep learning on multi-spectra imagery for high resolution fuel maps • Classification project to generate more accurate fuel maps (using Planet labs data)
  • 23. Classification project to generate more accurate fuel maps • Accurate and up-to-date fuel maps are critical for modeling wildfire rate of speed and potential burn areas. • Challenge: • USGS Landfire provides the best available fuel maps every two years. • The WIFIRE system is limited by these potentially 2-year old inputs. Fuel maps created at a higher temporal frequency is desired. • Approach: • Using high-resolution satellite imagery and deep learning methods, produce surface fuel maps of San Diego County and other regions in Southern California. • Use LandFire fuel maps as the target variable, the objective is create a classification model that will provide fuel maps at greater frequency with a measure of uncertainty. Cluster 1: Short Grass
  • 24. Summary • Geospatial big data has all the typical big data challenges • Lessons learned from other disciplines to deal with these challenges should be applied • Workflows can be used both for managing scalable coordination and training students and workforce • Dynamic data-driven integration of machine learning, data assimilation and modeling is of potential use to many geo applications
  • 25. WIFIRE Team: It takes a village! • PhD level researchers • Professional software developers • 24 undergraduate students • UC San Diego • UC Merced • MURPA University • University of Queensland • 1 high school student • 4 MSc and 5 MAS students • 2 PhD students (UMD) • 1 postdoctoral researcher UMD - Fire modeling UCSD MAE - Data assimilation SDSC - Cyberinfrastructure, Workflows, Data engineering, Machine Learning, Information Visualization, HPWREN Calit2/QI- Cyberinfrastructure, GIS, Advanced Visualization, Machine Learning, Urban Sustainability, HPWREN SIO - HPWREN
  • 26. WorDSDirector: Ilkay Altintas, Ph.D. Email: [email protected] Questions? Part of the presented work is funded by NSF, DOE, NIH, UC San Diego and various industry partners.