SlideShare a Scribd company logo
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Creating a Data Science Ecosystem for Scientific,
Societal and Educational Impact
İlkay ALTINTAŞ, Ph.D.
Chief Data Science Officer, San Diego Supercomputer Center
Division Director, Cyberinfrastructure Research, Education and Development
Founder and Director, Workflows for Data Science Center of Excellence
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
SAN DIEGO SUPERCOMPUTER CENTER at UC San Diego
Providing Cyberinfrastructure for Research and Education
• Established	as	a	national	supercomputer	
resource	center	in	1985	by	NSF
• A	world	leader	in	HPC,	data-intensive	computing,	
and	scientific	data	management
• Current	strategic	focus	on	“Big	Data”,	“versatile	
computing”,	and	“life	sciences	applications”
Recent Innovative Architectures
• Gordon: First Flash-based
Supercomputer for Data-intensive
Apps
• Comet: Serving the Long Tail of
Science
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Data Science Today is Both a Big Data and a Big Compute Discipline
BIG DATA
COMPUTING AT
SCALE
Enables dynamic data-driven applications
Smart Manufacturing
Computer-Aided Drug Discovery
Personalized Precision Medicine
Smart Cities
Smart Grid and Energy Management
Disaster Resilience and Response
Requires:
• Data management
• Data-driven methods
• Scalable & dynamic
process coordination
• Resource optimization
• Skilled interdisciplinary
workforce
New era of
data science!
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
What is Data Science?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Ultimate Goal
BigData
Insight
Action
Data Science
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How does successful data science happen?
Insight Data Product
“Big” Data
Question
Exploratory
Analysis
and
Modeling
Insight
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Customer
Demographic
Previous
Purchases
Book reviews
What kind of
books does this
customer like?
Book
recommendations
Example: Book Recommendations
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Model of customer’s
book preferences
New book
information
Who is likely to
like this book?
Find Potential Audience for a New Book
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Action to market
the book to the
right audience
Who is likely to
like this book?
Market a New Book
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Action to market
the book to the
right audience
Who is likely to
like this book?
Insight Action
Market a New Book
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Historical data Near real-time data
Prediction
Creating Actionable Information
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Prediction
Action
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Why is the increased interest
in Data Science?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
+
Big Data
Scalable Computing
Anywhere Anytime
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
What is and How Much Data Is Big Data?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
204 Million emails
200,000 photos
1.8 Million likes
2.78 Million video views
72 hours of video uploads
Every minute…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Velocity
Variety
Volume Scalable batch
processing
Stream processing
Extensible data storage,
access and integration
Big Data Characteristics
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Nearly every problem today is
transformed by big data.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Example: Geospatial Big Data
• Flood	of	new	data	sources	and	types
• Needs	new	data	management,	storage	and	analysis	
methods
• Too	big	for	a	single	server,	fast	growing	data	volume
• Requires	special	database	structures	that	can	handle	
data	variety
• Too	continuous	for	analysis	at	a	later	time,	with	
increasing	streaming	rate,	i.e.,	velocity
• Varying	degrees	of	uncertainty	in	measurements,	and	
other	veracity issues
• Provides	opportunities	for	scientific	understanding	at	
different	scales	more	than	ever,	i.e.,	potential	high	value
Real-time sensors
Weather forecast
Satellite imagery
Sea Surface Temperature
Measurements
Drone imagery
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Example: Biomedical Big Data http://nbcr.ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Scientific Big Data By the Numbers…
• HPWREN:	hpwren.ucsd.edu
• 30	TB	of	data	annually	
• MODIS:	modis.gsfc.nasa.gov
• 219	TB	of	data annually
• Precision	Medicine:	Genome	sequence
• 4	EB	(1018 bytes)	of	data	in	2016	(Ref:	www.fastcompany.com)
• LIGO,	Deep	Space	Network,	Protein	Data	Bank,	…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
100 MBs ~= couple of volumes
of Encyclopedias
A DVD ~= 5 GBs
1 TB ~= 300 hours of
good quality video
LHC ~= 15 PBs a year
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Exponential
data growth!
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
1021
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How do we find the connections
and answer questions that
benefit the society?
“We	are	drowning	in	
information	and	
starving	for	knowledge”	
– John	Naisbitt
Source: Megatrends, 1982
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How do we amplify the value of Big Data?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Create an Ecosystem that Enables
Needs and Best Practices
• data-driven
• scalable
• dynamic
• process-driven
• collaborative
• accountable
• reproducible
• interactive
• heterogeneous
• includes many different expertise
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
A Typical Collaborative Data Science Ecosystem
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
ACQUIRE PREPARE ANALYZE REPORT ACT
Approach:
Focus on the Process and Team Work
to Answer a Question
…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
ACQUIRE PREPARE ANALYZE REPORT ACT
Basic Steps
in a Data
Science
Process
• Import	raw	dataset	into	your	analytics	
platform
• Explore	&	Visualize
• Perform	Data	Cleaning
• Feature	Selection
• Model	Selection
• Analyze	the	results
• Present	your	findings
• Use	them
ACQUIRE
PREPARE
ANALYZE
REPORT
ACT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
Process-driven
Solution
Architectures
and the Role of
Workflows
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
…
COORDINATION AND
WORKFLOW MANAGEMENT
DATA INTEGRATION
AND PROCESSING
DATA MANAGEMENT
AND STORAGE
COMMUNICATION AND FEEDBACK
EXPLORATION
SCALABILITY
PROVENANCE
SECURITY
ACQUIRE PREPARE ANALYZE REPORT ACT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
WORKFLOW MANAGEMENT
Application Integration, Coordination, Optimization,
Communication, Reporting
COMPOSABLE DATA SERVICES
Deep Learning, Analytics, HPC, Training, Notebooks
COMPOSABLE SYSTEMS
GPU, CPU, Big Data, Neuromorphic, Networks, Storage, …
PROVENANCE
SECURITY
RESOURCE MANAGEMENT
Kubernetes Container Cloud
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
SOLUTION	ARCHITECTURE
DOMAIN	KNOWLEDGE
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Using dynamic workflows for data
science…
… requires methodology,
research and tool development.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Workflows for Data Science
Center of Excellence at SDSC
Goal: Methodology and tool
development to build automated
and operational workflow-driven
solution architectures on big data
and HPC platforms.
Focus	on	the	
question,	
not	the	
technology!
Real-Time	Hazards	Management
wifire.ucsd.edu
Data-Parallel	Bioinformatics
bioKepler.org
Scalable	Automated	Molecular	Dynamics	and	Drug	Discovery
nbcr.ucsd.edu
WorDS.sdsc.edu
• Access and query data
• Support exploratory design
• Scale computational analysis
• Increase reuse and reproducibility
• Save time, energy and money
• Formalize and standardize
• Train
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Balance of:
• team building
• process management
• performance optimization
• provenance tracking
• training and education
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
While working with experts on…
• data modeling and integration
• data management services
• analytical methods
• communication and visualization
• domain expertise
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How can I get smart people
to collaborate and
communicate?
…to utilize data and computing to
generate insights and solve a question.
Focus	on	the	
question,	
not	the	
technology!
Team Building
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Process Management
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Process for Practice
of Data Science
Workflow
Design
Reporting
Workflow
Monitoring
Workflow
Execution
Workflow
Scheduling
and Execution
Planning
Execution
Review
Provenance
Analysis
Deploy
and
Publish
Programmability
Ease of use, iteration, interaction, re-use, re-purpose
Scalability
From local experiments to large-scale runs
Reproducibility
Ability to validate, re-run, re-play
BUILD
and
EXPLORE
SHARE SCALE
and
ITERATE
LEARN
and
REPORT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Some P’s in PPoDS
Platforms
Process
People
Problem
or
Purpose
?
Programmability
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Metrics for accountability should be
built into the process.
Timeline
Purpose
Expectations
Planning of deliverables
Cost
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Treat Each Step in the Solution Process
as a Conceptual Pod
Pod è sub-process
Defined by:
• Purpose and goal
• Stakeholders
• Expectations
• Key questions to be answered, production/consumption relationships, needs, dependencies, limits, …
• Contracts
• Performance, economic, accuracy, policy, privacy, reproducibility, political, …
• Knowns
• Known unknowns
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Zooming into a simple example…
PREPARE ANALYZE
Data	
Exploration
Schema	
Integration
Query	
Processing
Machine	
Learning
…
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
The insights need to be evaluated to
turn them into action.
Platforms
Process
People
Purpose?
Programmability
Metrics Product
Insight
Action
?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Implementation of the actions needs
many things working together.
Process
StakeholdersAutomation
Action
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
GIS
Files
Sensor
NoSQL
Social
Database
Action
The impact of the
actions should be
monitored, measured
and evaluated.
Evaluation
Measure
Monitor
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Evaluation will determine
the next steps.
Favorable
Results?
Revisit?
Further
Opportunities?
Action
Evaluation
Real-time	
Action?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
COORDINATION AND WORKFLOW MANAGEMENT
…
http://kepler-project.org
National	
Resources
(Gordon) (Comet)
(Stampede)(Lonestar)
Cloud	
Resources
Execution Platforms
Local	Cluster	Resources
ACQUIRE PREPARE ANALYZE REPORT ACT
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dynamic data-driven coordination
& resource optimization
Requires:
Ability to explore and scale on
multiple platforms
Workflows increasingly becoming the dynamic
operations research tool for science.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Where do we make use of such
capabilities?
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Data Science for Social Good
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Smart City and Hazards IoT Applications
• Many	sensed	and	organizational	open	datasets
• Potential	to	improve	public	safety	and	quality	of	life
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
How do we Better Predict Wildfire Behavior?
• Wildfires	are	critical	for	ecology,	but	volatile
• Fuel	load	is	high	due	to	fire	suppression	over	the	
last	century
• Drought,	higher	temperatures
• Better	prevention,	prediction	and	maintenance	of	
wildfires	is	needed
Photo of Harris Fire (2007) by former Fire Captain Bill
Clayton
Disaster management of (ongoing) wildfires heavily relies on
understanding their Direction and Rate of Spread (RoS).
Fire	is	Part	of	the	Natural	Ecology….	
…	but	requires	Monitoring,	Prediction	and	Resilience
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
What was lacking is…
a dynamic system integration of
real-time sensor networks, satellite imagery, near-real
time data management tools, wildfire simulation tools,
and connectivity to emergency command centers
.…. before, during and after a firestorm.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Big Data Fire Modeling
Visualization
Monitoring
WIFIRE: A Scalable Data-Driven Monitoring, Dynamic
Prediction and Resilience Cyberinfrastructure for Wildfires
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
High Performance Wireless
Research and Education
Network
FARSITE
http://hpwren.ucsd.edu/cameras
>160 Meteorological Sensors and Growing
Major	success	to	bring	
internet	to	incident	
command	in	the	field.	Used	
in	over	20	fires	over	time.
Most	popular	
operational	fire	
behavior	
modeling	system.
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Closing the Loop using Big Data
-- Wildfire Behavior Modeling and Data Assimilation --
• Computational	costs	for	existing	
models	too	high	for	real-time	
analysis
• a	priori ->	a	posteriori
• Parameter	estimation	to	make	
adjustments	to	the	(input)	parameters	
• State	estimation	to	adjust	the	
simulated	fire	front	location	with	an	a	
posteriori	update/measurement	of	the	
actual	fire	front	location	Conceptual Data Assimilation Workflow with
Prediction and Update Steps using Sensor Data
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Fire Modeling Workflows in WIFIRE
Real-time sensors
Weather forecast
Fire perimeter
Landscape data
Monitoring &
fire mapping
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Firemap Tool
• A	web-based	GIS	
environment:
• access	information	
related	to	fire	behavior	
• analyze	what-if	
scenarios
• model	real-time	fire	
behavior
• generate	reports
• Powered	by	WIFIRE
Firemap	
Web	Interface
WIFIRE	Data	Interfaces WIFIRE		Workflows
Computing	Infrastructure
http://firemap.sdsc.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Data-Driven Fire Progression
Prediction Over Three Hours
Collaboration with LA and
SD Fire Departments
http://firemap.sdsc.edu
August 2016 – Blue Cut Fire
Tahoe	and	Nevada	Bureau	
of	Land	Management	
Cameras: 20	cameras	added	
with	field-of-view
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Northern CA Fires 10/09/17 through now…
300K+	unique	visitors	and	~3M	hits	in	5	days	
http://firemap.sdsc.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Some Machine Learning Case Studies
• Smoke	and	fire	perimeter	detection	based	on	imagery
• Prediction	of	Santa	Ana	and	fire	conditions	specific	to	location
• Prediction	of	fuel	build	up	based	on	fire	and	weather	history
• NLP	for	understanding	local	conditions	based	on	radio	
communications
• Deep	learning	on	multi-spectra	imagery	for	high	resolution	fuel	maps
• Classification	project	to	generate	more	accurate	fuel	maps	(using	
Planet	Labs	satellite	data)
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Classification project to generate more
accurate fuel maps
• Accurate	and	up-to-date	fuel	maps	are	critical	for	
modeling	wildfire	rate	of	speed	and	potential	burn	
areas.
• Challenge:	
• USGS	Landfire provides	the	best	available	fuel	maps	
every	two	years.	
• The	WIFIRE	system	is	limited	by	these	potentially	2-year	
old	inputs.	 Fuel	maps	created	at	a	higher	temporal	
frequency	is	desired.
• Approach:	
• Using	high-resolution	satellite	imagery	and	deep	
learning	methods,	produce	surface	fuel	maps	of	San	
Diego	County	and	other	regions	in	Southern	California.
• Use	LandFire fuel	maps	as	the	target	variable,	the	
objective	is	create	a	classification	model	that	will	
provide	fuel	maps	at	greater	frequency	with	a	measure	
of	uncertainty.
Cluster 1: Short Grass
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
WIFIRE Team: It takes a village!
• PhD	level	researchers	
• Professional	software	
developers
• 27	undergraduate	students
• UC	San	Diego
• UC	Merced
• MURPA	University
• University	of	Queensland	
• 1	high	school	student
• 5	MSc	and	5	MAS	students
• 2	PhD	students	(UMD)
• 1	postdoctoral	researcher
UMD - Fire modeling
UCSD MAE - Data assimilation
SDSC -
Cyberinfrastructure,
Workflows,
Data engineering,
Machine Learning,
Information Visualization,
HPWREN
Calit2/QI-
Cyberinfrastructure, GIS,
Advanced Visualization,
Machine Learning,
Urban Sustainability,
HPWREN
SIO - HPWREN
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Process for Precision Education
• How	are	the	students	performing?
• What	does	a	drop	out	process	really	start?	
What	are	early	signs?
• How	many	students	do	we	expect	for	a	
subject	next	year?	What	are	the	trends?	
• When	will	a	student	graduate?	
• What	are	personalized	learning	paths?
• When	is	the	best	time	to	take	a	course	to	
graduate	on	time?	
• How	does	the	curriculum	serve	the	local	
economy	and	workforce?
Some	
Questions
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Parts of the Solution
• Stakeholders
• Datasets
• Compliance	requirements
• Defined	actions
• Analytical	methods
• Technical	infrastructure
Bias
Transparency	
Verification
Accuracy
Ethics
Reproducibility
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Dr. ILKAY ALTINTAS
ialtintas@ucsd.edu
Contact:		Ilkay	Altintas,	Ph.D.
Email:	ialtintas@ucsd.edu
Questions?
PartsofthepresentedworkisfundedbyNSF,DOE,
NIH,UCSanDiegoandvariousindustrypartners.

More Related Content

PDF
GTU GeekDay Data Science and Applications
Kürşat İNCE
 
DOCX
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
PPTX
Adding Open Data Value to 'Closed Data' Problems
Simon Price
 
PPTX
Data Science applications in business
Vladyslav Yakovenko
 
PDF
Introduction to Data Science
ANOOP V S
 
PPTX
Introduction to data science
Sampath Kumar
 
PDF
Programming for data science in python
UmmeSalmaM1
 
PDF
Data Science in Action
Jordan Open Source Association
 
GTU GeekDay Data Science and Applications
Kürşat İNCE
 
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
Adding Open Data Value to 'Closed Data' Problems
Simon Price
 
Data Science applications in business
Vladyslav Yakovenko
 
Introduction to Data Science
ANOOP V S
 
Introduction to data science
Sampath Kumar
 
Programming for data science in python
UmmeSalmaM1
 
Data Science in Action
Jordan Open Source Association
 

What's hot (20)

PDF
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Ferdin Joe John Joseph PhD
 
PPTX
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Dr.Sotarat Thammaboosadee CIMP-Data Governance
 
PDF
Data Science
Prithwis Mukerjee
 
PPTX
Data science applications and usecases
Sreenatha Reddy K R
 
PPTX
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Dr.Sotarat Thammaboosadee CIMP-Data Governance
 
PDF
Unit 3 part 2
MohammadAsharAshraf
 
PDF
Introduction To Data Science
Spotle.ai
 
PDF
Data science
Sreejith c
 
PPTX
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
PDF
Data science presentation
MSDEVMTL
 
PDF
1. introduction to data science —
swethaT16
 
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
PDF
Introduction to data science intro,ch(1,2,3)
heba_ahmad
 
PPTX
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
PDF
Data science presentation 2nd CI day
Mohammed Barakat
 
PPTX
Session 01 designing and scoping a data science project
bodaceacat
 
PPTX
A Practical-ish Introduction to Data Science
Mark West
 
PPTX
Big data and data science overview
Colleen Farrelly
 
PPTX
Data science | What is Data science
ShilpaKrishna6
 
PDF
Open Data, Big Data and Machine Learning
Steven Van Vaerenbergh
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Ferdin Joe John Joseph PhD
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Dr.Sotarat Thammaboosadee CIMP-Data Governance
 
Data Science
Prithwis Mukerjee
 
Data science applications and usecases
Sreenatha Reddy K R
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Dr.Sotarat Thammaboosadee CIMP-Data Governance
 
Unit 3 part 2
MohammadAsharAshraf
 
Introduction To Data Science
Spotle.ai
 
Data science
Sreejith c
 
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Data science presentation
MSDEVMTL
 
1. introduction to data science —
swethaT16
 
Big Data and Data Science: The Technologies Shaping Our Lives
Rukshan Batuwita
 
Introduction to data science intro,ch(1,2,3)
heba_ahmad
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
Data science presentation 2nd CI day
Mohammed Barakat
 
Session 01 designing and scoping a data science project
bodaceacat
 
A Practical-ish Introduction to Data Science
Mark West
 
Big data and data science overview
Colleen Farrelly
 
Data science | What is Data science
ShilpaKrishna6
 
Open Data, Big Data and Machine Learning
Steven Van Vaerenbergh
 
Ad

Similar to Creating a Data Science Ecosystem for Scientific, Societal and Educational Impact (20)

PDF
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Ilkay Altintas, Ph.D.
 
PDF
Collaborative Data Science In A Highly Networked World
Ilkay Altintas, Ph.D.
 
PDF
Intro to Data Science for Non-Data Scientists
Sri Ambati
 
PPTX
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
Lee Dirks
 
PDF
BioStorage Technologies Case Study: How to build an informatics platform usin...
Denodo
 
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
PDF
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Inside Analysis
 
PDF
What is a Data Scientist
Experian_US
 
PPTX
Introduction to Data Science.pptx
PerumalPitchandi
 
PPTX
Data Science Intro.pptx
PerumalPitchandi
 
PPTX
Rethink Analytics with an Enterprise Data Hub
Cloudera, Inc.
 
PPTX
Make data simple in the cognitive era
IBM Analytics
 
PDF
Taming the Big Data Beast - Together
Kennisalliantie
 
PPTX
20160414 23 Research Data Things
Katina Toufexis
 
PDF
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
PPTX
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
PDF
The Strategic Vision: Visualizing Data From Multiple Sources
Inside Analysis
 
PPTX
JavaZone 2018 - A Practical(ish) Introduction to Data Science
Mark West
 
PDF
What Managers Need to Know about Data Science
Annie Flippo
 
PPTX
Introduction to Data Science
Laguna State Polytechnic University
 
Workflow-Driven Geoinformatics Applications and Training in the Big Data Era
Ilkay Altintas, Ph.D.
 
Collaborative Data Science In A Highly Networked World
Ilkay Altintas, Ph.D.
 
Intro to Data Science for Non-Data Scientists
Sri Ambati
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
Lee Dirks
 
BioStorage Technologies Case Study: How to build an informatics platform usin...
Denodo
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
Understanding What’s Possible: Getting Business Value from Big Data Quickly
Inside Analysis
 
What is a Data Scientist
Experian_US
 
Introduction to Data Science.pptx
PerumalPitchandi
 
Data Science Intro.pptx
PerumalPitchandi
 
Rethink Analytics with an Enterprise Data Hub
Cloudera, Inc.
 
Make data simple in the cognitive era
IBM Analytics
 
Taming the Big Data Beast - Together
Kennisalliantie
 
20160414 23 Research Data Things
Katina Toufexis
 
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
Data Science and AI in Biomedicine: The World has Changed
Philip Bourne
 
The Strategic Vision: Visualizing Data From Multiple Sources
Inside Analysis
 
JavaZone 2018 - A Practical(ish) Introduction to Data Science
Mark West
 
What Managers Need to Know about Data Science
Annie Flippo
 
Introduction to Data Science
Laguna State Polytechnic University
 
Ad

More from Ilkay Altintas, Ph.D. (6)

PDF
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
PDF
Using Cyberinfrastructure for Wildfire Resilience
Ilkay Altintas, Ph.D.
 
PDF
Using Cyberinfrastructure for Wildfire Resilience
Ilkay Altintas, Ph.D.
 
PDF
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
PDF
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
Ilkay Altintas, Ph.D.
 
PDF
Invited Talk for EUDAT Workshop in Barcelona
Ilkay Altintas, Ph.D.
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ilkay Altintas, Ph.D.
 
Using Cyberinfrastructure for Wildfire Resilience
Ilkay Altintas, Ph.D.
 
Using Cyberinfrastructure for Wildfire Resilience
Ilkay Altintas, Ph.D.
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
WorDS of Data Science in the Presence of Heterogenous Computing Architectures
Ilkay Altintas, Ph.D.
 
Invited Talk for EUDAT Workshop in Barcelona
Ilkay Altintas, Ph.D.
 

Recently uploaded (20)

PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 

Creating a Data Science Ecosystem for Scientific, Societal and Educational Impact