Understanding
Big Data Analytics -
solutions for growing businesses
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
■ 13+ yrs in IT
■ IT Service Management, Project Management,
Business development
■ Cloud Native, DevOps, Data Science, Big Data,
Genomics
■ Involved in:
● PyData Warsaw
● Data Science Summit
● DevOps Days Warsaw
● Cloud Native Warsaw
Rafał Małanij
rafal.malanij@getindata.com
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Founded in 2014 by
ex-Spotify engineers.
Focus only on Big Data and
Cloud (from day 1)
Community builders (Big Data
Tech Warsaw organizers)
60+ Big Data engineers
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
● Volume
● Variety
● Velocity
● Veracity
● Value
Big Data
Source: Wikipedia
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
60% - 85%
Big Data projects fails
(Gartner 2016/2017)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
“Big data isn't a one-off project: It's a culture
of collecting, analyzing, and using data.”
Matt Asay, Infoworld.com
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
“Technology is the engine of digital
transformation, data is the fuel, process is the
guidance system, and organizational change
capability is the landing gear.”
https://hbr.org/2020/05/digital-transformation-comes-down-to-talent-in-4-key-areas
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data literacy
Data literacy is the ability to read, understand, create, and
communicate data as information.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data
Collection
Data
Storage
Processing Delivery
Clickstream
Mobile apps
Product systems
Transaction system
CRM
Call center
Workforce mgmt
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Lake
● Repository for raw data
● Various type of data
○ Structured
○ Semi-structured
○ Unstructured
○ Binary
● Historical data
vs.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data lineage
● Where data comes from
● What happened / How it was transformed
● Where data is used
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Degrees of intelligence
Competing on Analytics: The New Science of Winning
by Thomas H. Davenport, Jeanne G. Harris
Competitive
advantage
🔴 Optimization What’s the best that can happen?
🔴 Predictive modeling What will happen next?
🔴 Forecasting/extrapolation What if these trends continue?
🔴 Statistical analysis Why is this happening?
🔴 Alerts What actions are needed?
🔴 Query/drill-down Where exactly is the problem?
🔴 Ad-hoc reports How many, how often, where?
🔴 Standard reports What happened?
Analytics
Reporting
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Science vs Machine Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Machine Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
ML Lifecycle
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Machine Learning vs. A.I.
“Artificial intelligence is
the science and engineering
of making computers behave
in ways that, until recently,
we thought required human
Intelligence.”
Andrew Moore,
Carnegie Mellon University,
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Culture
Automation
Lean
Measurement
Sharing
DevOps vs DataOps
+ Data quality
+ Manufacturing process
https://www.dataopsmanifesto.org/
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Technical
competences
Possibilities
Degrees of intelligence
Competing on Analytics: The New Science of Winning
by Thomas H. Davenport, Jeanne G. Harris
Competitive
advantage
🔴 Optimization
🔴 Predictive modeling
🔴 Forecasting/extrapolation
🔴 Statistical analysis
🔴 Alerts
🔴 Query/drill-down
🔴 Ad-hoc reports
🔴 Standard reports
Analytics
Reporting
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Interactive BI
● Reports
● Dashboards
● Drill-down reports
● SQL-queries
● Tools: Excel, PowerBi,
QlikView, Tableau,
Superset, Hive, Presto
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Science
● Transformed and Raw data
● Machine Learning
● Tools: Jupyter,
Spark, Scala/Java
R, Python
Tensorflow, etc.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Data Discovery
● Search tool for data
● What, where, who?
● Metadata
● Popularity score
● Quality and profiling
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Lexikon @ Spotify
● Library for data and insights
● Knowledge Mgmt tool
○ People
○ Description, stats
○ Tables, Queries
https://engineering.atspotify.com/2020/02/27/how-we-improved-data-discovery-for-data-scientists-at-spotify/
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Source: “Continuous Analytics:
Stream Query Processing in
Practice”, Michael J Franklin,
Professor, UC Berkley, Dec 2009 i
https://www.slideshare.net/JoshB
aer/shortening-the-feedback-loop
-big-data-spain-external
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Continuous
Data
Collection
Automation Security Monitoring Orchestration
Data Lake
Big Data
Processing
Data
Governance
Event
Processing
Feature
engineering
Interactive BI
& Analytics
Data
Discovery
Data Science
Machine
Learning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Hidden Technical Debt in Machine Learning Systems -
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Dataism
“Dataism declares that the
universe consists of data flows,
and the value of any
phenomenon or entity is
determined by its contribution
to data processing,”
Yuval Noah Harari
“Homo Deus”.
Rafał Małanij
rafal.malanij@getindata.com

More Related Content

PPTX
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
PDF
Informatica Becomes Part of the Business Data Lake Ecosystem
PPTX
Rocking the World of Big Data at Centrica
PPTX
The Vortex of Change - Digital Transformation (Presented by Intel)
PPTX
Tiger graph 2021 corporate overview [read only]
PDF
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
PDF
Key Considerations for Putting Hadoop in Production SlideShare
PDF
Transformacion del Negocio Financiero por medio de Tecnologias Cloud
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Informatica Becomes Part of the Business Data Lake Ecosystem
Rocking the World of Big Data at Centrica
The Vortex of Change - Digital Transformation (Presented by Intel)
Tiger graph 2021 corporate overview [read only]
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
Key Considerations for Putting Hadoop in Production SlideShare
Transformacion del Negocio Financiero por medio de Tecnologias Cloud

What's hot (20)

PDF
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
PDF
Bigdata based fraud detection
PDF
Extending BI with Big Data Analytics
PDF
Analyzing Unstructured Data in Hadoop Webinar
PPTX
Importance of Big Data Analytics
PPTX
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
PDF
Big Data Predictions for 2015
PPTX
Modernizing Architecture for a Complete Data Strategy
PDF
Big Data LDN 2017: The 3rd Wave of Business Intelligence
PPTX
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
PDF
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
PDF
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
PDF
Big Data Use Cases
PDF
Three Dimensions of Data as a Service
PPTX
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
PPTX
Big Data Case study - caixa bank
PPTX
Moving from data to insights: How to effectively drive business decisions & g...
PPTX
Deliver World Class Customer Experience with Big Data and Analytics
PDF
What are actionable insights? (Introduction to Operational Analytics Software)
PDF
Platfora Data Visualization Meetup
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
Bigdata based fraud detection
Extending BI with Big Data Analytics
Analyzing Unstructured Data in Hadoop Webinar
Importance of Big Data Analytics
Big Data Roundtable. Why, how, where, which, and when to start doing Big Data
Big Data Predictions for 2015
Modernizing Architecture for a Complete Data Strategy
Big Data LDN 2017: The 3rd Wave of Business Intelligence
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Big Data Use Cases
Three Dimensions of Data as a Service
Meg Mude, Intel - Data Engineering Lifecycle Optimized on Intel - H2O World S...
Big Data Case study - caixa bank
Moving from data to insights: How to effectively drive business decisions & g...
Deliver World Class Customer Experience with Big Data and Analytics
What are actionable insights? (Introduction to Operational Analytics Software)
Platfora Data Visualization Meetup
Ad

Similar to Understanding Big Data Analytics - solutions for growing businesses - Rafał Małanij, GetInData (20)

PDF
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
PPTX
In-Depth Data Analytics
PDF
Industrial Data Science
PDF
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
PDF
Gse uk-cedrinemadera-2018-shared
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
PPTX
data_intelligence_presentation beginner friendly to use the instances
PDF
Big data Analytics
DOCX
My Journey from Data Confusion to Data Mastery.docx
PDF
Building successful data science teams
PDF
Thinkful DC - Intro to Data Science
PPSX
Intro to Data Science Big Data
PDF
Data science and its potential to change business as we know it. The Roadmap ...
PPTX
Are you ready for Data science? A 12 point test
PDF
Real-World-Case-Studies-in-Data-Science.
PDF
The Impact of Data Science on Business Strategy | IABAC
PDF
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
DOCX
Data Science: Powering the Future of Innovation(2).docx
PDF
Intro to Data Science
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
In-Depth Data Analytics
Industrial Data Science
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
Gse uk-cedrinemadera-2018-shared
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
data_intelligence_presentation beginner friendly to use the instances
Big data Analytics
My Journey from Data Confusion to Data Mastery.docx
Building successful data science teams
Thinkful DC - Intro to Data Science
Intro to Data Science Big Data
Data science and its potential to change business as we know it. The Roadmap ...
Are you ready for Data science? A 12 point test
Real-World-Case-Studies-in-Data-Science.
The Impact of Data Science on Business Strategy | IABAC
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
Data Science: Powering the Future of Innovation(2).docx
Intro to Data Science
Ad

More from GetInData (20)

PDF
LLMOps: from Demo to Production-Ready GenAI Systems
PDF
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
PDF
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
PDF
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
PDF
How NOT to win a Kaggle competition
PDF
How to become good Developer in Scrum Team?
PDF
OpenLineage & Airflow - data lineage has never been easier
PDF
Benefits of a Homemade ML Platform
PDF
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
PDF
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
PDF
MLOps implemented - how we combine the cloud & open-source to boost data scie...
PDF
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
PDF
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
PDF
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
PDF
Big data trends - Krzysztof Zarzycki, GetInData
PDF
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
PDF
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
PDF
Complex event processing platform handling millions of users - Krzysztof Zarz...
PDF
Predicting Startup Market Trends based on the news and social media - Albert ...
PDF
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
LLMOps: from Demo to Production-Ready GenAI Systems
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
How NOT to win a Kaggle competition
How to become good Developer in Scrum Team?
OpenLineage & Airflow - data lineage has never been easier
Benefits of a Homemade ML Platform
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Big data trends - Krzysztof Zarzycki, GetInData
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Complex event processing platform handling millions of users - Krzysztof Zarz...
Predicting Startup Market Trends based on the news and social media - Albert ...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...

Recently uploaded (20)

PPTX
DATA ANALYTICS COURSE IN PITAMPURA.pptx
PDF
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
PPTX
Stats annual compiled ipd opd ot br 2024
PPTX
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc
PPTX
lung disease detection using transfer learning approach.pptx
PPTX
inbound2857676998455010149.pptxmmmmmmmmm
PPTX
Hushh Hackathon for IIT Bombay: Create your very own Agents
PPTX
cyber row.pptx for cyber proffesionals and hackers
PPTX
inbound6529290805104538764.pptxmmmmmmmmm
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PPTX
AI AND ML PROPOSAL PRESENTATION MUST.pptx
PDF
technical specifications solar ear 2025.
PDF
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
PPT
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PPT
2011 HCRP presentation-final.pptjrirrififfi
PDF
Grey Minimalist Professional Project Presentation (1).pdf
PDF
Book Trusted Companions in Delhi – 24/7 Available Delhi Personal Meeting Ser...
DATA ANALYTICS COURSE IN PITAMPURA.pptx
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
Stats annual compiled ipd opd ot br 2024
Statisticsccdxghbbnhhbvvvvvvvvvv. Dxcvvvhhbdzvbsdvvbbvv ccc
lung disease detection using transfer learning approach.pptx
inbound2857676998455010149.pptxmmmmmmmmm
Hushh Hackathon for IIT Bombay: Create your very own Agents
cyber row.pptx for cyber proffesionals and hackers
inbound6529290805104538764.pptxmmmmmmmmm
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
AI AND ML PROPOSAL PRESENTATION MUST.pptx
technical specifications solar ear 2025.
CS3352FOUNDATION OF DATA SCIENCE _1_MAterial.pdf
dsa Lec-1 Introduction FOR THE STUDENTS OF bscs
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
2011 HCRP presentation-final.pptjrirrififfi
Grey Minimalist Professional Project Presentation (1).pdf
Book Trusted Companions in Delhi – 24/7 Available Delhi Personal Meeting Ser...

Understanding Big Data Analytics - solutions for growing businesses - Rafał Małanij, GetInData

  • 1. Understanding Big Data Analytics - solutions for growing businesses
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ■ 13+ yrs in IT ■ IT Service Management, Project Management, Business development ■ Cloud Native, DevOps, Data Science, Big Data, Genomics ■ Involved in: ● PyData Warsaw ● Data Science Summit ● DevOps Days Warsaw ● Cloud Native Warsaw Rafał Małanij [email protected]
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Founded in 2014 by ex-Spotify engineers. Focus only on Big Data and Cloud (from day 1) Community builders (Big Data Tech Warsaw organizers) 60+ Big Data engineers
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ● Volume ● Variety ● Velocity ● Veracity ● Value Big Data Source: Wikipedia
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 60% - 85% Big Data projects fails (Gartner 2016/2017)
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. “Big data isn't a one-off project: It's a culture of collecting, analyzing, and using data.” Matt Asay, Infoworld.com
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. “Technology is the engine of digital transformation, data is the fuel, process is the guidance system, and organizational change capability is the landing gear.” https://hbr.org/2020/05/digital-transformation-comes-down-to-talent-in-4-key-areas
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data literacy Data literacy is the ability to read, understand, create, and communicate data as information.
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Collection Data Storage Processing Delivery Clickstream Mobile apps Product systems Transaction system CRM Call center Workforce mgmt
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Lake ● Repository for raw data ● Various type of data ○ Structured ○ Semi-structured ○ Unstructured ○ Binary ● Historical data vs.
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Continuous Data Collection Automation Security Monitoring Orchestration Data Lake Big Data Processing Data Governance Event Processing Feature engineering Interactive BI & Analytics Data Discovery Data Science Machine Learning
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data lineage ● Where data comes from ● What happened / How it was transformed ● Where data is used
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Degrees of intelligence Competing on Analytics: The New Science of Winning by Thomas H. Davenport, Jeanne G. Harris Competitive advantage 🔴 Optimization What’s the best that can happen? 🔴 Predictive modeling What will happen next? 🔴 Forecasting/extrapolation What if these trends continue? 🔴 Statistical analysis Why is this happening? 🔴 Alerts What actions are needed? 🔴 Query/drill-down Where exactly is the problem? 🔴 Ad-hoc reports How many, how often, where? 🔴 Standard reports What happened? Analytics Reporting
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Science vs Machine Learning
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Machine Learning
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ML Lifecycle
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Machine Learning vs. A.I. “Artificial intelligence is the science and engineering of making computers behave in ways that, until recently, we thought required human Intelligence.” Andrew Moore, Carnegie Mellon University,
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Continuous Data Collection Automation Security Monitoring Orchestration Data Lake Big Data Processing Data Governance Event Processing Feature engineering Interactive BI & Analytics Data Discovery Data Science Machine Learning
  • 23. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Culture Automation Lean Measurement Sharing DevOps vs DataOps + Data quality + Manufacturing process https://www.dataopsmanifesto.org/
  • 24. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Continuous Data Collection Automation Security Monitoring Orchestration Data Lake Big Data Processing Data Governance Event Processing Feature engineering Interactive BI & Analytics Data Discovery Data Science Machine Learning
  • 25. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Continuous Data Collection Automation Security Monitoring Orchestration Data Lake Big Data Processing Data Governance Event Processing Feature engineering Interactive BI & Analytics Data Discovery Data Science Machine Learning
  • 26. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Technical competences Possibilities Degrees of intelligence Competing on Analytics: The New Science of Winning by Thomas H. Davenport, Jeanne G. Harris Competitive advantage 🔴 Optimization 🔴 Predictive modeling 🔴 Forecasting/extrapolation 🔴 Statistical analysis 🔴 Alerts 🔴 Query/drill-down 🔴 Ad-hoc reports 🔴 Standard reports Analytics Reporting
  • 27. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Interactive BI ● Reports ● Dashboards ● Drill-down reports ● SQL-queries ● Tools: Excel, PowerBi, QlikView, Tableau, Superset, Hive, Presto
  • 28. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Science ● Transformed and Raw data ● Machine Learning ● Tools: Jupyter, Spark, Scala/Java R, Python Tensorflow, etc.
  • 29. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 30. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Data Discovery ● Search tool for data ● What, where, who? ● Metadata ● Popularity score ● Quality and profiling
  • 31. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Lexikon @ Spotify ● Library for data and insights ● Knowledge Mgmt tool ○ People ○ Description, stats ○ Tables, Queries https://engineering.atspotify.com/2020/02/27/how-we-improved-data-discovery-for-data-scientists-at-spotify/
  • 32. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Continuous Data Collection Automation Security Monitoring Orchestration Data Lake Big Data Processing Data Governance Event Processing Feature engineering Interactive BI & Analytics Data Discovery Data Science Machine Learning
  • 33. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Source: “Continuous Analytics: Stream Query Processing in Practice”, Michael J Franklin, Professor, UC Berkley, Dec 2009 i https://www.slideshare.net/JoshB aer/shortening-the-feedback-loop -big-data-spain-external
  • 34. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 35. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Continuous Data Collection Automation Security Monitoring Orchestration Data Lake Big Data Processing Data Governance Event Processing Feature engineering Interactive BI & Analytics Data Discovery Data Science Machine Learning
  • 36. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Hidden Technical Debt in Machine Learning Systems - https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 37. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 38. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 39. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 40. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Dataism “Dataism declares that the universe consists of data flows, and the value of any phenomenon or entity is determined by its contribution to data processing,” Yuval Noah Harari “Homo Deus”.