SlideShare a Scribd company logo
Data Transformation
Workflow
(from gsheet to jupyter)
July 2019, @DinisCruz
OSBOT
(OWASP Security Bot)
Start Jupyter Server
Jupyter Lab Dev Environment
Step 1
Loading Data from GSheet
Original Data Set
Viewing Data in Slack
Viewing data in Jupyter
Pandas
DataFrame
Jupyter QGrid
All code and data is auto-saved in Git
Step 2
Parsing dataset
Explaining what we are doing in an Jupyter Notebook
Powerful development environment
Starting refactoring code into methods
Get 3 datasets as separate DataFrames
Helper function to transform raw data into objects
Refactored code that extracts data into 3 DataFrames
3 DataFrames with data
Step 3
Merging data
Example 1 - Merge 3 datasets
All fields from all 3 data sets
(in this case there was only one exact match on email)
Example 2 - Merging data loses user with different email
When Merging DF_1 with DF_3 we lose `Alan Lee` due
to different email
Example 3 - Fixing data before merge (version 1)
Lost user due to bad data in Name
Lost `Bruno Lyon`
User
Because `Name`
value is wrong
Example 4 - Fixing data before merge (version 2)
The merge of the two DataFrames now successfully finds
all 4 users (including Alan who has two emails)
By using email to find the first and last names values

More Related Content

More from Dinis Cruz (20)

PDF
Making fact based decisions and 4 board decisions (Oct 2019)
Dinis Cruz
 
PDF
CISO Application presentation - Babylon health security
Dinis Cruz
 
PDF
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
Dinis Cruz
 
PDF
GSBot Commands (Slack Bot used to access Jira data)
Dinis Cruz
 
PDF
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
Dinis Cruz
 
PDF
Jira schemas - Open Security Summit (Working Session 21th May 2019)
Dinis Cruz
 
PDF
Template for "Sharing anonymised risk theme dashboards v0.8"
Dinis Cruz
 
PDF
Owasp and summits (may 2019)
Dinis Cruz
 
PDF
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Dinis Cruz
 
PDF
Open security summit 2019 owasp london 25th feb
Dinis Cruz
 
PDF
Owasp summit 2019 - OWASP London 25th feb
Dinis Cruz
 
PDF
Evolving challenges for modern enterprise architectures in the age of APIs
Dinis Cruz
 
PDF
How to not fail at security data analytics (by CxOSidekick)
Dinis Cruz
 
PDF
Thinking in graphs v1.0
Dinis Cruz
 
PDF
Open Security Summit - April 2018
Dinis Cruz
 
PDF
Using security to drive chaos engineering - April 2018
Dinis Cruz
 
PDF
Using security to drive chaos engineering
Dinis Cruz
 
PDF
Scaling security in a cloud environment v0.5 (Sep 2017)
Dinis Cruz
 
PDF
Improving the quality of Cyber Security Hires via Pre-Interview Challenges
Dinis Cruz
 
PDF
Creating a Graph Based Security Organisation - DevSecCon Keynote
Dinis Cruz
 
Making fact based decisions and 4 board decisions (Oct 2019)
Dinis Cruz
 
CISO Application presentation - Babylon health security
Dinis Cruz
 
Using OWASP Security Bot (OSBot) to make Fact Based Security Decisions
Dinis Cruz
 
GSBot Commands (Slack Bot used to access Jira data)
Dinis Cruz
 
(OLD VERSION) Dinis Cruz (CV) - CISO and Transformation Agent v0.6
Dinis Cruz
 
Jira schemas - Open Security Summit (Working Session 21th May 2019)
Dinis Cruz
 
Template for "Sharing anonymised risk theme dashboards v0.8"
Dinis Cruz
 
Owasp and summits (may 2019)
Dinis Cruz
 
Creating a graph based security organisation - Apr 2019 (OWASP London chapter...
Dinis Cruz
 
Open security summit 2019 owasp london 25th feb
Dinis Cruz
 
Owasp summit 2019 - OWASP London 25th feb
Dinis Cruz
 
Evolving challenges for modern enterprise architectures in the age of APIs
Dinis Cruz
 
How to not fail at security data analytics (by CxOSidekick)
Dinis Cruz
 
Thinking in graphs v1.0
Dinis Cruz
 
Open Security Summit - April 2018
Dinis Cruz
 
Using security to drive chaos engineering - April 2018
Dinis Cruz
 
Using security to drive chaos engineering
Dinis Cruz
 
Scaling security in a cloud environment v0.5 (Sep 2017)
Dinis Cruz
 
Improving the quality of Cyber Security Hires via Pre-Interview Challenges
Dinis Cruz
 
Creating a Graph Based Security Organisation - DevSecCon Keynote
Dinis Cruz
 

Recently uploaded (20)

PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
July Patch Tuesday
Ivanti
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
July Patch Tuesday
Ivanti
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Python basic programing language for automation
DanialHabibi2
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Ad

OSBot - Data transformation workflow (from GSheet to Jupyter)