SlideShare a Scribd company logo
2
Most read
4
Most read
8
Most read
Launching a Data Platform on Snowflake
Using “old skills” in a new world
Simon Sleight
Data Guy
• People
• Technology
• Pricing
A Seasoned Professional – My Data Journey
Old World Data Rock StarMe
Evolution
rikkyal © Creative Market
What skills and attributes are required in a data team?
Evolution
David Benes © 123RF.com
Running a platform requires people with the
right skills
1. There is complexity to
manage
2. Agile working
environment
3. Data Rock Stars are rare
4. Data modelling and SQL
invaluable
What technology enabled us to launch a
data platform?
A step change in the evolution of cloud computing.
1. Decoupling storage from compute
Pay for compute only when needed (scale up, down, out)
Pay for storage separately (very cheap)
2. Low barrier to entry
Extremely easy to set up
Very low price (no CAPEX)
3. Same data used by everybody – no impact, each to their own
compute
Snowflake Architecture Overview
Snowflake Architecture Overview
doyouevendata.com
Let me explain…
Data
Extracted
Loaded
Transformed
How your
business
refers to
terms that
you
report on.
Called a
Semantic
Layer
Our platform is a wrapper service based on
Snowflake
Price of Entry
 KETL can now offer cloud services with little risk
 No software licensing costs
 No hardware costs
 Time to deployment rapid
 We can use existing team skills to carry out Extract Load
Transform
 We can iterate quickly and let designs evolve
 Time to value for clients massively reduced
https://www.snowflake.com/zero-to-snowflake/zero-to-snowflake-in-90-minutes-bristol/
KETL
30 Queen Charlotte St
Bristol
BS1 4JH
+44 (0)117 251 0064
www.ketl.co.uk
info@ketl.co.uk
@KETL_BI
For more information on
what we do please contact
Helen Woodcock.
We host regular workshops

More Related Content

PDF
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
PDF
Infra Migration Proposal Draft from Oracle to Snowflake
Shruti Chaurasia
 
PPTX
Elastic Data Warehousing
Snowflake Computing
 
PPTX
Modern data warehouse presentation
David Rice
 
PDF
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
Chris Hoyean Song
 
PDF
Snowflake Data Cloud Differentiators !!!
waydebiz
 
PDF
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Certus Solutions
 
PDF
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
 
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
Infra Migration Proposal Draft from Oracle to Snowflake
Shruti Chaurasia
 
Elastic Data Warehousing
Snowflake Computing
 
Modern data warehouse presentation
David Rice
 
[EN] Building modern data pipeline with Snowflake + DBT + Airflow.pdf
Chris Hoyean Song
 
Snowflake Data Cloud Differentiators !!!
waydebiz
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Certus Solutions
 
Demystifying Data Warehousing as a Service - DFW
Kent Graziano
 

What's hot (20)

PPTX
Snowflake Overview
Snowflake Computing
 
PPTX
Snowflake essentials
qureshihamid
 
PDF
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
PDF
Snowflake Architecture
mymailforspamfr
 
PPTX
Azure purview
Shafqat Turza
 
PPTX
Demystifying Data Warehouse as a Service
Snowflake Computing
 
PDF
Considerations for Data Access in the Lakehouse
Databricks
 
PPTX
Snowflake Automated Deployments / CI/CD Pipelines
Drew Hansen
 
PPTX
Data Sharing with Snowflake
Snowflake Computing
 
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
PPTX
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
Timothy McAliley
 
PDF
DevOps for Databricks
Databricks
 
PDF
Azure Data Factory v2
inovex GmbH
 
PPTX
Azure Data Factory
HARIHARAN R
 
PDF
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
Denodo
 
PPTX
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
PDF
Data Warehouse - Incremental Migration to the Cloud
Michael Rainey
 
PDF
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
PPTX
Microsoft Fabric Introduction
James Serra
 
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Snowflake Overview
Snowflake Computing
 
Snowflake essentials
qureshihamid
 
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
 
Snowflake Architecture
mymailforspamfr
 
Azure purview
Shafqat Turza
 
Demystifying Data Warehouse as a Service
Snowflake Computing
 
Considerations for Data Access in the Lakehouse
Databricks
 
Snowflake Automated Deployments / CI/CD Pipelines
Drew Hansen
 
Data Sharing with Snowflake
Snowflake Computing
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
Timothy McAliley
 
DevOps for Databricks
Databricks
 
Azure Data Factory v2
inovex GmbH
 
Azure Data Factory
HARIHARAN R
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
Denodo
 
Introducing the Snowflake Computing Cloud Data Warehouse
Snowflake Computing
 
Data Warehouse - Incremental Migration to the Cloud
Michael Rainey
 
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Microsoft Fabric Introduction
James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Ad

Similar to Launching a Data Platform on Snowflake (20)

PDF
go.datadriven.whitepaper
Tara Fusco
 
PDF
Adi Wijaya - Scrum in Data Science, What Works and What Doesn’t
Agile Impact Conference
 
PDF
Adi Wijaya - Scrum in Data Science, What Works and What Doesn’t
Agile Impact
 
PPTX
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
DataScienceConferenc1
 
PDF
What makes an effective data team?
Snowplow Analytics
 
PPT
Datapreneurs
suresh sood
 
PDF
Building a Data Culture at Your Organization - Dawn of the Data Age Lecture S...
Luciano Pesci, PhD
 
PDF
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku
 
PPTX
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
PDF
1030 track1 bennett
Rising Media, Inc.
 
PDF
Building innovative digital platform dashboards to improve business and opera...
Steve Ng
 
PPTX
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
PDF
Evolution of Data at Nubank - Product.io Meetup 2019-01-29
André de Lannoy Tavares
 
PDF
Gartner EA: The Rise of Data-driven Architectures
LeanIX GmbH
 
PPTX
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 
PPTX
Becoming Data-Driven Through Cultural Change
Cloudera, Inc.
 
PPTX
UCSD: Building a Big Data Culture - It Takes a Village
Paul Barsch
 
PDF
Slow Data Kills Business eBook - Improve the Customer Experience
InterSystems
 
PPTX
Creating an Enterprise AI Strategy
AtScale
 
PPTX
Achieve New Heights with Modern Analytics
Sense Corp
 
go.datadriven.whitepaper
Tara Fusco
 
Adi Wijaya - Scrum in Data Science, What Works and What Doesn’t
Agile Impact Conference
 
Adi Wijaya - Scrum in Data Science, What Works and What Doesn’t
Agile Impact
 
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
DataScienceConferenc1
 
What makes an effective data team?
Snowplow Analytics
 
Datapreneurs
suresh sood
 
Building a Data Culture at Your Organization - Dawn of the Data Age Lecture S...
Luciano Pesci, PhD
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku
 
Building Modern Data Platform with Microsoft Azure
Dmitry Anoshin
 
1030 track1 bennett
Rising Media, Inc.
 
Building innovative digital platform dashboards to improve business and opera...
Steve Ng
 
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Evolution of Data at Nubank - Product.io Meetup 2019-01-29
André de Lannoy Tavares
 
Gartner EA: The Rise of Data-driven Architectures
LeanIX GmbH
 
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 
Becoming Data-Driven Through Cultural Change
Cloudera, Inc.
 
UCSD: Building a Big Data Culture - It Takes a Village
Paul Barsch
 
Slow Data Kills Business eBook - Improve the Customer Experience
InterSystems
 
Creating an Enterprise AI Strategy
AtScale
 
Achieve New Heights with Modern Analytics
Sense Corp
 
Ad

More from KETL Limited (8)

PPTX
London Jaspersoft Community User Group Event 2 KETL presentation
KETL Limited
 
PPTX
Talend Community Use Group Bristol: Preparing your business for mastering dat...
KETL Limited
 
PDF
London Jaspersoft Community User Group presentation KETL
KETL Limited
 
PDF
Jaspersoft 6.2
KETL Limited
 
PDF
KETL Quick guide to data analytics
KETL Limited
 
PPTX
Marketing Network presentation: Why marketers need to be concerned with data ...
KETL Limited
 
PPTX
Talend community user group Bristol: commercial versus community version
KETL Limited
 
PPTX
Talend community user group Bristol & SW UK event
KETL Limited
 
London Jaspersoft Community User Group Event 2 KETL presentation
KETL Limited
 
Talend Community Use Group Bristol: Preparing your business for mastering dat...
KETL Limited
 
London Jaspersoft Community User Group presentation KETL
KETL Limited
 
Jaspersoft 6.2
KETL Limited
 
KETL Quick guide to data analytics
KETL Limited
 
Marketing Network presentation: Why marketers need to be concerned with data ...
KETL Limited
 
Talend community user group Bristol: commercial versus community version
KETL Limited
 
Talend community user group Bristol & SW UK event
KETL Limited
 

Recently uploaded (20)

PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Software Development Methodologies in 2025
KodekX
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 

Launching a Data Platform on Snowflake

Editor's Notes

  • #2: This talk is about how KETL were able to launch a data platform There are three key ingredients – People, Technology, and Pricing
  • #3: People This talk is about KETLs experience with developing a cloud platform. Three key factors in it’s development are people, technology, and price. I’d like to share my experiences to date in order to give you some context. We are all products of our experiences. Good or Bad. It’s true to say that we can only improve going forward. As our Marketing team would say “I am a seasoned professional”. These grey hairs come from experience – I wish I had cloud when I started. When I started work we had to physically build the servers, compile operating systems, fine tune software and run business processes as best we could. The environment was fragile because budgetary and technical constraints meant that true resilience was expensive and difficult to obtain. The financial and emotional investment in systems stymied free form creative development. Changes to any code often came with downtime, arduous deployments, and on occasion hardware upgrades too. In the old world, as sharp and keen as I think I was, I was still perceived as a bottleneck by the business. I was also the custodian and gatekeeper to the data. We all talk cloud and new world but for many people and businesses that I meet they are still carry lots of technical debt, frustrations, and fears – and I completely understand where they are coming from. CALL CENTRE I had to replace three spreadsheets and an access database for a call-centre If I looked at the data it was mostly complete and superficially fit for purpose When I sat with the team taking calls I began to understand the stress of using Excel and Access whilst on the phone to the client When you start to realise that technology should enable and is a service and the power of the right data at the right time you can better understand my personal goal as being an enabler That is what we want for our platform too I’m not a data rock star yet
  • #4: The data platform is a service that has to integrate with different organisations and data sources Interactions with the outside world 70% of the time require dealing with people Good Service requires good people – people are key For me good data people display some key personal qualities: Accuracy (don’t send me a CV with missing full stops and typos) Consistency and reliability Evidence of working with data problems (the number of rows do not always determine the complexity of the problem) The “old skills” are still valuable today Environments are build from scripts – we can deploy identical client server environments using a script called with a different variable, vanilla configuration of cloud servers and data services Coding of key business functions into reusable working patterns – systematic thought processes (we still have to deliver a chain of events even if they now last seconds rather than hours) SQL skills – it matters little which database you learnt SQL on, the fundamentals of SQL should be able to give an employee an understanding of the data warehousing concepts I look for experience of Kimball or Data Vault schema design (even if not directly stated) It still requires humans to interpret business processes Can’t build a data service without a data team Why is this piece of data here? Why is there missing data? What does this piece of data relate to in the business? What does people friendly mean? how do we get to the business of what a customer wants Part of our platform has to deliver a semantic layer to the client – this is where we describe and codify data values in business terms The outcome we are trying to achieve is consistency in business reporting and a centralisation of business logic We are only able to code this layer if we understand and interact with the business users and match code to meaning An Example: What is a customer? Is it a credit card? An email list member? A loyalty card? A gift recipient? Do they expire? How are customers counted? How are they uniquely identifiable? Other typical scenarios are summarising business activity markers into sales stages or grouping products into categories
  • #5: IN MY EXPERIENCE: There is complexity still to manage Client technical debt Access to information and third party systems Multiple data sources, multiple vendors Provision of data ingestion access points and EC2 servers Reporting Requirements We use agile methodologies: de-risk the complexity make tasks manageable Time to value for our clients has improved dramatically with snowflake We recently implemented a proof of concept (end-to-end) from extract to load in four days Previously it would have taken about four months (elapsed time) – hardware, VPCs, software, licences, schema pre-design, etc Proof of concept uses production data and replicates production functionality, the only limitation was the scope of required outputs We used to have a long design stage prior to data load, now we spend the time exploring the real data and adapting the design as we go Typically we can add fields and new sources within a sprint (two weeks) New Skills can be taught and learned (many skills transferable) So many tutorials online, so many great e-learning courses We are running a zero to snowflake course Like any of us coders that feel we can do pretty much anything – there is no substitute for hand on experience Necessity is the mother of invention Our platform was designed to make the steps we do for all clients repeatable and automated Data Rock stars are rare Teams with a combination of skills, differences of opinion, different backgrounds and domain experience provide the best results No single point of failure, or anything to esoteric Domain experience helps speed up insight Data modelling and SQL are key to the product Kimball Star Schemas / Data Vault Understanding join logic SQL load scripts Many of you will have these attributes, so starting a journey in Snowflake will not be as hard as you may think
  • #6: TECHNOLOGY is now meeting our expectation Legacy data pipelines with long batch windows on maxed out limited hardware – fragile and high maintenance, changes difficult Initial cloud offerings reduced capital expenditure on equipment but still required lots of system administrators Recent improvements in shared services and containerisation (Docker) Key elements of the Snowflake technology de-risked the investment Benefits of the technology easily replicated and shared for different client types enables fail-fast (development and querying) lots of different teams can work on the same production data set the same data can be split between different server groups (no impact across teams) ability to process data loading/unloading without impacting running queries zero copy clones and separate compute time travel we are running a hands on zero to snowflake session November 6th Details at end There are other MPP 2016 productive use
  • #7: TECHNOLOGY: This architecture diagram illustrates the core concepts Every user has access to the same data (subject to permissions) Data is stored once Teams can use “clones” of production data to carry out development on Different teams can use their own virtual warehouses (compute resources) Loading warehouse is generally about parallel file ingestion (particularly for legacy) CSV is the quickest File sizes should be about the same 10MB-100MB compressed maximum NUMBER OF FILES key 4 cores / 8 threads -> so eight files in parallel, one file per thread Adhoc analytics warehouse Scaled for query time responsiveness Multiple users, more clusters – resolves concurrency, prevents queued queries Development can scale up and down the warehouse to test different functions Proof of concept single small server adequate for view development We are running a hands on Zero to Snowflake session November 6th Bring a laptop
  • #8: TECHNOLOGY: Each virtual warehouse can have from 1 server per cluster (X-Small) to 128 servers per cluster (4X-Large) Each virtual warehouse cluster can be scaled out identically (up to 10 clusters) Automated Cluster Scale Out Single command line scale-up/down Result Cache persisted for 24 hours – reset each time the results are accessed, for up to 31 days Pay for only the compute used As a company we do not have to predict demand but are able to respond to it We are able to set limits and alerts around usage such that we can be pro-active with running costs
  • #9: TECHNOLOGY: The ETL processes of old where data took ages to load in series and had to be manipulated outside of the database are over Load (and reload) all the data, transform in situ – the Extract Load Transform process Transformation in situ does not require tooling, just a good understanding of SQL Snowflake allows the parallel loading of data files (streams and other feeds) The more nodes you have in the virtual server, the more threads you have to load files Query Result Cache – the cache is part of the snowflake service and returns previously calculated Disk slow but cheap, SSD cache proportional to virtual warehouse size nodes – disappears on ramp down Tuning – ramp up/ramp down Ramp-up for parallel ingestion Ramp out for concurrency – many BI report users
  • #10: Snowflake is an enabler We can ingest data very easily Bash Python Connectors SQL Scripts and S3 We can rapidly prototype and deploy data models Develop views in design stage Share data with customers As mentioned earlier, we are able to implement Proof of Concept warehouses (end-to-end) From four months to four days Data tables can be refined through continual iterations The scalability and speed allow experimentation and measurement before the outcome has to be fixed The investigation of data becomes a doing thing rather than a thinking thing – find issues quicker Flexibility in modelling We can use our “old skills” in designing the data model and generating the semantic layer for the client Notes: AWS Lambda limit 15 minutes EC2 Orchestrated Scripts – CRON / Python / Bash Apache Airflow
  • #12: Come and meet our Rock Star You learn from doing Other Notes: Forecasting using auto.arima - possible ARIMA models are searched through to find the best fit.  ARIMA is Auto-Regressive Integrated Moving Average