SlideShare a Scribd company logo
Managing Terabytes of Data with Amazon S3
Dhaval Nagar
AWS Hero, AWS SME
TRACK 01 - MODERN APPS - SERVERLESS
● Founder @ APPGAMBIT, AWS Consulting Partner
● 12x AWS Certified
● AWS Hero (since 2020)
● AWS Certification SME
● AWS Surat User Group Lead
● Practicing Barista
Introduction
Dhaval Nagar
Agenda
● Amazon S3 - Storage Powerhouse
● Use Case
● Storage Optimisation
● Storage Cost Breakdown
● Key Learnings
Amazon S3 - Storage Powerhouse
● One of the earliest and oldest AWS services, released in 2006
● Managed Object Storage Service
● Designed to be used over HTTPS APIs to do simple file operations (Simple Storage Service)
● Works as a backbone for many AWS Services
● Dozens of features from Different Storage Tiers, Versioning, Web Hosting, In-Place Data Queries,
Event Notifications, etc
● Manages over 100 trillion objects and manages millions of requests per second
● Famously known for 11 9’s of Durability
Use Case
● Application captures and processes Large 3D Video Files
● Requires Multi-tenant Storage with current data of over 150TB
● Amazon S3 is used to save the raw data files before processing
● Raw files are fewer in quantity but quite large in size
● Raw files are then processed to generate output files
● Output files are smaller in size but millions in quantity
● Key Objectives:
○ Achieve optimal storage organization based on the processing workflow
○ Generate Per-Tenant Storage cost breakdown
Job
Scheduled
(Auto / Manual)
Video Files
Uploaded
Processed Files
Validated
(Auto / Manual)
Original Video is
ready for
Archival
Use Case - Status
● Amazon S3 offers many storage classes like Standard, Standard
Infrequent Access, One-Zoned and different Cold Storage classes in
Glacier
● Each Tier emulates a real-world use case to keep data in specific
storage type
● Standard is the most common and the most expensive tier
● The starting point is to enable the Intelligent Tier!! However, that is
not the optimal choice in all conditions.
Storage Optimisation for Raw Files
S3 Standaard
S3 Intelligent-Tiering
S3 Standard-IA
S3 One Zone-IA
S3 Glacier Instant, Flexible, Retrieval
and Deep Archive
Intelligent Tier
● Intelligent Tier is perfect for unpredictable usage patterns
● Internally using multiple tiers to automatically move objects based on the age of the object
Intelligent Tier Internal Transitions
Frequent Access Same as Standard Tier This is default tier when object storage class is set
to Intelligent Tier
Infrequent Access Same as Standard Infrequent
Access Tier
IA is applied when an object is not accessed for
30 consecutive days
Archive Instant
Access
Same as Glacier Instant Object is archived if not access for
90 consecutive days
Archive Access Same as Glacier Flexible This is optional transition
Deep Archive Same as Glacier Deep This is optional transition
● With Intelligent Tier, will need to wait for next 30 days to switch the tier
● With Manual Storage Class update, the Infrequent Access can be switched directly and avoid the 30 days of
standard cost - which is 50%
● With average 10 TB of monthly uploads, the Standard tier will cost around $230 for new data
Reconsider Intelligent Tier for Predictable Flow
File Uploaded File Processed File Unused
Day 1 Day N+30
Day 3-5
Move to IA
File Uploaded File Processed
Day 1 Day 3-5
Move to IA
● Output files are configured to use the Standard Intelligent Tier
● The consumption of the processed images are infrequent and unpredictable
● Intelligent Tier gives the best of both the worlds
Storage Tier Decision Tree
(CloudHealth by VMware)
● Standard Infrequent Access is another possible storage option
● It is 50% cheaper compared to Standard Frequent Access tier
● However, AWS charges prorated monthly cost for the first 30 days if the objects in the
Infrequent Access tier are moved to another tier or deleted.
● Again for us the access pattern was not entirely, Infrequent.
● By applying multiple transition and calculation strategies, we were able to reduce around 40% of
storage cost per Month
● Current strategies are aligned with the business and operation flow, but they can change from time to
time
● Schedule Periodic Optimisation Exercises - Not Too Early, Not Too Late
Current State
● One bucket Per Customer is not a scalable solution (not a practical solution for SaaS company)
● S3 has hard limit of 1000 buckets Per Account
Per-Tenant Storage Breakdown
Storage Bucket
/tenant1/**
/tenant2/**
/tenantN/**
● AWS provides tags to organize account resources, and cost allocation tags to track your AWS costs
on a detailed level
● S3 allows to set Tags for Buckets as well as individual objects inside the bucket
● Cost Allocation Tags is the most efficient practice to create logical partitioning within the resources
● However, Tags has additional COST
Cost Analysis for Cost Analysis
● S3 allows to set Tags for individual objects
● For millions of objects settings multiple tags was not a desired solution
● While the cost is not huge, we needed to run additional automation on objects
● So we decided to build and store Object Metadata into CSV files and DynamoDB
Cost Analysis for Cost Analysis
$0.01 for 10,000 Tags
And we have Millions of Objects
● As large number of objects are located in the Intelligent Tier, using the LIST API is very cheap
● Metadata storage helps to build a Cost Breakdown Per Tenant with Per Storage Category
○ Count of Raw Files, Size and Cost
○ Count of Output Files, Size and Cost
● Helped to create granular cost reports like Per Month or Per Project
Cost Analysis for Cost Analysis
$0.01 for 10,000 Tags
Per Month
$0.0055 for 1000
LIST Object API
Each API call can return
Max 1000 Objects Entries
● Listing Objects is not a frequent operation, only required when we need to re-build the
Metadata
● This is a very tiny optimisation, but the output of Metadata helps in different analysis easily
● This only includes the Storage cost and there is no Data Transfer cost calculated
● Understand the flow and state transition of your data
● Do the periodic cost and operational analysis
● At scale, small optimisations results in big savings
● There will always be an opportunity to optimise, in cost or operations
● S3 is by far one of the most reliable AWS services - It was designed for the ultimate Performance,
Scale and Reliability.
Key Learnings
Thank You
Request to share feedback and join AWS User Groups

More Related Content

PDF
AWS S3 Cost Optimization
Eric Kim
 
PPTX
AWS Cost Optimization Strategy
Robert Sell
 
PPTX
AWS Amazon S3 Mastery Bootcamp
Matt Bohn
 
PDF
Deep Dive on Amazon S3 (May 2016)
Julien SIMON
 
PDF
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
QCloudMentor
 
PDF
Drive Down the Cost of your Data Lake by Using the Right Data Tiering
Boaz Ziniman
 
PPTX
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
PDF
Builders' Day - Best Practises for S3 - BL
Amazon Web Services LATAM
 
AWS S3 Cost Optimization
Eric Kim
 
AWS Cost Optimization Strategy
Robert Sell
 
AWS Amazon S3 Mastery Bootcamp
Matt Bohn
 
Deep Dive on Amazon S3 (May 2016)
Julien SIMON
 
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
QCloudMentor
 
Drive Down the Cost of your Data Lake by Using the Right Data Tiering
Boaz Ziniman
 
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Builders' Day - Best Practises for S3 - BL
Amazon Web Services LATAM
 

Similar to ✅ Managing Terabytes of Data with Amazon S3.pdf (20)

PPTX
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
Simplilearn
 
PPTX
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Adrian Hornsby
 
PDF
Aws storage services whitepaper v9
Victor Insunza
 
PDF
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
PDF
Aws storage services whitepaper v9
saifam
 
PPSX
Amazon ec2 s3 dynamo db
Pankaj Thakur
 
PPTX
AWS Simple Storage Service (s3)
zekeLabs Technologies
 
PDF
AWS Cost Optimization.pdf
Zen Bit Tech
 
PDF
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Amazon Web Services Korea
 
PDF
Big Data Architecture and Design Patterns
John Yeung
 
PDF
Introduction 2 to aws and storage options
Szilveszter Molnár
 
PPTX
Amazon_S3 (Simple storage service)_Presentation.pptx
HarmanjitSingh62
 
PDF
Module 1 - CP Datalake on AWS
Lam Le
 
PDF
Brian Klaas - I didn't know s3 could do that
Ortus Solutions, Corp
 
PPTX
AWS Meet-up Atlanta: AWS Economics
Aaron Klein
 
PPTX
Aws object storage and cdn(s3, glacier and cloud front) part 1
Parag Patil
 
PPTX
Types of Cloud Storage and choosing the right solution
Vrishali Sanglikar
 
PPTX
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
 
PDF
Builders' Day - Building Data Lakes for Analytics On AWS LC
Amazon Web Services LATAM
 
PDF
AWS Data Security And Reliability
Intelligentia IT Systems Pvt. Ltd.
 
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
Simplilearn
 
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Adrian Hornsby
 
Aws storage services whitepaper v9
Victor Insunza
 
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Aws storage services whitepaper v9
saifam
 
Amazon ec2 s3 dynamo db
Pankaj Thakur
 
AWS Simple Storage Service (s3)
zekeLabs Technologies
 
AWS Cost Optimization.pdf
Zen Bit Tech
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Amazon Web Services Korea
 
Big Data Architecture and Design Patterns
John Yeung
 
Introduction 2 to aws and storage options
Szilveszter Molnár
 
Amazon_S3 (Simple storage service)_Presentation.pptx
HarmanjitSingh62
 
Module 1 - CP Datalake on AWS
Lam Le
 
Brian Klaas - I didn't know s3 could do that
Ortus Solutions, Corp
 
AWS Meet-up Atlanta: AWS Economics
Aaron Klein
 
Aws object storage and cdn(s3, glacier and cloud front) part 1
Parag Patil
 
Types of Cloud Storage and choosing the right solution
Vrishali Sanglikar
 
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Amazon Web Services LATAM
 
AWS Data Security And Reliability
Intelligentia IT Systems Pvt. Ltd.
 
Ad

More from Dhaval Nagar (20)

PDF
AWS Simple Storage Service Overview [June 2019]
Dhaval Nagar
 
PDF
Building Public and Business Alexa Skills [Aug 2019]
Dhaval Nagar
 
PDF
Serverless Day Zero: How to Serveless [July 2019]
Dhaval Nagar
 
PDF
Serverless Meetup - Authentication for Serverless Applications [Jul 2020]
Dhaval Nagar
 
PDF
Serverless Meetup - Getting started with AWS Cognito [Jul 2020]
Dhaval Nagar
 
PDF
Getting Started with DevOps on AWS [Mar 2020]
Dhaval Nagar
 
PDF
How to Prepare for your next AWS Certification Meetup [Jan 2020]
Dhaval Nagar
 
PDF
Introduction to AWS Cloud Databases [Apr 2020]
Dhaval Nagar
 
PDF
Jumpstart your idea with AWS Serverless [Oct 2020]
Dhaval Nagar
 
PDF
Amazon EventBridge - Unlocking Event Driven Architecture in AWS [Nov 2020]
Dhaval Nagar
 
PDF
Building Multi-channel Bot using AWS Serverless
Dhaval Nagar
 
PDF
AWS Communities | Times Techie Webinar Bengaluru
Dhaval Nagar
 
PDF
Dhaval Nagar - ServerlessDays Bengaluru 2023
Dhaval Nagar
 
PDF
eChai Developer Meetup | Cloud Native Learnings with AWS
Dhaval Nagar
 
PDF
2022 Presentation | Serverless Innovation with AWS
Dhaval Nagar
 
PDF
User Group Presentation | AWS 2022 Latest Release
Dhaval Nagar
 
PDF
2022 Presentation | Cloud Is The New Normal | Collage Students
Dhaval Nagar
 
PDF
GreatLearning Webinar - Microservices and Event-Driven Architecture.pdf
Dhaval Nagar
 
PDF
Amazon Q Developer - For Developer Productivity
Dhaval Nagar
 
PDF
Leveraging AWS Serverless, Amazon Bedrock and Generative AI for Textile Patte...
Dhaval Nagar
 
AWS Simple Storage Service Overview [June 2019]
Dhaval Nagar
 
Building Public and Business Alexa Skills [Aug 2019]
Dhaval Nagar
 
Serverless Day Zero: How to Serveless [July 2019]
Dhaval Nagar
 
Serverless Meetup - Authentication for Serverless Applications [Jul 2020]
Dhaval Nagar
 
Serverless Meetup - Getting started with AWS Cognito [Jul 2020]
Dhaval Nagar
 
Getting Started with DevOps on AWS [Mar 2020]
Dhaval Nagar
 
How to Prepare for your next AWS Certification Meetup [Jan 2020]
Dhaval Nagar
 
Introduction to AWS Cloud Databases [Apr 2020]
Dhaval Nagar
 
Jumpstart your idea with AWS Serverless [Oct 2020]
Dhaval Nagar
 
Amazon EventBridge - Unlocking Event Driven Architecture in AWS [Nov 2020]
Dhaval Nagar
 
Building Multi-channel Bot using AWS Serverless
Dhaval Nagar
 
AWS Communities | Times Techie Webinar Bengaluru
Dhaval Nagar
 
Dhaval Nagar - ServerlessDays Bengaluru 2023
Dhaval Nagar
 
eChai Developer Meetup | Cloud Native Learnings with AWS
Dhaval Nagar
 
2022 Presentation | Serverless Innovation with AWS
Dhaval Nagar
 
User Group Presentation | AWS 2022 Latest Release
Dhaval Nagar
 
2022 Presentation | Cloud Is The New Normal | Collage Students
Dhaval Nagar
 
GreatLearning Webinar - Microservices and Event-Driven Architecture.pdf
Dhaval Nagar
 
Amazon Q Developer - For Developer Productivity
Dhaval Nagar
 
Leveraging AWS Serverless, Amazon Bedrock and Generative AI for Textile Patte...
Dhaval Nagar
 
Ad

Recently uploaded (20)

PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
PDF
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Software Development Methodologies in 2025
KodekX
 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
This slide provides an overview Technology
mineshkharadi333
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Chapter 1 Introduction to CV and IP Lecture Note.pdf
Getnet Tigabie Askale -(GM)
 
Why Your AI & Cybersecurity Hiring Still Misses the Mark in 2025
Virtual Employee Pvt. Ltd.
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
CIFDAQ'S Market Insight: BTC to ETH money in motion
CIFDAQ
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Smart Infrastructure and Automation through IoT Sensors
Rejig Digital
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Chapter 2 Digital Image Fundamentals.pdf
Getnet Tigabie Askale -(GM)
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 

✅ Managing Terabytes of Data with Amazon S3.pdf

  • 1. Managing Terabytes of Data with Amazon S3 Dhaval Nagar AWS Hero, AWS SME TRACK 01 - MODERN APPS - SERVERLESS
  • 2. ● Founder @ APPGAMBIT, AWS Consulting Partner ● 12x AWS Certified ● AWS Hero (since 2020) ● AWS Certification SME ● AWS Surat User Group Lead ● Practicing Barista Introduction Dhaval Nagar
  • 3. Agenda ● Amazon S3 - Storage Powerhouse ● Use Case ● Storage Optimisation ● Storage Cost Breakdown ● Key Learnings
  • 4. Amazon S3 - Storage Powerhouse ● One of the earliest and oldest AWS services, released in 2006 ● Managed Object Storage Service ● Designed to be used over HTTPS APIs to do simple file operations (Simple Storage Service) ● Works as a backbone for many AWS Services ● Dozens of features from Different Storage Tiers, Versioning, Web Hosting, In-Place Data Queries, Event Notifications, etc ● Manages over 100 trillion objects and manages millions of requests per second ● Famously known for 11 9’s of Durability
  • 5. Use Case ● Application captures and processes Large 3D Video Files ● Requires Multi-tenant Storage with current data of over 150TB ● Amazon S3 is used to save the raw data files before processing ● Raw files are fewer in quantity but quite large in size ● Raw files are then processed to generate output files ● Output files are smaller in size but millions in quantity ● Key Objectives: ○ Achieve optimal storage organization based on the processing workflow ○ Generate Per-Tenant Storage cost breakdown
  • 6. Job Scheduled (Auto / Manual) Video Files Uploaded Processed Files Validated (Auto / Manual) Original Video is ready for Archival Use Case - Status
  • 7. ● Amazon S3 offers many storage classes like Standard, Standard Infrequent Access, One-Zoned and different Cold Storage classes in Glacier ● Each Tier emulates a real-world use case to keep data in specific storage type ● Standard is the most common and the most expensive tier ● The starting point is to enable the Intelligent Tier!! However, that is not the optimal choice in all conditions. Storage Optimisation for Raw Files S3 Standaard S3 Intelligent-Tiering S3 Standard-IA S3 One Zone-IA S3 Glacier Instant, Flexible, Retrieval and Deep Archive
  • 8. Intelligent Tier ● Intelligent Tier is perfect for unpredictable usage patterns ● Internally using multiple tiers to automatically move objects based on the age of the object
  • 9. Intelligent Tier Internal Transitions Frequent Access Same as Standard Tier This is default tier when object storage class is set to Intelligent Tier Infrequent Access Same as Standard Infrequent Access Tier IA is applied when an object is not accessed for 30 consecutive days Archive Instant Access Same as Glacier Instant Object is archived if not access for 90 consecutive days Archive Access Same as Glacier Flexible This is optional transition Deep Archive Same as Glacier Deep This is optional transition
  • 10. ● With Intelligent Tier, will need to wait for next 30 days to switch the tier ● With Manual Storage Class update, the Infrequent Access can be switched directly and avoid the 30 days of standard cost - which is 50% ● With average 10 TB of monthly uploads, the Standard tier will cost around $230 for new data Reconsider Intelligent Tier for Predictable Flow File Uploaded File Processed File Unused Day 1 Day N+30 Day 3-5 Move to IA File Uploaded File Processed Day 1 Day 3-5 Move to IA
  • 11. ● Output files are configured to use the Standard Intelligent Tier ● The consumption of the processed images are infrequent and unpredictable ● Intelligent Tier gives the best of both the worlds
  • 12. Storage Tier Decision Tree (CloudHealth by VMware)
  • 13. ● Standard Infrequent Access is another possible storage option ● It is 50% cheaper compared to Standard Frequent Access tier ● However, AWS charges prorated monthly cost for the first 30 days if the objects in the Infrequent Access tier are moved to another tier or deleted. ● Again for us the access pattern was not entirely, Infrequent.
  • 14. ● By applying multiple transition and calculation strategies, we were able to reduce around 40% of storage cost per Month ● Current strategies are aligned with the business and operation flow, but they can change from time to time ● Schedule Periodic Optimisation Exercises - Not Too Early, Not Too Late Current State
  • 15. ● One bucket Per Customer is not a scalable solution (not a practical solution for SaaS company) ● S3 has hard limit of 1000 buckets Per Account Per-Tenant Storage Breakdown Storage Bucket /tenant1/** /tenant2/** /tenantN/**
  • 16. ● AWS provides tags to organize account resources, and cost allocation tags to track your AWS costs on a detailed level ● S3 allows to set Tags for Buckets as well as individual objects inside the bucket ● Cost Allocation Tags is the most efficient practice to create logical partitioning within the resources ● However, Tags has additional COST Cost Analysis for Cost Analysis
  • 17. ● S3 allows to set Tags for individual objects ● For millions of objects settings multiple tags was not a desired solution ● While the cost is not huge, we needed to run additional automation on objects ● So we decided to build and store Object Metadata into CSV files and DynamoDB Cost Analysis for Cost Analysis $0.01 for 10,000 Tags And we have Millions of Objects
  • 18. ● As large number of objects are located in the Intelligent Tier, using the LIST API is very cheap ● Metadata storage helps to build a Cost Breakdown Per Tenant with Per Storage Category ○ Count of Raw Files, Size and Cost ○ Count of Output Files, Size and Cost ● Helped to create granular cost reports like Per Month or Per Project Cost Analysis for Cost Analysis $0.01 for 10,000 Tags Per Month $0.0055 for 1000 LIST Object API Each API call can return Max 1000 Objects Entries
  • 19. ● Listing Objects is not a frequent operation, only required when we need to re-build the Metadata ● This is a very tiny optimisation, but the output of Metadata helps in different analysis easily ● This only includes the Storage cost and there is no Data Transfer cost calculated
  • 20. ● Understand the flow and state transition of your data ● Do the periodic cost and operational analysis ● At scale, small optimisations results in big savings ● There will always be an opportunity to optimise, in cost or operations ● S3 is by far one of the most reliable AWS services - It was designed for the ultimate Performance, Scale and Reliability. Key Learnings
  • 21. Thank You Request to share feedback and join AWS User Groups