Managing Terabytes of Data with Amazon S3
Dhaval Nagar
AWS Hero, AWS SME
TRACK 01 - MODERN APPS - SERVERLESS
● Founder @ APPGAMBIT, AWS Consulting Partner
● 12x AWS Certified
● AWS Hero (since 2020)
● AWS Certification SME
● AWS Surat User Group Lead
● Practicing Barista
Introduction
Dhaval Nagar
Agenda
● Amazon S3 - Storage Powerhouse
● Use Case
● Storage Optimisation
● Storage Cost Breakdown
● Key Learnings
Amazon S3 - Storage Powerhouse
● One of the earliest and oldest AWS services, released in 2006
● Managed Object Storage Service
● Designed to be used over HTTPS APIs to do simple file operations (Simple Storage Service)
● Works as a backbone for many AWS Services
● Dozens of features from Different Storage Tiers, Versioning, Web Hosting, In-Place Data Queries,
Event Notifications, etc
● Manages over 100 trillion objects and manages millions of requests per second
● Famously known for 11 9’s of Durability
Use Case
● Application captures and processes Large 3D Video Files
● Requires Multi-tenant Storage with current data of over 150TB
● Amazon S3 is used to save the raw data files before processing
● Raw files are fewer in quantity but quite large in size
● Raw files are then processed to generate output files
● Output files are smaller in size but millions in quantity
● Key Objectives:
○ Achieve optimal storage organization based on the processing workflow
○ Generate Per-Tenant Storage cost breakdown
Job
Scheduled
(Auto / Manual)
Video Files
Uploaded
Processed Files
Validated
(Auto / Manual)
Original Video is
ready for
Archival
Use Case - Status
● Amazon S3 offers many storage classes like Standard, Standard
Infrequent Access, One-Zoned and different Cold Storage classes in
Glacier
● Each Tier emulates a real-world use case to keep data in specific
storage type
● Standard is the most common and the most expensive tier
● The starting point is to enable the Intelligent Tier!! However, that is
not the optimal choice in all conditions.
Storage Optimisation for Raw Files
S3 Standaard
S3 Intelligent-Tiering
S3 Standard-IA
S3 One Zone-IA
S3 Glacier Instant, Flexible, Retrieval
and Deep Archive
Intelligent Tier
● Intelligent Tier is perfect for unpredictable usage patterns
● Internally using multiple tiers to automatically move objects based on the age of the object
Intelligent Tier Internal Transitions
Frequent Access Same as Standard Tier This is default tier when object storage class is set
to Intelligent Tier
Infrequent Access Same as Standard Infrequent
Access Tier
IA is applied when an object is not accessed for
30 consecutive days
Archive Instant
Access
Same as Glacier Instant Object is archived if not access for
90 consecutive days
Archive Access Same as Glacier Flexible This is optional transition
Deep Archive Same as Glacier Deep This is optional transition
● With Intelligent Tier, will need to wait for next 30 days to switch the tier
● With Manual Storage Class update, the Infrequent Access can be switched directly and avoid the 30 days of
standard cost - which is 50%
● With average 10 TB of monthly uploads, the Standard tier will cost around $230 for new data
Reconsider Intelligent Tier for Predictable Flow
File Uploaded File Processed File Unused
Day 1 Day N+30
Day 3-5
Move to IA
File Uploaded File Processed
Day 1 Day 3-5
Move to IA
● Output files are configured to use the Standard Intelligent Tier
● The consumption of the processed images are infrequent and unpredictable
● Intelligent Tier gives the best of both the worlds
Storage Tier Decision Tree
(CloudHealth by VMware)
● Standard Infrequent Access is another possible storage option
● It is 50% cheaper compared to Standard Frequent Access tier
● However, AWS charges prorated monthly cost for the first 30 days if the objects in the
Infrequent Access tier are moved to another tier or deleted.
● Again for us the access pattern was not entirely, Infrequent.
● By applying multiple transition and calculation strategies, we were able to reduce around 40% of
storage cost per Month
● Current strategies are aligned with the business and operation flow, but they can change from time to
time
● Schedule Periodic Optimisation Exercises - Not Too Early, Not Too Late
Current State
● One bucket Per Customer is not a scalable solution (not a practical solution for SaaS company)
● S3 has hard limit of 1000 buckets Per Account
Per-Tenant Storage Breakdown
Storage Bucket
/tenant1/**
/tenant2/**
/tenantN/**
● AWS provides tags to organize account resources, and cost allocation tags to track your AWS costs
on a detailed level
● S3 allows to set Tags for Buckets as well as individual objects inside the bucket
● Cost Allocation Tags is the most efficient practice to create logical partitioning within the resources
● However, Tags has additional COST
Cost Analysis for Cost Analysis
● S3 allows to set Tags for individual objects
● For millions of objects settings multiple tags was not a desired solution
● While the cost is not huge, we needed to run additional automation on objects
● So we decided to build and store Object Metadata into CSV files and DynamoDB
Cost Analysis for Cost Analysis
$0.01 for 10,000 Tags
And we have Millions of Objects
● As large number of objects are located in the Intelligent Tier, using the LIST API is very cheap
● Metadata storage helps to build a Cost Breakdown Per Tenant with Per Storage Category
○ Count of Raw Files, Size and Cost
○ Count of Output Files, Size and Cost
● Helped to create granular cost reports like Per Month or Per Project
Cost Analysis for Cost Analysis
$0.01 for 10,000 Tags
Per Month
$0.0055 for 1000
LIST Object API
Each API call can return
Max 1000 Objects Entries
● Listing Objects is not a frequent operation, only required when we need to re-build the
Metadata
● This is a very tiny optimisation, but the output of Metadata helps in different analysis easily
● This only includes the Storage cost and there is no Data Transfer cost calculated
● Understand the flow and state transition of your data
● Do the periodic cost and operational analysis
● At scale, small optimisations results in big savings
● There will always be an opportunity to optimise, in cost or operations
● S3 is by far one of the most reliable AWS services - It was designed for the ultimate Performance,
Scale and Reliability.
Key Learnings
Thank You
Request to share feedback and join AWS User Groups

More Related Content

PPTX
AWS Cloud Cost Optimization Presentation.pptx
PDF
AWS S3 Cost Optimization
PPTX
AWS Cost Optimization Strategy
PPTX
AWS Amazon S3 Mastery Bootcamp
PDF
Deep Dive on Amazon S3 (May 2016)
PDF
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
PDF
Drive Down the Cost of your Data Lake by Using the Right Data Tiering
PDF
Builders' Day - Best Practises for S3 - BL
AWS Cloud Cost Optimization Presentation.pptx
AWS S3 Cost Optimization
AWS Cost Optimization Strategy
AWS Amazon S3 Mastery Bootcamp
Deep Dive on Amazon S3 (May 2016)
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
Drive Down the Cost of your Data Lake by Using the Right Data Tiering
Builders' Day - Best Practises for S3 - BL

Similar to ✅ Managing Terabytes of Data with Amazon S3.pdf (20)

PPTX
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
PPTX
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
PDF
Aws storage services whitepaper v9
PDF
AWS Certified Machine Learning Slides.pdf
PDF
Aws storage services whitepaper v9
PPSX
Amazon ec2 s3 dynamo db
PPTX
AWS Simple Storage Service (s3)
PDF
AWS Cost Optimization.pdf
PDF
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
PDF
Big Data Architecture and Design Patterns
PDF
Introduction 2 to aws and storage options
PPTX
Amazon_S3 (Simple storage service)_Presentation.pptx
PDF
Module 1 - CP Datalake on AWS
PDF
Brian Klaas - I didn't know s3 could do that
PPTX
AWS Meet-up Atlanta: AWS Economics
PPTX
Aws object storage and cdn(s3, glacier and cloud front) part 1
PPTX
Types of Cloud Storage and choosing the right solution
PPTX
Rethinking the database for the cloud (iJAWS)
PDF
Builders' Day - Building Data Lakes for Analytics On AWS LC
PDF
AWS Data Security And Reliability
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
Aws storage services whitepaper v9
AWS Certified Machine Learning Slides.pdf
Aws storage services whitepaper v9
Amazon ec2 s3 dynamo db
AWS Simple Storage Service (s3)
AWS Cost Optimization.pdf
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Big Data Architecture and Design Patterns
Introduction 2 to aws and storage options
Amazon_S3 (Simple storage service)_Presentation.pptx
Module 1 - CP Datalake on AWS
Brian Klaas - I didn't know s3 could do that
AWS Meet-up Atlanta: AWS Economics
Aws object storage and cdn(s3, glacier and cloud front) part 1
Types of Cloud Storage and choosing the right solution
Rethinking the database for the cloud (iJAWS)
Builders' Day - Building Data Lakes for Analytics On AWS LC
AWS Data Security And Reliability
Ad

More from Dhaval Nagar (20)

PDF
AWS Simple Storage Service Overview [June 2019]
PDF
Building Public and Business Alexa Skills [Aug 2019]
PDF
Serverless Day Zero: How to Serveless [July 2019]
PDF
Serverless Meetup - Authentication for Serverless Applications [Jul 2020]
PDF
Serverless Meetup - Getting started with AWS Cognito [Jul 2020]
PDF
Getting Started with DevOps on AWS [Mar 2020]
PDF
How to Prepare for your next AWS Certification Meetup [Jan 2020]
PDF
Introduction to AWS Cloud Databases [Apr 2020]
PDF
Jumpstart your idea with AWS Serverless [Oct 2020]
PDF
Amazon EventBridge - Unlocking Event Driven Architecture in AWS [Nov 2020]
PDF
Building Multi-channel Bot using AWS Serverless
PDF
AWS Communities | Times Techie Webinar Bengaluru
PDF
Dhaval Nagar - ServerlessDays Bengaluru 2023
PDF
eChai Developer Meetup | Cloud Native Learnings with AWS
PDF
2022 Presentation | Serverless Innovation with AWS
PDF
User Group Presentation | AWS 2022 Latest Release
PDF
2022 Presentation | Cloud Is The New Normal | Collage Students
PDF
GreatLearning Webinar - Microservices and Event-Driven Architecture.pdf
PDF
Amazon Q Developer - For Developer Productivity
PDF
Leveraging AWS Serverless, Amazon Bedrock and Generative AI for Textile Patte...
AWS Simple Storage Service Overview [June 2019]
Building Public and Business Alexa Skills [Aug 2019]
Serverless Day Zero: How to Serveless [July 2019]
Serverless Meetup - Authentication for Serverless Applications [Jul 2020]
Serverless Meetup - Getting started with AWS Cognito [Jul 2020]
Getting Started with DevOps on AWS [Mar 2020]
How to Prepare for your next AWS Certification Meetup [Jan 2020]
Introduction to AWS Cloud Databases [Apr 2020]
Jumpstart your idea with AWS Serverless [Oct 2020]
Amazon EventBridge - Unlocking Event Driven Architecture in AWS [Nov 2020]
Building Multi-channel Bot using AWS Serverless
AWS Communities | Times Techie Webinar Bengaluru
Dhaval Nagar - ServerlessDays Bengaluru 2023
eChai Developer Meetup | Cloud Native Learnings with AWS
2022 Presentation | Serverless Innovation with AWS
User Group Presentation | AWS 2022 Latest Release
2022 Presentation | Cloud Is The New Normal | Collage Students
GreatLearning Webinar - Microservices and Event-Driven Architecture.pdf
Amazon Q Developer - For Developer Productivity
Leveraging AWS Serverless, Amazon Bedrock and Generative AI for Textile Patte...
Ad

Recently uploaded (20)

PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Architecture types and enterprise applications.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Getting Started with Data Integration: FME Form 101
PDF
Hybrid model detection and classification of lung cancer
PDF
August Patch Tuesday
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Unlock new opportunities with location data.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
observCloud-Native Containerability and monitoring.pptx
PPTX
Modernising the Digital Integration Hub
PPT
Geologic Time for studying geology for geologist
PDF
Getting started with AI Agents and Multi-Agent Systems
DOCX
search engine optimization ppt fir known well about this
PDF
STKI Israel Market Study 2025 version august
PDF
Five Habits of High-Impact Board Members
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Zenith AI: Advanced Artificial Intelligence
Architecture types and enterprise applications.pdf
A novel scalable deep ensemble learning framework for big data classification...
Getting Started with Data Integration: FME Form 101
Hybrid model detection and classification of lung cancer
August Patch Tuesday
NewMind AI Weekly Chronicles – August ’25 Week III
Unlock new opportunities with location data.pdf
Tartificialntelligence_presentation.pptx
CloudStack 4.21: First Look Webinar slides
Chapter 5: Probability Theory and Statistics
observCloud-Native Containerability and monitoring.pptx
Modernising the Digital Integration Hub
Geologic Time for studying geology for geologist
Getting started with AI Agents and Multi-Agent Systems
search engine optimization ppt fir known well about this
STKI Israel Market Study 2025 version august
Five Habits of High-Impact Board Members
Final SEM Unit 1 for mit wpu at pune .pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf

✅ Managing Terabytes of Data with Amazon S3.pdf

  • 1. Managing Terabytes of Data with Amazon S3 Dhaval Nagar AWS Hero, AWS SME TRACK 01 - MODERN APPS - SERVERLESS
  • 2. ● Founder @ APPGAMBIT, AWS Consulting Partner ● 12x AWS Certified ● AWS Hero (since 2020) ● AWS Certification SME ● AWS Surat User Group Lead ● Practicing Barista Introduction Dhaval Nagar
  • 3. Agenda ● Amazon S3 - Storage Powerhouse ● Use Case ● Storage Optimisation ● Storage Cost Breakdown ● Key Learnings
  • 4. Amazon S3 - Storage Powerhouse ● One of the earliest and oldest AWS services, released in 2006 ● Managed Object Storage Service ● Designed to be used over HTTPS APIs to do simple file operations (Simple Storage Service) ● Works as a backbone for many AWS Services ● Dozens of features from Different Storage Tiers, Versioning, Web Hosting, In-Place Data Queries, Event Notifications, etc ● Manages over 100 trillion objects and manages millions of requests per second ● Famously known for 11 9’s of Durability
  • 5. Use Case ● Application captures and processes Large 3D Video Files ● Requires Multi-tenant Storage with current data of over 150TB ● Amazon S3 is used to save the raw data files before processing ● Raw files are fewer in quantity but quite large in size ● Raw files are then processed to generate output files ● Output files are smaller in size but millions in quantity ● Key Objectives: ○ Achieve optimal storage organization based on the processing workflow ○ Generate Per-Tenant Storage cost breakdown
  • 6. Job Scheduled (Auto / Manual) Video Files Uploaded Processed Files Validated (Auto / Manual) Original Video is ready for Archival Use Case - Status
  • 7. ● Amazon S3 offers many storage classes like Standard, Standard Infrequent Access, One-Zoned and different Cold Storage classes in Glacier ● Each Tier emulates a real-world use case to keep data in specific storage type ● Standard is the most common and the most expensive tier ● The starting point is to enable the Intelligent Tier!! However, that is not the optimal choice in all conditions. Storage Optimisation for Raw Files S3 Standaard S3 Intelligent-Tiering S3 Standard-IA S3 One Zone-IA S3 Glacier Instant, Flexible, Retrieval and Deep Archive
  • 8. Intelligent Tier ● Intelligent Tier is perfect for unpredictable usage patterns ● Internally using multiple tiers to automatically move objects based on the age of the object
  • 9. Intelligent Tier Internal Transitions Frequent Access Same as Standard Tier This is default tier when object storage class is set to Intelligent Tier Infrequent Access Same as Standard Infrequent Access Tier IA is applied when an object is not accessed for 30 consecutive days Archive Instant Access Same as Glacier Instant Object is archived if not access for 90 consecutive days Archive Access Same as Glacier Flexible This is optional transition Deep Archive Same as Glacier Deep This is optional transition
  • 10. ● With Intelligent Tier, will need to wait for next 30 days to switch the tier ● With Manual Storage Class update, the Infrequent Access can be switched directly and avoid the 30 days of standard cost - which is 50% ● With average 10 TB of monthly uploads, the Standard tier will cost around $230 for new data Reconsider Intelligent Tier for Predictable Flow File Uploaded File Processed File Unused Day 1 Day N+30 Day 3-5 Move to IA File Uploaded File Processed Day 1 Day 3-5 Move to IA
  • 11. ● Output files are configured to use the Standard Intelligent Tier ● The consumption of the processed images are infrequent and unpredictable ● Intelligent Tier gives the best of both the worlds
  • 12. Storage Tier Decision Tree (CloudHealth by VMware)
  • 13. ● Standard Infrequent Access is another possible storage option ● It is 50% cheaper compared to Standard Frequent Access tier ● However, AWS charges prorated monthly cost for the first 30 days if the objects in the Infrequent Access tier are moved to another tier or deleted. ● Again for us the access pattern was not entirely, Infrequent.
  • 14. ● By applying multiple transition and calculation strategies, we were able to reduce around 40% of storage cost per Month ● Current strategies are aligned with the business and operation flow, but they can change from time to time ● Schedule Periodic Optimisation Exercises - Not Too Early, Not Too Late Current State
  • 15. ● One bucket Per Customer is not a scalable solution (not a practical solution for SaaS company) ● S3 has hard limit of 1000 buckets Per Account Per-Tenant Storage Breakdown Storage Bucket /tenant1/** /tenant2/** /tenantN/**
  • 16. ● AWS provides tags to organize account resources, and cost allocation tags to track your AWS costs on a detailed level ● S3 allows to set Tags for Buckets as well as individual objects inside the bucket ● Cost Allocation Tags is the most efficient practice to create logical partitioning within the resources ● However, Tags has additional COST Cost Analysis for Cost Analysis
  • 17. ● S3 allows to set Tags for individual objects ● For millions of objects settings multiple tags was not a desired solution ● While the cost is not huge, we needed to run additional automation on objects ● So we decided to build and store Object Metadata into CSV files and DynamoDB Cost Analysis for Cost Analysis $0.01 for 10,000 Tags And we have Millions of Objects
  • 18. ● As large number of objects are located in the Intelligent Tier, using the LIST API is very cheap ● Metadata storage helps to build a Cost Breakdown Per Tenant with Per Storage Category ○ Count of Raw Files, Size and Cost ○ Count of Output Files, Size and Cost ● Helped to create granular cost reports like Per Month or Per Project Cost Analysis for Cost Analysis $0.01 for 10,000 Tags Per Month $0.0055 for 1000 LIST Object API Each API call can return Max 1000 Objects Entries
  • 19. ● Listing Objects is not a frequent operation, only required when we need to re-build the Metadata ● This is a very tiny optimisation, but the output of Metadata helps in different analysis easily ● This only includes the Storage cost and there is no Data Transfer cost calculated
  • 20. ● Understand the flow and state transition of your data ● Do the periodic cost and operational analysis ● At scale, small optimisations results in big savings ● There will always be an opportunity to optimise, in cost or operations ● S3 is by far one of the most reliable AWS services - It was designed for the ultimate Performance, Scale and Reliability. Key Learnings
  • 21. Thank You Request to share feedback and join AWS User Groups