© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deep Dive on Amazon S3
Julien Simon, Principal Technical Evangelist, AWS
julsimon@amazon.fr - @julsimon
Loke Dupont, Head of Services, Xstream A/S
loke@xstream.net
Agenda
•  Introduction
•  Case study: Xstream A/S
•  Amazon S3 Standard-Infrequent Access
•  Amazon S3 Lifecycle Policies
•  Amazon S3 Versioning
•  Amazon S3 Performance & Transfer Acceleration
Happy birthday, S3
S3: our customer promise
Durable
99.999999999%
Available
Designed for 99.99%
Scalable
Gigabytes à Exabytes
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Amazon S3 for video ingestion
Loke Dupont, Head of Services, Xstream A/S
loke@xstream.net
What does Xstream do?
Xstream is an online video platform provider. We sell
OVP’s to broadcasters, ISP’s, cable companies etc.
What we are trying to provide is a ”white-label Netflix” that
our customers can use to provide video services to end
users.
As part of that delivery, we ingest large amounts of video.
Challenges of ingesting video
Challenges of premium video ingestion
•  Very large files (upwards of several hundred GB)
•  Content security is extremely important
•  Content integrity is very important (no video corruption)
•  Content often arrive in batches of 1000+ videos
•  Content needs to be available to all ingest processes
Ingest workflow
Decrypt Transcode Packaging DRM Upload
Ingest architecture
Ingest DatabaseWorkflow ManagerIngest API
Queue
Workers WorkersWorkers Workers Workers
•  Amazon RDS
MySQL instance for
data
•  Running 100% on
Amazon EC2
instances
•  Planning to replace
EC2 with AWS
Lambda and Amazon
SQS
How does Amazon S3 help?
Amazon S3 real world usage
In April, in just one region, we had:
•  300 TB/month of short term storage in S3
•  62 million PUT/COPY/POST/LIST requests
•  55 million GET requests
In the same region we had 848 TB of Amazon Glacier long
term archive storage
Previous workflow vs. Amazon S3
Previous workflow
•  Large files moved between
machines
•  Managing access had to be
done pr. machine
•  Disk space had to be
managed carefully
•  Encryption at rest was tricky
•  Constant file integrity
checks
Amazon S3
•  Files always accessible
on Amazon S3
•  Managing access for bucket
using policy and Amazon IAM
•  Running our of space,
practically impossible
•  Encryption is easy
•  S3 checks integrity for us,
using checksums
What else do we get for free?
Versioning, which allows us to retrieve deleted and modified
objects.
Easy Amazon Glacier integration for long term content
archiving of “mezzanine” assets. Alternatively Amazon S3-IA
could be used.
Event notifications using Amazon SNS, Amazon SQS
and AWS Lambda
Demo
Amazon S3 events
& AWS Lambda
Sample code: http://cloudvideo.link/lambda.zip
Lesser known Amazon S3 features – Bucket tagging
Bucket tagging is a great
feature for cost
allocation.
Assign custom tags to
your bucket and they can
be used to separate cost
pr. customer or pr.
project.
Getting cost with tags
Setup Cost allocation tags in
preferences.
Using the AWS Cost Explorer
to create a new report.
Select filter by “tags” and
select the tag you want to filter
by.
Lesser known Amazon S3 features – Lifecycle
Use the lifecycle feature to
automatically transition to the
Amazon S3 Infrequent Access
storage class, or even to
Amazon Glacier.
Be careful about retrieval costs
especially when using Amazon
Glacier backed storage.
Lessons learned
Lessons learned from Amazon Glacier
Verify archive creation before
deleting data.
Retrieval is priced by “peak
rate” – spread it out
Retrieval has several hours
latency
AWS Storage cost comparison
Things we wish we new earlier
•  Don’t use Amazon S3 filesystem wrappers
•  Use Amazon IAM roles whenever possible
•  If there is an AWS service for it, use that!
•  Auto scaling, auto scaling, auto scaling
Amazon S3
Standard-IA
Expired object delete
marker
Incomplete multipart
upload expiration
Lifecycle policy & Versioning
Object naming
Multipart operations
Transfer
Acceleration
Continuous Innovation for Amazon S3
Performance
16/3 16/3
19/4
September
2015
S3 Infrequent Access
Choice of storage classes on Amazon S3
Standard
Active data Archive dataInfrequently accessed data
Standard - Infrequent Access Amazon Glacier
11 9s of durability
Standard-Infrequent Access storage
Designed for
99.9% availability
Durable Available
Same throughput as
Amazon S3 Standard storage
High performance
•  Server-side encryption
•  Use your encryption keys
•  KMS-managed encryption keys
Secure
•  Lifecycle management
•  Versioning
•  Event notifications
•  Metrics
Integrated
•  No impact on user
experience
•  Simple REST API
•  Single bucket
Easy to use
Management policies
Lifecycle policies
•  Automatic tiering and cost controls
•  Includes two possible actions:
•  Transition: archives to Standard-IA
or Amazon Glacier after specified time
•  Expiration: deletes objects after specified time
•  Allows for actions to be combined
•  Set policies at the prefix level
aws s3api put-bucket-lifecycle-configuration 

--bucket BUCKET_NAME
--lifecycle-configuration file://LIFECYCLE_JSON_FILE
Standard Storage -> Standard-IA
"Rules": [
{
"Status": "Enabled",
"Prefix": ”old_files",
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 365,
"StorageClass": "GLACIER"
}
],
"ID": ”lifecycle_rule",
}
]
Standard à Standard-IA
"Rules": [
{
"Status": "Enabled",
"Prefix": ”old_files",
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 365,
"StorageClass": "GLACIER"
}
],
"ID": ”lifecycle_rule",
}
]
Standard-IA -> Amazon Glacier
Standard-IA à Amazon Glacier
Standard Storage -> Standard-IA
Versioning S3 buckets
•  Protects from accidental overwrites and deletes
•  New version with every upload
•  Easy retrieval and rollback of deleted objects
•  Three states of an Amazon S3 bucket
•  No versioning (default)
•  Versioning enabled
•  Versioning suspended
{
"Status": "Enabled",
"MFADelete": "Disabled"
}
aws s3api put-bucket-versioning 

--bucket BUCKET_NAME
--versioning-configuration 

file://VERSIONING_JSON_FILE
Restricting deletes
•  For additional security, enable MFA (multi-factor
authentication) in order to require additional
authentication to:
•  Change the versioning state of your bucket
•  Permanently delete an object version
•  MFA delete requires both your security credentials and a
code from an approved authentication device
"Rules": [
{
…
"Expiration": {
"Days": 60
},
"NoncurrentVersionExpiration": {
"NoncurrentDays": 30
}
]
}
Lifecycle policy to expire versioned objects
Current version will expire after 60
days.
Older versions will be permanently
deleted after 30 days.
Delete markers
•  Deleting a versioned object puts a delete
marker on the current version of the object
•  No storage charge for delete marker
•  No need to keep delete markers when all
versions have expired (they slow down LIST
operations)
•  Use a lifecycle policy to automatically remove
the delete marker when previous versions of
the object no longer exist
"Rules": [
{
…
"Expiration": {
"Days": 60,
"ExpiredObjectDeleteMarker" : true
},
"NoncurrentVersionExpiration": {
"NoncurrentDays": 30
}
]
}
Lifecycle policy to expire delete markers
Current version will expire after 60
days. A delete marker will be placed
and expire after 60 days.
Older versions will be permanently
deleted after 30 days.
Performance optimization
<my_bucket>/2013_11_13-164533125.jpg
<my_bucket>/2013_11_13-164533126.jpg
<my_bucket>/2013_11_13-164533127.jpg
<my_bucket>/2013_11_13-164533128.jpg
<my_bucket>/2013_11_12-164533129.jpg
<my_bucket>/2013_11_12-164533130.jpg
<my_bucket>/2013_11_12-164533131.jpg
<my_bucket>/2013_11_12-164533132.jpg
<my_bucket>/2013_11_11-164533133.jpg
<my_bucket>/2013_11_11-164533134.jpg
<my_bucket>/2013_11_11-164533135.jpg
<my_bucket>/2013_11_11-164533136.jpg
Use a key-naming scheme with randomness at the beginning for high TPS
•  Most important if you regularly exceed 100 TPS on a bucket
•  Avoid starting with a date
•  Avoid starting with sequential numbers
Don’t do this…
Distributing key names
Distributing key names
…because this is going to happen
1 2 N
1 2 N
Partition Partition Partition Partition
Distributing key names
Add randomness to the beginning of the key name…
<my_bucket>/521335461-2013_11_13.jpg
<my_bucket>/465330151-2013_11_13.jpg
<my_bucket>/987331160-2013_11_13.jpg
<my_bucket>/465765461-2013_11_13.jpg
<my_bucket>/125631151-2013_11_13.jpg
<my_bucket>/934563160-2013_11_13.jpg
<my_bucket>/532132341-2013_11_13.jpg
<my_bucket>/565437681-2013_11_13.jpg
<my_bucket>/234567460-2013_11_13.jpg
<my_bucket>/456767561-2013_11_13.jpg
<my_bucket>/345565651-2013_11_13.jpg
<my_bucket>/431345660-2013_11_13.jpg
Other ideas
•  Store objects as a hash of their
name and add the original name
as metadata
“deadbeef_mix.mp3” à
0aa316fb000eae52921aab1b4697424958a53ad9
•  Reverse key name to break
sequences
Distributing key names
…so your transactions can be distributed across the partitions
1 2 N
1 2 N
Partition Partition Partition Partition
Parallelizing PUTs with multipart uploads
•  Increase aggregate throughput by
parallelizing PUTs on high-bandwidth
networks
•  Move the bottleneck to the network
where it belongs
•  Increase resiliency to network errors;
fewer large restarts on error-prone
networks
https://aws.amazon.com/fr/premiumsupport/knowledge-center/s3-multipart-upload-cli/
Choose the right part size
•  Maximum number of parts: 10,000
•  Part size: from 5MB to 5GB
•  Strike a balance between part size
and number of parts
•  Too many small parts
à connection overhead
(TCP handshake & slow start)
•  Too few large parts
à not enough benefits of multipart
Incomplete multipart upload expiration policy
•  Multipart upload feature improves PUT
performance
•  Partial upload does not appear in bucket list
•  Partial upload does incur storage charges
•  Set a lifecycle policy to automatically expire
incomplete multipart uploads after a
predefined number of days
Incomplete multipart
upload expiration
Incomplete multipart uploads will
expire seven days after initiation
"Rules": [
{
…
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
]
}
Lifecycle policy to expire multipart uploads
Parallelize your GETs
•  Use Amazon CloudFront to offload Amazon S3
and benefit from range-based GETs
•  Use range-based GETs to get multithreaded
performance when downloading objects
•  Compensates for unreliable networks
•  Benefits of multithreaded parallelism
•  Align your ranges with your parts!
CloudFront EC2
S3
Parallelizing LIST
•  Parallelize LIST when you need a
sequential list of your keys
•  Secondary index to get a faster
alternative to LIST
•  Sorting by metadata
•  Search ability
•  Objects by timestamp
“Building and Maintaining an Amazon S3 Metadata Index without Servers”
AWS blog post by Mike Deck on using Amazon DynamoDB and AWS Lambda
Amazon S3
Transfer Acceleration
Amazon S3 Transfer Acceleration
•  Designed for long distance transfers
•  Send data to Amazon S3 using the 54 AWS Edge Locations
•  Up to 6 times faster thanks to the internal AWS network
•  No change required (software, firewalls, etc.)
•  Must be explicitely set by customers, on a per-bucket basis
•  Pay according to volume : from $0.04 / GB
•  You’re only charged if transfer is faster than using Amazon S3
endpoints
Amazon S3 Transfer Acceleration
{
"Status": "Enabled"
}
aws s3api put-bucket-accelerate-configuration

--bucket BUCKET_NAME
--accelerate-configuration 

file://ACCELERATE_JSON_FILE
AWS Snowball
•  New version: 80 Terabytes (+60%)
•  Available in Europe (eu-west-1)
•  All regions available by the end of 2016
•  $250 per operation
•  25 Snowballs à 2 Petabytes in a week for $6250
Recap
•  Case study: Xstream A/S
•  Amazon S3 Standard-Infrequent Access
•  Amazon S3 Lifecycle Policies
•  Amazon S3 Versioning
•  Amazon S3 Performance & Transfer Acceleration
Thank You!

More Related Content

PDF
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
PDF
Deep Dive on Amazon RDS (May 2016)
PDF
Scale, baby, scale!
PDF
Stream Processing in SmartNews #jawsdays
PPT
Amazon S3 and EC2
PDF
Influencer marketing: Buying and Selling Audience Impressions
PDF
Simplify Big Data with AWS
PDF
AWS Big Data combo
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Deep Dive on Amazon RDS (May 2016)
Scale, baby, scale!
Stream Processing in SmartNews #jawsdays
Amazon S3 and EC2
Influencer marketing: Buying and Selling Audience Impressions
Simplify Big Data with AWS
AWS Big Data combo

Viewers also liked (8)

PDF
Scale, baby, scale! (June 2016)
PDF
Machine Learning for everyone
PDF
A 60-minute tour of AWS Compute (November 2016)
PDF
Hands-on with AWS IoT (November 2016)
PDF
AWS Machine Learning Workshp
PDF
Hands-on with AWS IoT
PPTX
Intro to AWS Machine Learning
PDF
Deep Dive AWS CloudTrail
Scale, baby, scale! (June 2016)
Machine Learning for everyone
A 60-minute tour of AWS Compute (November 2016)
Hands-on with AWS IoT (November 2016)
AWS Machine Learning Workshp
Hands-on with AWS IoT
Intro to AWS Machine Learning
Deep Dive AWS CloudTrail
Ad

Similar to Deep Dive on Amazon S3 (May 2016) (20)

PDF
Builders' Day - Best Practises for S3 - BL
PPTX
AWS Storage services
PPTX
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
PPTX
Aws object storage and cdn(s3, glacier and cloud front) part 1
PPTX
AWS Simple Storage Service (s3)
PPTX
2016 Utah Cloud Summit: AWS S3
PPTX
Efficient and Secure Data Management with Cloud Storage
PPTX
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
PDF
AWS S3 Cost Optimization
PDF
Getting started with S3
PPTX
AWS Amazon S3 Mastery Bootcamp
PPTX
ABCs of AWS: S3
PPTX
AWS S3 masterclass
PPTX
Deep Dive on Amazon S3
PDF
Cloud Lesson_04_Amazon_Storage_Services.pdf
PPSX
Amazon ec2 s3 dynamo db
PDF
Amazon S3 Masterclass
PDF
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
PPTX
amazon web servics in the cloud aws and its categories compute cloud and stor...
PDF
AWS simple storage service
Builders' Day - Best Practises for S3 - BL
AWS Storage services
AWS S3 | Tutorial For Beginners | AWS S3 Bucket Tutorial | AWS Tutorial For B...
Aws object storage and cdn(s3, glacier and cloud front) part 1
AWS Simple Storage Service (s3)
2016 Utah Cloud Summit: AWS S3
Efficient and Secure Data Management with Cloud Storage
Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
AWS S3 Cost Optimization
Getting started with S3
AWS Amazon S3 Mastery Bootcamp
ABCs of AWS: S3
AWS S3 masterclass
Deep Dive on Amazon S3
Cloud Lesson_04_Amazon_Storage_Services.pdf
Amazon ec2 s3 dynamo db
Amazon S3 Masterclass
AWS Study Group - Chapter 09 - Storage Option [Solution Architect Associate G...
amazon web servics in the cloud aws and its categories compute cloud and stor...
AWS simple storage service
Ad

More from Julien SIMON (20)

PDF
Implementing high-quality and cost-effiient AI applications with small langua...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
PDF
Arcee AI - building and working with small language models (06/25)
PDF
deep_dive_multihead_latent_attention.pdf
PDF
Deep Dive: Model Distillation with DistillKit
PDF
Deep Dive: Parameter-Efficient Model Adaptation with LoRA and Spectrum
PDF
Building High-Quality Domain-Specific Models with Mergekit
PDF
Tailoring Small Language Models for Enterprise Use Cases
PDF
Tailoring Small Language Models for Enterprise Use Cases
PDF
Julien Simon - Deep Dive: Compiling Deep Learning Models
PDF
Tailoring Small Language Models for Enterprise Use Cases
PDF
Julien Simon - Deep Dive - Optimizing LLM Inference
PDF
Julien Simon - Deep Dive - Accelerating Models with Better Attention Layers
PDF
Julien Simon - Deep Dive - Quantizing LLMs
PDF
Julien Simon - Deep Dive - Model Merging
PDF
An introduction to computer vision with Hugging Face
PDF
Reinventing Deep Learning
 with Hugging Face Transformers
PDF
Building NLP applications with Transformers
PPTX
Building Machine Learning Models Automatically (June 2020)
Implementing high-quality and cost-effiient AI applications with small langua...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Trying to figure out MCP by actually building an app from scratch with open s...
Arcee AI - building and working with small language models (06/25)
deep_dive_multihead_latent_attention.pdf
Deep Dive: Model Distillation with DistillKit
Deep Dive: Parameter-Efficient Model Adaptation with LoRA and Spectrum
Building High-Quality Domain-Specific Models with Mergekit
Tailoring Small Language Models for Enterprise Use Cases
Tailoring Small Language Models for Enterprise Use Cases
Julien Simon - Deep Dive: Compiling Deep Learning Models
Tailoring Small Language Models for Enterprise Use Cases
Julien Simon - Deep Dive - Optimizing LLM Inference
Julien Simon - Deep Dive - Accelerating Models with Better Attention Layers
Julien Simon - Deep Dive - Quantizing LLMs
Julien Simon - Deep Dive - Model Merging
An introduction to computer vision with Hugging Face
Reinventing Deep Learning
 with Hugging Face Transformers
Building NLP applications with Transformers
Building Machine Learning Models Automatically (June 2020)

Recently uploaded (20)

DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
CloudStack 4.21: First Look Webinar slides
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
Internet of Everything -Basic concepts details
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Comparative analysis of machine learning models for fake news detection in so...
PPTX
Configure Apache Mutual Authentication
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PPT
Geologic Time for studying geology for geologist
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPTX
Build Your First AI Agent with UiPath.pptx
Basics of Cloud Computing - Cloud Ecosystem
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
CloudStack 4.21: First Look Webinar slides
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
sbt 2.0: go big (Scala Days 2025 edition)
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
sustainability-14-14877-v2.pddhzftheheeeee
NewMind AI Weekly Chronicles – August ’25 Week III
Internet of Everything -Basic concepts details
Credit Without Borders: AI and Financial Inclusion in Bangladesh
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Comparative analysis of machine learning models for fake news detection in so...
Configure Apache Mutual Authentication
Early detection and classification of bone marrow changes in lumbar vertebrae...
Geologic Time for studying geology for geologist
Convolutional neural network based encoder-decoder for efficient real-time ob...
Build Your First AI Agent with UiPath.pptx

Deep Dive on Amazon S3 (May 2016)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deep Dive on Amazon S3 Julien Simon, Principal Technical Evangelist, AWS [email protected] - @julsimon Loke Dupont, Head of Services, Xstream A/S [email protected]
  • 2. Agenda •  Introduction •  Case study: Xstream A/S •  Amazon S3 Standard-Infrequent Access •  Amazon S3 Lifecycle Policies •  Amazon S3 Versioning •  Amazon S3 Performance & Transfer Acceleration
  • 4. S3: our customer promise Durable 99.999999999% Available Designed for 99.99% Scalable Gigabytes à Exabytes
  • 5. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Amazon S3 for video ingestion Loke Dupont, Head of Services, Xstream A/S [email protected]
  • 6. What does Xstream do? Xstream is an online video platform provider. We sell OVP’s to broadcasters, ISP’s, cable companies etc. What we are trying to provide is a ”white-label Netflix” that our customers can use to provide video services to end users. As part of that delivery, we ingest large amounts of video.
  • 8. Challenges of premium video ingestion •  Very large files (upwards of several hundred GB) •  Content security is extremely important •  Content integrity is very important (no video corruption) •  Content often arrive in batches of 1000+ videos •  Content needs to be available to all ingest processes
  • 9. Ingest workflow Decrypt Transcode Packaging DRM Upload
  • 10. Ingest architecture Ingest DatabaseWorkflow ManagerIngest API Queue Workers WorkersWorkers Workers Workers •  Amazon RDS MySQL instance for data •  Running 100% on Amazon EC2 instances •  Planning to replace EC2 with AWS Lambda and Amazon SQS
  • 11. How does Amazon S3 help?
  • 12. Amazon S3 real world usage In April, in just one region, we had: •  300 TB/month of short term storage in S3 •  62 million PUT/COPY/POST/LIST requests •  55 million GET requests In the same region we had 848 TB of Amazon Glacier long term archive storage
  • 13. Previous workflow vs. Amazon S3 Previous workflow •  Large files moved between machines •  Managing access had to be done pr. machine •  Disk space had to be managed carefully •  Encryption at rest was tricky •  Constant file integrity checks Amazon S3 •  Files always accessible on Amazon S3 •  Managing access for bucket using policy and Amazon IAM •  Running our of space, practically impossible •  Encryption is easy •  S3 checks integrity for us, using checksums
  • 14. What else do we get for free? Versioning, which allows us to retrieve deleted and modified objects. Easy Amazon Glacier integration for long term content archiving of “mezzanine” assets. Alternatively Amazon S3-IA could be used. Event notifications using Amazon SNS, Amazon SQS and AWS Lambda
  • 15. Demo Amazon S3 events & AWS Lambda Sample code: http://cloudvideo.link/lambda.zip
  • 16. Lesser known Amazon S3 features – Bucket tagging Bucket tagging is a great feature for cost allocation. Assign custom tags to your bucket and they can be used to separate cost pr. customer or pr. project.
  • 17. Getting cost with tags Setup Cost allocation tags in preferences. Using the AWS Cost Explorer to create a new report. Select filter by “tags” and select the tag you want to filter by.
  • 18. Lesser known Amazon S3 features – Lifecycle Use the lifecycle feature to automatically transition to the Amazon S3 Infrequent Access storage class, or even to Amazon Glacier. Be careful about retrieval costs especially when using Amazon Glacier backed storage.
  • 20. Lessons learned from Amazon Glacier Verify archive creation before deleting data. Retrieval is priced by “peak rate” – spread it out Retrieval has several hours latency
  • 21. AWS Storage cost comparison
  • 22. Things we wish we new earlier •  Don’t use Amazon S3 filesystem wrappers •  Use Amazon IAM roles whenever possible •  If there is an AWS service for it, use that! •  Auto scaling, auto scaling, auto scaling
  • 23. Amazon S3 Standard-IA Expired object delete marker Incomplete multipart upload expiration Lifecycle policy & Versioning Object naming Multipart operations Transfer Acceleration Continuous Innovation for Amazon S3 Performance 16/3 16/3 19/4 September 2015
  • 25. Choice of storage classes on Amazon S3 Standard Active data Archive dataInfrequently accessed data Standard - Infrequent Access Amazon Glacier
  • 26. 11 9s of durability Standard-Infrequent Access storage Designed for 99.9% availability Durable Available Same throughput as Amazon S3 Standard storage High performance •  Server-side encryption •  Use your encryption keys •  KMS-managed encryption keys Secure •  Lifecycle management •  Versioning •  Event notifications •  Metrics Integrated •  No impact on user experience •  Simple REST API •  Single bucket Easy to use
  • 28. Lifecycle policies •  Automatic tiering and cost controls •  Includes two possible actions: •  Transition: archives to Standard-IA or Amazon Glacier after specified time •  Expiration: deletes objects after specified time •  Allows for actions to be combined •  Set policies at the prefix level aws s3api put-bucket-lifecycle-configuration 
 --bucket BUCKET_NAME --lifecycle-configuration file://LIFECYCLE_JSON_FILE
  • 29. Standard Storage -> Standard-IA "Rules": [ { "Status": "Enabled", "Prefix": ”old_files", "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA" }, { "Days": 365, "StorageClass": "GLACIER" } ], "ID": ”lifecycle_rule", } ] Standard à Standard-IA
  • 30. "Rules": [ { "Status": "Enabled", "Prefix": ”old_files", "Transitions": [ { "Days": 30, "StorageClass": "STANDARD_IA" }, { "Days": 365, "StorageClass": "GLACIER" } ], "ID": ”lifecycle_rule", } ] Standard-IA -> Amazon Glacier Standard-IA à Amazon Glacier Standard Storage -> Standard-IA
  • 31. Versioning S3 buckets •  Protects from accidental overwrites and deletes •  New version with every upload •  Easy retrieval and rollback of deleted objects •  Three states of an Amazon S3 bucket •  No versioning (default) •  Versioning enabled •  Versioning suspended { "Status": "Enabled", "MFADelete": "Disabled" } aws s3api put-bucket-versioning 
 --bucket BUCKET_NAME --versioning-configuration 
 file://VERSIONING_JSON_FILE
  • 32. Restricting deletes •  For additional security, enable MFA (multi-factor authentication) in order to require additional authentication to: •  Change the versioning state of your bucket •  Permanently delete an object version •  MFA delete requires both your security credentials and a code from an approved authentication device
  • 33. "Rules": [ { … "Expiration": { "Days": 60 }, "NoncurrentVersionExpiration": { "NoncurrentDays": 30 } ] } Lifecycle policy to expire versioned objects Current version will expire after 60 days. Older versions will be permanently deleted after 30 days.
  • 34. Delete markers •  Deleting a versioned object puts a delete marker on the current version of the object •  No storage charge for delete marker •  No need to keep delete markers when all versions have expired (they slow down LIST operations) •  Use a lifecycle policy to automatically remove the delete marker when previous versions of the object no longer exist
  • 35. "Rules": [ { … "Expiration": { "Days": 60, "ExpiredObjectDeleteMarker" : true }, "NoncurrentVersionExpiration": { "NoncurrentDays": 30 } ] } Lifecycle policy to expire delete markers Current version will expire after 60 days. A delete marker will be placed and expire after 60 days. Older versions will be permanently deleted after 30 days.
  • 38. Distributing key names …because this is going to happen 1 2 N 1 2 N Partition Partition Partition Partition
  • 39. Distributing key names Add randomness to the beginning of the key name… <my_bucket>/521335461-2013_11_13.jpg <my_bucket>/465330151-2013_11_13.jpg <my_bucket>/987331160-2013_11_13.jpg <my_bucket>/465765461-2013_11_13.jpg <my_bucket>/125631151-2013_11_13.jpg <my_bucket>/934563160-2013_11_13.jpg <my_bucket>/532132341-2013_11_13.jpg <my_bucket>/565437681-2013_11_13.jpg <my_bucket>/234567460-2013_11_13.jpg <my_bucket>/456767561-2013_11_13.jpg <my_bucket>/345565651-2013_11_13.jpg <my_bucket>/431345660-2013_11_13.jpg Other ideas •  Store objects as a hash of their name and add the original name as metadata “deadbeef_mix.mp3” à 0aa316fb000eae52921aab1b4697424958a53ad9 •  Reverse key name to break sequences
  • 40. Distributing key names …so your transactions can be distributed across the partitions 1 2 N 1 2 N Partition Partition Partition Partition
  • 41. Parallelizing PUTs with multipart uploads •  Increase aggregate throughput by parallelizing PUTs on high-bandwidth networks •  Move the bottleneck to the network where it belongs •  Increase resiliency to network errors; fewer large restarts on error-prone networks https://aws.amazon.com/fr/premiumsupport/knowledge-center/s3-multipart-upload-cli/
  • 42. Choose the right part size •  Maximum number of parts: 10,000 •  Part size: from 5MB to 5GB •  Strike a balance between part size and number of parts •  Too many small parts à connection overhead (TCP handshake & slow start) •  Too few large parts à not enough benefits of multipart
  • 43. Incomplete multipart upload expiration policy •  Multipart upload feature improves PUT performance •  Partial upload does not appear in bucket list •  Partial upload does incur storage charges •  Set a lifecycle policy to automatically expire incomplete multipart uploads after a predefined number of days Incomplete multipart upload expiration
  • 44. Incomplete multipart uploads will expire seven days after initiation "Rules": [ { … "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 } ] } Lifecycle policy to expire multipart uploads
  • 45. Parallelize your GETs •  Use Amazon CloudFront to offload Amazon S3 and benefit from range-based GETs •  Use range-based GETs to get multithreaded performance when downloading objects •  Compensates for unreliable networks •  Benefits of multithreaded parallelism •  Align your ranges with your parts! CloudFront EC2 S3
  • 46. Parallelizing LIST •  Parallelize LIST when you need a sequential list of your keys •  Secondary index to get a faster alternative to LIST •  Sorting by metadata •  Search ability •  Objects by timestamp “Building and Maintaining an Amazon S3 Metadata Index without Servers” AWS blog post by Mike Deck on using Amazon DynamoDB and AWS Lambda
  • 48. Amazon S3 Transfer Acceleration •  Designed for long distance transfers •  Send data to Amazon S3 using the 54 AWS Edge Locations •  Up to 6 times faster thanks to the internal AWS network •  No change required (software, firewalls, etc.) •  Must be explicitely set by customers, on a per-bucket basis •  Pay according to volume : from $0.04 / GB •  You’re only charged if transfer is faster than using Amazon S3 endpoints
  • 49. Amazon S3 Transfer Acceleration { "Status": "Enabled" } aws s3api put-bucket-accelerate-configuration
 --bucket BUCKET_NAME --accelerate-configuration 
 file://ACCELERATE_JSON_FILE
  • 50. AWS Snowball •  New version: 80 Terabytes (+60%) •  Available in Europe (eu-west-1) •  All regions available by the end of 2016 •  $250 per operation •  25 Snowballs à 2 Petabytes in a week for $6250
  • 51. Recap •  Case study: Xstream A/S •  Amazon S3 Standard-Infrequent Access •  Amazon S3 Lifecycle Policies •  Amazon S3 Versioning •  Amazon S3 Performance & Transfer Acceleration