S3 101
- Massive topic for exam
What is S3
- Simple Storage Service
- Provides developers and IT teams with secure, durable, highly-scalable object storage
- Files, images, web pages
- Not for operating systems or databases
- Easy to use with a simple web services interface to store and retrieve any amount of data from anywhere on the web
- Safe place to store files
- Object based storage only. Not block storage
- Spread across multiple devices and facilities
- High availability and disaster recovery
- Amazon could lose a device or facility and the service is still available
S3 - the basics
- Object based - allows you to upload object files
- Files can be from 0 bytes to 5 TB files
- Unlimited storage
- Don't have to worry about allocating space or predicting space
- Files are stored in Buckets (similar to a folder)
- Name of bucket is user defined
- S3 is a universal namespace
- Names must be unique globally
- https://s3-eu-west-1.amazonaws.com/acloudguru
- Internet address, so it must be unique globally
- Name of bucket is on the end
- When you upload a file to S3, you will receive an HTTP 200 code if the upload was successful
- Won’t see the code using the console
- Only seen using the API or CLI
Data consistency Model for S3
- Read after Write consistency for PUTS of new objects
- PUTS means initially uploading an object for the first time
- As soon as you add your object into S3, the file is available to read
- Access as soon as it is uploaded
- Eventual Consistency for overwrite PUTS and DELETES (can take some time to propagate)
- Can read right away when uploading a new file, but making changes to it can take a while to propagate
S3 is a simple key-value store
- S3 is object based. Objects consist of the following:
- Key (name of object)
- Value (data, made up of a sequence of bytes)
- Version ID (important for versioning)
- Good for version control in S3
- MetaData (data about data you are storing)
- Can add your own data to metadata
- Name of team that owns the file
- Subresources - bucket specific configuration
- Bucket policies, access control lists
- Cross origin resource sharing (CORS)
- Transfer acceleration
- Accelerate file transfer speeds when uploading lots of files into S3
S3 - the basics
- Built for 99.99% availability for the S3 platform
- Amount of time the service is actually available
- Available, service is there and accessible 99.99% of the time
- Amazon actually guarantees 99.9% availability
- Amazon guarantees 99.99999999999% durability for S3 information (11 x 9s)
- 11 9s durability for s3
- Amount of data you can expect to lose over a given year
- We want this as close to 100% as possible so we don’t lose our files
- Store 10 million objects in S3
- Incur loss of a single object once every 10,000 years
- Always have a backup of data
- Enable version control
- Keep multiple versions of the same file
- Rollback if we lost the file
- Prevent certain users from deleting
- Replicate data
- Enable version control
- Tiered storage available
- Lifecycle management
- Set rules around moving data between storage teirs
- Versioning
- Version control
- Encryption
- Secure your data
- Access control lists and bucket policies
S3 storage tiers/classes
- S3: 99.99% availability, 99.99999999999% durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently
- S3 IA (infrequently accessed): accessed less frequently, but requires rapid access when needed
- May be data you need to access once a year
- Lower fee than S3, but you are charged a retrieval fee every time it is retrieved
- S3 One Zone IA: Same as IA, but the data is stored in a single availability Zone only
- Still 99.99999999999% durability, but only 99.5% availability.
- Cost is 20% less than regular S3 - IA
- If one AZ goes down, lose access to data until it comes back online
- Reduced redundancy storage: designed to provide 99.99% durability and 99.55% availability of objects over a given year
- Used for data that can be recreated if lost, e.g. thumbnails
- Not used for data that cannot be lost or mission critical data
- Amazon recommends that you do not use this storage class
- Use standard S3
- Adjusted standard cost so it is more cost effective than using reduced redundancy
- Glacier - very cheap, but used for archival
- Optimised for data that is very infrequently accessed
- Takes 3-5 hours to restore from glacier
S3 - Intelligent Tiering
- Announced at reinvent 2018
- Unknown or unpredictable access pattern
- 2 Tiers
- Frequent and infrequent access
- Automatically moves your data to the most cost-effective tier based on how frequently you access each object
- If an object is not accessed for 30 consecutive days, it is moved to infrequently accessed tier
- When an object is accessed, it is moved to frequently accessed
- 99.99999999999% durability
- 99.9 availability over a given year
- Optimizes cost
- No fees for accessing data but a small monthly fee for monitoring/automation
- $0.0025 per 1,000 objects
S3 charges
- Charged for:
- Amount of storage per GB (gigabytes)
- Requests (Get, Put, Copy, etc)
- Storage management pricing
- Inventory, analytics, and object tags
- Tags are used to know which team or department is using the storage
- Inventory, analytics, and object tags
- Data management pricing
- Data transferred out of S3
- Free to transfer into S3
- Downloading a file from S3 is a charge
- Transfer acceleration
- Use of CloudFront to optimize transfers
Exam tips for S3
- Object based
- Allows you to upload files
- Object storage only
- Not suitable to install an operating system or running a database on
- Files can be from 0 Bytes to 5 TB
- Unlimited Storage
- Files are stored in Buckets (folders)
- S3 is a universal namespace
- Bucket names must be unique globally
- Read after Write consistency for PUTS of new objects
- Eventual Consistency for overwrite PUTS and DELETES
- Can take some time to propagate
- Storage tiers
- S3
- Durable
- Immediately available
- Frequently accessed
- S3 - IA
- Durable
- Immediately available
- Infrequently accessed
- S3 - One Zone IA
- Same as IA, HOWEVER
- Data is stored in ONE Availability Zone only
- S3 - Reduced Redundancy Storage
- Data that is easily reproducible
- Thumbnails, etc
- Glacier
- Archived data
- 3-5 hour wait before accessing
- S3
- Fundamentals of an S3 object
- Key (name)
- Value (data)
- Version ID
- MetaData (Data about data)
- User defined tags
- Subresources - bucket specific configuration
- Bucket policies, access control lists
- Cross Origin Resource Sharing (CORS)
- Transfer acceleration
- Successful uploads will generate an HTTP 200 status code
- From CLI or API only
- Read the S3 FAQ
- https://aws.amazon.com/s3/faqs
Extras
- S3 bucket names may only contain lower case letters, periods, numbers, and dashes. Bucket names must not be formatted as an IP address, and they may not begin with a period.
- ReadObject is not an S3 API Call
- get familiar with S3 API Calls
- By Default, accounts can have a maximum of 100 S3 Buckets
- To increase, contact AWS
- S3-Standard provides 11-nines durability
- S3-Standard provides 99.99% availability
- S3-RRS provides 99.99% durability
- S3-IA is 99.99% Available
- Using IPv6 support for Amazon S3, applications can connect to Amazon S3 without needing any IPv6 to IPv4 translation software or systems.
- Multi-part upload is recommended for objects over 100MB
- REQUIRED for objects over 5GB
- x-amz-delete-marker, x-amz-id-2, and x-amz-request-id are all common S3 response headers.
- x-amz-set-delete-marker is not
- familirize yourself with the S3 API Reference document prior to the exam.
- S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increases in many cases you can get as much as a 400% improvement.
- Amazon S3 inventory is one of the tools Amazon S3 provides to help manage your storage. You can use it to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs.
- By using Amazon S3 analytics storage class analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature observes data access patterns to help you determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
- If you get questions about policies or controlling access, make sure you read carefully
- Is the question about controlling access to a bucket?
- Is the question about controlling access to specific objects inside a bucket?
Amazon S3 Transfer Acceleration
Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and your Amazon S3 bucket. Transfer Accerlation leverages Amazon CloudFront's globally distributed AWS Edge Locations. As data arrives at an AWS Edge Location, data is routed to your Amazon S3 bucket over an optimized network path.
Amazon S3 Transfer Acceleration
Amazon S3 Transfer Acceleration Examples
Large File Uploads
If you are getting an Access denied error when trying to upload a large file to your S3 bucket with an upload request that includes an AWS KMS key, then you have to confirm that you have the permission to perform kms:Decrypt actions on the AWS KMS key that you're using to encrypt the object.
Take note that kms:Decrypt is only one of the actions that you must have permissions to when you upload or download an Amazon S3 object encrypted with an AWS KMS key. You must also have permissions to kms:Encrypt, KMS:ReEncrypt, kms:GenerateDataKey, and kms:DescribeKey actions.
The AWS CLI, AWS SDKs, and many third-party programs automatically perform a multipart upload when the file is large. To perform a multipart upload with encryption using an AWS KMS key, the requester must have permission to the kms:Decrypt action on the key. This permission is required because Amazon S3 must decrypt and read data from the encrypted file parts before it completes the multipart upload.
Each piece of the multipart upload has to be decrypted and pieced together.
Cross Region Replication
Cross Region Replication (CRR) enables automatic, asynchronous copying of objects across buckets in different AWS Regions. Buckets configured for cross-region replication can be owned by the same AWS account or by different accounts. Cross-region replication is enabled with a bucket-level configuration. You add th replication configuration to your source bucket.
To enable the cross-region replication feature in S3, the following items should be met:
- The source and destination buckets must have versioning enabled
- The source and destination bukets must be in different AWS Regions
- Amazon S3 must have permission to replicate objects from that source bucket to the destination bucket on your behalf
S3 Object Lock
S3 Object Lock enables you to store objects using a write-once-read-many (WORM) model. You can use it to prevent an object from being deleted or overwritten for a fixed amount of time or indefinitely, but it will not affect your cross-region replication configuring.