S3 101

Massive topic for exam

What is S3

Simple Storage Service
Provides developers and IT teams with secure, durable, highly-scalable object storage
- Files, images, web pages
- Not for operating systems or databases
Easy to use with a simple web services interface to store and retrieve any amount of data from anywhere on the web
Safe place to store files
Object based storage only. Not block storage
Spread across multiple devices and facilities
- High availability and disaster recovery
- Amazon could lose a device or facility and the service is still available

S3 - the basics

Object based - allows you to upload object files
Files can be from 0 bytes to 5 TB files
Unlimited storage
- Don't have to worry about allocating space or predicting space
Files are stored in Buckets (similar to a folder)
- Name of bucket is user defined
S3 is a universal namespace
- Names must be unique globally
https://s3-eu-west-1.amazonaws.com/acloudguru
- Internet address, so it must be unique globally
- Name of bucket is on the end
When you upload a file to S3, you will receive an HTTP 200 code if the upload was successful
- Won’t see the code using the console
- Only seen using the API or CLI

Data consistency Model for S3

Read after Write consistency for PUTS of new objects
- PUTS means initially uploading an object for the first time
- As soon as you add your object into S3, the file is available to read
- Access as soon as it is uploaded
Eventual Consistency for overwrite PUTS and DELETES (can take some time to propagate)
Can read right away when uploading a new file, but making changes to it can take a while to propagate

S3 is a simple key-value store

S3 is object based. Objects consist of the following:
- Key (name of object)
- Value (data, made up of a sequence of bytes)
- Version ID (important for versioning)
  - Good for version control in S3
- MetaData (data about data you are storing)
  - Can add your own data to metadata
  - Name of team that owns the file
- Subresources - bucket specific configuration
  - Bucket policies, access control lists
  - Cross origin resource sharing (CORS)
  - Transfer acceleration
    - Accelerate file transfer speeds when uploading lots of files into S3

S3 - the basics

Built for 99.99% availability for the S3 platform
- Amount of time the service is actually available
- Available, service is there and accessible 99.99% of the time
- Amazon actually guarantees 99.9% availability
- Amazon guarantees 99.99999999999% durability for S3 information (11 x 9s)
  - 11 9s durability for s3
  - Amount of data you can expect to lose over a given year
  - We want this as close to 100% as possible so we don’t lose our files
  - Store 10 million objects in S3
  - Incur loss of a single object once every 10,000 years
  - Always have a backup of data
    - Enable version control
      - Keep multiple versions of the same file
      - Rollback if we lost the file
    - Prevent certain users from deleting
    - Replicate data
- Tiered storage available
- Lifecycle management
  - Set rules around moving data between storage teirs
- Versioning
  - Version control
- Encryption
- Secure your data
  - Access control lists and bucket policies

S3 storage tiers/classes

S3: 99.99% availability, 99.99999999999% durability, stored redundantly across multiple devices in multiple facilities and is designed to sustain the loss of 2 facilities concurrently
S3 IA (infrequently accessed): accessed less frequently, but requires rapid access when needed
- May be data you need to access once a year
- Lower fee than S3, but you are charged a retrieval fee every time it is retrieved
S3 One Zone IA: Same as IA, but the data is stored in a single availability Zone only
- Still 99.99999999999% durability, but only 99.5% availability.
- Cost is 20% less than regular S3 - IA
- If one AZ goes down, lose access to data until it comes back online
Reduced redundancy storage: designed to provide 99.99% durability and 99.55% availability of objects over a given year
- Used for data that can be recreated if lost, e.g. thumbnails
- Not used for data that cannot be lost or mission critical data
- Amazon recommends that you do not use this storage class
  - Use standard S3
  - Adjusted standard cost so it is more cost effective than using reduced redundancy
Glacier - very cheap, but used for archival
- Optimised for data that is very infrequently accessed
- Takes 3-5 hours to restore from glacier

S3 - Intelligent Tiering

Announced at reinvent 2018
Unknown or unpredictable access pattern
2 Tiers
- Frequent and infrequent access
Automatically moves your data to the most cost-effective tier based on how frequently you access each object
If an object is not accessed for 30 consecutive days, it is moved to infrequently accessed tier
When an object is accessed, it is moved to frequently accessed
99.99999999999% durability
99.9 availability over a given year
Optimizes cost
No fees for accessing data but a small monthly fee for monitoring/automation
- $0.0025 per 1,000 objects

S3 charges

Charged for:
- Amount of storage per GB (gigabytes)
- Requests (Get, Put, Copy, etc)
- Storage management pricing
  - Inventory, analytics, and object tags
    - Tags are used to know which team or department is using the storage
- Data management pricing
  - Data transferred out of S3
  - Free to transfer into S3
  - Downloading a file from S3 is a charge
- Transfer acceleration
  - Use of CloudFront to optimize transfers

Exam tips for S3

Object based
- Allows you to upload files
- Object storage only
Not suitable to install an operating system or running a database on
Files can be from 0 Bytes to 5 TB
Unlimited Storage
Files are stored in Buckets (folders)
S3 is a universal namespace
- Bucket names must be unique globally
Read after Write consistency for PUTS of new objects
Eventual Consistency for overwrite PUTS and DELETES
- Can take some time to propagate
Storage tiers
- S3
  - Durable
  - Immediately available
  - Frequently accessed
- S3 - IA
  - Durable
  - Immediately available
  - Infrequently accessed
- S3 - One Zone IA
  - Same as IA, HOWEVER
  - Data is stored in ONE Availability Zone only
- S3 - Reduced Redundancy Storage
  - Data that is easily reproducible
  - Thumbnails, etc
- Glacier
  - Archived data
  - 3-5 hour wait before accessing
Fundamentals of an S3 object
- Key (name)
- Value (data)
- Version ID
- MetaData (Data about data)
  - User defined tags
- Subresources - bucket specific configuration
  - Bucket policies, access control lists
  - Cross Origin Resource Sharing (CORS)
  - Transfer acceleration
Successful uploads will generate an HTTP 200 status code
- From CLI or API only
Read the S3 FAQ
- https://aws.amazon.com/s3/faqs

Extras

S3 bucket names may only contain lower case letters, periods, numbers, and dashes. Bucket names must not be formatted as an IP address, and they may not begin with a period.
ReadObject is not an S3 API Call
- get familiar with S3 API Calls
By Default, accounts can have a maximum of 100 S3 Buckets
- To increase, contact AWS
S3-Standard provides 11-nines durability
S3-Standard provides 99.99% availability
S3-RRS provides 99.99% durability
S3-IA is 99.99% Available
Using IPv6 support for Amazon S3, applications can connect to Amazon S3 without needing any IPv6 to IPv4 translation software or systems.
Multi-part upload is recommended for objects over 100MB
- REQUIRED for objects over 5GB
x-amz-delete-marker, x-amz-id-2, and x-amz-request-id are all common S3 response headers.
- x-amz-set-delete-marker is not
- familirize yourself with the S3 API Reference document prior to the exam.
S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by your application, you can achieve drastic performance increases in many cases you can get as much as a 400% improvement.
Amazon S3 inventory is one of the tools Amazon S3 provides to help manage your storage. You can use it to audit and report on the replication and encryption status of your objects for business, compliance, and regulatory needs.
By using Amazon S3 analytics storage class analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature observes data access patterns to help you determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
If you get questions about policies or controlling access, make sure you read carefully
- Is the question about controlling access to a bucket?
- Is the question about controlling access to specific objects inside a bucket?

Amazon S3 Transfer Acceleration

Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and your Amazon S3 bucket. Transfer Accerlation leverages Amazon CloudFront's globally distributed AWS Edge Locations. As data arrives at an AWS Edge Location, data is routed to your Amazon S3 bucket over an optimized network path.

Amazon S3 Transfer Acceleration

Amazon S3 Transfer Acceleration Examples

Large File Uploads

If you are getting an Access denied error when trying to upload a large file to your S3 bucket with an upload request that includes an AWS KMS key, then you have to confirm that you have the permission to perform kms:Decrypt actions on the AWS KMS key that you're using to encrypt the object.

Take note that kms:Decrypt is only one of the actions that you must have permissions to when you upload or download an Amazon S3 object encrypted with an AWS KMS key. You must also have permissions to kms:Encrypt, KMS:ReEncrypt, kms:GenerateDataKey, and kms:DescribeKey actions.

The AWS CLI, AWS SDKs, and many third-party programs automatically perform a multipart upload when the file is large. To perform a multipart upload with encryption using an AWS KMS key, the requester must have permission to the kms:Decrypt action on the key. This permission is required because Amazon S3 must decrypt and read data from the encrypted file parts before it completes the multipart upload.

Each piece of the multipart upload has to be decrypted and pieced together.

AWS Support Issue

Cross Region Replication

Cross Region Replication (CRR) enables automatic, asynchronous copying of objects across buckets in different AWS Regions. Buckets configured for cross-region replication can be owned by the same AWS account or by different accounts. Cross-region replication is enabled with a bucket-level configuration. You add th replication configuration to your source bucket.

To enable the cross-region replication feature in S3, the following items should be met:

The source and destination buckets must have versioning enabled
The source and destination bukets must be in different AWS Regions
Amazon S3 must have permission to replicate objects from that source bucket to the destination bucket on your behalf

Replication

S3 Object Lock

S3 Object Lock enables you to store objects using a write-once-read-many (WORM) model. You can use it to prevent an object from being deleted or overwritten for a fixed amount of time or indefinitely, but it will not affect your cross-region replication configuring.