S3 Summary

  • S3 is Object-Based: i.e. allows you to upload files
  • Files can be from 0 Bytes to 5 TB
  • There is unlimited storage
  • Files are stored in Buckets
  • S3 is a universal namespace
  • Consistency Model
    • Read after WRite consistency for PUTS of new objects
    • Eventual Consistency for overwrite PUTS and DELETES (can take some time to propagate)
  • S3 storage Classes/Tiers
    • S3
      • Durable, immediately available, frequently accessed
    • S3-IA
      • Durable, immediately available, infrequently accessed
    • S3-One Zone IA
      • Same as IA, however data is stored in a single AZ only
    • S3 - Reduced Redundancy Storage
      • Data that is easily reproducible such as thumbnails, etc
    • Glacier
      • Archived data, where you can wait 3-5 hours before accessing
  • Core Fundamentals of an S3 Object:
    • Key (name)
    • Value (data)
    • Version ID
    • Metadata
    • Subresources (used to manage bucket-specific configuration)
      • Bucket Policies, ACLs
        • Control access to buckets and content
      • CORS
        • Cross Origin Resource Sharing
        • Allow one resource to access another resource
      • Transfer Acceleration
  • Object storage only (for files)
  • Not suitable to install an OS
  • Successful uploads will generate an HTTP 200 status code
  • By default, all newly created buckets are private
  • You can set up access control to your buckets using:
    • Bucket policies - applied at bucket level
    • ACLs - applied at an object level
  • S3 buckets can be configured to create access logs, which log all requests made to the S3 bucket
    • These logs can be written to another bucket
  • S3 Encryption
    • Encryption In-Transit
      • SSL/TLS
      • Any request made into S3 is encrypted in transit or on the network
      • Make request into the bucket using HTTPS
    • Encryption At Rest
      • Server Side Encryption
        • SSE-S3
          • Each object is encrypted using a unique key with strong multifactor encryption
          • Keys are managed within S3
          • Managed by AWS end to end
          • Advanced encryption standard 256 bit
        • SSE-KMS
          • Separate permissions for an envelope key
          • Protects data’s encryption key
          • Audit trail telling you when keys were used and by who
        • SSE-C
          • Customer provided key
          • You manage your own key
          • AWS encrypts and decrypts
      • Client Side Encryption
        • Encrypt it before you upload to S3
    • Remember that we can use a bucket policy to prevent unencrypted files from being uploaded by creating a policy which only allows requests which include the x-amz-server-side-encryption parameter in the request header
  • CORS-Summary
    • Cross Origin Resource Sharing (CORS)
      • Used to enable cross origin access for your AWS resources
      • E.g. S3 hosted website accessing javascript or image files located in another S3 bucket
      • By default resources in one bucket cannot access resources located in another
      • To allow this we need to configure CORS on the bucket being accessed and enable access for the origin (bucket) attempting to access
      • Always use the s3 website URL, not the regular bucket URL
  • CloudFront
    • Edge Location
      • Location where content will be cahced
      • Separate from aws region/AZ
    • Origin
      • Origin of all the files that the CDN will distribute
      • Origins can be an S3 bucket, EC2 instance, ELB or Route53
    • Distribution
      • Name given to the CDN
      • Consists of a collection of Edge Locations
      • Access using distribution domain name
      • Web Distribution - typically used for websites
      • RMTP - flash or media streaming
    • Edge locations are not just READ only, you can WRITE to them too (i.e. put)
    • Objects are cached for the life of the TTL (time to live)
    • You can clear cached objects, but you will be charged (invalidation)
  • Performance optimization
    • GET-intensive workloads - cloudfront
    • Mixed workloads
      • Avoid sequential key names for your S3 objects
      • Instead, add a random prefix like a hex hash to the key name to prevent multiple objects from being stored on the same partition
  • https://aws.amazon.com/s3/faqs/
  • Extras:
    • Multipart uploads can be stopped and resumed later
    • A multipart upload CAN be executed while the file is being created
    • AWS Recommends to Use Multipart Uploads for files larger than 100 MB
      • Required for files larger than 5 GB
    • Multipart uploads will not be reassembled until the CompleteMultipartUpload operation is called
    • ReadObject is not an S3 API Call
    • Access Denied HTTP code is 404
    • Minimum size is 0 bytes (empty or untouched files are permitted)