S3

S3 (Simple Storage Service)

[S3 FAQ — read this before going to the exam]

    • S3 is safe place to store files.
    • S3 is object based
    • Files can be 0 to 5TB
    • There is unlimited storage
    • Files are stored in Buckets
    • S3 buckets are stored in specific regions, however bucket names must be globally unique (universal namespace)
    • Not suitable to install OS or DB
    • You can turn on MFA delete (protects files from someone accessing them and deleting them)

An S3 object consists of:

    • Key – name of the object
    • Value – data made up of sequence of bytes
    • Version ID (used when versioning is turned on)
    • Metadata (data about the object such as date uploaded)
    • Subresources:
        • ACLs (Access Control Lists i.e. who can access the file)
        • Torrent

S3 data consistency

    • Read after write consistency of PUTs for new objects; (you can read an object immediately after upload.)
    • Eventual consistency for updates and DELETEs; (an object won’t immediately be readable, take time to propagate). If you try to access an object immediately after it’s been updated, you may get the old version. It takes a few seconds for an update or delete to propagate.

S3 Features:

    • Tiered storage 
    • Lifecycle management
    • Versioning
    • Encryption
    • MFA
    • Secure data using ACL and bucket policies
    • Static website hosting
    • Access logs – server access logging can be used to track requests for access to your bucket, and can be used for internal security and access audits

S3 Storage Classes

1- S3 standard:

    • 99.99% availability
    • 99.999999999% durability (11 9’s, you won’t lose a file due to S3 failure)
    • Designed to sustain the loss of 2 facilities concurrently

2- S3 IA (Infrequent Access):

    • 99% availability
    • 99.999999999% durability (11 9’s, you won’t lose a file due to S3 failure)
    • Best for situations where you want lower costs than standard S3, and a file doesn’t need to be always accessible, but it’s critical that the file is not lost.

3- S3 One Zone – IA :

  • Lower cost than IA but don’t require the multiple AZ data resilience.

4- S3- Intelligent Tiering

  • Designed to optimize costs by automatically moving to the most cost-effective access tier, without performance impact or operational overhead. 

5- S3 Glacier:

    • For data archiving
    • Retrieval times configurable from minutes to hours

6- S3 Glacier Deep Archive

  • Lowest cost storage class where a retrieval time of 12hrs is acceptable

S3 charges for:

    • The volume of data you have stored
    • The number of requests
    • Data transfer out (including to buckets in other zones/regions)
    • Transfer Acceleration (which uses the AWS CloudFront CDN for caching files at edge locations)
    • Cross Region Replication Pricing

S3 Security  & Encryption

    • By default all newly created buckets are Private
    • Bucket access control can be setup by using: [Bucket Policies & Access Control Lists]
    • S3 Buckets can be configured to create access logs which log all requests made to the S3 bucket.
    • Encryption  in transit is achieved by SSL/ TLS
    • Data at rest can be encrypted by:
  1. Server-side encryption

A- SSE-S3 – Amazon S3-Managed Keys where S3 manages the keys, encrypting each object with a unique key using AES-256, and even encrypts the key itself with a master key which regularly rotates.

B- SSE-KMS – AWS KMS-Managed Keys – Similar to SSE-S3, but with an option to provide an audit trail of when your key is used, and by whom, and also the option to create and manage keys yourself.

C- SSE-C – Customer-Provided Keys – Where you manage the encryption keys, and AWS manages encryption and decryption as it reads from and writes to disk.

   2. Client Side Encryption: user encrypts the object and uploads it to S3.

S3 Versioning

    • Stores all versions of an object (including all writes and even if you delete an object)
    • Great backup tool
    • Once enabled, versioning can’t be disabled only suspended
    • Integrates with Lifecycle rules
    • Versioning’s MFA Delete capability which uses MFA can be used to provide an additional layer of security.
    • On buckets which have had versioning enabled – versioning can only be disabled, but not removed. If you want to get rid of versioning, you’ll need to copy files to a new bucket which has never had versioning enabled, and update any references pointing to the old bucket to point to the new bucket instead.
    • If you enable versioning on an existing bucket, versioning will not be applied to existing objects; versioning will only apply to any new or updated objects.

S3 Lifecycle management (Lifecycle Policies)

    • Automates moving your objects between the different storage tiers.
    • Can be used in conjunction with versioning.
    • There is tiered storage available, and you can use lifecycle management to transition though the tiers. For example, there might be a requirement that invoices for the last 24 months are immediately available, and that older invoices don’t need to be immediately available, but must be stored for compliance reasons for 7 years. For this scenario, you may decide to keep the invoices younger than 24 months in S3 for immediate access, and use lifecycle management to move the invoices to Glacier (where storage is extremely cheap, with the tradeoff that it takes 3-5 hours to restore an object) for long term storage.
    • Lifecycle management also supports permanently deleting files after a configurable amount of time i.e. after the file has been migrated to Glacier.

S3 Cross-Region Replication

– Requires that versioning is enabled on both the source and destination buckets.

– Regions must be unique.

– Files in an existing bucket are not replicated automatically. All subsequent update files will be replicated automatically

– If a delete marker is put to the original bucket, the delete marker will not be replicated to the other bucket. This is done by Amazon to prevent from accidentally replicating deletes.

– deleting individual versions or delete markers will not be replicated

S3 Transfer Acceleration

CloudFront

    • Cloudfront is AWS’s CDN, which delivers content based on the user’s geographic location.
    • Can be used to deliver entire website, including dynamic,static,streaming and interactive content using a global network of edge locations.
    • Requests for content are automatically routed to the nearest edge location, so content is delivered with the best possible performance.
    • Two different types of distributions: [Web distribution (typically used for websites) and RTMP (for media streaming)]
    • Objects are cached for the life of the TTL (Time To Live)
    • Cached objects can be cleared but will be charged for that.

Origin:

  • This is the origin of all the files that the CDN will distribute. This can be either an S3 Bucket, an EC2 instance and Elastic Load Balance or Route53..

Edge Locations

    • Separate from and different to AWS AZ’s and Regions
    • Location where content will be cached.
    • Not just read only – we can write to them too.
    • Enabling writes means that customers can upload files to their local edge location, which can speed up data transfer for them

Setting up CloudFront Distributions

  • Distribution is collection of Edge locations
  • Go to CloudFront > Create Distribution > Copy domain name > Once the status of the distribution changes to Deployed > paste the domain name/file_path. File_path is the path of the file in the S3 bucket we want to access. This returns a faster result .
  • To edit distribution, click distribution ID from the list of distributions. For example. Click Invalidations > Create Invalidation > here we can invalidate files or entire folders.
  • Two types of distributions:a. Web distribution: used for websites
    b. RTMP: used for MEdia Streaming. (i.e. video), and only supported if the origin is S3 – other origins such as EC2, etc do not support RTMP

Snowball

    • Is a petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data into and out of AWS. Using snowball addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Transferring data with Snowball is simple, fast, secure and  can be little as one-fifth the cost of high-speed internet.

    • Snowball comes in either a 50TB or 80TB size. Snowball uses multiple layers of security designed to protect data including tamper-resistant enclosures, 256-bit encryption and an industry standard Trusted Platform Module (TPM) designed to ensure both security and full chain-of-custody of data. Once the data transfer job has been processed and verified, AWS performs a software erasure of the Snowball appliance.
    • Snowball can import to S3 or export from S3

Snowball Edge

    • AWS Snowball Edge is a 100TB data transfer device with on board storage and compute capabilities.
    • Can be used to move large amounts of data into and out of AWS, as a temporary storage tier for large local datasets or to support local workloads in remote or offline locations.
    • Connects to an existing application and infrastructure using standard storage interfaces, streamlining the data transfer process and minimizing setup and integration.
    • Can cluster together to form a local storage tier and process data on-premises, helping ensure apps continue to run even when they aren’t able to access the cloud.

Snowmobile

    • This is an exabyte-scale data transfer service used to move extremely large amounts of data to AWS.
    • Can transfer up to 100PM per snowmobile, a 45 foot long shipping container pulled by a semi-trailer truck.
    • Makes it easy to move massive volumes of data to the cloud, including video libraries, image repositories or even a complete data center migration.
    • Secure,fast and cost effective.

Storage Gateway

    • A Storage Gateway is a software appliance which sits in your data center, and securely connects your on-premises IT environment and AWS’s storage infrastructure.Storage gateways’s software appliance is available for download as a virtual machine (VM) image that you install on a host in your datacenter. Storage gateway supports either VMware ESXi or Hyper-V. Once you’ve installed your gateway and associated it with your AWS account through the activation process, you can use the AWS management console to create the storage gateway option that is right for you.

There are three types of Storage Gateway:

  • File Gateway 

– using NFS to store files in S3

–  files stored as objects

  • Volume Gateway – a virtual iSCSI disk. Block based, not object based like S3.
    • Cached volumes – the entire data set is stored on the cloud, with recently-read data on site, for quick retrieval of frequently accessed data. 1GB – 32 TB in size.
    • Stored volumes – entire dataset is stored on-premise with data being incrementally backed up to S3. 1GB-16GB in size for stored volumes.
  • Virtual Tape Gateway – virtual tapes, backed up to Glacier. It provides VTL interface and this lets you leverage your existing tape-based application infrastructure to store data on virtual tape cartridges that you create on your tape gateway.
  • Each tape gateway is configured with a media changer and tape drives, which are available to your existing client backup applications as iSCSI devices. You add tape cartridges as you need to archive your data,

All data transferred between Storage Gateway and S3 is encrypted using SSL. By default, all data stored in S3 is encrypted server-side with SSE-S3, so your data is automatically encrypted at rest.