Storage Options on AWS - Born To Compile

Storage Types on AWS

AWS storage services are grouped into three different categories: block storage, file storage, and object storage.

File Storage

You place your files in a tree-like hierarchy that consists of folders and subfolders. Each file has metadata such as file name, file size, and the date the file was created. The file also has a direct path.

File storage is ideal when you require centralized access to files that need to be easily shared and managed by multiple host computers. Typically, this storage is mounted onto multiple hosts and requires file locking and integration with existing file system communication protocols.Common use cases for file storage include:

Large content repositories
Development environments
User home directories

Block Storage

Block storage splits files into fixed-size chunks of data called blocks that have their own addresses. Since each block is addressable, blocks can be retrieved efficiently. When data is requested, these addresses are used by the storage system to organize the blocks in the correct order to form a complete file to present back to the requestor. Outside of the address, there is no additional metadata associated with each block. So, when you want to change a character in a file, you just change the block, or the piece of the file, that contains the character. This ease of access is why block storage solutions are fast and use less bandwidth.
Since block storage is optimized for low-latency operations, it is a typical storage choice for high-performance enterprise workloads, such as databases or enterprise resource planning (ERP) systems, that require low-latency storage.

Object Storage

Objects, much like files, are also treated as a single unit of data when stored. However, unlike file storage, these objects are stored in a flat structure instead of a hierarchy. Each object is a file with a unique identifier. This identifier, along with any additional metadata, is bundled with the data and stored.

With object storage, you can store almost any type of data, and there is no limit to the number of objects stored, making it easy to scale. Object storage is generally useful when storing large data sets, unstructured files like media assets, and static assets, such as photos.

Storage available with EC2 instances

EC2 Instance Store

Amazon EC2 Instance Store provides temporary block-level storage for an EC2 instance. This storage is located on disks that are physically attached to the host computer. This ties the lifecycle of your data to the lifecycle of the EC2 instance. If you delete an instance, the instance store is deleted as well.
It’s ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content.

Elastic Block Storage (EBS)

Amazon EBS is a block-level storage device that you can attach to an Amazon EC2 instance. These storage devices are called Amazon EBS volumes. EBS volumes are essentially drives of a user-configured size attached to an EC2 instance, similar to an external drive to a laptop.

Most Amazon EBS volumes can only be connected with one computer at a time.
You can detach an EBS volume from one EC2 instance and attach it to another EC2 instance in the same Availability Zone, to access the data on it.
If the EC2 goes down, you still have your data on the EBS drive.
The EBS drive have a max limitation of how much content you can store on them.

You can scale Amazon EBS volumes in two ways:

Increase the volume size, as long as it doesn’t increase above the maximum size limit of 16 TB.
Attach multiple volumes to a single Amazon EC2 instance.

EBS Use Cases

Amazon EBS is useful when you need to retrieve data quickly and have data persist long-term. Volumes are commonly used in the following scenarios:

Operating systems: Boot/root volumes to store an operating system. The root device for an instance launched from an Amazon Machine Image (AMI) is typically an Amazon EBS volume. These are commonly referred to as EBS-backed AMIs.
Databases: A storage layer for databases running on Amazon EC2 that rely on transactional reads and writes.
Enterprise applications: Amazon EBS provides reliable block storage to run business-critical applications.
Throughput-intensive applications: Applications that perform long, continuous reads and writes.

EBS Volume Types

There are two main categories of Amazon EBS volumes: solid-state drives (SSDs) and hard-disk drives (HDDs). SSDs provide strong performance for random input/output (I/O), while HDDs provide strong performance for sequential I/O. AWS offers two types of each. The following chart can help you decide which EBS volume is the right option for your workload.

Benefits of Amazon EBS

High availability: When you create an EBS volume, it is automatically replicated within its Availability Zone to prevent data loss from single points of failure.
Data persistence: The storage persists even when your instance doesn’t.
Data encryption: All EBS volumes support encryption.
Flexibility: EBS volumes support on-the-fly changes. You can modify volume type, volume size, and input/output operations per second (IOPS) capacity without stopping your instance.
Backups: Amazon EBS provides you the ability to create backups of any EBS volume.

EBS Snapshots

EBS snapshots are incremental backups that only save the blocks on the volume that have changed after your most recent snapshot.

When you take a snapshot of any of your EBS volumes, these backups are stored redundantly in multiple Availability Zones using Amazon S3. This aspect of storing the backup in Amazon S3 will be handled by AWS, so you won’t need to interact with Amazon S3 to work with your EBS snapshots. You simply manage them in the EBS console (which is part of the EC2 console).

EBS snapshots can be used to create multiple new volumes, whether they’re in the same Availability Zone or a different one. When you create a new volume from a snapshot, it’s an exact copy of the original volume at the time the snapshot was taken.

Amazon S3

Amazon S3 is a standalone object storage service. It stores data in a flat structure, using unique identifiers to look up objects when requested. An object is simply a file combined with metadata and that you can store as many of these objects as you’d like. Objects are stored in containers called buckets. A bucket has name which must be unique across all AWS accounts, and a region that determines where it resides.
Amazon S3 is a standalone object storage service. It stores data in a flat structure, using unique identifiers to look up objects when requested. An object is simply a file combined with metadata and that you can store as many of these objects as you’d like. Objects are stored in containers called buckets. A bucket has name which must be unique across all AWS accounts, and a region that determines where it resides.

You can organize objects in a bucket with folders. However, there’s no actual file hierarchy on the back end. It is instead a flat structure where all files and folders live at the same level.

S3 Use Cases

Amazon S3 is one of the most widely used storage services. The following list summarizes some of the most common ways you can use Amazon S3.

Backup and storage: S3 is a natural place to back up files because it is highly redundant.
Media hosting: Because you can store unlimited objects, and each individual object can be up to 5 TBs, S3 is an ideal location to host video, photo, or music uploads.
Software delivery: You can use S3 to host your software applications that customers can download.
Data lakes: S3 is an optimal foundation for a data lake because of its virtually unlimited scalability.
Static websites: You can configure your bucket to host a static website of HTML, CSS, and client-side scripts.
Static content: Because of the limitless scaling, the support for large files, and the fact that you access any object over the web at any time, S3 is the perfect place to store static content.

Access Management for S3

Everything in Amazon S3 is private by default. This means that all S3 resources, such as buckets, folders, and objects can only be viewed by the user or AWS account that created that resource. Amazon S3 resources are all private and protected to begin with.

Typically, you want your objects to be accessible and also want to be more granular about the way you provide access to your resources. To be more specific about who can do what with your S3 resources, Amazon S3 provides two main access management features: IAM policies and S3 bucket policies.

IAM Policies for S3

You should use IAM policies for private buckets when:

You have many buckets with different permission requirements. Instead of defining many different S3 bucket policies, you can use IAM policies instead.
You want all policies to be in a centralized location. Using IAM policies allows you to manage all policy information in one location.

S3 Bucket policies

S3 bucket policies are similar to IAM policies, in that they are both defined using the same policy language in a JSON format. The difference is IAM policies are attached to users, groups, and roles, whereas S3 bucket policies are only attached to buckets. S3 bucket policies specify what actions are allowed or denied on the bucket.

For example, if you have a bucket called employeebucket, you can attach an S3 bucket policy to it that allows anonymous viewers to read the objects in that bucket:

{
     "Version":"2012-10-17",
     "Statement":&#91;
          {
               "Sid":"PublicRead",
               "Effect":"Allow",
               "Principal": "*",
               "Action":&#91;"s3:GetObject"],
               "Resource":&#91;"arn:aws:s3:::employeebucket/*"]
          }
     ]
}

{

"Version":"2012-10-17",

"Statement":[

{

"Sid":"PublicRead",

"Effect":"Allow",

"Principal": "*",

"Action":["s3:GetObject"],

"Resource":["arn:aws:s3:::employeebucket/*"]

}

]

}

S3 Bucket policies can only be placed on buckets, and cannot be used for folders or objects. However, the policy that is placed on the bucket applies to every object in that bucket. You should use S3 bucket policies when:

You need a simple way to do cross-account access to S3, without using IAM roles.
Your IAM policies bump up against the defined size limit. S3 bucket policies have a larger size limit.

Encrypt S3

Amazon S3 reinforces encryption in transit (as it travels to and from Amazon S3) and at rest. To protect data at rest, you can use:

Server-side encryption: This allows Amazon S3 to encrypt your object before saving it on disks in its data centers and then decrypt it when you download the objects.
Client-side encryption: Encrypt your data client-side and upload the encrypted data to Amazon S3. In this case, you manage the encryption process, the encryption keys, and all related tools.

To encrypt in transit, you can use client-side encryption or Secure Sockets Layer (SSL).

S3 VERSIONING

If you enable versioning for a bucket, Amazon S3 automatically generates a unique version ID for the object being stored. Versioning-enabled buckets let you recover objects from accidental deletion or overwrite.

Deleting an object does not remove the object permanently. Instead, Amazon S3 puts a marker on the object that shows you tried to delete it. If you want to restore the object, you can remove this marker and it reinstates the object.
If you overwrite an object, it results in a new object version in the bucket. You still have access to previous versions of the object.

Buckets can be in one of three states.

Unversioned (the default): No new or existing objects in the bucket have a version.
Versioning-enabled: This enables versioning for all objects in the bucket.
Versioning-suspended: This suspends versioning for new objects. All new objects in the bucket will not have a version. However, all existing objects keep their object versions.

The versioning state applies to all of the objects in that bucket. Keep in mind that storage costs are incurred for all objects in your bucket and all versions of those objects. To reduce your S3 bill, you may want to delete previous versions of your objects that are no longer in use

S3 Storage Classes

S3 storage classes let you change your storage tier as your data characteristics change. For example, if you are now accessing old objects infrequently, you may want to change the storage class those files are stored in to save on costs. There are six S3 storage classes.

Amazon S3 Standard: This is considered general purpose storage for cloud applications, dynamic websites, content distribution, mobile and gaming applications, and big data analytics.
Amazon S3 Intelligent-Tiering: This tier is useful if your data has unknown or changing access patterns. S3 Intelligent-Tiering stores objects in two tiers, a frequent access tier and an infrequent access tier. Amazon S3 monitors access patterns of your data, and automatically moves your data to the most cost-effective storage tier based on frequency of access.
Amazon S3 Standard-Infrequent Access (S3 Standard-IA): S3 Standard-IA is for data that is accessed less frequently, but requires rapid access when needed. S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard, with a low per-GB storage price and per-GB retrieval fee. This storage tier is ideal if you want to store long-term backups, disaster recovery files, and so on.
Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA): Unlike other S3 storage classes which store data in a minimum of three Availability Zones (AZs), S3 One Zone-IA stores data in a single AZ and costs 20% less than S3 Standard-IA. S3 One Zone-IA is ideal for customers who want a lower-cost option for infrequently accessed data but do not require the availability and resilience of S3 Standard or S3 Standard-IA. It’s a good choice for storing secondary backup copies of on-premises data or easily re-creatable data.
Amazon S3 Glacier Instant Retrieval: Amazon S3 Glacier Instant Retrieval is an archive storage class that delivers the lowest-cost storage for long-lived data that is rarely accessed and requires retrieval in milliseconds.
Amazon S3 Glacier Flexible Retrieval:S3 Glacier Flexible Retrieval delivers low-cost storage, up to 10% lower cost (than S3 Glacier Instant Retrieval), for archive data that is accessed 1—2 times per year and is retrieved asynchronously.
Amazon S3 Glacier Deep Archive: S3 Glacier Deep Archive is Amazon S3’s lowest-cost storage class and supports long-term retention and digital preservation for data that may be accessed once or twice in a year. It is designed for customers—particularly those in highly regulated industries, such as the Financial Services, Healthcare, and Public Sectors—that retain data sets for 7 to 10 years or longer to meet regulatory compliance requirements.
Amazon S3 Outposts:Amazon S3 on Outposts delivers object storage to your on-premises AWS Outposts environment.

S3 Object Lifecycle Management

When you define a lifecycle policy configuration for an object or group of objects, you can choose to automate two actions: transition and expiration actions.

Transition actions are used to define when you should transition your objects to another storage class.
Expiration actions define when objects expire and should be permanently deleted.

For example, you might choose to transition objects to S3 Standard-IA storage class 30 days after you created them, or archive objects to the S3 Glacier storage class one year after creating them.

The following use cases are good candidates for lifecycle management.

Periodic logs: If you upload periodic logs to a bucket, your application might need them for a week or a month. After that, you might want to delete them.
Data that changes in access frequency: Some documents are frequently accessed for a limited period of time. After that, they are infrequently accessed. At some point, you might not need real-time access to them, but your organization or regulations might require you to archive them for a specific period. After that, you can delete them.

In this Post: