AWS S3 provides a platform for storage of data in the form of objects. The object storage enables corporations to store, analyze and retrieve large data from different platforms such as websites, mobile apps, customized company applications among other devices.
The AWS S3 store the objects in resources known as buckets. In a single bucket, you can store numerous objects, and at the same time you can write or delete an object in a bucket.
Amazon S3 (Simple Storage Service) provides object storage, which is built for storing and recovering any amount of information or data from anywhere over the internet. It provides this storage through a web services interface. While designed for developers for easier web-scale computing, it provides 99.999999999 percent durability and 99.99 percent availability of objects. It can also store computer files up to 5 terabytes in size.
Amazon Simple Storage Service, widely known as Amazon S3 or AWS S3, is a highly scalable, fast, and durable solution for object-level storage of any data type. Unlike the operating systems we are all used to, Amazon S3 does not store files in a file system, instead it stores files as objects. Object Storage allows users to upload files, videos, and documents like you were to upload files, videos, and documents to popular cloud storage products like Dropbox and Google Drive. This makes Amazon S3 very flexible and platform agnostic.
Amazon S3 is object storage built to store and retrieve any amount of data from anywhere on the Internet. It’s a simple storage service that offers industry leading durability, availability, performance, security, and virtually unlimited scalability at very low costs.
Amazon S3 is one of the main building blocks of AWS. It’s advertised as” infinitely scaling” storage. It’s widely popular and deserves its own section. Many websites use Amazon S3 as a backbone
Benefits of using AWS S3:
Here are some of the benefits of using Amazon S3.
- Simple to Use:
S3 serves as a unique storage platform. It is designed and built to simplify functions. It has mobile apps, web management console, and APIs which allow firms to integrate the platform with other technologies in use quickly. S3 is a cost-effective platform to store and transfer data in or out to third-party networks.
- Enhances Security:
S3 supports the transfer of data on SSL. It also enables automatic encryptions of data upon the completion of uploading. Importantly, it allows you to configure policies which will manage permissions grant to an object. It allows you to control the access to valuable information by utilising AWS Identity.
- Reliable and Durable Data Storage:
Amazon S3 offers robust infrastructure for the storage of crucial information. It is designed to provide 99.999 percent durability for the objects. In addition to strength, S3 is reliable since it stores data across different devices and facilities to improve data accessibility.
S3 is available in various regions across the globe. It allows geographical redundancy per region. You can also replicate S3 across areas.
- Integration:
S3 is easy to integrate with other services available on AWS. You can directly embed S3 with security platforms such as KMS or IAM, alerting services such as Event Notifications, databases including Redshift and EMR, and computing platforms such as Lambda.
- Scalability:
Many firms in the world use S3 to secure millions of objects. Costs can grow or reduce depending on demand and S3 allow implementation in few minutes. Many industries such as financial service sector, entertainment and health use S3 to develop big data, transcode or archive.
Amazon S3 Buckets:
- Amazon S3 allows people to store objects (files) in “buckets” (directories)
- Buckets must have a globally unique name
- Buckets are defined at the region level
- Naming convention
- No uppercase
- No underscore
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number
AWS S3 Objects:
An object consists of data, key (assigned name), and metadata. A bucket is used to store objects. When data is added to a bucket, Amazon S3 creates a unique version ID and allocates it to the object.
- Objects (files) have a Key
- The key is the FULL path:
s3://my-bucket/my_file.txt
s3://my-bucket/my_folder1/another_folder/my_file.txt
- The key is composed of prefix + object name
- s3://my-bucket/my_folder1/another_folder/my_file.txt
- There’s no concept of “directories” within buckets (although the UI will trick you to think otherwise)
- Just keys with very long names that contain slashes (“/”)
Object values are the content of the body:
Max Object Size is 5TB (5000GB). If uploading more than 5GB, must use “multi-part upload”.
Objects can have
- Metadata (list of text key / value pairs – system or user metadata)
- Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle
- Version ID (if versioning is enabled)
Amazon S3 – Versioning
Amazon S3 is a great way to host files. It is similar to Google Drive, Apple iCloud, and Microsoft OneDrive, but for developers. Files are uploaded into Buckets under specific Keys. They can then be downloaded from around the world. There is a little bit more to it, but that is the main list.
Each new upload brings a risk. A new file or a new version of an existing file could be incompatible with its consumers. Depending on the coupling, this could cause outages.
Versioning is a great way to mitigate this.
Amazon S3 has a built-in versioning solution. It can be enabled in the bucket’s properties tab.
- You can version your files in Amazon S3
- It is enabled at the bucket level
- Same key overwrite will increment the “version”: 1, 2, 3….
- It is best practice to version your buckets
- Protect against unintended deletes (ability to restore a version)
- Easy roll back to previous version
- Any file that is not versioned prior to enabling versioning will have version “null”
- Suspending versioning does not delete the previous versions.
Once enabled, objects are never overwritten. Uploading multiple files to the same Bucket and Key will create new versions. Amazon S3 will return the latest one if none is explicitly requested.
Please check storage classes and S3 Encryption from below:
S3 Bucket Policies:
JSON based policies contains-
Resources: buckets and objects
Actions: Set of API to Allow or Deny
Effect: Allow / Deny
Principal: The account or user to apply the policy to
Use S3 bucket for policy to:
Grant public access to the bucket
Force objects to be encrypted at upload
Grant access to another account (Cross Account)
Bucket settings for Block Public Access
1)Block public access to buckets and objects granted through
- new access control lists (ACLs)
- any access control lists (ACLs)
- new public bucket or access point policies
2) Block public and cross-account access to buckets and objects through any public bucket or access point policies
These settings were created to prevent company data leaks.
If you know your bucket should never be public, leave these on.
Can be set at the account level
S3 Websites:
S3 can host static websites and have them accessible on the www
The website URL will be:
<bucket-name>.s3-website–<AWS-region>.amazonaws.com
OR
<bucket-name>.s3-website.<AWS-region>.amazonaws.com
If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!
S3 CORS:
CORS means Cross-Origin Resource Sharing
Web Browser based mechanism to allow requests to other origins while visiting the main origin
Same origin: http://example.com/app1 & http://example.com/app2
Different origins: http://www.example.com & http://other.example.com
The requests won’t be fulfilled unless the other origin allows for the requests, using CORS Headers (ex: Access-Control-Allow-Origin)
If a client does a cross-origin request on our S3 bucket, we need to enable the correct CORS headers
You can allow for a specific origin or for * (all origins).
Amazon S3 – Consistency Model:
- Read after write consistency for PUTS of new objects
- As soon as a new object is written, we can retrieve it
ex: (PUT 200 => GET 200)
- This is true, except if we did a GET before to see if the object existed
ex: (GET 404 => PUT 200 => GET 404) – eventually consistent
- Eventual Consistency for DELETES and PUTS of existing objects
- If we read an object after updating, we might get the older version
ex: (PUT 200 => PUT 200 => GET 200 (might be older version))
- If we delete an object, we might still be able to retrieve it for a short time
ex: (DELETE 200 => GET 200)
S3 Pre-Signed URLs:
Can generate pre-signed URLs using SDK or CLI
- For downloads (easy, can use the CLI)
- For uploads (harder, must use the SDK)
Valid for a default of 3600 seconds, can change timeout with –expires-in [TIME_BY_SECONDS] argument
Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT
Examples:
Allow only logged-in users to download a premium video on your S3 bucket
It allow an ever changing list of users to download files by generating URLs dynamically
Allow temporarily a user to upload a file to a precise location in our bucket
AWS S3 MFA-Delete:
MFA (multi factor authentication) forces user to generate a code on a device (usually a mobile phone or hardware) before doing important operations on S3.
To use MFA-Delete, enable Versioning on the S3 bucket
To use MFA-Delete, enable Versioning on the S3 bucket
You will need MFA to :
- permanently delete an object version
- suspend versioning on the bucket.
You won’t need MFA for:
- enabling versioning
- listing deleted versions
Only the bucket owner (root account) can enable/disable MFA-Delete
MFA-Delete currently can only be enabled using the CLI
S3 Access Logs: Warning
- Do not set your logging bucket to be the monitored bucket
- It will create a logging loop, and your bucket will grow in size exponentially
AWS S3 Replication (CRR & SRR):
Must enable versioning in source and destination
1) Cross Region Replication (CRR)
2) Same Region Replication (SRR)
- Buckets can be in different accounts.
- Copying is asynchronous
- Must give proper IAM permissions to S3
CRR – Use cases: compliance, lower latency access, replication across accounts
SRR – Use cases: log aggregation, live replication between production and test accounts
After activating, only new objects are replicated (not retroactive).
For DELETE operations:
- If you delete without a version ID, it adds a delete marker, not replicated
- If you delete with a version ID, it deletes in the source, not replicated
There is no “chaining” of replication:
- If bucket 1 has replication into bucket 2, which has replication into bucket 3
- Then objects created in bucket 1 are not replicated to bucket 3
Related Posts:
Amazon Web Service – AWS Tutorial