Amazon S3 Glacier is an online data storage service provided by AWS. Just like AWS’s popular S3 service, Glacier provides users with a simple, secure, cloud-based data storage that can quickly be scaled up or down as needed. But unlike S3, which is designed to provide users with quick access to their data, Amazon Glacier is designed for the long-term storage of inactive data that will not need to be quickly retrieved.
It commonly takes between three-and-five hours to make a retrieval from Glacier. This long-term, slow-moving method is known as cold storage, hence the reason AWS named their service Glacier.
Following are some interesting facts about Amazon Glacier:
- Unlimited amount of data can be stored in Amazon Glacier.
- Amazon Glacier provides average annual durability of 99.999999999%
- While the data is uploaded in Amazon Glacier, it stores the data across multiple availability zones before it confirms the successful upload.
- One can enjoy the Amazon Glacier storage solution at the cost of just $0.007 per GB per month.
- Data stored in Amazon Glacier is immutable and is encrypted at rest by default.
Benefits of AWS S3 Glacier:
1. Fast Retrievals (1 – 5 minutes)
AWS S3 Glacier provides lightning fast data retrieval. You can retrieve data within 1 – 5 mins if it is Expedited. Also, Standard is available which takes 3 – 5 hours.
2. Scalability & Durability
S3 Glacier has a 99.999999999% durability. Uploaded data is automatically distributed across a minimum of three physical Availability Zones that are geographically separated.
3. Low Cost
Glacier allows you to store large amounts of data at a very low cost and also provide fast retrieval speeds. It would only cost you 0.0007$ per month for storing per GB and this is applicable only when it exceeds the free tier limit. You can receive 10 GB of data every month.
4. Highly supported by Partners, Vendors, and other AWS services
Not only AWS services support Glacier. Third-party software and MNC vendors who need to do Backup & Recovery, Archiving and Disaster Recovery use Amazon S3 Glacier.
5. Easy Querying
Amazon S3 Glacier is the only archiving service which allows a user to query in the management console itself and retrieve separate datasets from a huge data archive.
Features in Amazon S3 Glacier:
The key features of Glacier are:
- Data Retrieval features: These are features provided by Amazon Glacier for different speeds and making it more cost-effective. There are three different retrieval methods – Expedited, Standard, and Bulk.
- Amazon Glacier select: AWS Let’s you to run queries directly on the archives rather than extracting the entire archive which reduces the access time.
- Vault lock: Glacier lets you create locks on individual vaults by applying policies. For instance, WORM (Write Once Read Many) policies can be used to prevent further edits after uploading.
- Access control: AWS IAM can be used to securely access the management console and also secure the S3 Glacier data.
- Vault inventory: Amazon S3 Glacier always has an inventory of all the archives in every vault. The inventory will contain the name, creation date, and description of the archives.
- AWS Software Development Kits (SDKs): All upload and retrieval functions are done by AWS SDKs or the APIs (Application Programming Interfaces). AWS SDK is supported by multiple programming languages and frameworks like JAVA, .NET, Python and PHP. Programming is made easy by SDKs and APIs.
Archives and Vaults in AWS S3 Glacier:
Archives and Vaults are at the core of the Glacier data model. Glacier is a REST-based web service, which means that it relies on the REST architecture. Job and Notification-configuration resources are included in the Glacier data model.
Glacier uses Archives and Vaults to store data.
The actual data is stored in archives. You can store any type of data in an archive, such as images, video, audio and documents. There are two ways to upload files to an archive. You can directly upload a single file and create an archive or you can “TAR” or “ZIP” multiple files and upload it as a single archive.
Following are some important pointers about Archives:
- Maximum size of a single archive is 40 terabytes
- Unlimited number of archives can be created
- Virtually unlimited data can be stored in S3 Glacier
- An archive cannot be updated after creation
We have stored the data in the archives, but where will the archives be stored and grouped? That is where Vaults come into play. Vaults are containers where you can store multiple archives.
Following are some important pointers about Vaults:
- 1000 vaults are allowed for a single AWS account.
- You can set every vault with access policies to make it available or deny for users
- You can use AWS SDKs to do a variety of vault operations
- create vault
- delete vault
- lock vault
- list vault metadata
- tag vaults
S3 Glacier Storage Classes:
1. classic S3 Glacier class: The first storage class for S3 Glacier is the classic S3 Glacier class. It was announced in August 2012 and is intended for long-term storage that you don’t need to access quickly.
2. Deep Archive S3 Glacier class: The second storage class for Glacier is S3 Glacier Deep Archive. This storage class was announced at AWS re:Invent 2018 and is intended for extremely long-term archival with low access needs. This fits well in regulated areas, such as healthcare or financial services, where there are compliance requirements around data retention.
Data retrieval and retrieval policies in S3 Glacier:
A. Data Retrieval:
These type of retrievals can happen within 1 – 5 minutes. This is used under an urgent circumstances when you need to access data quickly from a subset of archives.
This retrieval takes 3 – 5 hours to make the data accessible. This is the most commonly used retrieval method.
Bulk retrieval is used to access significant portions of data with cost-effectiveness.
B. Retrieval policies in Glacier
Amazon S3 Glacier data retrieval policies can be user-defined. You can add policies with just a few clicks. You can limit retrievals to “Free Tier Only” or “Max Retrieval Rate” to limit speed and save costs.
Glacier will not exceed the conditions which you have mentioned in the policies. Also, even if you have to upload more data than you have provided, the policy will not allow. There are two ways you can set up data retrieval policies, first is using the AWS Management Console and another way is using Amazon Glacier APIs.