For AWS users migrating data into Glacier without truly understanding their storage costs and usage, know that recent changes to S3 and Glacier adds new complexity to data retrieval. Read on to learn how to avoid these challenges, or dragons, and optimize your cloud storage costs.
[EDIT: 11/23/16] Now that S3 and Glacier pricing have changed during late 2016 (effective December 2016), this tale is a bit outdated. What was once a cost-spiking dragon to avoid is now just a message of being mindful about storage data retrieval nuances. Nevertheless, we must keep an eye on our storage costs and usage as they can add both AWS and operational costs to store and retrieve without a sound information governance policy or strategy involved.
While AWS Glacier is an inexpensive way to store archives and data that does not need to be accessed frequently, it can lead to unexpected charges when improperly used. Let’s cover some pricing nuances involved with storing and retrieving data from Glacier.
Some data are continuously sent, received, stored, and accessed on a daily basis. Other data eventually become less frequently needed and require long-term, secure storage. This is where AWS’ Glacier product comes into play when data becomes less frequently needed and requires long-term storage. It gives users an inexpensive means to satisfy their data retention policies.
Here’s the data transfer prices for U.S. West to give context as to how AWS users get charged for moving data to and from Glacier.
|Data IN to AWS Glacier|
|All data transfer in||$0.000 per GB|
|Data OUT from AWS Glacier|
|Amazon EC2 in the same region||$0.000 per GB|
|Another AWS Region||$0.020 per GB|
|Data OUT from AWS Glacier to Internet|
|First 1 GB/month||$0.000 per GB|
|Up to next 10 TB / month||$0.090 per GB|
|Next 40 TB / month||$0.085 per GB|
|Next 100 TB / month||$0.070 per GB|
|Next 350 TB / month||$0.050 per GB|
|Next 524 TB / month||Contact AWS|
|Next 4 PB / month||Contact AWS|
|Greater than 5 PB / month||Contact AWS|
Users gain cloud storage savings when IT and operations folks migrate certain data for long-term storage where they know they won’t need to access the data except for certain circumstances or situations. This is especially true for data stored in Glacier beyond three months. But, when large sets of data need to be restored from Glacier back to S3, this is where costs can add up, far outweighing any original savings from intended long-term storage.
AWS introduced new tiers of data retrieval out of Glacier to accommdate with different types of usage. These new methods are far more user-friendly than the previous iteration that included punitive fees for early retrieval (the “dragon” of the previous version of this article). Get more details from the AWS blog on these changes.
Standard Retrieval is a renaming of the basic Glacier data retrieval function. It’s the default method for all API-driven retrieval requests. Data retrieval will take only a matter of hours (typically three to five according to AWS). Users will pay $0.01 per GB along with $0.05 for every 1,000 requests.
Use Expedited Retrieval to get data back quickly within minutes. According to AWS, this type of retrieval is optimal for users who plan to store more than 100 TB of data in Glacier and need to make infrequent, yet urgent requests for subsets of data. Expedited Retrievals cost $0.03 per GB and $0.01 per request.
Note: if users plan on storing less than 100 TB of data under this condition, it looks to be more cost-efficient to use S3 Infrequent Access.
Expedited Retrieval generally takes between 1 and 5 minutes with this expectation dependent upon how busy the AWS datacenter gets. AWS provides a purchasable option to get data back even quicker.
Users can provision capacity to expedite retrieval even further. Each unit of Provisioned capacity costs $100 per month and includes three (3) Expedited Retrievals every five (5) minutes at a data transfer speed of up to 150 MB/second of retrieval throughput.
For users with massive amounts of data and the need to retrieve it quickly, this pay-for-speed feature might be something to keep an eye on to optimize retrieval costs. Using a tool like Cloudability to break down storage costs by data transfer can help shed a lot of light on the cost efficiency of this tactic.
The last method is called Bulk Retrieval and is ideal for planned or non-urgent use cases. Data retrieval typically takes “five to 12 hours at a cost of $0.0025 per GB (75% less than for Standard Retrieval) along with $0.025 for every 1,000 requests” according to AWS.
The big tradeoff for waiting a bit longer to retrieve data is attaining significant savings. Users with massive amounts of archival or compliance data that need to retrieve it from Glacier and aren’t on a schedule can save big here. AWS covers more frequently asked questions on their documentation.
Building the means (such as creating reporting widgets or storage-focused cost and usage dashboards) to keep an eye on storage costs and usage can give IT and operations teams a better sense of how their data is accessed and stored. It can’t wholly replace a well-built information governance strategy, but these visualizations and reports can lend some cost-management insight and inform the best means to store and archive data.
The right cloud storage cost management setup can help teams keep granular tabs on AWS storage products. Here’s an example from Cloudability.
But, sometimes data restorations are required. There’s no way around certain business challenges or circumstances that require pulling the data out of long-term storage. By understanding S3 and Glacier pricing and learning ways around causing a cloud storage cost spike, IT and operations teams can help mitigate heavy Glacier data retrieval costs.
Interested in seeing storage usage and costs beyond the usual bill? Feel free to get in touch for a free Cloudability trial.