Notes on Navigating an AWS s3 Glacier Restore


Yesterday marked a first for me: I had to restore a few objects from a large S3 bucket that was backed up to Glacier. Along the way I learned a few things:

  1. Objects sent to glacier permanently retain the GLACIER storage class
  2. If your S3 objects were replicated across an AWS Account boundary, you might not have 'full control' of your objects (but AWS will gladly let you pay them to store them)
  3. The AWS CLI is unhelpful when it comes to recursively copying objects that are restored from glacier

The objects can be restored and downloaded, it just takes some specific knowledge

References:

My Assigned Task

I needed to restore about 400 files from a bucket containing a large number of files (more than 100,000). In examining the bucket I found that the objects were all sent to glacier, so they first needed to be restored before I could download them.

Unfreezing S3 Objects from Glacier

This was easy enough, I had a list of the objects I wanted to recover and made a script that recursively called this s3api call (which unfreezes the objects for 7 days using a 'standard' retrieval speed):

aws s3api restore-object --profile aws-profile-here-if-you-need-it --bucket my-bucket --key "$key" --restore-request '{"Days":7,"GlacierJobParameters":{"Tier":"Standard"}}'

As this was the first time using Glacier, I did not realize that the object class would not revert to S3 standard. Apparently once an object is sent to Glacier it can never be permanently restored. You can copy the object somewhere else (which is what I was going to do anyway), but it's a little unsettling that unless I delete the object from glacier I would have to pay a storage retrieval fee to get it back again.

You can check the status of a glacier restore using this command: aws s3api head-object --profile aws-profile-here-if-you-need-it --bucket my-bucket --key "$key" (be sure it doesn't show a file transfer in progress).

Dealing With Cross-Account Object Ownership

At this point I noticed that the S3 web console would not show me information about the objects that were restored from Glacier (just kept seeing 'access denied'), so I had to speak with AWS support. They helped identify the source of the access denied messages: the objects in the bucket were all replicated from another AWS account and when replicated the new account was not granted FULL_CONTROL over those objects. The solution is to run a command like this across all of the objects in the bucket:

aws s3api put-object-acl --bucket my-bucket --grant-full-control emailaddress=rootAccountOldAwsAccount@emaildomain.tld --grant-full-control emailaddress=rootAccountNewAwsAccount@emaildomain.tld --profile aws-profile-FROMTHEOLDACCOUNT-here-if-you-need-it --key "$key"

This will ensure that the S3 ACLs are setup in a way that the new account can work with the objects, assuming you still have access to the old account with the right IAM role setup to run the command. This needs to be run in the context of the OLD account and appropriate cross-account access needs to be configured for this to work.

AWS support did make one thing clear to me: If the source account is no longer accessible to you with the right permissions, you wll not be able to download/view or do anything to the S3 objects aside from delete them. So watch out!

You can check an objects ACLs like this: aws s3api get-object-acl --profile aws-profile-here-if-you-need-it --bucket my-bucket --key "$key"

Check the 'grants' section for FULL_CONTROL:

{
    "Owner": {
        "DisplayName": "my-username",
        "ID": "7009a8971cd538e11f6b6606438875e7c86c5b672f46db45460ddcd087d36c32"
    },
    "Grants": [
        {
            "Grantee": {
                "DisplayName": "my-username",
                "ID": "7009a8971cd538e11f6b6606438875e7c86c5b672f46db45460ddcd087d36c32"
            },
            "Permission": "FULL_CONTROL"
        },
        {
            "Grantee": {
                "URI": "http://acs.amazonaws.com/groups/global/AllUsers"
            },
            "Permission": "READ"
        }
    ]
}

How to Prevent the Permissions Issue

AWS Support recommended a couple of options:

Downloading the Unfrozen Objects

Now that the objects are unfrozen and ACLs are in place, I should be able to download the objects. This is where I run into an issue with the AWS CLI not being verbose enough. I tried to recursively download the key/directory which has the 400ish objects I need with this command:

aws s3 cp s3://my-bucket/parentkey/childkey/ . --profile aws-profile-here-if-you-need-it --recursive

But All I see are messages like these:

warning: Skipping file s3://my-bucket/parentkey/childkey/index.html. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
warning: Skipping file s3://my-bucket/parentkey/childkey/app.js. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

I found there was another flag I needed to add to the aws cli command to get it to work: --force-glacier-transfer, which puts the complete command looking like this (I wish the force command was clearly listed in the 'warning' message... Instead I had to hunt for a solution):

aws s3 cp s3://my-bucket/parentkey/childkey/ . --profile aws-profile-here-if-you-need-it --recursive --force-glacier-transfer

And that's how I got the objects downloaded.