Search open search form

Rclone backups to AWS

Setup Rclone 1.62.2 for AWS backups from HPC

Setup the IAM user permissions in AWS

Perform the following before configuring rclone.


Log into the AWS console


Navigate to IAM


Click Add User

name: rclone-backup-from-hpc


Do not check "Provide User accounts to the AWS Management Console" as this account will be used for CLI based access via rclone.


Click Next > Select "Attach policies directly"


Search for S3 and select "AmazonS3FullAccess". If this AWS account already has multiple buckets created it's up to the end user to specify a more granular access policy to S3 for this new IAM user.


Click Next > Create User


Now that user is created, click on the User in the IAM console


Navigate to Security Credentials and locate the Access Keys section.


Click "Create access key" and select "Applications running outside of AWS"


Click Next


For Description enter "rclone-backups-resnick-hpc" > Create Access Key


Leave this window open and access keys visible during the configuration of rclone on the cluster side.


Configure Rclone


module load rclone/1.62.2


rclone config


Enter name for new remote.

name> my-aws-account



Select S3 compatible object store. (This same backend can also be used for backblaze B2, Wasabi etc.)


Storage > 5


Now select the actual provider of this S3 compatible object storage


Provider > 1


Option env_auth. Specify 1 to enter your credentials or add the credentials to your shell environment.


env_auth > 1 


Enter the new access key id_shown in the aws console during IAM creation.

access_key_id > AKIAX4EICHL66XXXXXXX


Now enter secret access key provided by the AWS console

secret_access-key > aa8jXXXXXXXXXXXXXXXXXXXXXXXXX


Select a region for the storage bucket. We recommend us-west-2 (Oregon) as it's one of the newer regions, is distinct seismically from California and provides fast network access via Cenic/Internet2


regions > 4


Option Endpoint (leave default by hitting enter)


endpoint > (enter)


Location constraint (used when creating buckets via rclone commands). Set to option 4 (Oregon us-west-2)


location_constraint > 4


Optional ACL (just hit enter to leave blank)


>acl > (enter)


Optional server side encryption. Shown disabled below, set accordingly.


server_side-encryption > (enter)


Option sse_kms_key_id


sse_kms_id > (enter)


Option Storage Class


Standard IA (Good for fast recovery with the intention of not touching the data much). Set accordingly based on requirements and full understanding of the implications. 


storage_class > 4


Edit advanced config? No

y/n > n


-------------------------------

Configuration complete.

Options:

- type: s3

- provider: AWS

- access_key_id: AKIAX4EICHL66XXXXXXX

- secret_access_key: aa8jXXXXXXXXXXXXXXXXXXXXXXXXX

- region: us-west-2

- location_constraint: us-west-2

- storage_class: STANDARD_IA

Keep this "my-aws-account" remote?

y) Yes this is OK (default)

e) Edit this remote

d) Delete this remote

y/e/d>

-------------------------------


Enter yes to save config.


y/e/d> y


Quit the config by entering q


e/n/d/r/c/s/q> q


Now create the new bucket for backups inside of the AWS account. Here we are creating the resnick-hpc-backups bucket.


rclone mkdir my-aws-account:/resnick-hpc-backups/



Notes


* Rclone stores configs and aws keys in config/rclone/rclone.conf.


* If using one of the deeper storage classes such as glacier-deep_archive-glacier_ir, you'll want to familiarize yourself with its limitations, retention periods and retrieval fees.


* To specify a remote, be sure to include the colon, i.e.

my-aws-account:


* Backups are self-managed, you will need to check on the backup process now and then to assure everything is backing up as intended.


Useful Links


[Amazon S3 Glacier Storage Classes | AWS](https://aws.amazon.com/s3/storage-classes/glacier/)


[AWS Pricing Calculator](https://calculator.aws/#/)


https://rclone.org/s3/


https://rclone.org/commands/


Common commands


Basic ls

rclone ls my-aws-account:/resnick-hpc-backups/


Basic copy command

rclone copy ~/my-source-directory my-aws-account:/resnick-hpc-backups/


-P --progress flag to view real-time transfer statistics.


Scheduling reoccuring backups with crontab


In order to schedule a job that runs repeatedly, you may add it to your crontab. We suggest running crontab at 5 minute intervals as a test to assure the backups are capturing new data on the cluster side. (i.e. run a 5 minutes while adding data to the cluster/source directories then check S3 to verify they are showing up.)


crontab -e


*/5 * * * * /central/software/rclone/1.62.2/rclone copy ~/Google-GCP my-aws-account:/resnick-hpc-backups/


After verifying the data is being backed up successfully you can switch to a daily or longer type backup. Example for once a day at 4AM


0 4 * * * /central/software/rclone/1.62.2/rclone copy ~/Google-GCP my-aws-account:/resnick-hpc-backups/