Copying large files between S3 buckets

There are many different scenarios you might face where you need to migrate data between S3 buckets, and folders. Depending on the use case you have several options for the language to select.

  • Lambda’s – this could be Python, Java, JavaScript or C#
  • Bespoke code – again, this could be any language you select. To keep things different from above, lets add Powershell to the mix

Behind the scenes a lot of these SDK’s call into common endpoints Amazon host. As a user you don’t really need to delve too deeply into the specific endpoints unless you really need to.

Back to the issue at hand – copying large files
Amazon impose a limit of roughly 5GB on regular copy operations. If you use the standard copy operations you will probably hit exceptions when the file sizes grow.

The solution is to use the multipart copy. It sounds complex but all the code is provided for you:

Python
This is probably the easiest to do as the boto3 library already does this. Rather than using copy_object, the copy function already handles multipart uploads: http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.copy

C#
The C# implementation is slightly more complex however Amazon provide good worked examples: https://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjctsUsingLLNetMPUapi.html

Powershell
A close mimic of the C# implementation – someone has ported the C# example into powershell: https://stackoverflow.com/a/32635525/1065332

Happy copying!

Leave a Reply

Your email address will not be published. Required fields are marked *