Scan large S3 bucket with node js and AWS lambda

AWS lamdba’s are a really cool way to remove the need for specific hardware when running things like scheduled operations. The challenge we had was to find the latest 2 files in a large bucket which matched specific key prefixes. This is easy enough on smaller buckets as the listObjectsV2 call is limited to return 1000 items. What to do if you need to scan more?

The following example shows how you can achieve this. You need to fill in a couple parts:

  • the bucket name
  • the filename / folder prefix
  • the file suffixes

What’s really neat with Lambda’s is you can pass in parameters from the test event e.g.:

When this runs it will fire off SNS alerts if it finds the files to be out of date.

The key bit is the recursive calls in GetLatestFiles which finally triggers the callback from the parent function (ie the promise in GetLatestFileForType).

 

Leave a Reply

Your email address will not be published. Required fields are marked *