<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Lambda &#8211; blog.boro2g .co.uk</title>
	<atom:link href="https://blog.boro2g.co.uk/category/lambda/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog.boro2g.co.uk</link>
	<description>Some ideas about coding, dev and all things online.</description>
	<lastBuildDate>Fri, 10 Feb 2017 14:30:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.5.8</generator>
	<item>
		<title>Scan large S3 bucket with node js and AWS lambda</title>
		<link>https://blog.boro2g.co.uk/scan-large-s3-bucket-node-js-aws-lambda/</link>
					<comments>https://blog.boro2g.co.uk/scan-large-s3-bucket-node-js-aws-lambda/#respond</comments>
		
		<dc:creator><![CDATA[boro]]></dc:creator>
		<pubDate>Fri, 10 Feb 2017 14:15:58 +0000</pubDate>
				<category><![CDATA[AWS]]></category>
		<category><![CDATA[Lambda]]></category>
		<guid isPermaLink="false">https://blog.boro2g.co.uk/?p=811</guid>

					<description><![CDATA[<p>AWS lamdba&#8217;s are a really cool way to remove the need for specific hardware when running things like scheduled operations. The challenge we had was to find the latest 2 files in a large bucket which matched specific key prefixes. This is easy enough on smaller buckets as the listObjectsV2 call is limited to return [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://blog.boro2g.co.uk/scan-large-s3-bucket-node-js-aws-lambda/">Scan large S3 bucket with node js and AWS lambda</a> appeared first on <a rel="nofollow" href="https://blog.boro2g.co.uk">blog.boro2g .co.uk</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>AWS lamdba&#8217;s are a really cool way to remove the need for specific hardware when running things like scheduled operations. The challenge we had was to find the latest 2 files in a large bucket which matched specific key prefixes. This is easy enough on smaller buckets as the <em>listObjectsV2</em> call is limited to return 1000 items. What to do if you need to scan more?</p>
<p>The following example shows how you can achieve this. You need to fill in a couple parts:</p>
<ul>
<li>the bucket name<br />
<pre class="crayon-plain-tag">this._bucket = "TODO:bucket-name";</pre>
</li>
<li>the filename / folder prefix<br />
<pre class="crayon-plain-tag">"folder/filename_prefix"</pre>
</li>
<li>the file suffixes<br />
<pre class="crayon-plain-tag">this._priceSets = ["Full", "Delta"];</pre>
</li>
</ul>
<p>What&#8217;s really neat with Lambda&#8217;s is you can pass in parameters from the test event e.g.:</p><pre class="crayon-plain-tag">{
  "accountAlias": "accountName",
  "environment": "qa",
  "region": "eu-west-1",
  "snsArn": "arn:aws:sns:eu-west-1:###:Channel name"
}</pre><p></p>
<p>When this runs it will fire off SNS alerts if it finds the files to be out of date.</p>
<p>The key bit is the recursive calls in GetLatestFiles which finally triggers the callback from the parent function (ie the promise in GetLatestFileForType).</p>
<p></p><pre class="crayon-plain-tag">"use strict";
const AWS = require("aws-sdk");
var Demo;
(function (Demo) {
    class LambdaFunctionDemo {
        constructor() {
            this._priceSets = ["Full", "Delta"];
            this._items = {};
            this._s3 = new AWS.S3();
            this._sns = new AWS.SNS();
        }
        Run(event, finishCallback) {
            this._environment = event.environment;
            this._bucket = "TODO:bucket-name";
            this._snsArn = event.snsArn;
            var promises = [];
            for (var priceSet of this._priceSets) {
                promises.push(this.InspectRoutePriceDelta(priceSet));
            }
            Promise.all(promises)
                .then((results) =&gt; {
                console.log(results);
                finishCallback(null, results);
            });
        }
        InspectRoutePriceDelta(priceSet) {
            this._items[priceSet] = [];
            return new Promise((resolve, reject) =&gt; {
                this.GetLatestFileForType(priceSet)
                    .then((priceFile) =&gt; {
                    if (this.CheckDataStatus(priceFile.LastModified, priceSet, priceFile.Key)) {
                        this.FireFileStaleAlert(priceFile.Key, "Delta file");
                        resolve(`Data Stale: ${priceSet}.`);
                    }
                    else {
                        resolve(`Data Fresh: ${priceSet}.`);
                    }
                })
                    .catch(() =&gt; {
                    this.FireFileMissingAlert(priceSet, "Delta file");
                    resolve(`Data Not Found: ${priceSet}`);
                });
            });
        }
        GetLatestFileForType(priceSet) {
            console.log("GetLatestFileForType: " + priceSet);
            return new Promise((resolve, reject) =&gt; {
                this.GetLatestFiles("folder/filename_prefix" + priceSet, null, priceSet, resolve);
            });
        }
        GetLatestFiles(prefix, continuationToken, priceSet, callback) {
            console.log("GetLatestFiles: " + this._items[priceSet].length + " " + priceSet);
            return new Promise((resolve, reject) =&gt; {
                this.GetLatestFile(prefix, continuationToken)
                    .then((data) =&gt; {
                    data.Contents.forEach((a) =&gt; this._items[priceSet].push(a));
                    if (data.IsTruncated) {
                        this.GetLatestFiles(prefix, data.NextContinuationToken, priceSet, callback)
                            .then((recursiveData) =&gt; { });
                    }
                    else {
                        this._items[priceSet].sort((a, b) =&gt; (b.LastModified).getTime() - (a.LastModified).getTime());
                        console.log("there are: " + this._items[priceSet].length + " items for: " + priceSet);
                        callback(this._items[priceSet][0]);
                    }
                });
            });
        }
        GetLatestFile(prefix, continuationToken) {
            console.log(`Scanning folder for file: ${prefix} with token: ${continuationToken}`);
            var params = {
                Bucket: this._bucket,
                Prefix: prefix,
                ContinuationToken: continuationToken
            };
            return this._s3.listObjectsV2(params).promise();
        }
        CheckDataStatus(fileTimeStamp, dataType, key) {
            console.log("Validating " + dataType + " :" + key);
            var currentTimeStamp = new Date();
            var timeDifferenceSeconds = (currentTimeStamp.getTime() - fileTimeStamp.getTime()) / 1000;
            console.log("TimeDifferenceSeconds = " + timeDifferenceSeconds);
            if (dataType == "Full") {
                return timeDifferenceSeconds &gt; (25 * 60 * 60);
            }
            else {
                return timeDifferenceSeconds &gt; (10 * 60);
            }
        }
        FireFileStaleAlert(fileKey, dataType) {
            console.error(fileKey + " on " + this._environment + " is stale in " + this._bucket);
            var snsParams = {
                Message: "ERROR: " + fileKey + " on " + this._environment + " is stale in " + this._bucket,
                Subject: this._environment + " " + dataType + " Data Stale",
                TopicArn: this._snsArn
            };
            this.FireAlert(snsParams);
        }
        FireFileMissingAlert(fileKey, dataType) {
            console.error(fileKey + " on " + this._environment + " is missing in " + this._bucket);
            var snsParams = {
                Message: "ERROR: " + fileKey + " on " + this._environment + " is missing in " + this._bucket,
                Subject: this._environment + " " + dataType + " Data Missing",
                TopicArn: this._snsArn
            };
            this.FireAlert(snsParams);
        }
        FireAlert(snsParams) {
            this._sns.publish(snsParams, (error, data) =&gt; {
                if (error) {
                    console.error("Failed to send SNS message: " + error, error.stack);
                }
                else {
                    console.log("Successfully sent SNS message");
                }
            });
        }
    }
    Demo.LambdaFunctionDemo = LambdaFunctionDemo;
})(Demo || (Demo = {}));
exports.handler = (event, context, finishCallback) =&gt; {
    var processor = new Demo.LambdaFunctionDemo();
    processor.Run(event, finishCallback);
};</pre><p>&nbsp;</p>
<p>The post <a rel="nofollow" href="https://blog.boro2g.co.uk/scan-large-s3-bucket-node-js-aws-lambda/">Scan large S3 bucket with node js and AWS lambda</a> appeared first on <a rel="nofollow" href="https://blog.boro2g.co.uk">blog.boro2g .co.uk</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://blog.boro2g.co.uk/scan-large-s3-bucket-node-js-aws-lambda/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
