Make request with fiddler based off a timer

Fiddler is a great tool for debugging web requests. Things like the composer section allow you to concoct requests and then test out the responses you get. If you’re using composer and look at the raw view you will see something like:

If you want to run this request every N seconds you can setup a quick script to achieve this:

The full script will be shown below, you need to simply add into the class Handlers code, save the script and then use the new menu options that get added to the Tools menu (Request by Timer, Stop Timer and Request Once):

One thing to note, in your script ensure you keep the trailing \r\n\r\n in the request url otherwise you’ll receive an error.

Enjoy

Log aggregation in AWS – part 3 – enriching the data sent into ElasticSearch

This is the third, and last part of the series that details how to aggregate all your log data within AWS. See Part 1 and Part 2 for getting started and keeping control of the size of your index.

By this point you should have a self sufficient ElasticSearch domain running that pools logs from all the CloudWatch log groups that have been configured with the correct subscriber.

The final step will be how we can enrich the data being sent into the index?

By default AWS will set you up a lambda function that extracts information from the CloudWatch event. It will contain things like: instanceId, event timestamp, the source log group and a few more. This is handled in the lambda via:

Note, a tip around handling numeric fields – in order for ElasticSearch to believe fields are numbers rather than strings you can multiply the value by 1 e.g.: source[‘@fieldName’] = 1*value;

What to enrich the data with?

This kind of depends on your use case. As we were aggregating logs from a wide range of boxes, applications and services, we wanted to enrich the data in the index with the tags applied to each instance. This sounds simple in practice but needed some planning around how to access the tags for each log entry – I’m not sure AWS would look kindly on making 500,000 API requests in 15 mins!

Lambda and caching

Lambda functions are a very interesting offering and I’m sure you will start to see a lot more use cases for them over the next few years. One challenge they bring is they are stateless – in our scenario we need to provide a way of querying an instance for its tags based of its InstanceId. Enter DynamoDb, another AWS service that provides scalable key-value pair storage.

Amazon define Dynamo as: Amazon DynamoDB is a fully managed non-relational database service that provides fast and predictable performance with seamless scalability.

Our solution

There were 2 key steps to the process:

  1. Updating dynamo to gather tag information from our running instances
  2. Updating the lambda script to pull the tags from dynamo as log entries are processed

1. Pushing instance data into Dynamo

Setup a lambda function that would periodically scan all running instances in our account and push the respective details into Dynamo.

  1. Setup a new Dynamo db table
    1. Named: kibana-instanceLookup
    2. Region: eu-west-1 (note, adjust this as you need)
    3. Primary partition key: instanceId (of type string)
      1. Note – we will tweak the read capacity units once things are up and running – for production we average about 50 
  2. Setup a new lambda function
    1. Named: LogsToElasticSearch_InstanceQueries_regionName 
    2. Add environment variable: region=eu-west-1
      1. Note, if you want this to pool logs from several regions into one dynamo setup a lambda function per region and set different environment variables for each. You can use the same trigger and role for each region
    3. Use the script shown below
    4. Set the execution timeout to be: 1 minute (note, tweak this if the function takes longer to run)
    5. Create a new role and give the following permissions:
      1. AmazonEC2ReadOnlyAccess (assign the OTB policy)
      2. Plus add the following policy:
        1. Note, the ### in the role wants to be your account id
    6. Setup a trigger within Cloudwatch -> Rules
      1. To run hourly, set the cron to be: 0 * * * ? *
      2. Select the target to be your new lambdas
        1. Note, you can always test your lambda by simply running on demand with any test event

And the respective script:

Note, if your dynamo runs in a different region to eu-west-1, update the first line of the pushInstanceToDynamo method and set the desired target region.

Running on demand should then fill your dynamo with data e.g.:

2. Querying dynamo when you process log entries

The final piece of the puzzle is to update the streaming function to query dynamo as required. This needs a few things:

  1. Update the role used for the lambda that streams data from CloudWatch into ElasticSearch

  2. where ### is your account id
  3. Update the lambda script setup in stage 1 and tweak as shown below

Add the AWS variable to the requires at the top of the file:

Update the exports.handler & transform methods and add loadFromDynamo to be:

The final step is to refresh the index definition within Kibana: Management -> Index patterns -> Refresh field list.

Final thoughts
There are a few things to keep an eye on as you roll this out – bear in mind these may need tweaking over time:

  • The lambda function that scans EC2 times out, if so, up the timeout
  • The elastic search index runs out of space, if so, adjust the environment variables used in step 2
  • The dynamo read capacity threshold hits its ceiling, if so increase the read capacity (this can be seen in the Metrics section of the table in Dynamo)

Happy logging!

Log aggregation in AWS – part 2 – keeping your index under control

This is the second part in the series as a follow on to /log-aggregation-aws-part-1/

Hopefully by this point you’ve now got kibana up and running, gathering all the logs from each of your desired CloudWatch groups. Over time the amount of data being stored in the index will constantly be growing so we need to keep things under control.

Here is a good view of the issue. We introduced our cleanup lambda on the 30th, if we hadn’t I reckon we’d have about 2 days more uptime before the disks ran out. The oscillating items from the 31st onward are exactly what we’d want to see – we delete indices older than 10 days every day.

Initially this was done via a scheduled task from a box we host – it worked but wasn’t ideal as it relies on the box running, potentially user creds and lots more. What seemed a better fit was to use AWS Lambda to keep our index under control.

Getting setup

Luckily you don’t need to setup much for this. One AWS Lambda, a trigger and some role permissions and you should be up and running.

  1. Create a new lambda function based off the script shown below
  2. Add 2 environment variables:
    1. daysToKeep=10
    2. endpoint=elastic search endpoint e.g. search-###-###.eu-west-1.es.amazonaws.com
  3. Create a new role as part of the setup process
    1. Note, these can then be found in the IAM section of AWS e.g.  https://console.aws.amazon.com/iam/home?region=eu-west-1#/roles
    2. Update the role to allow Get and Delete access to your index with the policy:
  4. Setup a trigger (in CloudWatch -> Events -> Rules)
    1. Here you can set the frequency of how often to run e.g. a CRON of

      will run at 2am every night
  5. Test your function, you can always run on demand and then check whether the indices have been removed

And finally the lambda code:

Note, if you are running in a different region you will need to tweak req.region = “eu-west-1”;

How does it work?

Elastic search allows you to query the index to find all indices via the url: /_cat/indices. The lambda function makes a web request to this url, parses each row and finds any indices that match the name: cwl-YYYY.MM.dd. If an indice is found that is older than days to keep, a delete request is issued to elasticSearch

Was this the best option?

There are tools available for cleaning up old indices, even ones that Elastic themselves provide: https://github.com/elastic/curator however this requires additional boxes to run hence the choice for keeping it wrapped in a simple lambda.

Happy indexing!

Log aggregation in AWS – part 1

As with most technologies these days you get plenty of options as to how you solve your technical and logistical problems. The following set of posts details one way you can approach solving what I suspect is quite a common problem – how to usefully aggregate large quantities of logs in the cloud.

What to expect from these blog posts

  1. Getting started – how to get log data off each box into a search index (this post)
  2. Keeping the search index under control
  3. Enriching the data you push into your search index

Some background
Our production infrastructure is composed of roughly 30 windows instances, some web boxes and some sql boxes. These are split between 2 regions and within each region are deployed to all the availability zones that AWS provides. We generate roughly 500k log entries in 15 mins which ends up as 20-25GB log data per day.

The first attempt
When we started out on this work we didn’t appreciate quite how much log data we’d be generating. Our initial setup was based around some CloudFormation templates provided by AWSLabs: https://github.com/awslabs/cloudwatch-logs-subscription-consumer. Initially this worked fine however we quickly hit the issue where the index, and hence kibana, would stop working. We aren’t elasticSearch, kibana or even linux experts here so troubleshooting was taking more time than the benefits we got from the tools.

Getting log data off your instances
As much as the first attempt for querying and displaying log entries didn’t quite work out, we did make good progress as to how we pooled all the log data generated across the infrastructure. You have a few options here – we chose to push everything to CloudWatch and then stream onto other tools.

Note, CloudWatch is a great way of aggregating all your logs however searching across large numbers of log groups and log streams isn’t particularly simple or quick.

To push data into CloudWatch you have a few options:
– Write log entries directly to a known group – e.g. setup a log4net appender that writes directly to CloudWatch
– If you are generating physical log files on disk, use EC2Config (an ootb solution provided by AWS) which streams the data from log files into CloudWatch.

Note, this needs configuring to specify which folders contain the source log files. See http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/UsingConfig_WinAMI.html for more info.

Provided things have gone to plan, you should now start to see log entries show up in CloudWatch:

CloudWatch log group subscribers
CloudWatch allows you to wire up subscribers to log groups – these forward on any log entries to the respective subscriber. Subscribers can be multiple things, either: kinesis streams or to a lambda function. Via the web ui you can select a log group, choose actions then e.g. ‘Stream to AWS Lambda’.

AWS Lambda functions
Lambda functions can be used for many things – in these examples we use them to:
– Transform cloudformation log entries into a format we want to index in ElasticSearch
– Run a nightly cleanup to kill off old search indices
– Run an hourly job to scrape meta data out of EC2 and store into Dynamo
Note, we chose to use the NodeJS runtime – see http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/top-level-namespace.html for the API documentation

ElasticSearch as a service
Our first attempt, ie self hosting, did provide good insight into how things should work but the failure rate was too high. We found we were needing to rebuild the stack every couple weeks. Hence, SAAS was a lot more appealing an option. Let the experts handle the setup.

Note, Troy Hunt has written some good posts on the benefits of pushing as much to SAAS – https://www.troyhunt.com/heres-how-i-deal-with-managed-platform-outages/

Setting up ElasticSearch
This is the cool bit as it can all be done through the AWS UI! The steps to follow are:

  1. Create an Elasticsearch domain
    1. Ensure to pick a large enough volume size. We opted for 500GB in our production account
    2. Select a suitable access policy. We whitelisted our office IP
    3. This takes a while to rev up so wait until it goes green
    4. One neat thing is you now get Kibana automatically available. The UI will provide the kibana url.
  2. Setup a CloudWatch group subscriber
    1. Find the group you want to push to the index, then ‘Actions’ -> ‘Stream to Amazon Elasticsearch Service’
    2. Select ‘Other’ for the Log Format
    3. Complete the wizard, which ultimately will create you a Lambda function
  3. Start testing things out
    1. If you now push items into CloudWatch, you should see indices created in ElasticSearch
    2. Within Kibana you need to let Kibana know how the data looks:
      1. Visit ‘Management’ -> ‘Index Patterns’ -> ‘Add new’
      2. The log format is [cwl-]YYYY.MM.DD

Next up we’ll go through:

  • How to prune old indices in order to keep a decent level of disk space left
  • How to transform the data from CloudWatch, through a Lambda function, into the ElasticSearch index

TypeScript mixins

This popped into my news feed a couple days ago and it seemed like a really neat feature that’s been added into version 2.2. of TypeScript – the concept of mixins.

This terminology (mixin) has been available in SASS for quite a while and provides the ability to extend functionality by composing functionality from various sources. This same concept holds true in TypeScript.

A few examples based off a User class:

Now a couple mixins:

The first adds the ability to ‘tag’ the base class, the second exposes a couple methods to activate/deactivate the class.

Pulling this together you can then create basic TaggableUser:

Or an ActivatableUser:

Or even better, a combination of both:

Pretty neat huh?

The inspiration for this came from https://blog.mariusschulz.com/2017/05/26/typescript-2-2-mixin-classes

Newtonsoft – deserializing POCO objects that contain interfaces

Just a quick post here but hopefully helpful if you hit the same issue. If you are dealing with json serialization, newtonsoft is one of our goto options.

When deserializing json to poco’s, sometimes the structure of your poco’s require a bit of extra setup to play fair with newtonsoft. Consider the following objects:

So when this gets serialized, you’d end up with:

If you then try to deserialize:

then you will get an exception.

The solution is to setup a converter:

Which then gets passed into the deserialize call e.g.:

Then you can simply add as many InterfaceConverter’s as you need

Scan large S3 bucket with node js and AWS lambda

AWS lamdba’s are a really cool way to remove the need for specific hardware when running things like scheduled operations. The challenge we had was to find the latest 2 files in a large bucket which matched specific key prefixes. This is easy enough on smaller buckets as the listObjectsV2 call is limited to return 1000 items. What to do if you need to scan more?

The following example shows how you can achieve this. You need to fill in a couple parts:

  • the bucket name
  • the filename / folder prefix
  • the file suffixes

What’s really neat with Lambda’s is you can pass in parameters from the test event e.g.:

When this runs it will fire off SNS alerts if it finds the files to be out of date.

The key bit is the recursive calls in GetLatestFiles which finally triggers the callback from the parent function (ie the promise in GetLatestFileForType).

 

Sinj – scripted Sitecore changes

This post follows on from Migrating Sitecore content and looks to explain the way we migrate content between different environments.

Sinj is the framework we’ve put in place to facilitate a re-playable and scripted approach to any Sitecore changes. It enables you to create changesets via JSON/Javascript which then get run against a Sitecore authoring environment and any counterpart publishing targets. The code can be found at https://github.com/tcuk/sinj.

For any setup instructions have a look a the wiki on github. It also contains examples of the different kinds of operations you can run against your content. As the js layer talks into the Sitecore API, if you find there is an operation you need to perform but can’t it can simply be extended to add the desired functionality.

An example script for creating a template would be:

Now this may seem overly verbose however there are easy ways to speed up the JSON generation. I’ll cover these in a subsequent post.

For me the real advantages we gain from this approach are:

  • Changes are as granular as you want – they can be applied to specific fields in specific languages on specific versions if desired.
    • However, if you want to update all languages in one go, its simply a case of iterating through each language in a for loop
  • Bulk sets of changes can be applied in one go. By simply including all the JS files you wish to deploy in one folder then all will get applied in sequence
  • You can run the scripts to any database, avoiding any need to publish scattered areas of the tree
  • Changesets can be replayable and don’t have the somewhat confusing concept of: overwrite/merge (and it’s options)
  • You can query the Sitecore tree to gather data to feed into other updates

In the next post I’ll show some examples of how creating Sinj scripts can become a lot simpler…

Migrating Sitecore content

It’s something everyone’s had to do at some point. How can you best migrate content from your dev machine to other environments e.g. qa / uat / live etc.

Before we carry on let’s define exactly what we mean by content. I’m not talking about files here, let tools designed for deploying files handle that problem. By content I mean Sitecore items. These can be broken into two key areas:

  • content owned by content editors
    • typically items under /sitecore/content and /sitecore/media library
  • content owned by developers
    • typically items under /sitecore/layout, /sitecore/system and /sitecore/templates

In theory both can be migrated in exactly the same way – it’s out the box and is called packages. Via the desktop you can decide which items to package up and simply download a package file. The counterpart is you then install this package in other environments and you’ve migrated your content.

So, why a blog post on something you get for free?

A lot of the information here is based on experiences of migrating content between lots of environments, that’s been edited by lots of people in several different places. Trust me, packages work but can become somewhat cumbersome and error prone as you scale things up.

Some key factors it’s worth considering for the following discussion:

  • How easy is it to build up your changeset?
  • How quick is it to install these changes?
  • How easy is it to see which versions of your changeset have been installed?
  • What happens if you get a ‘merge’ conflict?
    • E.g. what happens if an item in your changeset has been updated in the destination environment
  • How easily can your changeset be source-controlled?
  • How easily can your changeset be updated / versioned?
  • How easy is it to deploy your changeset to your publishing targets?
  • Can the installation be clever?
    • E.g. could you add logic to the install process or even base content it installs on existing content

What options do you have?

Note, this list isn’t meant to be exhaustive so apologies if you think items are missing – it’s aim is to highlight answers to the list of questions above. Several tools solve these issues in similar ways. The pros and cons are based on field experience of using each.

Sitecore packages:

These are available to use in an out the box Sitecore installation. Creating packages can be based on cherry picking the items from a specific database you know have changed or basing on some dynamic rules (e.g. what’s changed recently).

Pros:

  • They are out the box and quick to get started with
  • You can open the output zip and peer in
  • It’s possible to save an xml file which represents the full content of a package

Cons:

  • Source controlling their content is tricky as they are output as a zip file
  • It can be tricky to get clear visibility of which version of packages have been installed to specific environments
  • Content still requires publishing if installed into master
  • Keeping them up-to-date with changes, especially with a large team can be laborious
  • Validating their content is slow
  • Installing lots of packages in one go is a painful process
  • Install options are somewhat unclear:
    • Overwrite can nuke existing content
    • Merge – does anyone really understand the 3 options?
  • Field level updates aren’t possible

Sitecore update files:

Much like packages update files store a form of serialized content in zip files. There isn’t a way to generate update files out the box so I won’t dwell too much on this option. IMO they suffer many of the same issues as packages.

FYI TDS allows you to generate these files. 

Pros:

  • Partial item updates can be achieved
  • You get detailed installation history and (undocumented) rollback options in the /temp folder

Cons:

  • You can’t simply generate this type of file

Unicorn / TDS:

Unicorn and TDS take a slightly different approach in that they store a view of the world in your solution. Both rely on serialization to generate a view of configurable areas of the tree, Unicorn diverting slightly by using a custom yaml format for its files.

Installing each is slightly different: Unicorn hosts a custom page that allows manual or automatic syncing of files, TDS allows you to generate update files.

I’d argue both these approaches suit developer content well, I’ve struggled storing large amounts of content editor content in both.

Pros:

  • Source control is your view of the truth – items can be branched & merged along with your code
  • The deployment process can be automated

Cons:

  • TDS does come with an additional cost
  • Deploying to all publishing targets requires the changeset to include content configured for each target
  • In TDS building more complex installation rules is possible however difficult to visualize (note, I’ve not used the product for a couple years now, this may well be better)
    • Examples in mind would be: sync once, field level configuration

Scripting your changes:

You build up custom scripts / helper pages / ???? to allow changes to be made via the Sitecore api’s (or database if you are feeling particularly Chuck Norris). Let’s assume we have a means for scripting these changes via some some json configuration (see the summary :)).

Pros:

  • If done right you get an easily re-playable process that can update content in any publishing target
  • All the scripts are source-controlled
  • Scripts can base decisions on existing content
  • Scripts can be as granular or as course as you want – bulk updates on multiple items vs single field updates on specific items

Cons:

  • Every change requires ‘scripting’
  • A considerable shift in approach is required
  • A raft of external tooling is required to facilitate generating and installing scripts

Summary, or should it be sales pitch?

We use the last approach across most dev teams here so would be used for countless deployments per day. For us it works and is infinitely re-playable. Think of it like advanced config transforms for your content.

I’ll write up more details on the specifics of sinj in a later post – to get started have a look at https://github.com/tcuk/sinj

I’m hoping the information above gets you thinking – just because certain tools exist doesn’t always guarantee they are the best for the job!

bad idea