Performance tuning and load testing xDB

I’ve recently been trying to explain the life of a developer to my fiance, in particular the recent work we’ve been doing around load testing xDB. She suggested a rather apt metaphor for the problem: tummy pants – you prod and poke one area, then another starts to bulge.

Female fashion aside the metaphor feels rather accurate, especially as you add more components to a system. Lets consider the evolution of Sitecore. Originally you had a relatively simple model: sql servers and web servers. Since the advent of xDB this landscape shifts – you now need to consider things like: mongo, shared/private session, solr, reporting services, aggregation services, the list goes on.

Recently we’ve been through a long phase of load testing – the primary focus: can we get personalized content to the customer quickly and reliably? In short, yes – but it took a lot of test runs to get there! Sitecore have a white paper on load testing they ran, it’s worth having a read: https://doc.sitecore.net/White_papers

The goal:

The client in question has a really good track record of focusing on key parts of the development lifecycle such as load testing – their main sales outlet is the web so keeping online customers happy is rather high on their list of priorities. Based on this we often have a load test phase prior to deploying new applications or even when new features are added to the existing code base.

We had a clear target to achieve based on their existing online profile: 250 transactions per second with average response times sub 2s. There were more non-functional requirements but for the scope of this post they aren’t really important.

The setup:

All the testing was performed against boxes hosted in AWS. The load tests were run via 2 means, custom AWS boxes running jMeter and VS Online Load Tests. We had control over the VS tests, an external company was running the jMeter tests – this allowed us to quickly iterate our approach and finally get sign off once we were happy with our setup.

For AWS box specs have a look at https://aws.amazon.com/ec2/instance-types/. Its worth noting we are looking to trim back the sizes of each – now we’ve achieved the target we can simplify and tune back specs and therefore cost.

  • Web boxes: 3 / 5 / 7 web boxes – c4.2xlarge
  • Sql boxes: 1 (for core, master, web, session) – c4.2xlarge
  • xDB cluster: 3 – i2.xlarge linux
    • Mongo configured to use wiredTiger. When mmap1 was used we’d see large numbers of collection locks during test runs.
  • This was all monitored via New Relic via their free 24hr retention account

How did we get on?

Initially pretty badly! We’d see stable response times under light load but as soon as we started to move up the load ramps this would quickly tail off – graphs would look a lot like:

graph

The overall average response times were ok but things really started to tail off towards the end.

Where did we get to?

By the end things were much rosier, we could get a lot more stable response times right through the test ramps. Note the number of requests we managed to handle between the 2 runs:

graph

Now it’s worth noting, we could perform the same test twice and get variations in results. Please don’t use the exact figures as gospel, they are more to indicate the improvements we managed to achieve – avg response times halved! 🙂

Tummy pants?!?!

We ran several iterations of tests against different spec boxes, and combinations of boxes. Quite a common issue we’d find would be you scale up one aspect which then moves the bottleneck elsewhere. More web boxes wouldn’t necessarily buy you better results, bigger mongo boxes (even with promises of vast quantities of iops) may have little marked effect. We prodded one area and the problem appeared to move around.

How did we achieve the improvement?

It took a combination of a few things: some help and guidance from the guys at Sitecore that were involved in the load testing shown in the white paper listed above and some reconfiguration of the setup.

As we tweaked the setup one area that remained unclear was the way the linux boxes were handling each collection. Mongo allows you to create your own collections, in the Sitecore model things like analytics. It also maintains its own collections for things like replication. The wiredTiger storage engine is I/O heavy on the disk as documents are pulled to and from the disk when updates are issued.

In order to measure and tweak exactly what mongo was doing we made a few changes:

Prior to making these changes I had little experience of working with linux. It took a while, and a fair amount of googling to find the best resources. There are some good tools to help get going: putty and winscp.

Tuning the changes

New relic proved invaluable when diagnosing each disk’s resource usage. The next step for us will be to reduce the pre-allocated iops assigned to each collection so it suits the details below.

iops

tl:dr;

When you load test a system it’s key to get a clear picture of what is going on. Tools like New Relic are great for aggregating the performance of different components. That holds true for both windows and linux installs.

For your Mongo instances assigning different performance to each collection will give you much better visibility and much more fine grained control over each collection. In our testing this resulted in halving our average response times.

Real time view of the Sitecore log files

Just a quick post – if you want to get a realtime view of the log files then you have a few options.

If like me you find opening the latest file and scrolling to the bottom a bit tiresome then the following options might help:

  1. Dynamic log viewer – I discovered this tool as this ships with SIM – alternatively you can download the exe from http://www.softpedia.com/get/Office-tools/Text-editors/Dynamic-Log-Viewer.shtml. You need to select the latest file and it then watches the tail of the log file
  2. DebugView – you need a couple things – the download from https://technet.microsoft.com/en-us/sysinternals/debugview.aspx  and a slight tweak to your log4net config to add a new appender (see below). The advantage here is the log always updates, you don’t need to select a new file each rebuild. When you run the app,
    1. Run as an administrator
    2. Turn on ‘Capture -> Capture Global Win32’
    3. Add a filter to match your config – ‘Edit -> Filter/Highlight -> Include – [xDBPrototype]

The new config you need adds a new appender into the <log4net> section of the web.config/sitecore.config (depending on your version of Sitecore):

Enjoy

Sitecore Redis SessionState provider

Out the box Sitecore offers 3 options for how to handle session when you setup xDB. One option is to keep things in process (inProc). This is ok for testing in dev but isn’t suitable when you have > 1 front end content delivery nodes as each box wouldn’t be able to share the same information. The other two options are: Sql Server or Mongo. See the docs site for more information on how to configure these 2 approaches.

I’ve uploaded an early version of a Sitecore Redis SessionProvider to github: https://github.com/boro2g/Sitecore-Redis-Session-Provider

Conceptually the implementation of Session_End is easy to get your head around – when keys expire you raise up the corresponding events and Sitecore handles the rest. Redis makes this tricky as when keys timeout they don’t raise events and also the data is then gone, so how could it get flushed to xDB?

To work around this I’ve combined the logic in the SitecoreSessionStateStoreProvider which gives you the ability to poll the repository, along with some custom keys to manage the concept of expiration.

By default the asp.net redis implementation creates 3 types of keys:

  • DataKey e.g. “{” + applicationName + “_” + id + “}_Data”
  • LockKey e.g. “{” + applicationName + “_” + id + “}_Write_Lock”;
  • InternalKey e.g. “{” + applicationName + “_” + id + “}_Internal”;

The new entries will also be:

  • _log: this is a sorted set that keeps a record of all the marker sets
  • TimeoutKey e.g. “{” + applicationName + “_” + id + “}_Timeout”
  • MarkerKey e.g. yyyy MM dd HH:mm:ss_Marker
    • Note, this will contain sets of items (i.e. everything that expires at that time)

These new keys are used to store when items are added and updated. They are also then referenced in the callback to validate whether specific entries should expire.

In the solution there are the implementation details for the provider along with a console app for monitoring a solution.

console app

Do let us know how you get on! It’s worth noting this is currently an alpha release that’s undergone basic testing – any feedback / pull-requests would be greatly appreciated.

FYI If you want to get Redis running locally you can install via chocolatey: https://chocolatey.org/packages/redis-64

Octopus deploy – Script Steps sourced from a package

In the latest version of Octopus deploy you can now choose to run script steps where the files exist in a package. This might sound like a minor change but opens up some very neat options. You can read more on the details of the change in https://octopus.com/blog/octopus-deploy-3.3#ScriptsInPackages. Note at the time of writing this is only available in the beta of 3.3 (https://octopus.com/downloads/3.3.0-beta0001)

Ok, so why is this such a good thing?
Step templates are great – there is even a large library of pre-existing templates to download (https://library.octopusdeploy.com/#!/listing). If you’ve not used them before, step templates allow additional scripts to run during your deployment.

Examples could be: post to slack, create certain folders, delete given files etc. Basically anything you can achieve with powershell can be done in step templates.

Lets just stick with step templates then?
If you’ve gone through the process of setting up several deployments with Octopus and find you want to replicate the same functionality across several projects or installs then you need to re-create all the step template configurations each time. It’s not the slowest process but the idea below helps streamline things.

Now that you can run scripts from a package, why not source control the steps you want to run? One key advantage is that you can then see things like history of all the deployment steps.

What needs setting up?
You need to be running version 3.3 or higher of Octopus – see above for the link.

I’ve been using a simple test deployment of an out the box MVC project along with a new project specifically for the scripts.:
solution setup

In Octopus this has 2 steps:
steps

The first is a vanilla website deployment of ‘WebApplication1’. The second the startup scripts:
script step

Note the package id. The idea behind using a separate projects is that the powershell scripts never need to exist in the website project.

The startup script project
solution setup

I chose to use a class library for the simple reason that I could include a reference to Octopack and hence building the output nuget file was trivial.

The nuspec file is important as it tells the packaging to include all powershell files:

Packages simply contains a reference to Octopack:

And finally the scripts:
Helloworld.ps1

And the more important one, Startup.ps1

It’s worth noting this should be considered a POC of the approach. The next steps would be to split the scripts up into more meaningful units, remove hello world and update the nuspec with more valid information.

If you struggle with accessing the Octopus Parameters you require, the script in helloworld allows you to dump out all parameters and their values. In the startup script the parameter: $OctopusParameters[‘Octopus.Action[Deploy website].Output.Package.InstallationDirectoryPath’] depends on the name of your deployment in step 1 of your deployment process (Deploy website)

How to bulk import contacts from SQL Server into Sitecore Mongo xDB

Imagine the scenario, you have millions of customer records in an existing SQL server instance and want to tie things together with your shiny new Mongo xDB. Where to start?!?!

Hopefully the following information will help guide you towards the different areas that will need researching and developing. It’s worth noting, if you follow these steps I’d recommend the Mongo University free courses (https://university.mongodb.com/) to get acquainted with how Mongo and it’s queries work.

For this demo assume the following example infrastructure:

  • Existing on premise SQL server instance containing: user data & order records. The Sitecore deployment has no r/w access to this instance.
    • Data is structured so that a user can have: 0, 1 or more orders
  • Sitecore, SQL server and Mongo all deployed to a cloud hosted provider. Assume Mongo is used for both xDB and session

So, how do we get the data from a relational database into xDB?

There are several options here, if the Sitecore instance and the on-premise database can talk you may choose slightly different approaches – for now lets assume not.

You can get data out of sql in many ways, one simple option is to right-click the database in question and follow the ‘export data’ wizard. Here you can specify things like source databases, destination dbs or files, queries to run etc. I’ve chosen to use CSV flat files as the interim data storage. Tip, remember to check ‘Column names in the first data row’ – it will make life easier when you come to import into mongo.

One key difference between SQL and Mongo is the way you can represent linked data. The CSV will contain something like:

UserID OrderID Order total
123 456 50
123 789 150
280 535 20

Note: user 123 has made 2 orders.

Compare this to Mongo (*note, this isn’t the only option for storing data in Mongo. In this scenario the model fits well with the Sitecore approach to contact facets.):

  • User: 123
    • orders
      • ID: 456, Cost: 50
      • ID: 789, Cost: 150
  • User: 280
    • orders
      • ID: 535, Cost: 20

Lots of data? Don’t panic..!!
When dealing with big sets of data in Mongo, bulk operations are your friend – they will make things much quicker! Based on that I decided to blast the whole CSV into a temporary Mongo collection. Via a cmd prompt run:

Note, read the mongo docs for more info on mongoimport.

All well and good, but the data looks just like a sql table!
True, so we now need to process it into a format that we want for xDB.

Sitecore defines a schema for the xDB data based around Contacts, and Contact facets. Examples of this could be: for a given user, you have a facet that represents all the user’s orders. I won’t go into too much details on this – see here for some background.

The format you’d then expect to see within xDB and Mongo would be:

  • Contacts
    • Contact: _id
      • Customer (this name is up to you)
        • Orders (this name is up to you)
          • Order Id: X, Order Details: Y
          • Order Id: Z, Order Details: Q
          • etc

Ok, we need to map from rows to structured data right?
This is a pretty common problem to solve when working with databases and the solution here is the result of several attempts, each with mixed results! 🙂

The code I arrived at was pretty specific to the exact schema we have, do ask if you want a copy. I used typescript to then generate the javascript files used by the Mongo shell as it gave me type based development in a few areas.

The flow of the operations was:

  1. Bulk import the csv file into a temporary Mongo db (CRM) and then access via Mongo scripts:
  2. Batch the import of the data via queries based on information we know of the data – e.g. process each user whose email starts with a given key:
  3. Iterate through each entry in externalUsers but don’t write them into your analytics db one by one, instead store in an array and then insert batches every N users (this will be much faster!) and then reset the array. The reason for storing the createdUser dictionary is to handle users with more than 1 order:

    1. Note, the methods for the schema generation are:
  4. This should create you 4 collections: Contacts, Identifiers, Orders and EmailsTemp. The last 2 are temporary collections used later. CSUUID helper functions can be found: here.
  5. Now we should have contacts and identifiers filled out, but not associated orders.
  6. Finally we need to glue them together – this is where the temporary collections come into play:

What issues did I run into?

  • No more power! (well, memory) – if you try to store a huge array in your scripts you will quickly run out of juice and the import will grind to halt
  • You miss-map the Mongo properties so that either: the data stored is null or doesn’t match the Sitecore facet properties – here you simply get a nasty ‘can’t convert type’ mongo driver exception
  • What’s going on with the import? You can dump the output of any mongo script to a file via:  mongo runner.js > output.txt
  • Querying large sets of data can be slow – make sure you setup indexes on the collections if you need to do a lot of cross referencing e.g.: db.CRM.createIndex({“EmailAddress”:1})
  • Sitecore expects arrays of data in a slightly unusual format – beware: it’s not a standard Mongo array
  • Mongo has a 16MB size limit per document – the cap in the OrderMapping prevents the import creating giant order histories. You will want to review this in your implementation!

There is a lot of information here (sorry about that!) – mainly due to the fact it took several iterations of code to arrive at a  solution that even completed, let alone in a timely manner!

Sitecore patch include files and feature folders

I recently got caught out when trying to patch some include files within the FXM configuration. In the end the fix was simple – thanks support 🙂

For certain recent features the config is now setup with folders per feature. An example would be:
– app_config
– – include
– – – fxm
– – – – Sitecore.FXM.config
– – – – … etc
– – – Sitecore.Diagnostics.config
– – – … etc

Now say you want to patch the config within Sitecore.FXM.config, in order for the patch:before and patch:after logic to work correctly you need to create a new folder which has a start letter greater than Fxm. An example would be /app_config/include/zzz.

The reason being, Sitecore looks to process all the files in /app_config/include first, then all the folders e.g. /app_config/include/fxm etc.

In the scenario I was interested in i.e. patching the FXM pipeline:

You need to remember to include the group tag to ensure the nesting is correct. The final patched config you’d need would be:

Happy patching 🙂

Testing Sitecore Federated Experience Manager without a deploy

We are starting the migration of a site to make use of Sitecore FXM (federated experience manager) and wanted to do a very quick test as to how it would play with our existing sites javascript. The key question was are there any glaringly obvious issues when we drop in the beacon?

There are a few options – a common one would be to add the beacon to a qa / uat site and test there. However, what if the content isn’t as up-to-date – is there another solution?

The approach below is a bit hacky so don’t rely on this for your final integration testing! However, on the plus side, it’s very quick to see things in action 🙂

  1. Select the site you want to test on. Nominally: www.sitecore.net
  2. Fire up a new instance of Sitecore on your dev machine (8.1 if possible) with the host: fxm.www.sitecore.net
  3. Create a new FXM site entry and set the host to be fxm.www.sitecore.net
    1. Note, we will change this later
  4. Through the FXM experience editor add a hello world placeholder and control to your page
    1. This should create you an item in the tree under: ‘/sitecore/system/Marketing Control Panel/FXM/www sitecore net’
  5. Open up the new placeholder you added in the tree and note the selector. This can be anything you want – for the sitecore site update this to be say ‘#Form1 > div.shell’ (without the ‘s)
    1. To find the value to use, dive into chrome developer tools, right click the element you want and choose ‘copy XPath’ or ‘copy CSS path’ – I found css was easier to work with as you can target specific elements, not array entries
  6. Update the primary domain entry added in step 3 to be www.sitecore.net
  7. Publish the lot
  8. Visit www.sitecore.net in a browser and note nothing has changed
  9. In the chrome console run the following script:
    1. var script = document.createElement(‘script’); script.type = ‘text/javascript’; script.src = ‘//fxm.www.sitecore.net/bundle/beacon’; document.head.appendChild(script);
    2. If the script here fails, make sure the ‘ are proper single quotes, not funky curly ones.
  10. You should see in the network tab a new request to the beacon – the response should be json containing all the data needed for rendering your changes to the page
  11. Check the page – in theory anything set in step 4/5 should now be applied to the page

sitecore.net screenshot

A word of caution – if you are interested in how the placeholders work you can always view: Presentation details -> Final renderings. However be careful, don’t ok (ie save) once you’ve reviewed them as the format saved back into the field isn’t compatible with FXM.

Have you ever edited in the Sitecore web db by mistake? V8.1

To build on a previous post (http://blog.boro2g.co.uk/ever-edited-sitecore-web-db-mistake/) – if you want to achieve the same kind of thing in version 8.1, you need to tweak the js slightly:

Just replace the window.onload=function()… method listed in the previous post with:

Open command prompt here

Something really quick this time in the form of a couple tips picked up at the Microsoft Future Decoded conference (which was really good btw! :))

If you are in a windows explorer and want a command prompt for the current folder, either:

  • Ctrl+Shift+right click -> Open command window here
  • Or, type in the address bar: cmd .

I’m sure there are more ways but these both seemed pretty clean and simple