Sitecore data providers – a week in the field

As part of a recent POC we’ve needed to pull large amounts of data from an external set of API’s – some ‘realtime’ i.e. prices and some more static i.e. titles, descriptions, isbn numbers etc. There were vast options for how we surface the content into the site, in the end deciding on Sitecore Data Providers for the static content and ajax for the realtime data.

If you are taking on something like this I’d recommend you carefully consider whether any data needs to ever reside, or be enriched (i.e. adding media, text etc) within Sitecore. Data providers are hugely chatty when working in the master db (watch out for the IDTable!) Note – this problem somewhat goes away when its published as real items are then created in web.

I found these examples a good resource to get started. 

Things that caught me out:

Quantities of items: Sitecore recommend not exceeding 100 (ish) items per folder –  a basic implementation could pull all external items into one folder however you don’t really want to limit yourself to < 100 items. A few options are: build a structure, use buckets, use a hybrid. Before trying buckets I’d recommend building some basic structure to get to grips with public override IDList GetChildIDs

Related items: Implementing relationships between data provided content isn’t too tricky – one thing to be careful of, make sure the related item data provider is patched in before the destination items. Otherwise your lookups will fail

Item ids: Chances are you won’t be the only person working on the codebase. You might be lucky and your source data contains Id’s which are Guids – this works well because the Sitecore items can then be assigned the same Guid ensuring everyone gets the same tree. If that’s not possible you might want to consider generating a Guid from the data you have.

The data we had contained an ISBN number so all our item Id’s became ISBNNumber+0’s e.g. {97807234-1576-3000-0000-000000000000}. If you don’t have a distinct key like this there are ways to generate deterministic guid’s from a string however you stand a fair chance of duplicates if the ID value you use is common or exists in more than one place.

Saving (enriching) content: Unless you implement public override bool SaveItem then any changes you make to the content will simply be overriden. The parameters of SaveItem give you plenty of information should you need to fire data back to the source, or in our case, back to an interim mongo db.

API Access: So, what happens if the API you are using goes AWOL. Hopefully not a common scenario, but one you don’t want to ignore. In our scenario we only had access to the client’s api’s when connected through a volatile vpn connection. To speed up local dev I harvested a good spread of data to an interim db (mongo) which I could then work on locally.

Debugging what’s going on: This may well be specific to my implementation, but I couldn’t find a good way to debug data providers efficiently. The debugger would take forever to reach breakpoints.

Being a good citizen: All data providers will run, one after another. You can prevent others via context.Abort(); so be mindful your new operations are as lean as possible.

The IDTable: Be careful, this can become stale if you are working in dev – don’t be surprised if you nuke it several times. If you’ve sorted the ItemId issue above this becomes less of an issue. The more data = more sql calls – fire up profiler and watch the calls fly! I’d experimented with a static cache over the whole table however maintaining it’s consistency proved messy, and out of the scope of the POC. I do think the idea has legs so if taken further would definitely be an area for investigation.

Some ‘ah, that was easy’ moments:

Indexing: Once published, indexing works a treat. In the web db all items exist as proper Sitecore items.

Adding more data providers: I’d set things up so that common operations were squirreled away into a base class. Adding new types of content was then trivial with only a very light subset of methods required – this meant linking up more related types became trivial.

So, in summary

Conceptually Data Providers are great. However, in practice they can be tricky to get right especially for large data sets! You may find calling api’s on the fly and * items give you everything you need, especially as you can easily achieve caching for * items with

Leave a Reply

Your email address will not be published. Required fields are marked *