Eyes wide open - Correct Caching is always hard

May 03, 2018 Comment on this post [28] Posted in DotNetCore

Sponsored By

Image from Pixabay used under Creative Commons In my last post I talked about Caching and some of the stuff I've been doing to cache the results of a VERY expensive call to the backend that hosts my podcast.

As always, the comments are better than the post! Thanks to you, Dear Reader.

The code is below. Note that the MemoryCache is a singleton, but within the process. It is not (yet) a DistributedCache. Also note that Caching is Complex(tm) and that thousands of pages have been written about caching by smart people. This is a blog post as part of a series, so use your head and do your research. Don't take anyone's word for it.

Bill Kempf had an excellent comment on that post. Thanks Bill! He said:

The SemaphoreSlim is a bad idea. This "mutex" has visibility different from the state it's trying to protect. You may get away with it here if this is the only code that accesses that particular key in the cache, but work or not, it's a bad practice.
As suggested, GetOrCreate (or more appropriate for this use case, GetOrCreateAsync) should handle the synchronization for you.

My first reaction was, "bad idea?! Nonsense!" It took me a minute to parse his words and absorb. Ok, it took a few hours of background processing plus I had lunch.

Again, here's the code in question. I've removed logging for brevity. I'm also deeply not interested in your emotional investment in my brackets/braces style. It changes with my mood. ;)

public class ShowDatabase : IShowDatabase
{
    private readonly IMemoryCache _cache;
    private readonly ILogger _logger;
    private SimpleCastClient _client;
  
    public ShowDatabase(IMemoryCache memoryCache,
            ILogger<ShowDatabase> logger,
            SimpleCastClient client){
        _client = client;
        _logger = logger;
        _cache = memoryCache;
    }
  
    static SemaphoreSlim semaphoreSlim = new SemaphoreSlim(1);
   
    public async Task<List<Show>> GetShows() {
        Func<Show, bool> whereClause = c => c.PublishedAt < DateTime.UtcNow;
  
        var cacheKey = "showsList";
        List<Show> shows = null;
  
        //CHECK and BAIL - optimistic
        if (_cache.TryGetValue(cacheKey, out shows))
        {
            return shows.Where(whereClause).ToList();
        }
  
        await semaphoreSlim.WaitAsync();
        try
        {
            //RARE BUT NEEDED DOUBLE PARANOID CHECK - pessimistic
            if (_cache.TryGetValue(cacheKey, out shows))
            {
                return shows.Where(whereClause).ToList();
            }
  
            shows = await _client.GetShows();
  
            var cacheExpirationOptions = new MemoryCacheEntryOptions();
            cacheExpirationOptions.AbsoluteExpiration = DateTime.Now.AddHours(4);
            cacheExpirationOptions.Priority = CacheItemPriority.Normal;
  
            _cache.Set(cacheKey, shows, cacheExpirationOptions);
            return shows.Where(whereClause).ToList(); ;
        }
        catch (Exception e) {
            throw;
        }
        finally {
            semaphoreSlim.Release();
        }
    }
}
  
public interface IShowDatabase {
    Task<List<Show>> GetShows();
}

SemaphoreSlim IS very useful. From the docs:

The System.Threading.Semaphore class represents a named (systemwide) or local semaphore. It is a thin wrapper around the Win32 semaphore object. Win32 semaphores are counting semaphores, which can be used to control access to a pool of resources.
The SemaphoreSlim class represents a lightweight, fast semaphore that can be used for waiting within a single process when wait times are expected to be very short. SemaphoreSlim relies as much as possible on synchronization primitives provided by the common language runtime (CLR). However, it also provides lazily initialized, kernel-based wait handles as necessary to support waiting on multiple semaphores. SemaphoreSlim also supports the use of cancellation tokens, but it does not support named semaphores or the use of a wait handle for synchronization.

And my use of a Semaphore here is correct...for some definitions of the word "correct." ;) Back to Bill's wise words:

You may get away with it here if this is the only code that accesses that particular key in the cache, but work or not, it's a bad practice.

Ah! In this case, my cacheKey is "showsList" and I'm "protecting" it with a lock and double-check. That lock/check is fine and appropriate HOWEVER I have no guarantee (other than I wrote the whole app) that some other thread is also accessing the same IMemoryCache (remember, process-scoped singleton) at the same time. It's protected only within this function!

Here's where it gets even more interesting.

I could make my own IMemoryCache, wrap things up, and then protect inside with my own TryGetValues...but then I'm back to checking/doublechecking etc.
However, while I could lock/protect on a key...what about the semantics of other cached values that may depend on my key. There are none, but you could see a world where there are.

Yes, we are getting close to making our own implementation of Redis here, but bear with me. You have to know when to stop and say it's correct enough for this site or project BUT as Bill and the commenters point out, you also have to be Eyes Wide Open about the limitations and gotchas so they don't bite you as your app expands!

The suggestion was made to use the GetOrCreateAsync() extension method for MemoryCache. Bill and other commenters said:

As suggested, GetOrCreate (or more appropriate for this use case, GetOrCreateAsync) should handle the synchronization for you.

Sadly, it doesn't work that way. There's no guarantee (via locking like I was doing) that the factory method (the thing that populates the cache) won't get called twice. That is, someone could TryGetValue, get nothing, and continue on, while another thread is already in line to call the factory again.

public static async Task<TItem> GetOrCreateAsync<TItem>(this IMemoryCache cache, object key, Func<ICacheEntry, Task<TItem>> factory)
{
    if (!cache.TryGetValue(key, out object result))
    {
        var entry = cache.CreateEntry(key);
        result = await factory(entry);
        entry.SetValue(result);
        // need to manually call dispose instead of having a using
        // in case the factory passed in throws, in which case we
        // do not want to add the entry to the cache
        entry.Dispose();
    }

    return (TItem)result;
}

Is this the end of the world? Not at all. Again, what is your project's definition of correct? Computer science correct? Guaranteed to always work correct? Spec correct? Mostly works and doesn't crash all the time correct?

Do I want to:

Actively and aggressively avoid making my expensive backend call at the risk of in fact having another part of the app make that call anyway?
- What I am doing with my cacheKey is clearly not a "best practice" although it works today.
Accept that my backend call could happen twice in short succession and the last caller's thread would ultimately populate the cache.
- My code would become a dozen lines simpler, have no process-wide locking, but also work adequately. However, it would be naïve caching at best. Even ConcurrentDictionary has no guarantees - "it is always possible for one thread to retrieve a value, and another thread to immediately update the collection by giving the same key a new value."

What a fun discussion. What are your thoughts?

Sponsor: SparkPost’s cloud email APIs and C# library make it easy for you to add email messaging to your .NET applications and help ensure your messages reach your user’s inbox on time. Get a free developer account and a head start on your integration today!

About Scott

Scott Hanselman is a former professor, former Chief Architect in finance, now speaker, consultant, father, diabetic, and Microsoft employee. He is a failed stand-up comic, a cornrower, and a book author.

About Newsletter

Hosting By

Hosted on Linux using .NET in an Azure App Service

Comment on this post [28]

Share on BlueSky or use the Permalink and post anywhere!

May 03, 2018 1:53

I'd generally go for simplicity unless complexity makes a substantial difference.

If you did go for GetOrCreateAsync, could there be some value in priming it whenever you release a new episode so that most of the time it's actually you updating the cache?

I guess your current 4 hour expiration makes that not worthwhile, but you could replace that with an eviction strategy where it checked the shows list and only evicted the cache if it had actually changed. Very few requests would then be forced to wait for the shows list to download.

Mark Adamson

May 03, 2018 2:02

How expensive are we talking here? A few seconds? Personally, I would use GetOrCreate to keep the code clean and simple and just live with the fact that in some cases the call would happen twice.

Tim Boyce

May 03, 2018 2:08

I would recommend LazyCache, it uses MemoryCache but guarantees that the factory method only gets called once.

Thomas

May 03, 2018 3:03

Thomas - I'm looking at LazyCache and forgive my ignorance, but I'm not seeing how it "guarantees" it only get called once. Can you point me to where that lock/check/protection is? https://github.com/alastairtree/LazyCache

Tim - 7-10 seconds.

Scott Hanselman

May 03, 2018 3:21

They use Lazy<T>. I think it prevents double calls: AsyncLazy

Victor

May 03, 2018 5:07

I don't think it's unreasonable to write this code with the assumption that it is "the only code that accesses that particular key in the cache".

If you were strewing cache access throughout Controllers or some such ugliness, that would be a recipe for errors. But a "ShowDatabase" class which is the sole point where Show records are retrieved from the Database .. that seems fine.

At first glance, that GetOrCreateAsync extension method looks unambiguously worse for your purposes than what you have written. Your cache access is only protected from multiple data access calls within your one function. GetOrCreateAsync cache access is not protected at all.

carson63000

May 03, 2018 5:10

Scott what you have is good, especially if GetShows is expensive. You will only ever get it once per instance of your website.

I used this caching pattern to great success in a UWP app where it was critical to reduce the number of HTTP requests to keep the local device fast. If some data was displayed in two places in the UI simulataniously then we only wanted to fetch it from the server once.

Simpler caching, i.e. not getting the data inside a lock, is fine for a lot of people but if your website is high traffic and the data is expensive to get then every person who visits while it is uncached (either because your app is starting up or the cache expiried) will get their own copy.

1000 visitors/sec * 5 secs to get the data = 5000 requests to get the data.

On the other hand with a bit of locking - like you have now - you reduce that to 1 request.

James Newton-King

May 03, 2018 9:55

Silly question, but does your back-end not support HTTP caching? I.e. can it not offer ETags/Cache headers and allow cheap re-validation of stale items?

I know that would then still impose parsing overheads but where is the current trade-off between the parsing and accessing the remote API?

Damien

May 03, 2018 10:30

Would using ReaderWriterLockSlim be an approach that would allow a caller to safely "upgrade" from read-write to write, while at the same time being able to have managed thread affinity?

Leon

May 03, 2018 11:38

Does the 4 hour absolute expiry mean that every 4 hours the page takes 7-10 seconds to load? It seems it would be better to set a relative one (which will probably never expire in your case) and then just repopulate it when you know you've made a change.

Otherwise you could try two cache keys and populate the background one on a schedule before swapping it to be primary.

Mark Adamson

May 03, 2018 12:23

Hi,

Just looking at the problem.

You are looking at doing double, triple, quadruple work when 2/3/4 people hit you site in the 7(?) second window that you have when cache expired and a new cache record is set up? After that the next 4 hours you have the episodes cached and happy days. After 4 hours you again run the "risk" of having multiple people hit that "cache expired building up the cache again" window.

So how big is the window, how long does GetShows take and how many concurrent users hit your site at the same time.

Biggest risk is probably something like automated downloads that podcast apps have (at least mine does) although that probably does not use the showlist but they could be hitting you often in a small window.

Is it a problem worth spending time on ;) https://xkcd.com/1205/

Thanks for the posts though, nice to see your thought processes out in the open :)

Franc

May 03, 2018 13:05

@Scott - LazyCache wraps your "build stuff I want to cache" func in a Lazy<> or an AsyncLazy<> before passing it into MemoryCache to ensure the delegate only gets executed once as you retrieve it from the cache. It also allows you to swap between sync and async for the same cached thing. It is just a very thin wrapper around MemoryCache to save you the hassle of doing the locking yourself. A netstandard 2 version is in pre-release.
Since you asked the implementation is in CachingService.cs#L119 and proof it works is in CachingServiceTests.cs#L343

Alastair

May 03, 2018 15:13

I'm going to have to agree with some of the other posters and come out in favour of simplicity.

Whenever you start to introduce locking into web site software, I think you need to take a long hard look at the problem and decide if it is worth the trade off. Every lock you put into a site is essentially a pinch point for an inherently multi threaded environment.

For a single problem like the above, it probably won't show up as an issue, but the problem comes when (after reading this) someone decides that this is a great idea for *every* bit of static data that they load. Suddenly you are dealing with multiple pinch points and *shudder* the possibility of deadlocking your site.

If this is a serious slow point in the site, would you maybe be better served by introducing out-of-band caching and a dedicated cache server? This could be long lived and be primed as it starts rather than on demand.

Is this possibly overkill for your situation? Probably yes.

In that case, is the locking really necessary either? Probably not (in my opinion :)

Paddy Boyd

May 03, 2018 15:15

A catch block that does nothing but rethrow the exception...just...no. Please change that 🙂

Jason Bock

May 03, 2018 15:26

Solving one problem with a cache; now you have two problems.

Paw

May 03, 2018 16:06

@Jason, that's just because the logging code that is in that catch block has been removed for brevity

Mark Adamson

May 03, 2018 17:43

I have absolutely run into this myself. As I understand it, GetOrCreate still might call the creation function more than once, but when it does, it's simply recreating a new Lazy instance of something, which is a cheap operation. It's not calling the inner expensive operation more than once.

I've done it manually myself, but the LazyCache library looks quite interesting. Thanks to everyone who recommended it.

Brian Vallelunga

May 03, 2018 17:48

Unrelated to the main problem, it seems there is a small potential issue as you are using
AbsoluteExpiration = DateTime.Now
instead of UtcNow. Relatively more expensive and hopefully unnecessary. Will not work well a few hours a year.

Joe

May 03, 2018 19:31

Perhaps just reconsider how you solved the problem. This data will be used a bunch, so at application start-up, populate the cache. Then, fire off a task that fires every 30-60 minutes to see if any new shows were added. No locking, no worries. And the app doesn't start until the data is available.

Scott

May 03, 2018 21:49

If this cache reduces the load with a considerable improvement (let's say 50%), I think it does the job. Did you do some measurements on this?

Adrian

May 04, 2018 1:43

Jason - As mentioned before, that's because the logging methods were removed. It's collapsed code.

Adrian - Yes, a huge improvement. The backend is so slow that SOME cache is absolutely needed.

Joe - Good point!

Alastair! Thanks for commenting and for the links. I'll check it out and see how it works with my code!

Scott Hanselman

May 04, 2018 15:26

Blocking repeat access to backend services is an absolute necessity in high traffic situations if the backend service is brittle. I refer to it as the backend service not fit to "carry direct load". Sql server ends up in this category too. Now the mechanism of guarding/protecting ie lock or semaphoreslim or some nifty increment lockless scheme makes less of a difference and one can go for simplicity.

When it comes to structuring to "future proof" the code from developers, aye that's always difficult :) This is where you sometimes have to resort to supporting external documentation...

Dennis Malmgren

May 05, 2018 0:06

Well, if you are willing to take a dependency on the cache being in-memory (not distributed), you could cache the task instead of the result of the task. The caveat there being that you don't want to cache failed backend calls. My AsyncLazy has a RetryOnFailure option for just this scenario.

However, it's still not ideal, since it does force the cache to be local (Tasks don't serialize very well).

Stephen Cleary

May 05, 2018 2:07

This blog post is a welcome relief from the flood of Azure blog posts at MS.

greg

May 05, 2018 2:43

Alternative strategy: you could use a webjob on a schedule to download the remote content to a local file, then serve the data from there, with/without memorycache on top.

jon

May 05, 2018 17:44

If the number of cached objects is low [probably your case] and demand for them is high [list of shows is essential for your site] then pre-loading is the easiest. In that case you avoid chicken-egg problem with cashing. When your site is up, it is ready serving pre-cached content. So you solved the initial long running process - it does happen before you site is up. Static constructor may be good place. Along you initiate a single timer, that invokes your cache refresh each 4 hours - reload data, after it is ready, replace static reference to it with new one (which is believed to be atomic).

The most important prerequisite is "demand for them is high" - if this is not true, you approach with locking is more appropriate since holding always a copy won't be justified...

Jirka

May 06, 2018 16:13

Late to the party, but following on from Alastair's comment about Lazy:

The ASP.NET Core "Middleware as Filters" feature uses ConcurrentDictionary combined with Lazy to ensure "run once" semantics. In the worst case, multiple Lazy<> objects are created for multiple threads, but only one of the objects succeeds in running the factory method. That looks to be the same approach as LazyCache.

I wrote a post on it a while ago as it was a new approach to me: https://andrewlock.net/making-getoradd-on-concurrentdictionary-thread-safe-using-lazy/

Andrew Lock

May 07, 2018 6:54

Hi Scott. Have you considered using an Aspect orientated approach to caching? By using attribute based caching all tricky caching logic, key generation etc is kept in a single place. I have successfully used AspectCache for several projects.
https://github.com/andyvans/AspectCache

[Cache(minutes:5)]
public virtual List<Show> GetShows()
{
...
}

Andyvan

Comments are closed.