Serverless is a great way to go if you need compute time on-demand that scales. However, it can come with a hidden price of irregular response times. Why does this happen you say? A great analogy is to think of a car out in the cold trying to start for the first time in several days. When it's cold the oil gets thicker and the battery produces fewer electrons for the starter engine to work with. It's kinda the same thing with serverless services. I'm not saying Azure servers run on thick car oil, but when a cold start occurs there are more steps involved in getting your code to run. In broad strokes here's what happens when your function is triggered:

  • The very first time you trigger your function, Azure needs to allocate a server with capacity for your function to run on.
  • Next, the functions runtime must start on that server.
  • Finally, your code executes.

After the initial run of a function, you should see a noticeable reduction in response time. This is because your function already has a server and runtime allocated, and you can now consider your function "warm".

Microsoft states that they deallocate resources after roughly 20 minutes of inactivity, rendering your function "cold" again.

Keeping it warm

There are several ways of keeping your functions "warm". The easiest is to choose to host your functions on a dedicated plan. This means you rent a VM that's running 24/7 where your functions live, but this isn't really serverless, and also changes how you're billed. What you usually want is a consumption plan where you pay as you go.
Microsoft actually has its own solution to the problem, which let's be honest is probably the best way to go. Recently they added a feature that pre-warms instances for you if you run a premium plan. Here's a small excerpt from the documentation:

In the Premium plan, you can have your app pre-warmed on a specified number of instances, up to your minimum plan size. Pre-warmed instances also let you pre-scale an app before high load. As the app scales out, it first scales into the pre-warmed instances. Additional instances continue to buffer out and warm immediately in preparation for the next scale operation. By having a buffer of pre-warmed instances, you can effectively avoid cold start latencies. Pre-warmed instances is a feature of the Premium plan, and you need to keep at least one instance running and available at all times the plan is active.

You can read the article in full here.

As mentioned this requires your functions to run on a premium plan, however, there are other ways of solving our problem if you're not able to run a premium plan.
One way of doing it that actually comes with added value is to create a health check endpoint in your function. All it has to do is return a Http 200 OK response. By doing this let's say every 15 minutes you keep your function warm and verify that the service is available.

Another way of doing it might be to send a request to the function just before you actually need it (Maybe to an empty endpoint, like the health check). Depending on the nature of the function you might not always know when you want it, but for argument's sake let's agree we know when we want it for this scenario. For example, if the function handles a user registration on an app/website, you can call the "dummy" endpoint when a user hits the registration page.


"Cold" starts are a consequence of running code on-demand. If the function hasn't been run for a while it needs time to find and load the proper runtime on a server with free capacity. Running Azure functions on a consumption plan can be really cost-effective, but "cold" starts should factor in on your decision when architecting your solution. We've also looked at Microsofts feature for handling "cold" starts. If affordable I would recommend taking a more in-depth look at it. As an alternative, we looked at ways of pre-heating our functions ourselves.

One last thing to consider is writing your functions in a language that is in general availability. Those are languages that have been optimized for Azure functions and have lower "cold" start time than any of the languages that may be in preview.

As always thoughts and feedback in the comment section are appreciated.