Engineering
December 12, 2024
6 min read

On-Call Oasis: Creating a Peaceful Experience, Especially During the Holidays

Krzysztof Słonka
Software Engineer, Kong

Kong Konnect and our code in production

Kong Konnect is Kong’s infrastructural SaaS solution. We run the control planes and API management applications for the data planes (API gateway and mesh) that are run by our customers to power their APIs. Some of the most critical traffic in the world goes through these data planes. The criticality of Konnect and therefore our customer’s uptime expectations of Konnect are very high.  

As developers working on Kong Konnect, our role goes beyond building the core functionality. We're committed to ensuring that our services are resilient, and part of this commitment is adopting the principle of trending our MTTR (Mean Time to Repair) toward zero. MTTR is about minimizing the repair time from incidents by being proactive, well-prepared, and fully owning our production environments. We're responsible for monitoring and responding to issues, ensuring reliability, and rapid recovery. During high-demand periods like the holidays, we need to act swiftly to maintain service quality. Being on-call is an essential part of owning our code in production.

With the holidays just around the corner, reviewing your on-call routine can make unexpected alerts much easier to handle. Being paged in the middle of the night is stressful enough. The last thing you need is to wake up your dog, kids, and significant other. Here are some tips I've found useful for making being on-call over the holidays less of a headache.

Setting Up Before Going On-Call

There are a few preemptive steps that can make it easier and quicker for you to react when a new alert pops up. I usually get a little bit stressed when an alert pops up, and knowing that I’m ready to react wherever I am — no matter the time and place — makes things easier. This is especially useful around the holidays when interruptions are best kept to a minimum.

Step Zero: Team Preparation

Before you even begin setting up your on-call environment, take some time to meet with your team. Review the latest alerts and incidents to understand any patterns or recurring issues. Ensure that all playbooks are up to date and address common scenarios. With this done, you’re less likely to be caught off guard during your on-call shift. This collective preparation ensures you’re equipped with the most relevant knowledge to handle incidents effectively.

Initial Configuration

Properly set up and test your phone/SMS/app notifications with your incident management software. Ensure that all notification settings are configured correctly, such as allowing the app to override “Do Not Disturb” mode, and test each notification channel to verify they are working as expected. This one is pretty basic, but you don’t want the alert going into the void.

Establish a Hotspot Connection Between Your Phone and Laptop

Preemptively establish a hotspot connection between your phone and laptop so that if you need to start tethering from a cafe or somewhere outside of your office, you won’t be fumbling around with passwords/settings.

Install Useful Software and Get Familiar with OS Features

For example: during a live incident, you don’t want to be bombarded with notifications from other applications. Find out your OS notification settings, tune them accordingly, and, if possible, save as a “profile” that you can toggle on when needed.

Increase Notification Availability

Here are a few tips to increase your notification availability.

  • Use Wi-Fi whenever possible.
  • Enable Wi-Fi calling (Android, iPhone).
  • Use two SIMs from different telecom companies if your phone supports that (check their coverage maps). In Poland (where I’m based), an extra SIM is quite cheap (no more than a couple of euros per year), so it gives me additional peace of mind that one of the carriers will have good enough coverage wherever I am - particularly handy during holiday travel to remote places with spotty cell coverage.

Know You’re Going On-Call

I don’t like being surprised or having to find someone to cover on-call for me at the last minute, especially during the busy holiday season. That’s why I use these tools to make sure I know exactly when I’m going on-call:

  • I export my on-duty calendar to my regular calendar, so I know at a glance if I’m on-call that day or not. I don’t have to open another app; it’s just right there where I schedule most of my activities.
  • I have email, SMS, and push notifications a week before going on-call.

Preparing for Late-Night Alerts

I very rarely get paged in the middle of the night, but when it has happened, it's very disruptive. The ringtone wakes up every living creature in the house. To mitigate this, I configured my phone and wristband as follows.

Choose X depending on how many false positives you have in your alerts and how quickly they resolve themselves.

In the first phase, I have enough time to ACK the alert and not wake up anyone else. At the next level, it might wake someone up, but it also might not — and it does offer a higher chance of waking me up. At the final level, I accept that I will probably wake up everyone in the house but this is the last resort.

Gearing Up

As mentioned above, I have two devices to notify me when an alert comes in. The main one is my phone and the secondary one is a smart band. One thing that’s missing is a notification (vibration) when the band is out of Bluetooth range of the phone. It doesn’t happen often at my apartment, but I don’t always carry the phone with me everywhere. A notification like this would be handy to ensure I don't forget it, especially when I’m caught up in holiday activities. If you know of a workaround for this missing notification, please get in touch —I'd love to hear some tips on it!

If you’re often in a position where you can’t have your phone with you (like swimming, for example), you might consider a more expensive option of having a smartwatch with GSM support.

Before Starting a Shift

Staying Connected on a Laptop

Run all relevant updates before starting a shift, or disable automatic updates of OS/software if that’s possible. Disabling automatic updates can prevent unexpected restarts or performance issues during an incident, ensuring a smoother on-call experience.

Staying Connected on Your Phone

Before starting a shift:

  • Verify your LTE bandwidth limits with your telecom provider.
  • Ensure your phone is fully charged or has access to a power source to avoid losing connectivity during an incident.
  • Consider carrying a powerbank that can power both your phone and your laptop. (I use a powerbank with 140W and 24000 mAh capacity.)

Staying Connected When You Can’t Have Your Phone in Your Pocket

  • Activities like swimming or playing football may require leaving your phone behind.
  • Smart band/BT smartwatch: Use something with long-range Bluetooth — Scosche Rhythm 24 supposedly has 30m of range.
  • LTE smartwatch: Provides even greater connectivity without needing your phone nearby.

Conclusion

In summary, there are a few key things you can do to help ensure your time on-call goes as smoothly as possible.

  • Set up thoroughly: Ensure all your notifications, hotspot, and software are ready before going on-call.
  • Prepare for late-night aAlerts: Use tools like smartbands and tailored notifications to minimize disruptions.
  • Gear up: Have the right devices, such as a smartwatch, to stay connected even when your phone isn’t accessible.

Remember: this is just a list of things that work for me. Everyone is different, so pick the points that suit you and tune (or discard) the items that don’t. On-call does not have to be stressful or limiting in any way — especially during the holidays. If you prepare yourself well enough, you’ll be less worried about it. And most importantly, when an incident happens, take a big breath and don’t worry. You've got this!