The art of debugging: a guide to squashing bugs efficiently

Being able to debug your code is one of the most important skills in software development. Finding the root cause of a bug that’s plaguing your clients can be a frustrating undertaking, but at the same time it’s a necessary and sometimes even rewarding practice.

Debugging applications can force you of the beaten path, and give you insights into your application, programming language, framework, etc… that you would have never discovered otherwise. However it can be hard to know where you should start your search.

In this article I have written down my own thought process whenever I encounter a bug in my work as a software developer, in the hopes that someone might benefit from reading it.

Reproduce the problem

The first step in any bug-hunting process should be trying to reproduce the problem, because it not only makes determining the cause a whole lot easier, but being able to reproduce the issue means you can actually test if it’s fixed.

A good rule of thumb is that a bug that happens every time (and thus is easy to reproduce) is usually also easy to fix. But when you hear the word ‘sometimes’, then you should set yourself a cup of coffee and put on your thinking cap.

The more challenging bugs I have encountered were ones where I spent more time trying to reproduce the issue than actually solving it. And the hardest bugs to solve are the ones you can’t reproduce.

This is mainly why logging is so important. When you have proper logging, you can retrace the steps before the bug occurred. Thus making reproducing a whole lot easier.

The process of elimination

Once you have been able (or in the worst case unable) to reproduce the problem, the next step is to actually analyze the issue and to try and determine where it’s occurring.

Debugging is first and foremost a process of elimination. Finding where a bug is, is a lot harder than finding where it is not. So, you should start by trying to eliminate large portions of your codebase as quickly as possible.

Let’s say your users are reporting that they’re trying to upload a profile picture but it’s not working. They select a valid image from their own computer, upload it and then click save, after which the application shows a message that the save action was successful. However, when they reload the page they see that there is still no visible profile picture.

The first thing to do here is to start following the flow of your data, starting from the most outer parts of your application. These are the parts where the flow of data ‘stops’. In the case of a web application (such as in the above example) this is your front-end on one side, and your database or file storage on the other.

So firstly, open your browser and check the network traffic. What gets sent to the server when the user saves the changes? Is the profile image included in the data that’s being sent? If not then we have already determined that the issue can be found in our client-side application, eliminating the entire back-end service.

But let’s say you see that image is correctly sent to the back-end server. This means we can eliminate our client-side application as the source of our troubles, and continue with analyzing the flow of data into our back-end service.

Either way, we have already been able to narrow down the location of the bug significantly.

Moving to the back-end server, we start again at the extremities. So, we head over to the place where our data should end up. In this case, we should check the file storage and see if we can find the file there. Can’t find the file? Well then check the point where it enters the back-end. Found the file? Well then check the point where your back-end should be returning the data to the client.

The above steps took less than 5 minutes, and we have already eliminated most of our code base as the cause of the bug. In each situation we narrowed it down to a smaller, manageable part of our larger application. And keep in mind, we did all that without turning on the debugger or (if that’s more your kind of thing) writing tons of console.log() statements.

Follow the flow

The next step is taking the part of our codebase we narrowed it down to, and following the flow of our program.

Again, start at the extremities. In an HTTP API we start at the HTTP endpoint that should be called. Starting from that endpoint, go through the code step by step.

Take your time and make sure you understand exactly what’s happening in each step. Step into all the different methods that are being called, all the way to the end where our data should end up.

Debugging code takes time and concentration. Don’t rush yourself!

In the best case, you have been able to reproduce the issue on your local development environment. This means you can use the debugger of your favorite IDE to go through the code step-by-step and find the mistake in your code.

In the worst case you have been unable to reproduce the issue locally and you’re stuck reading code and trying to reason through it in your head. Did I mention how important proper logging is?

Be suspicious

While you’re doing the above and you’re stepping through the code, you should constantly be on the lookout for things that seem odd to you.

When you’re reading a piece of code and somewhere in the back if your mind you feel something is off, then don’t ignore that feeling. Most of this ‘code-spider-sense’ is something that comes with experience. If you have a few years of writing code under your belt, it gets easier to spot ‘anomalies’.

As my own focus is on web application development, the thing I am especially suspicious of are singletons. There’s nothing wrong with the singleton pattern in itself, but it does attract a lot of bugs. Especially in web applications where you don’t want shared state between your clients, a singleton is usually a dangerous thing to rely on. It’s one of the causes of those ‘sometimes’ bugs I mentioned earlier.

Other frequent offenders are wrongly mapped models, boolean logic that is ‘reversed’, … However these types of bugs will exhibit consistent behavior and will always occur when that specific process is run, so they’re of the easier variety to find and fix.

Obviously the above examples will be completely different if you write embedded software, but the core message remains true: if you feel suspicious about a piece of code, pay extra attention to it!

Test your fix

This might seem obvious, but I have seen cases where supposedly the issue was found and fixed, but after deploying to production the bug still persisted.

A good habit is to write unit or integration tests for each bug your discover and fix. The idea is that you already have an automated testing suite in place, and that bugs that somehow escape detection indicate your tests aren’t good enough!

The best way to approach debugging is actually to write a unit test that reproduces the bug and fails, and then refactor your code until the bug is fixed and the test passes. This way you make sure that the same bug will never occur again.

Also (if possible), a quick ‘manual’ test of a fixed bug can give you some extra peace-of-mind.