25 votes

Code Quality Tip: Cyclomatic complexity in depth.

Posted March 1, 2019 by Emerald_Knight

Tags: programming.code quality tips, coding, technical debt, code quality, refactoring, recurring.infrequent

Preface

Recently I briefly touched on the subject of cyclomatic complexity. This is an important concept for any programmer to understand and think about as they write their code. In order to provide a more solid understanding of the subject, however, I feel that I need to address the topic more thoroughly with a more practical example.

What is cyclomatic complexity?

The concept of "cyclomatic complexity" is simple: the more conditional branching and looping in your code, the more complex--and therefore the more difficult to maintain--that code is. We can visualize this complexity by drawing a diagram that illustrates the flow of logic in our program. For example, let's take the following toy example of a user login attempt:

<?php

$login_data = getLoginCredentialsFromInput();

$login_succeeded = false;
$error = '';
if(usernameExists($login_data['username'])) {
    $user = getUser($login_data['username']);
    
    if(!isDeleted($user)) {
        if(!isBanned($user)) {
            if(!loginRateLimitReached($user)) {
                if(passwordMatches($user, $login_data['password'])) {
                    loginUser($user);
                    $login_succeeded = true;
                } else {
                    $error = getBadPasswordError();
                    logBadLoginAttempt();
                }
            } else {
                $error = getLoginRateLimitError($user);
            }
        } else {
            $error = getUserBannedError($user);
        }
    } else {
        $error = getUserDeletedError($user);
    }
} else {
    $error = getBadUsernameError($login_data['username']);
}

if($login_succeeded) {
    sendSuccessResponse();
} else {
    sendErrorResponse($error);
}

?>

A diagram for this logic might look something like this:

+-----------------+
|                 |
|  Program Start  |
|                 |
+--------+--------+
         |
         |
         v
+--------+--------+    +-----------------+
|                 |    |                 |
|    Username     +--->+    Set Error    +--+
|    Exists?      | No |                 |  |
|                 |    +-----------------+  |
+--------+--------+                         |
         |                                  |
     Yes |                                  |
         v                                  |
+--------+--------+    +-----------------+  |
|                 |    |                 |  |
|  User Deleted?  +--->+    Set Error    +->+
|                 | Yes|                 |  |
+--------+--------+    +-----------------+  |
         |                                  |
      No |                                  |
         v                                  |
+--------+--------+    +-----------------+  |
|                 |    |                 |  |
|  User Banned?   +--->+    Set Error    +->+
|                 | Yes|                 |  |
+--------+--------+    +-----------------+  |
         |                                  |
      No |                                  |
         v                                  |
+--------+--------+    +-----------------+  |
|                 |    |                 |  |
|   Login Rate    +--->+    Set Error    +->+
| Limit Reached?  | Yes|                 |  |
|                 |    +-----------------+  |
+--------+--------+                         |
         |                                  |
      No |                                  |
         v                                  |
+--------+--------+    +-----------------+  |
|                 |    |                 |  |
|Password Matches?+--->+    Set Error    +->+
|                 | No |                 |  |
+--------+--------+    +-----------------+  |
         |                                  |
     Yes |                                  |
         v                                  |
+--------+--------+    +----------+         |
|                 |    |          |         |
|   Login User    +--->+ Converge +<--------+
|                 |    |          |
+-----------------+    +---+------+
                           |
                           |
         +-----------------+
         |
         v
+--------+--------+
|                 |
|   Succeeded?    +-------------+
|                 | No          |
+--------+--------+             |
         |                      |
     Yes |                      |
         v                      v
+--------+--------+    +--------+--------+
|                 |    |                 |
|  Send Success   |    |   Send Error    |
|    Message      |    |    Message      |
|                 |    |                 |
+-----------------+    +-----------------+

It's important to note that between nodes in this directed graph, you can find certain enclosed regions being formed. Specifically, each conditional branch that converges back into the main line of execution generates an additional region. The number of these distinct enclosed regions is directly proportional to the level of cyclomatic complexity of the system--that is, more regions means more complicated code.

Clocking out early.

There's an important piece of information I noted when describing the above example:

. . . each conditional branch that converges back into the main line of execution generates an additional region.

The above example is made complex largely due to an attempt to create a single exit point at the end of the program logic, causing these conditional branches to converge and thus generate the additional enclosed regions within our diagram.

But what if we stopped trying to converge back into the main line of execution? What if, instead, we decided to interrupt the program execution as soon as we encountered an error? Our code might look something like this:

<?php

$login_data = getLoginCredentialsFromInput();

if(!usernameExists($login_data['username'])) {
    sendErrorResponse(getBadUsernameError($login_data['username']));
    return;
}

$user = getUser($login_data['username']);
if(isDeleted($user)) {
    sendErrorResponse(getUserDeletedError($user));
    return;
}

if(isBanned($user)) {
    sendErrorResponse(getUserBannedError($user));
    return;
}

if(loginRateLimitReached($user)) {
    logBadLoginAttempt($user);
    sendErrorResponse(getLoginRateLimitError($user));
    return;
}

if(!passwordMatches($user, $login_data['password'])) {
    logBadLoginAttempt($user);
    sendErrorResponse(getBadPasswordError());
    return;
}

loginUser($user);
sendSuccessResponse();

?>

Before we've even constructed a diagram for this logic, we can already see just how much simpler this logic is. We don't need to traverse a tree of if statements to determine which error message has priority to be sent out, we don't need to attempt to follow indentation levels, and our behavior on success is right at the very end and at the lowest level of indentation, where it's easily and obviously located at a glance.

Now, however, let's verify this reduction in complexity by examining the associated diagram:

+-----------------+
|                 |
|  Program Start  |
|                 |
+--------+--------+
         |
         |
         v
+--------+--------+    +-----------------+
|                 |    |                 |
|    Username     +--->+   Send Error    |
|    Exists?      | No |    Message      |
|                 |    |                 |
+--------+--------+    +-----------------+
         |
     Yes |
         v
+--------+--------+    +-----------------+
|                 |    |                 |
|  User Deleted?  +--->+   Send Error    |
|                 | Yes|    Message      |
+--------+--------+    |                 |
         |             +-----------------+
      No |
         v
+--------+--------+    +-----------------+
|                 |    |                 |
|  User Banned?   +--->+   Send Error    |
|                 | Yes|    Message      |
+--------+--------+    |                 |
         |             +-----------------+
      No |
         v
+--------+--------+    +-----------------+
|                 |    |                 |
|   Login Rate    +--->+   Send Error    |
| Limit Reached?  | Yes|    Message      |
|                 |    |                 |
+--------+--------+    +-----------------+
         |
      No |
         v
+--------+--------+    +-----------------+
|                 |    |                 |
|Password Matches?+--->+   Send Error    |
|                 | No |    Message      |
+--------+--------+    |                 |
         |             +-----------------+
     Yes |
         v
+--------+--------+
|                 |
|   Login User    |
|                 |
+--------+--------+
         |
         |
         v
+--------+--------+
|                 |
|  Send Success   |
|    Message      |
|                 |
+-----------------+

Something should immediately stand out here: there are no enclosed regions in this diagram! Furthermore, even our new diagram is much simpler to follow than the old one was.

Reality is rarely simple.

The above is a really forgiving example. It has no loops, and loops are going to create enclosed regions that can't be broken apart so easily; it has no conditional branches that are so tightly coupled with the main path of execution that they can't be broken up; and the scope of functionality and side effects are minimal. Sometimes you can't break those regions up. So what do we do when we inevitably encounter these cases?

High cyclomatic complexity in your program as a whole is inevitable for sufficiently large projects, especially in a production environment, and your efforts to reduce it can only go so far. In fact, I don't recommend trying to remove all or even most instances of cyclomatic complexity at all--instead, you should just be keeping the concept in mind to determine whether or not a function, method, class, module, or other component of your system is accumulating technical debt and therefore in need of refactoring.

At this point, astute readers might ask, "How does refactoring help if the cyclomatic complexity doesn't actually go away?", and this is a valid concern. The answer to that is simple, however: we're hiding complexity behind abstractions.

To test this, let's forget about cyclomatic complexity for a moment and instead focus on simplifying the refactored version of our toy example using abstraction:

<?php

function handleLoginAttempt($login_data) {
    if(!usernameExists($login_data['username'])) {
        sendErrorResponse(getBadUsernameError($login_data['username']));
        return;
    }

    $user = getUser($login_data['username']);
    if(isDeleted($user)) {
        sendErrorResponse(getUserDeletedError($user));
        return;
    }

    if(isBanned($user)) {
        sendErrorResponse(getUserBannedError($user));
        return;
    }

    if(loginRateLimitReached($user)) {
        logBadLoginAttempt($user);
        sendErrorResponse(getLoginRateLimitError($user));
        return;
    }

    if(!passwordMatches($user, $login_data['password'])) {
        logBadLoginAttempt($user);
        sendErrorResponse(getBadPasswordError());
        return;
    }

    loginUser($user);
    sendSuccessResponse();
}

$login_data = getLoginCredentialsFromInput();

handleLoginAttempt($login_data);

?>

The code above is functionally identical to our refactored example from earlier, but has an additional abstraction via a function. Now we can diagram this higher-level abstraction as follows:

+-----------------+
|                 |
|  Program Start  |
|                 |
+--------+--------+
         |
         |
         v
+--------+--------+
|                 |
|  Attempt Login  |
|                 |
+-----------------+

This is, of course, a pretty extreme example, but this is how we handle thinking about complex program logic. We abstract it down to the barest basics so that we can visualize, in its simplest form, what the program is supposed to do. We don't actually care about the implementation unless we're digging into that specific part of the system, because otherwise we would be so bogged down by the details that we wouldn't be able to reason about what our program is supposed to do.

Likewise, we can use these abstractions to hide away the cyclomatic complexity underlying different components of our software. This keeps everything clean and clutter-free in our head. And the more we do to keep our smaller components simple and easy to think about, the easier the larger components are to deal with, no matter how much cyclomatic complexity all of those components share as a collective.

Final Thoughts

Cyclomatic complexity isn't a bad thing to have in your code. The concept itself is only intended to be used as one of many tools to assess when your code is accumulating too much technical debt. It's a warning sign that you may need to change something, nothing more. But it's an incredibly useful tool to have available to you and you should get comfortable using it.

As a general rule of thumb, you can usually just take a glance at your code and assess whether or not there's too much cyclomatic complexity in a component by looking for either of the following:

Too many loops and/or conditional statements nested within each other, i.e. you have a lot of indentation.
Many loops in the same function/method.

It's not a perfect rule of thumb, but it's useful for at least 90% of your development needs, and there will inevitably be cases where you will prefer to accept some greater cyclomatic complexity because there is some benefit that makes it a better trade-off. Making that judgment is up to you as a developer.

As always, I'm more than willing to listen to feedback and answer any questions!

5 comments

[2]
meghan
March 2, 2019
Link
Awesome write up! It closely hearkens to an adage I read some time ago that went "Fail fast and fail often". The less time the program spends running, the less time it has to break haha

Awesome write up! It closely hearkens to an adage I read some time ago that went "Fail fast and fail often". The less time the program spends running, the less time it has to break haha

1 vote
1. Emerald_Knight (OP)
  March 2, 2019
  Link Parent
  Thanks! And yeah, you have a point there! Have it do only what it has to do, then have it get the hell out of the way so there are fewer opportunities for mistakes to be made. In fact, depending...
  
  Thanks! And yeah, you have a point there! Have it do only what it has to do, then have it get the hell out of the way so there are fewer opportunities for mistakes to be made.
  
  In fact, depending on the piece of software, sometimes it may make more sense to not handle the exceptional cases in your main path of execution and to instead allow them to fail "silently", while a separate process picks up any of the failed tasks for cleanup. This can allow for additional resilience in your code when unexpected, unhandled exceptions are thrown, giving your main path of execution a way of avoiding unexpected repeated instances of an otherwise blocking failure while the cleanup process reports on which tasks failed along with some debugging information. For example, marking a task as "in progress" prior to starting will prevent it from being picked back up if an exception is thrown, and the cleanup process only needs to find tasks that have been "in progress" for a period of time exceeding some maximum lifetime (i.e. tasks that must have failed).
  
  There are so many different architectural approaches to failures, each with their own pros and cons, that I could easily write up a series of posts just covering them alone.
  
  But I'm rambling, so I'm going to stop here before this comment becomes an essay :)
  
  1 vote
[2]
vakieh
March 2, 2019
Link
Both of these versions are 100% exactly the same level of complexity, once you recognise what 'return' is. It doesn't mean code magically stops - it is a convergence in and of itself. What it...

Both of these versions are 100% exactly the same level of complexity, once you recognise what 'return' is. It doesn't mean code magically stops - it is a convergence in and of itself. What it means is you abstract some of that complexity (and management of the convergence) off to some other layer (i.e. the compiler).

This is often a good thing, but it isn't always, because sometimes you need to react in particular ways to convergence where 'return' abstracts your ability to do so. Often the way people do that is with WORSE answers like globals/singletons to track state, because they religiously follow the above.

Like all things, it's important to understand what is going on rather than do it because it looks better or because something told you to.

1 vote
1. Emerald_Knight (OP)
  March 2, 2019
  Link Parent
  This is fair. This does, however, circle back to my point about using abstractions to hide away this complexity. I think one of the key takeaways here is that convergence is primarily a problem...
  
  It doesn't mean code magically stops - it is a convergence in and of itself. What it means is you abstract some of that complexity (and management of the convergence) off to some other layer (i.e. the compiler).
  
  This is fair. This does, however, circle back to my point about using abstractions to hide away this complexity. I think one of the key takeaways here is that convergence is primarily a problem when it interferes with your conceptual model of the problem, as in the case of the diagrams, whereas underlying implementation details are only really a secondary consideration.
  
  This is often a good thing, but it isn't always, because sometimes you need to react in particular ways to convergence where 'return' abstracts your ability to do so. Often the way people do that is with WORSE answers like globals/singletons to track state, because they religiously follow the above.
  
  This is also fair. Albeit briefly, I did mention at the end that there are cases in which you wouldn't want to make this reduction in cyclomatic complexity precisely because there is some alternative benefit for maintaining the structure. In those cases, it's far better to focus instead on breaking up your code into smaller components so it's easier to manage, rather than trying to shoehorn this sort of refactor.
  
  Like all things, it's important to understand what is going on rather than do it because it looks better or because something told you to.
  
  I completely agree. I want to stress again that cyclomatic complexity as a concept is only a tool, and should be one of many tools available to you, rather than the only tool in your arsenal. You don't want to use a hammer when what you need is a screwdriver, after all. Code that is simple, predictable, and easy to maintain takes precedence over everything. Even code that needs to be made more complicated for the sake of improving performance can still be written in a way that isn't complete garbage.
  
  Overall, we're both definitely on the same page here :)
  
  2 votes
lazer
March 4, 2019
Link
Very interesting and well written, thank you for sharing.

Very interesting and well written, thank you for sharing.

1 vote