Preface Recently I briefly touched on the subject of cyclomatic complexity. This is an important concept for any programmer to understand and think about as they write their code. In order to...
Preface
Recently I briefly touched on the subject of cyclomatic complexity. This is an important concept for any programmer to understand and think about as they write their code. In order to provide a more solid understanding of the subject, however, I feel that I need to address the topic more thoroughly with a more practical example.
What is cyclomatic complexity?
The concept of "cyclomatic complexity" is simple: the more conditional branching and looping in your code, the more complex--and therefore the more difficult to maintain--that code is. We can visualize this complexity by drawing a diagram that illustrates the flow of logic in our program. For example, let's take the following toy example of a user login attempt:
<?php
$login_data = getLoginCredentialsFromInput();
$login_succeeded = false;
$error = '';
if(usernameExists($login_data['username'])) {
$user = getUser($login_data['username']);
if(!isDeleted($user)) {
if(!isBanned($user)) {
if(!loginRateLimitReached($user)) {
if(passwordMatches($user, $login_data['password'])) {
loginUser($user);
$login_succeeded = true;
} else {
$error = getBadPasswordError();
logBadLoginAttempt();
}
} else {
$error = getLoginRateLimitError($user);
}
} else {
$error = getUserBannedError($user);
}
} else {
$error = getUserDeletedError($user);
}
} else {
$error = getBadUsernameError($login_data['username']);
}
if($login_succeeded) {
sendSuccessResponse();
} else {
sendErrorResponse($error);
}
?>
A diagram for this logic might look something like this:
+-----------------+
| |
| Program Start |
| |
+--------+--------+
|
|
v
+--------+--------+ +-----------------+
| | | |
| Username +--->+ Set Error +--+
| Exists? | No | | |
| | +-----------------+ |
+--------+--------+ |
| |
Yes | |
v |
+--------+--------+ +-----------------+ |
| | | | |
| User Deleted? +--->+ Set Error +->+
| | Yes| | |
+--------+--------+ +-----------------+ |
| |
No | |
v |
+--------+--------+ +-----------------+ |
| | | | |
| User Banned? +--->+ Set Error +->+
| | Yes| | |
+--------+--------+ +-----------------+ |
| |
No | |
v |
+--------+--------+ +-----------------+ |
| | | | |
| Login Rate +--->+ Set Error +->+
| Limit Reached? | Yes| | |
| | +-----------------+ |
+--------+--------+ |
| |
No | |
v |
+--------+--------+ +-----------------+ |
| | | | |
|Password Matches?+--->+ Set Error +->+
| | No | | |
+--------+--------+ +-----------------+ |
| |
Yes | |
v |
+--------+--------+ +----------+ |
| | | | |
| Login User +--->+ Converge +<--------+
| | | |
+-----------------+ +---+------+
|
|
+-----------------+
|
v
+--------+--------+
| |
| Succeeded? +-------------+
| | No |
+--------+--------+ |
| |
Yes | |
v v
+--------+--------+ +--------+--------+
| | | |
| Send Success | | Send Error |
| Message | | Message |
| | | |
+-----------------+ +-----------------+
It's important to note that between nodes in this directed graph, you can find certain enclosed regions being formed. Specifically, each conditional branch that converges back into the main line of execution generates an additional region. The number of these distinct enclosed regions is directly proportional to the level of cyclomatic complexity of the system--that is, more regions means more complicated code.
Clocking out early.
There's an important piece of information I noted when describing the above example:
. . . each conditional branch that converges back into the main line of execution generates an additional region.
The above example is made complex largely due to an attempt to create a single exit point at the end of the program logic, causing these conditional branches to converge and thus generate the additional enclosed regions within our diagram.
But what if we stopped trying to converge back into the main line of execution? What if, instead, we decided to interrupt the program execution as soon as we encountered an error? Our code might look something like this:
<?php
$login_data = getLoginCredentialsFromInput();
if(!usernameExists($login_data['username'])) {
sendErrorResponse(getBadUsernameError($login_data['username']));
return;
}
$user = getUser($login_data['username']);
if(isDeleted($user)) {
sendErrorResponse(getUserDeletedError($user));
return;
}
if(isBanned($user)) {
sendErrorResponse(getUserBannedError($user));
return;
}
if(loginRateLimitReached($user)) {
logBadLoginAttempt($user);
sendErrorResponse(getLoginRateLimitError($user));
return;
}
if(!passwordMatches($user, $login_data['password'])) {
logBadLoginAttempt($user);
sendErrorResponse(getBadPasswordError());
return;
}
loginUser($user);
sendSuccessResponse();
?>
Before we've even constructed a diagram for this logic, we can already see just how much simpler this logic is. We don't need to traverse a tree of if statements to determine which error message has priority to be sent out, we don't need to attempt to follow indentation levels, and our behavior on success is right at the very end and at the lowest level of indentation, where it's easily and obviously located at a glance.
Now, however, let's verify this reduction in complexity by examining the associated diagram:
+-----------------+
| |
| Program Start |
| |
+--------+--------+
|
|
v
+--------+--------+ +-----------------+
| | | |
| Username +--->+ Send Error |
| Exists? | No | Message |
| | | |
+--------+--------+ +-----------------+
|
Yes |
v
+--------+--------+ +-----------------+
| | | |
| User Deleted? +--->+ Send Error |
| | Yes| Message |
+--------+--------+ | |
| +-----------------+
No |
v
+--------+--------+ +-----------------+
| | | |
| User Banned? +--->+ Send Error |
| | Yes| Message |
+--------+--------+ | |
| +-----------------+
No |
v
+--------+--------+ +-----------------+
| | | |
| Login Rate +--->+ Send Error |
| Limit Reached? | Yes| Message |
| | | |
+--------+--------+ +-----------------+
|
No |
v
+--------+--------+ +-----------------+
| | | |
|Password Matches?+--->+ Send Error |
| | No | Message |
+--------+--------+ | |
| +-----------------+
Yes |
v
+--------+--------+
| |
| Login User |
| |
+--------+--------+
|
|
v
+--------+--------+
| |
| Send Success |
| Message |
| |
+-----------------+
Something should immediately stand out here: there are no enclosed regions in this diagram! Furthermore, even our new diagram is much simpler to follow than the old one was.
Reality is rarely simple.
The above is a really forgiving example. It has no loops, and loops are going to create enclosed regions that can't be broken apart so easily; it has no conditional branches that are so tightly coupled with the main path of execution that they can't be broken up; and the scope of functionality and side effects are minimal. Sometimes you can't break those regions up. So what do we do when we inevitably encounter these cases?
High cyclomatic complexity in your program as a whole is inevitable for sufficiently large projects, especially in a production environment, and your efforts to reduce it can only go so far. In fact, I don't recommend trying to remove all or even most instances of cyclomatic complexity at all--instead, you should just be keeping the concept in mind to determine whether or not a function, method, class, module, or other component of your system is accumulating technical debt and therefore in need of refactoring.
At this point, astute readers might ask, "How does refactoring help if the cyclomatic complexity doesn't actually go away?", and this is a valid concern. The answer to that is simple, however: we're hiding complexity behind abstractions.
To test this, let's forget about cyclomatic complexity for a moment and instead focus on simplifying the refactored version of our toy example using abstraction:
<?php
function handleLoginAttempt($login_data) {
if(!usernameExists($login_data['username'])) {
sendErrorResponse(getBadUsernameError($login_data['username']));
return;
}
$user = getUser($login_data['username']);
if(isDeleted($user)) {
sendErrorResponse(getUserDeletedError($user));
return;
}
if(isBanned($user)) {
sendErrorResponse(getUserBannedError($user));
return;
}
if(loginRateLimitReached($user)) {
logBadLoginAttempt($user);
sendErrorResponse(getLoginRateLimitError($user));
return;
}
if(!passwordMatches($user, $login_data['password'])) {
logBadLoginAttempt($user);
sendErrorResponse(getBadPasswordError());
return;
}
loginUser($user);
sendSuccessResponse();
}
$login_data = getLoginCredentialsFromInput();
handleLoginAttempt($login_data);
?>
The code above is functionally identical to our refactored example from earlier, but has an additional abstraction via a function. Now we can diagram this higher-level abstraction as follows:
+-----------------+
| |
| Program Start |
| |
+--------+--------+
|
|
v
+--------+--------+
| |
| Attempt Login |
| |
+-----------------+
This is, of course, a pretty extreme example, but this is how we handle thinking about complex program logic. We abstract it down to the barest basics so that we can visualize, in its simplest form, what the program is supposed to do. We don't actually care about the implementation unless we're digging into that specific part of the system, because otherwise we would be so bogged down by the details that we wouldn't be able to reason about what our program is supposed to do.
Likewise, we can use these abstractions to hide away the cyclomatic complexity underlying different components of our software. This keeps everything clean and clutter-free in our head. And the more we do to keep our smaller components simple and easy to think about, the easier the larger components are to deal with, no matter how much cyclomatic complexity all of those components share as a collective.
Final Thoughts
Cyclomatic complexity isn't a bad thing to have in your code. The concept itself is only intended to be used as one of many tools to assess when your code is accumulating too much technical debt. It's a warning sign that you may need to change something, nothing more. But it's an incredibly useful tool to have available to you and you should get comfortable using it.
As a general rule of thumb, you can usually just take a glance at your code and assess whether or not there's too much cyclomatic complexity in a component by looking for either of the following:
- Too many loops and/or conditional statements nested within each other, i.e. you have a lot of indentation.
- Many loops in the same function/method.
It's not a perfect rule of thumb, but it's useful for at least 90% of your development needs, and there will inevitably be cases where you will prefer to accept some greater cyclomatic complexity because there is some benefit that makes it a better trade-off. Making that judgment is up to you as a developer.
As always, I'm more than willing to listen to feedback and answer any questions!