-
12 votes
-
We fired our top talent. Best decision we ever made.
28 votes -
Code Quality Tip: Cyclomatic complexity in depth.
Preface Recently I briefly touched on the subject of cyclomatic complexity. This is an important concept for any programmer to understand and think about as they write their code. In order to...
Preface
Recently I briefly touched on the subject of cyclomatic complexity. This is an important concept for any programmer to understand and think about as they write their code. In order to provide a more solid understanding of the subject, however, I feel that I need to address the topic more thoroughly with a more practical example.
What is cyclomatic complexity?
The concept of "cyclomatic complexity" is simple: the more conditional branching and looping in your code, the more complex--and therefore the more difficult to maintain--that code is. We can visualize this complexity by drawing a diagram that illustrates the flow of logic in our program. For example, let's take the following toy example of a user login attempt:
<?php $login_data = getLoginCredentialsFromInput(); $login_succeeded = false; $error = ''; if(usernameExists($login_data['username'])) { $user = getUser($login_data['username']); if(!isDeleted($user)) { if(!isBanned($user)) { if(!loginRateLimitReached($user)) { if(passwordMatches($user, $login_data['password'])) { loginUser($user); $login_succeeded = true; } else { $error = getBadPasswordError(); logBadLoginAttempt(); } } else { $error = getLoginRateLimitError($user); } } else { $error = getUserBannedError($user); } } else { $error = getUserDeletedError($user); } } else { $error = getBadUsernameError($login_data['username']); } if($login_succeeded) { sendSuccessResponse(); } else { sendErrorResponse($error); } ?>
A diagram for this logic might look something like this:
+-----------------+ | | | Program Start | | | +--------+--------+ | | v +--------+--------+ +-----------------+ | | | | | Username +--->+ Set Error +--+ | Exists? | No | | | | | +-----------------+ | +--------+--------+ | | | Yes | | v | +--------+--------+ +-----------------+ | | | | | | | User Deleted? +--->+ Set Error +->+ | | Yes| | | +--------+--------+ +-----------------+ | | | No | | v | +--------+--------+ +-----------------+ | | | | | | | User Banned? +--->+ Set Error +->+ | | Yes| | | +--------+--------+ +-----------------+ | | | No | | v | +--------+--------+ +-----------------+ | | | | | | | Login Rate +--->+ Set Error +->+ | Limit Reached? | Yes| | | | | +-----------------+ | +--------+--------+ | | | No | | v | +--------+--------+ +-----------------+ | | | | | | |Password Matches?+--->+ Set Error +->+ | | No | | | +--------+--------+ +-----------------+ | | | Yes | | v | +--------+--------+ +----------+ | | | | | | | Login User +--->+ Converge +<--------+ | | | | +-----------------+ +---+------+ | | +-----------------+ | v +--------+--------+ | | | Succeeded? +-------------+ | | No | +--------+--------+ | | | Yes | | v v +--------+--------+ +--------+--------+ | | | | | Send Success | | Send Error | | Message | | Message | | | | | +-----------------+ +-----------------+
It's important to note that between nodes in this directed graph, you can find certain enclosed regions being formed. Specifically, each conditional branch that converges back into the main line of execution generates an additional region. The number of these distinct enclosed regions is directly proportional to the level of cyclomatic complexity of the system--that is, more regions means more complicated code.
Clocking out early.
There's an important piece of information I noted when describing the above example:
. . . each conditional branch that converges back into the main line of execution generates an additional region.
The above example is made complex largely due to an attempt to create a single exit point at the end of the program logic, causing these conditional branches to converge and thus generate the additional enclosed regions within our diagram.
But what if we stopped trying to converge back into the main line of execution? What if, instead, we decided to interrupt the program execution as soon as we encountered an error? Our code might look something like this:
<?php $login_data = getLoginCredentialsFromInput(); if(!usernameExists($login_data['username'])) { sendErrorResponse(getBadUsernameError($login_data['username'])); return; } $user = getUser($login_data['username']); if(isDeleted($user)) { sendErrorResponse(getUserDeletedError($user)); return; } if(isBanned($user)) { sendErrorResponse(getUserBannedError($user)); return; } if(loginRateLimitReached($user)) { logBadLoginAttempt($user); sendErrorResponse(getLoginRateLimitError($user)); return; } if(!passwordMatches($user, $login_data['password'])) { logBadLoginAttempt($user); sendErrorResponse(getBadPasswordError()); return; } loginUser($user); sendSuccessResponse(); ?>
Before we've even constructed a diagram for this logic, we can already see just how much simpler this logic is. We don't need to traverse a tree of if statements to determine which error message has priority to be sent out, we don't need to attempt to follow indentation levels, and our behavior on success is right at the very end and at the lowest level of indentation, where it's easily and obviously located at a glance.
Now, however, let's verify this reduction in complexity by examining the associated diagram:
+-----------------+ | | | Program Start | | | +--------+--------+ | | v +--------+--------+ +-----------------+ | | | | | Username +--->+ Send Error | | Exists? | No | Message | | | | | +--------+--------+ +-----------------+ | Yes | v +--------+--------+ +-----------------+ | | | | | User Deleted? +--->+ Send Error | | | Yes| Message | +--------+--------+ | | | +-----------------+ No | v +--------+--------+ +-----------------+ | | | | | User Banned? +--->+ Send Error | | | Yes| Message | +--------+--------+ | | | +-----------------+ No | v +--------+--------+ +-----------------+ | | | | | Login Rate +--->+ Send Error | | Limit Reached? | Yes| Message | | | | | +--------+--------+ +-----------------+ | No | v +--------+--------+ +-----------------+ | | | | |Password Matches?+--->+ Send Error | | | No | Message | +--------+--------+ | | | +-----------------+ Yes | v +--------+--------+ | | | Login User | | | +--------+--------+ | | v +--------+--------+ | | | Send Success | | Message | | | +-----------------+
Something should immediately stand out here: there are no enclosed regions in this diagram! Furthermore, even our new diagram is much simpler to follow than the old one was.
Reality is rarely simple.
The above is a really forgiving example. It has no loops, and loops are going to create enclosed regions that can't be broken apart so easily; it has no conditional branches that are so tightly coupled with the main path of execution that they can't be broken up; and the scope of functionality and side effects are minimal. Sometimes you can't break those regions up. So what do we do when we inevitably encounter these cases?
High cyclomatic complexity in your program as a whole is inevitable for sufficiently large projects, especially in a production environment, and your efforts to reduce it can only go so far. In fact, I don't recommend trying to remove all or even most instances of cyclomatic complexity at all--instead, you should just be keeping the concept in mind to determine whether or not a function, method, class, module, or other component of your system is accumulating technical debt and therefore in need of refactoring.
At this point, astute readers might ask, "How does refactoring help if the cyclomatic complexity doesn't actually go away?", and this is a valid concern. The answer to that is simple, however: we're hiding complexity behind abstractions.
To test this, let's forget about cyclomatic complexity for a moment and instead focus on simplifying the refactored version of our toy example using abstraction:
<?php function handleLoginAttempt($login_data) { if(!usernameExists($login_data['username'])) { sendErrorResponse(getBadUsernameError($login_data['username'])); return; } $user = getUser($login_data['username']); if(isDeleted($user)) { sendErrorResponse(getUserDeletedError($user)); return; } if(isBanned($user)) { sendErrorResponse(getUserBannedError($user)); return; } if(loginRateLimitReached($user)) { logBadLoginAttempt($user); sendErrorResponse(getLoginRateLimitError($user)); return; } if(!passwordMatches($user, $login_data['password'])) { logBadLoginAttempt($user); sendErrorResponse(getBadPasswordError()); return; } loginUser($user); sendSuccessResponse(); } $login_data = getLoginCredentialsFromInput(); handleLoginAttempt($login_data); ?>
The code above is functionally identical to our refactored example from earlier, but has an additional abstraction via a function. Now we can diagram this higher-level abstraction as follows:
+-----------------+ | | | Program Start | | | +--------+--------+ | | v +--------+--------+ | | | Attempt Login | | | +-----------------+
This is, of course, a pretty extreme example, but this is how we handle thinking about complex program logic. We abstract it down to the barest basics so that we can visualize, in its simplest form, what the program is supposed to do. We don't actually care about the implementation unless we're digging into that specific part of the system, because otherwise we would be so bogged down by the details that we wouldn't be able to reason about what our program is supposed to do.
Likewise, we can use these abstractions to hide away the cyclomatic complexity underlying different components of our software. This keeps everything clean and clutter-free in our head. And the more we do to keep our smaller components simple and easy to think about, the easier the larger components are to deal with, no matter how much cyclomatic complexity all of those components share as a collective.
Final Thoughts
Cyclomatic complexity isn't a bad thing to have in your code. The concept itself is only intended to be used as one of many tools to assess when your code is accumulating too much technical debt. It's a warning sign that you may need to change something, nothing more. But it's an incredibly useful tool to have available to you and you should get comfortable using it.
As a general rule of thumb, you can usually just take a glance at your code and assess whether or not there's too much cyclomatic complexity in a component by looking for either of the following:
- Too many loops and/or conditional statements nested within each other, i.e. you have a lot of indentation.
- Many loops in the same function/method.
It's not a perfect rule of thumb, but it's useful for at least 90% of your development needs, and there will inevitably be cases where you will prefer to accept some greater cyclomatic complexity because there is some benefit that makes it a better trade-off. Making that judgment is up to you as a developer.
As always, I'm more than willing to listen to feedback and answer any questions!
25 votes -
An Alternative Approach to Configuration Management
Preface Different projects have different use cases that can ultimately result in common solutions not suiting your particular needs. Today I'm going to diverging a bit from my more abstract,...
Preface
Different projects have different use cases that can ultimately result in common solutions not suiting your particular needs. Today I'm going to diverging a bit from my more abstract, generalized topics on code quality and instead focus on a specific project structure example that I encountered.
Background
For a while now, I've found myself being continually frustrated with the state of my project configuration management. I had a single configuration file that would contain all of the configuration options for the various tools I've been using--database, API credentials, etc.--and I kept running into the problem of wanting to test these tools locally while not inadvertently committing and pushing sensitive credentials upstream. For me, part of my security process is ensuring that sensitive access credentials never make it into the repository and to limit access to these credentials to only people who need to be able to access them.
Monolithic Files Cause Monolithic Pain
The first thing I realized was that having a single monolithic configuration file was just terrible practice. There are going to be common configuration options that I want to have in there with default values, such as local database configuration pointing to a database instance running on the same VM as the application. These should always be in the repo, otherwise any dev who spins up an instance of the VM will need to manually tread documentation and copy-paste the missing options into the configuration. This would be incredibly time-consuming, inefficient, and stupid.
I also use different tools which have different configuration options associated with them. Having to dig through a single file containing configuration options for all of these tools to find the ones I need to modify is cumbersome at best. On top of that, having those common configuration options living in the same place that sensitive access credentials do is just asking for a rogue
git commit -A
to violate the aforementioned security protocol.
Same Problem, Different Structure
My first approach to resolving this problem was breaking the configuration out into separate files, one for each distinct tool. In each file, a "skeleton" config was generated, i.e. each option was given a default empty value. The main config would then only contain config options that are common and shared across the application. To avoid having the sensitive credentials leaked, I then created rules in the
.gitignore
to exclude these files.This is where I ran into problem #2. I learned that this just doesn't work. You can either have a file in your repo and have all changes to that file tracked, have the file in your repo and make a local-only change to prevent changes from being tracked, or leave the file out of the repo completely. In my use case, I wanted to be able to leave the file in the repo, treat it as ignored by everyone, and only commit changes to that file when there was a new configuration option I wanted added to it. Git doesn't support this use case whatsoever.
This problem turned out to be really common, but the solution suggested is to have two separate versions of your configuration--one for dev, and one for production--and to have a flag to switch between the two. Given the breaking up of my configuration, I would then need twice as many files to do this, and given my security practices, this would violate the no-upstream rule for sensitive credentials. Worse still, if I had several different kinds of environments with different configuration--local dev, staging, beta, production--then for
m
such environments andn
configuration files, I would need to maintainn*m
separate files for configuration alone. Finally, I would need to remember to include a prefix or postfix to each file name any time I needed to retrieve values from a new config file, which is itself an error-prone requirement. Overall, there would be a substantial increase in technical debt. In other words, this approach would not only not help, it would make matters worse!
Borrowing From Linux
After a lot of thought, an idea occurred to me: within Linux systems, there's an
/etc/skel/
directory that contains common files that are copied into a new user's home directory when that user is created, e.g..bashrc
and.profile
. You can make changes to these files and have them propagate to new users, or you can modify your own personal copy and leave all other new users unaffected. This sounds exactly like the kind of behavior I want to emulate!Following their example, I took my
$APPHOME/config/
directory and placed askel/
subdirectory inside, which then contained all of the config files with the empty default values within. My.gitignore
then looked something like this:$APPHOME/config/* !$APPHOME/config/main.php !$APPHOME/config/skel/ !$APPHOME/config/skel/* # This last one might not be necessary, but I don't care enough to test it without.
Finally, on deploying my local environment, I simply include a snippet in my script that enters the new
skel/
directory and copies any files inside intoconfig/
, as long as it doesn't already exist:cd $APPHOME/config/skel/ for filename in *; do if [ ! -f "$APPHOME/config/$filename" ]; then cp "$filename" "$APPHOME/config/$filename" fi done
(Note: production environments have a slightly different deployment procedure, as local copies of these config files are saved within a shared directory for all releases to point to via symlink.)
All of these changes ensure that only
config/main.php
and the files contained withinconfig/skel/
are whitelisted, while all others are ignored, i.e. our local copies that get stored withinconfig/
won't be inadvertently committed and pushed upstream!
Final Thoughts
Common solutions to problems are typically common for a good reason. They're tested, proven, and predictable. But sometimes you find yourself running into cases where the common, well-accepted solution to the problem doesn't work for you. Standards exist to solve a certain class of problems, and sometimes your problem is just different enough for it to matter and for those standards to not apply. Standards are created to address most cases, but edge cases will always exist. In other words, standards are guidelines, not concrete rules.
Sometimes you need to stop thinking about the problem in terms of the standard approach to solving it, and instead break it down into its most abstract, basic form and look for parallels in other solved problems for inspiration. Odds are the problem you're trying to solve isn't as novel as you think it is, and that someone has probably already solved a similar problem before. Parallels, in my experience, are usually a pretty good indicator that you're on the right track.
More importantly, there's a delicate line to tread between needing to use a different approach to solving an edge case problem you have, and needing to restructure your project to eliminate the edge case and allow the standard solution to work. Being able to decide which is more appropriate can have long-lasting repercussions on your ability to manage technical debt.
16 votes -
Code Quality Tip: Wrapping external libraries.
Preface Occasionally I feel the need to touch on the subject of code quality, particularly because of the importance of its impact on technical debt, especially as I continue to encounter the...
Preface
Occasionally I feel the need to touch on the subject of code quality, particularly because of the importance of its impact on technical debt, especially as I continue to encounter the effects of technical debt in my own work and do my best to manage it. It's a subject that is unfortunately not emphasized nearly enough in academia.
Background
As a refresher, technical debt is the long-term cost of the design decisions in your code. These costs can manifest in different ways, such as greater difficulty in understanding what your code is doing or making non-breaking changes to it. More generally, these costs manifest as additional time and resources being spent to make some kind of change.
Sometimes these costs aren't things you think to consider. One such consideration is how difficult it might be to upgrade a specific technology in your stack. For example, what if you've built a back-end system that integrates with AWS and you suddenly need to upgrade your SDK? In a small project this might be easy, but what if you've built a system that you've been maintaining for years and it relies heavily on AWS integrations? If the method names, namespaces, argument orders, or anything else has changed between versions, then suddenly you'll need to update every single reference to an AWS-related tool in your code to reflect those changes. In larger software projects, this could be a daunting and incredibly expensive task, spanning potentially weeks or even months of work and testing.
That is, unless you keep those references to a minimum.
A Toy Example
This is where "wrapping" your external libraries comes into play. The concept of "wrapping" basically means to create some other function or object that takes care of operating the functions or object methods that you really want to target. One example might look like this:
<?php class ImportedClass { public function methodThatMightBecomeModified($arg1, $arg2) { // Do something. } } class ImportedClassWrapper { private $class_instance = null; private function getInstance() { if(is_null($this->class_instance)) { $this->class_instance = new ImportedClass(); } return $this->class_instance; } public function wrappedMethod($arg1, $arg2) { return $this->getInstance()->methodThatMightBecomeModified($arg1, $arg2); } } ?>
Updating Tools Doesn't Have to Suck
Imagine that our
ImportedClass
has some important new features that we need to make use of that are only available in the most recent version, and we're several versions behind. The problem, of course, is that there were a lot of changes that ended up being made between our current version and the new version. For example,ImportedClass
is now calledNewImportedClass
. On top of that,methodThatMightBecomeModified
is now calledmethodThatWasModified
, and the argument order ended up getting switched around!Now imagine that we were directly calling
new ImportedClass()
in many different places in our code, as well as directly invokingmethodThatMightBecomeModified
:<?php $imported_class_instance = new ImportedClass(); $imported_class_instance->methodThatMightBeModified($val1, $val2); ?>
For every single instance in our code, we need to perform a replacement. There is a linear or--in terms of Big-O notation--a complexity of
O(n)
to make these replacements. If we assume that we only ever used this one method, and we used it 100 times, then there are 100 instances ofnew ImportClass()
to update and another 100 instances of the method invocation, equaling 200 lines of code to change. Furthermore, we need to remember each of the replacements that need to be made and carefully avoid making any errors in the process. This is clearly non-ideal.Now imagine that we chose instead to use the wrapper object:
<?php $imported_class_wrapper = new ImportedClassWrapper(); $imported_class_wrapper->wrappedMethod($val1, $val2); ?>
Our updates are now limited only to the wrapper class:
<?php class ImportedClassWrapper { private $class_instance = null; private function getInstance() { if(is_null($this->class_instance)) { $this->class_instance = new NewImportedClass(); } return $this->class_instance; } public function wrappedMethod($arg1, $arg2) { return $this->getInstance()->methodThatWasModified($arg2, $arg1); } } ?>
Rather than making changes to 200 lines of code, we've now made changes to only 2. What was once an
O(n)
complexity change has now turned into anO(1)
complexity change to make this upgrade. Not bad for a few extra lines of code!
A Practical Example
Toy problems are all well and good, but how does this translate to reality?
Well, I ran into such a problem myself once. Running MongoDB with PHP requires the use of an external driver, and this driver provides an object representing a MongoDB ObjectId. I needed to perform a migration from one hosting provider over to a new cloud hosting provider, with the application and database services, which were originally hosted on the same physical machine, hosted on separate servers. For security reasons, this required an upgrade to a newer version of MongoDB, which in turn required an upgrade to a newer version of the driver.
This upgrade resulted in many of the calls to
new MongoId()
failing, because the old version of the driver would accept empty strings and other invalid ID strings and default to generating a new ObjectId, whereas the new version of the driver treated invalid ID strings as failing errors. And there were many, many cases where invalid strings were being passed into the constructor.Even after spending hours replacing the (literally) several dozen instances of the constructor calls, there were still some places in the code where invalid strings managed to get passed in. This made for a very costly upgrade.
The bugs were easy to fix after the initial replacements, though. After wrapping
new MongoId()
inside of a wrapper function, a few additional conditional statements inside of the new function resolved the bugs without having to dig around the rest of the code base.
Final Thoughts
This is one of those lessons that you don't fully appreciate until you've experienced the technical debt of an unwrapped external library first-hand. Code quality is an active effort, but a worthwhile one. It requires you to be willing to throw away potentially hours or even days of work when you realize that something needs to change, because you're thinking about how to keep yourself from banging your head against a wall later down the line instead of thinking only about how to finish up your current task.
"Work smarter, not harder" means putting in some hard work upfront to keep your technical debt under control.
That's all for now, and remember: don't be fools, wrap your external tools.
23 votes -
You should work at identifying technical debt, paying it down, and making sure it doesn’t keep popping up as fast as you can knock it down.
10 votes