• Activity
  • Votes
  • Comments
  • New
  • All activity
  • Showing only topics in ~comp with the tag "refactoring". Back to normal view / Search all groups
    1. Code Quality Tip: Cyclomatic complexity in depth.

      Preface Recently I briefly touched on the subject of cyclomatic complexity. This is an important concept for any programmer to understand and think about as they write their code. In order to...

      Preface

      Recently I briefly touched on the subject of cyclomatic complexity. This is an important concept for any programmer to understand and think about as they write their code. In order to provide a more solid understanding of the subject, however, I feel that I need to address the topic more thoroughly with a more practical example.


      What is cyclomatic complexity?

      The concept of "cyclomatic complexity" is simple: the more conditional branching and looping in your code, the more complex--and therefore the more difficult to maintain--that code is. We can visualize this complexity by drawing a diagram that illustrates the flow of logic in our program. For example, let's take the following toy example of a user login attempt:

      <?php
      
      $login_data = getLoginCredentialsFromInput();
      
      $login_succeeded = false;
      $error = '';
      if(usernameExists($login_data['username'])) {
          $user = getUser($login_data['username']);
          
          if(!isDeleted($user)) {
              if(!isBanned($user)) {
                  if(!loginRateLimitReached($user)) {
                      if(passwordMatches($user, $login_data['password'])) {
                          loginUser($user);
                          $login_succeeded = true;
                      } else {
                          $error = getBadPasswordError();
                          logBadLoginAttempt();
                      }
                  } else {
                      $error = getLoginRateLimitError($user);
                  }
              } else {
                  $error = getUserBannedError($user);
              }
          } else {
              $error = getUserDeletedError($user);
          }
      } else {
          $error = getBadUsernameError($login_data['username']);
      }
      
      if($login_succeeded) {
          sendSuccessResponse();
      } else {
          sendErrorResponse($error);
      }
      
      ?>
      

      A diagram for this logic might look something like this:

      +-----------------+
      |                 |
      |  Program Start  |
      |                 |
      +--------+--------+
               |
               |
               v
      +--------+--------+    +-----------------+
      |                 |    |                 |
      |    Username     +--->+    Set Error    +--+
      |    Exists?      | No |                 |  |
      |                 |    +-----------------+  |
      +--------+--------+                         |
               |                                  |
           Yes |                                  |
               v                                  |
      +--------+--------+    +-----------------+  |
      |                 |    |                 |  |
      |  User Deleted?  +--->+    Set Error    +->+
      |                 | Yes|                 |  |
      +--------+--------+    +-----------------+  |
               |                                  |
            No |                                  |
               v                                  |
      +--------+--------+    +-----------------+  |
      |                 |    |                 |  |
      |  User Banned?   +--->+    Set Error    +->+
      |                 | Yes|                 |  |
      +--------+--------+    +-----------------+  |
               |                                  |
            No |                                  |
               v                                  |
      +--------+--------+    +-----------------+  |
      |                 |    |                 |  |
      |   Login Rate    +--->+    Set Error    +->+
      | Limit Reached?  | Yes|                 |  |
      |                 |    +-----------------+  |
      +--------+--------+                         |
               |                                  |
            No |                                  |
               v                                  |
      +--------+--------+    +-----------------+  |
      |                 |    |                 |  |
      |Password Matches?+--->+    Set Error    +->+
      |                 | No |                 |  |
      +--------+--------+    +-----------------+  |
               |                                  |
           Yes |                                  |
               v                                  |
      +--------+--------+    +----------+         |
      |                 |    |          |         |
      |   Login User    +--->+ Converge +<--------+
      |                 |    |          |
      +-----------------+    +---+------+
                                 |
                                 |
               +-----------------+
               |
               v
      +--------+--------+
      |                 |
      |   Succeeded?    +-------------+
      |                 | No          |
      +--------+--------+             |
               |                      |
           Yes |                      |
               v                      v
      +--------+--------+    +--------+--------+
      |                 |    |                 |
      |  Send Success   |    |   Send Error    |
      |    Message      |    |    Message      |
      |                 |    |                 |
      +-----------------+    +-----------------+
      

      It's important to note that between nodes in this directed graph, you can find certain enclosed regions being formed. Specifically, each conditional branch that converges back into the main line of execution generates an additional region. The number of these distinct enclosed regions is directly proportional to the level of cyclomatic complexity of the system--that is, more regions means more complicated code.


      Clocking out early.

      There's an important piece of information I noted when describing the above example:

      . . . each conditional branch that converges back into the main line of execution generates an additional region.

      The above example is made complex largely due to an attempt to create a single exit point at the end of the program logic, causing these conditional branches to converge and thus generate the additional enclosed regions within our diagram.

      But what if we stopped trying to converge back into the main line of execution? What if, instead, we decided to interrupt the program execution as soon as we encountered an error? Our code might look something like this:

      <?php
      
      $login_data = getLoginCredentialsFromInput();
      
      if(!usernameExists($login_data['username'])) {
          sendErrorResponse(getBadUsernameError($login_data['username']));
          return;
      }
      
      $user = getUser($login_data['username']);
      if(isDeleted($user)) {
          sendErrorResponse(getUserDeletedError($user));
          return;
      }
      
      if(isBanned($user)) {
          sendErrorResponse(getUserBannedError($user));
          return;
      }
      
      if(loginRateLimitReached($user)) {
          logBadLoginAttempt($user);
          sendErrorResponse(getLoginRateLimitError($user));
          return;
      }
      
      if(!passwordMatches($user, $login_data['password'])) {
          logBadLoginAttempt($user);
          sendErrorResponse(getBadPasswordError());
          return;
      }
      
      loginUser($user);
      sendSuccessResponse();
      
      ?>
      

      Before we've even constructed a diagram for this logic, we can already see just how much simpler this logic is. We don't need to traverse a tree of if statements to determine which error message has priority to be sent out, we don't need to attempt to follow indentation levels, and our behavior on success is right at the very end and at the lowest level of indentation, where it's easily and obviously located at a glance.

      Now, however, let's verify this reduction in complexity by examining the associated diagram:

      +-----------------+
      |                 |
      |  Program Start  |
      |                 |
      +--------+--------+
               |
               |
               v
      +--------+--------+    +-----------------+
      |                 |    |                 |
      |    Username     +--->+   Send Error    |
      |    Exists?      | No |    Message      |
      |                 |    |                 |
      +--------+--------+    +-----------------+
               |
           Yes |
               v
      +--------+--------+    +-----------------+
      |                 |    |                 |
      |  User Deleted?  +--->+   Send Error    |
      |                 | Yes|    Message      |
      +--------+--------+    |                 |
               |             +-----------------+
            No |
               v
      +--------+--------+    +-----------------+
      |                 |    |                 |
      |  User Banned?   +--->+   Send Error    |
      |                 | Yes|    Message      |
      +--------+--------+    |                 |
               |             +-----------------+
            No |
               v
      +--------+--------+    +-----------------+
      |                 |    |                 |
      |   Login Rate    +--->+   Send Error    |
      | Limit Reached?  | Yes|    Message      |
      |                 |    |                 |
      +--------+--------+    +-----------------+
               |
            No |
               v
      +--------+--------+    +-----------------+
      |                 |    |                 |
      |Password Matches?+--->+   Send Error    |
      |                 | No |    Message      |
      +--------+--------+    |                 |
               |             +-----------------+
           Yes |
               v
      +--------+--------+
      |                 |
      |   Login User    |
      |                 |
      +--------+--------+
               |
               |
               v
      +--------+--------+
      |                 |
      |  Send Success   |
      |    Message      |
      |                 |
      +-----------------+
      

      Something should immediately stand out here: there are no enclosed regions in this diagram! Furthermore, even our new diagram is much simpler to follow than the old one was.


      Reality is rarely simple.

      The above is a really forgiving example. It has no loops, and loops are going to create enclosed regions that can't be broken apart so easily; it has no conditional branches that are so tightly coupled with the main path of execution that they can't be broken up; and the scope of functionality and side effects are minimal. Sometimes you can't break those regions up. So what do we do when we inevitably encounter these cases?

      High cyclomatic complexity in your program as a whole is inevitable for sufficiently large projects, especially in a production environment, and your efforts to reduce it can only go so far. In fact, I don't recommend trying to remove all or even most instances of cyclomatic complexity at all--instead, you should just be keeping the concept in mind to determine whether or not a function, method, class, module, or other component of your system is accumulating technical debt and therefore in need of refactoring.

      At this point, astute readers might ask, "How does refactoring help if the cyclomatic complexity doesn't actually go away?", and this is a valid concern. The answer to that is simple, however: we're hiding complexity behind abstractions.

      To test this, let's forget about cyclomatic complexity for a moment and instead focus on simplifying the refactored version of our toy example using abstraction:

      <?php
      
      function handleLoginAttempt($login_data) {
          if(!usernameExists($login_data['username'])) {
              sendErrorResponse(getBadUsernameError($login_data['username']));
              return;
          }
      
          $user = getUser($login_data['username']);
          if(isDeleted($user)) {
              sendErrorResponse(getUserDeletedError($user));
              return;
          }
      
          if(isBanned($user)) {
              sendErrorResponse(getUserBannedError($user));
              return;
          }
      
          if(loginRateLimitReached($user)) {
              logBadLoginAttempt($user);
              sendErrorResponse(getLoginRateLimitError($user));
              return;
          }
      
          if(!passwordMatches($user, $login_data['password'])) {
              logBadLoginAttempt($user);
              sendErrorResponse(getBadPasswordError());
              return;
          }
      
          loginUser($user);
          sendSuccessResponse();
      }
      
      $login_data = getLoginCredentialsFromInput();
      
      handleLoginAttempt($login_data);
      
      ?>
      

      The code above is functionally identical to our refactored example from earlier, but has an additional abstraction via a function. Now we can diagram this higher-level abstraction as follows:

      +-----------------+
      |                 |
      |  Program Start  |
      |                 |
      +--------+--------+
               |
               |
               v
      +--------+--------+
      |                 |
      |  Attempt Login  |
      |                 |
      +-----------------+
      

      This is, of course, a pretty extreme example, but this is how we handle thinking about complex program logic. We abstract it down to the barest basics so that we can visualize, in its simplest form, what the program is supposed to do. We don't actually care about the implementation unless we're digging into that specific part of the system, because otherwise we would be so bogged down by the details that we wouldn't be able to reason about what our program is supposed to do.

      Likewise, we can use these abstractions to hide away the cyclomatic complexity underlying different components of our software. This keeps everything clean and clutter-free in our head. And the more we do to keep our smaller components simple and easy to think about, the easier the larger components are to deal with, no matter how much cyclomatic complexity all of those components share as a collective.


      Final Thoughts

      Cyclomatic complexity isn't a bad thing to have in your code. The concept itself is only intended to be used as one of many tools to assess when your code is accumulating too much technical debt. It's a warning sign that you may need to change something, nothing more. But it's an incredibly useful tool to have available to you and you should get comfortable using it.

      As a general rule of thumb, you can usually just take a glance at your code and assess whether or not there's too much cyclomatic complexity in a component by looking for either of the following:

      • Too many loops and/or conditional statements nested within each other, i.e. you have a lot of indentation.
      • Many loops in the same function/method.

      It's not a perfect rule of thumb, but it's useful for at least 90% of your development needs, and there will inevitably be cases where you will prefer to accept some greater cyclomatic complexity because there is some benefit that makes it a better trade-off. Making that judgment is up to you as a developer.

      As always, I'm more than willing to listen to feedback and answer any questions!

      25 votes
    2. Light Analysis of a Recent Code Refactor

      Preface In a previous topic, I'd covered the subject of a few small lessons regarding code quality. Especially important was the impact on technical debt, which can bog down developer...

      Preface

      In a previous topic, I'd covered the subject of a few small lessons regarding code quality. Especially important was the impact on technical debt, which can bog down developer productivity, and the need to pay down on that debt. Today I would like to touch on a practical example that I'd encountered in a production environment.


      Background

      Before we can discuss the refactor itself, it's important to be on the same page regarding the technologies being used. In my case, I work with PHP utilizing a proprietary back-end framework and MongoDB as our database.

      PHP is a server-side scripting language. Like many scripting languages, it's loosely typed. This has some benefits and drawbacks.

      MongoDB is a document-oriented database. By default it's schema-less, allowing you to make any changes at will without an update to schema. This can blend pretty well with the loose typing of PHP. Each document is represented using a JSON-like structure and is stored in something called a "collection". For those of you accustomed to using relational database, a "collection" is analogous to a table, each document is a row, and each field in the document is a column. A typical query in the MongoDB shell would look something like this:

      db.users.findOne({
          username: "Emerald_Knight"
      });
      

      The framework itself has some framework-specific objects that are held in global references. This makes them easily accessible, but naturally littering your code with a bunch of globals is both error-prone and an eyesore.


      Unexpected Spaghetti

      In my code base are a number of different objects that are designed to handle basic CRUD-like operations on their associated database entries. Some of these objects hold references to other objects, so naturally there is some data validation that occurs to ensure that the references are both valid and authorized. Pretty typical stuff.

      What I noticed, however, is that the collection names for these database entries were littered throughout my code. This isn't necessarily a bad thing, except there were some use cases that came to mind: what if it turned out that my naming for one or more of these collections wasn't ideal? What if I wanted to change a collection name for the sake of easier management on the database end? What if I have a tendency to forget the name of a database collection and constantly have to look it up? What if I make a typo of all things? On top of that, the framework's database object was stored in a global variable.

      These seemingly minor sources of technical debt end up adding up over time and could cause some serious problems in the worst case. I've had breaking bugs make their way passed QA in the past, after all.


      Exchanging Spaghetti for Some Light Lasagna

      The problem could be characterized simply: there were scoping problems and too many references to what were essentially magic strings. The solution, then, was to move the database object reference from global to local scope within the application code and to eliminate the problem of magic strings. Additionally, it's a good idea to avoid polluting the namespace with an over-reliance on constants, and using those constants for database calls can also become unsightly and difficult to follow as those constants could end up being generally disconnected from the objects they're associated with.

      There turned out to be a nice, object-oriented, very PHP-like solution to this problem: a so-called "magic method" named "__call". This method is invoked whenever an "inaccessible" method is called on the object. Using this method, a database command executed on a non-database object could pass the command to the database object itself. If this logic were placed within an abstract class, the collection could then be specified simply as a configuration option in the inheriting class.

      This is what such a solution could look like:

      <?php
      
      abstract class MyBaseObject {
      
          protected $db = null;
          protected $collection_name = null;
      
          public function __construct() {
              global $db;
              
              $this->db = $db;
          }
      
          public function __call($method_name, $args) {
              if(method_exists($this->db, $method_name)) {
                  return $this->executeDatabaseCommand($method_name, $args);
              }
      
              throw new Exception(__CLASS__ . ': Method "' . $method_name . '" does not exist.');
          }
      
          public function executeDatabaseCommand($command, $args) {
              $collection = $this->collection_name;
              $db_collection = $this->db->$collection;
      
              return call_user_func_array(array($db_collection, $command), $args);
          }
      }
      
      class UserManager extends MyBaseObject {
          protected $collection_name = 'users';
      
          public function __construct() {
              parent::__construct();
          }
      }
      
      $user_manager = new UserManager();
      $my_user = $user_manager->findOne(array('username'=>'Emerald_Knight'));
      
      ?>
      

      This solution utilizes a single parent object which transforms a global database object reference into a local one, eliminating the scope issue. The collection name is specified as a class property of the inheriting object and only used in a single place in the parent object, eliminating the magic string and namespace polluting issues. Any time you perform queries on users, you do so by using the UserManager class, which guarantees that you will always know that your queries are being performed on the objects that you intend. And finally, if the collection name for an object class ever needs to be updated, it's a simple matter of modifying the single instance of the class property $collection_name, rather than tracking down some disconnected constant.


      Limitations

      This, of course, doesn't solve all of the existing problems. After all, executing the database queries for one object directly from another is still pretty bad practice, violating the principle of separation of concerns. Instead, those queries should generally be encapsulated within object methods and the objects themselves given primary responsibility in handling associated data. It's also incredibly easy to inadvertently override a database method, e.g. defining a findOne() method on UserManager, so there's still some mindfulness required on the part of the programmer.

      Still, given the previous alternative, this is a pretty major improvement, especially for an initial refactor.


      Final Thoughts

      As always, technical debt is both necessary and inevitable. After all, in exchange for not taking the excess time and considering structuring my code this way in the beginning, I had greater initial velocity to get the project off of the ground. What's important is continually reviewing your code as you're building on top of it so that you can identify bottlenecks as they begin to strain your efficiency, and getting those bottlenecks out of the way.

      In other words, even though technical debt is often necessary and is certainly inevitable, it's important to pay down on some of that debt once it starts getting expensive!

      7 votes