7 votes

Light Analysis of a Recent Code Refactor

Posted October 27, 2018 by Emerald_Knight (edited October 27, 2018)

Tags: programming.code quality tips, coding, refactoring, structure, recurring.infrequent

Preface

In a previous topic, I'd covered the subject of a few small lessons regarding code quality. Especially important was the impact on technical debt, which can bog down developer productivity, and the need to pay down on that debt. Today I would like to touch on a practical example that I'd encountered in a production environment.

Background

Before we can discuss the refactor itself, it's important to be on the same page regarding the technologies being used. In my case, I work with PHP utilizing a proprietary back-end framework and MongoDB as our database.

PHP is a server-side scripting language. Like many scripting languages, it's loosely typed. This has some benefits and drawbacks.

MongoDB is a document-oriented database. By default it's schema-less, allowing you to make any changes at will without an update to schema. This can blend pretty well with the loose typing of PHP. Each document is represented using a JSON-like structure and is stored in something called a "collection". For those of you accustomed to using relational database, a "collection" is analogous to a table, each document is a row, and each field in the document is a column. A typical query in the MongoDB shell would look something like this:

db.users.findOne({
    username: "Emerald_Knight"
});

The framework itself has some framework-specific objects that are held in global references. This makes them easily accessible, but naturally littering your code with a bunch of globals is both error-prone and an eyesore.

Unexpected Spaghetti

In my code base are a number of different objects that are designed to handle basic CRUD-like operations on their associated database entries. Some of these objects hold references to other objects, so naturally there is some data validation that occurs to ensure that the references are both valid and authorized. Pretty typical stuff.

What I noticed, however, is that the collection names for these database entries were littered throughout my code. This isn't necessarily a bad thing, except there were some use cases that came to mind: what if it turned out that my naming for one or more of these collections wasn't ideal? What if I wanted to change a collection name for the sake of easier management on the database end? What if I have a tendency to forget the name of a database collection and constantly have to look it up? What if I make a typo of all things? On top of that, the framework's database object was stored in a global variable.

These seemingly minor sources of technical debt end up adding up over time and could cause some serious problems in the worst case. I've had breaking bugs make their way passed QA in the past, after all.

Exchanging Spaghetti for Some Light Lasagna

The problem could be characterized simply: there were scoping problems and too many references to what were essentially magic strings. The solution, then, was to move the database object reference from global to local scope within the application code and to eliminate the problem of magic strings. Additionally, it's a good idea to avoid polluting the namespace with an over-reliance on constants, and using those constants for database calls can also become unsightly and difficult to follow as those constants could end up being generally disconnected from the objects they're associated with.

There turned out to be a nice, object-oriented, very PHP-like solution to this problem: a so-called "magic method" named "__call". This method is invoked whenever an "inaccessible" method is called on the object. Using this method, a database command executed on a non-database object could pass the command to the database object itself. If this logic were placed within an abstract class, the collection could then be specified simply as a configuration option in the inheriting class.

This is what such a solution could look like:

<?php

abstract class MyBaseObject {

    protected $db = null;
    protected $collection_name = null;

    public function __construct() {
        global $db;
        
        $this->db = $db;
    }

    public function __call($method_name, $args) {
        if(method_exists($this->db, $method_name)) {
            return $this->executeDatabaseCommand($method_name, $args);
        }

        throw new Exception(__CLASS__ . ': Method "' . $method_name . '" does not exist.');
    }

    public function executeDatabaseCommand($command, $args) {
        $collection = $this->collection_name;
        $db_collection = $this->db->$collection;

        return call_user_func_array(array($db_collection, $command), $args);
    }
}

class UserManager extends MyBaseObject {
    protected $collection_name = 'users';

    public function __construct() {
        parent::__construct();
    }
}

$user_manager = new UserManager();
$my_user = $user_manager->findOne(array('username'=>'Emerald_Knight'));

?>

This solution utilizes a single parent object which transforms a global database object reference into a local one, eliminating the scope issue. The collection name is specified as a class property of the inheriting object and only used in a single place in the parent object, eliminating the magic string and namespace polluting issues. Any time you perform queries on users, you do so by using the UserManager class, which guarantees that you will always know that your queries are being performed on the objects that you intend. And finally, if the collection name for an object class ever needs to be updated, it's a simple matter of modifying the single instance of the class property $collection_name, rather than tracking down some disconnected constant.

Limitations

This, of course, doesn't solve all of the existing problems. After all, executing the database queries for one object directly from another is still pretty bad practice, violating the principle of separation of concerns. Instead, those queries should generally be encapsulated within object methods and the objects themselves given primary responsibility in handling associated data. It's also incredibly easy to inadvertently override a database method, e.g. defining a findOne() method on UserManager, so there's still some mindfulness required on the part of the programmer.

Still, given the previous alternative, this is a pretty major improvement, especially for an initial refactor.

Final Thoughts

As always, technical debt is both necessary and inevitable. After all, in exchange for not taking the excess time and considering structuring my code this way in the beginning, I had greater initial velocity to get the project off of the ground. What's important is continually reviewing your code as you're building on top of it so that you can identify bottlenecks as they begin to strain your efficiency, and getting those bottlenecks out of the way.

In other words, even though technical debt is often necessary and is certainly inevitable, it's important to pay down on some of that debt once it starts getting expensive!