10 votes

How do you plan or outline a program?

I’m currently studying Python Object Oriented programming and got to a point where logic and syntax are the least of my problems. I always get to a stage where I’m completely lost between modules, classes, objects and a sea of “selfs”.

I’m not doing anything too complicated, just small projects for practice, but I think I would benefit from planning. My mental processes are highly disorganized (ADHD) and I need all the help I can get with that.

I don’t need an automated tool (even though it might come in handy) -- sketching things out on paper is probably enough.

I only know about UML, which seems fine. Can anyone recommend a tutorial about this and other tools?

Edit to link my last attempt at following a tutorial:

This is the last tutorial I tried to follow, a Pygame project from the book Python Crash Course 2ed. Following tutorials is frequently mostly typing, so what I achieved there is not a real representation of my abilities -- I would not be able to do something like that on my own. In fact, I failed to answer the latest exercises, which were basically smaller versions of this project.

My problem is not with syntax and the basics of how OOP works, but rather with memory and organization of information.

16 comments

  1. [5]
    spit-evil-olive-tips
    Link
    This is a huge topic, so I'm only going to be able to scratch the surface of it. In general, if you're still learning this, ignore fancy tools and standards like UML entirely. Learn to diagram on...
    • Exemplary

    This is a huge topic, so I'm only going to be able to scratch the surface of it.

    In general, if you're still learning this, ignore fancy tools and standards like UML entirely. Learn to diagram on paper, or by writing plain old comments in your text editor / IDE that explain the design. Once you have a good handle on that, you'll be able to decide which, if any, tools you want to use.

    You mentioned confusion between modules/classes/objects, so I'm going to zero in on that.

    Say you have some Python code with a bunch of methods that look like this (but with real names, obviously):

    def method1(a, b, c, d):
      ...
    
    def method2(a, b, c, e, f, g):
      ...
    
    def method3(a, b, c, x):
      ...
    
    def method4(a, b, c, x, y, z):
      ...
    

    One thing to notice here is that a, b, c tend to get passed around as a group. This is called cohesion, because the variables "stick together" in the parameter lists.

    When I see code like this, a refactoring that always comes to my mind is that my code should reflect the fact that these three variables are tied together. Using the dataclasses that are in Python 3.7+:

    @dataclass
    class Foo:
      a: str
      b: int
      c: dict
    

    Then I change the methods to accept an instance of my Foo class rather than individual a, b, c parameters:

    def method1(foo, d):
      ...
    
    def method2(foo, e, f, g):
      ...
    
    def method3(foo, x):
      ...
    
    def method4(foo, x, y, z):
      ...
    

    In addition to the a, b, c parameters "sticking together", I've also got method cohesion. All my methods do something with a Foo object. And frequently, I'll have something like this, where method1 is a sort of "primary" method that calls the others as helpers, and just passes along the same Foo object:

    def method1(foo, d):
      if d is None:
        method2(foo, some_e, some_f, some_g)
      else:
        some_x = some_function(d)
        method3(foo, some_x)
    

    When I see method cohesion like this, it's a sign that I should wrap those cohesive methods into a class, and "lift" their foo parameter so that it's an argument to the constructor rather than an argument to each individual method.

    class FooProcessor:
      def __init__(self, foo):
        self.foo = foo
    
      def method1(self, d):
        if d is None:
          self.method2(some_e, some_f, some_g)
        else:
          some_x = some_function(d)
          self.method3(some_x)
    
      def method2(self, e, f, g):
        ...
    
      def method3(self, x):
        ...
    
      def method4(self, x, y, z):
        ...
    

    Every method in the class, because it has the self parameter, has access to self.foo. Since there's only one member variable, this may not seem like a big change from just passing foo directly. But suppose all of our methods also took a Bar parameter. we could add that as a parameter to the constructor, and a member variable, very easily:

    class FooProcessor:
      def __init__(self, foo, bar):
        self.foo = foo
        self.bar = bar
    

    And then, every method within the class also has access to self.bar.

    A book I'd highly recommend that goes into a lot of detail along these lines is Code Complete. The first edition is from the early 90s and the second is from the aughts, but it holds up with age better than almost any other programming book I've ever read.

    16 votes
    1. [3]
      mrbig
      (edited )
      Link Parent
      That was, without exaggeration, the best explanation on the usefulness of OOP I have ever read. Thank you very much for this. I already knew the basics of how self Python OOP works in general, but...

      That was, without exaggeration, the best explanation on the usefulness of OOP I have ever read. Thank you very much for this.

      I already knew the basics of how self Python OOP works in general, but the books and tutorials I've read before jump straight from functions to classes without providing consistent reasons for their use. "cohesion" and "method cohesion" are new concepts for me, but I can already see their usefulness.

      This book looks awesome. Does it contain information similar to what you shared?

      And why did you use a dataclass in one example and a regular class in the other? Is there a clear disadvantage in using them? I prefer a cleaner syntax myself. Won't dataclass works pretty much like a regular class, sharing its self with other methods?

      7 votes
      1. spit-evil-olive-tips
        Link Parent
        You're welcome! Glad you found it helpful. I think Code Complete is exactly the sort of book you're looking for. It's not an "intro to programming" type book, it's targeted at people who already...

        You're welcome! Glad you found it helpful.

        I think Code Complete is exactly the sort of book you're looking for. It's not an "intro to programming" type book, it's targeted at people who already know the nuts and bolts of programming and are struggling with the things you're struggling with.

        With the example classes I gave, there really are two different types of classes. I think of them as data classes and worker classes.

        side note: the @dataclass stuff added in Python recently is just one way of writing data classes. You can also use a 3rd-party library like attrs, or write them yourself (though this involves a fair bit of boilerplate code). If you read the PEP that introduced Python dataclasses a big part of their motivation is having a convenient way to do data class stuff within the Python standard library, where relying on a 3rd-party library like attrs isn't an option. So don't worry too much about specifics of Python's @dataclass, and focus on the underlying idea of "data classes". You could do all of this in Java or C++ that doesn't have the @dataclass magic that Python has.

        The Foo class in my example above is a data class. It's just data, it doesn't do anything. The FooProcessor class is a worker class. It does things with the data in a Foo object.

        For a concrete example, let's suppose I have a Raspberry Pi with a little temperature & humidity sensor wired to it. I want to write a script that measures the temperature & humidity once a minute, and sends it off to a database. Then from that database I can draw graphs of temperature over time, or figure out averages, etc.

        I can write the skeleton of my program pretty easily:

        def main():
          while True:
            temperature, humidity = read_from_sensor()
            timestamp = time.time()
            write_to_database(timestamp, temperature, humidity)
            time.sleep(60)
        

        Now, I could write, all in the same single .py file, the read_from_sensor and write_to_database methods, plus all their helper methods. That would get unwieldy pretty fast. So instead I'm going to split it up into classes. Here's a data class:

        @attrs
        class WeatherMeasurement:
          timestamp = attrib(type=int)
          temperature = attrib(type=float)
          humidity = attrib(type=float)
        

        It's just a bundle of data representing a measurement. At some time, we read the sensor and got a temperature and humidity value from it.

        I've switched to using an @attrs class instead of an @dataclass to underscore the point that the idea of data classes is what's important, not the specifics of how @dataclass functions in Python.

        And I've also got some worker classes:

        class SensorReader:
          def __init__(self, device_path):
            self.device_path = device_path
        
          def read(self):
            ...
            return measurement
        

        This class knows all the low-level electronic details of how to talk to the sensor I have connected to my Raspberry Pi, and decode its output format. It has a read() method, which returns an instance of the WeatherMeasurement object. This is encapsulation - I ask the object to read the sensor, and it gives me back a measurement. I don't have to think about how it reads from the sensor. If I'm working on a project with other people, someone else might write that class, and I just call its read method. The class might even be provided in a third-party library - say the manufacturer of the sensor wrote a Python library to handle the low-level details of talking to the sensor for you.

        In the constructor I give it a device path (or some other way of identifying which sensor I'm talking to). What that's useful for is, suppose I decide to connect two sensors to one Raspberry Pi, and have one sensor inside a window and the other outside. All I need to do is create two instances of the SensorReader class, and pass in the correct device path for each one.

        My other worker class takes those measurements and knows how to write them to a database, let's say over HTTP:

        class MeasurementRecorder:
          def __init__(self, database_url):
            self.database_url = database_url
        
          def record(self, measurement):
            ...
        

        My main method now looks like this:

        def main():
          sensor = SensorReader('/dev/sensor0')
          recorder = MeasurementRecorder('http://localhost:8888')
          while True:
            measurement = sensor.read()
            recorder.record(measurement)
            time.sleep(60)
        

        Two worker classes, passing a data class between them, which is a very common thing to do.

        A really useful thing to note here is that my worker objects are initialized once, at the beginning of the script, and then re-used over and over again. That's useful because worker classes very often have some long-lived state they keep around as member variables. For my sensor device, I'm opening up some kind of connection to it, and I might want to keep that connection open rather than closing & re-opening it every minute. If that's every minute, it may not be a big deal, but imagine I want to record every second, or even 10 times a second.

        That's even more important for the recorder - if I'm talking to the database over HTTP I'd want to use something like requests.Session for connection keep-alives. Otherwise I'd be opening up a whole new HTTP connection (including the TCP handshake and possibly the TLS negotiation too) every single second, which can add significant overhead.

        Where this object-oriented design really shines is when things start to change.

        Suppose you're using FooDatabase to store your weather measurements, and you aren't quite happy with it. You read on Hacker News about this hot new BarDatabase you want to try. BarDatabase uses a different input format, might use SQL instead of HTTP, or whatever. That's fine. Your current MeasurementRecorder class is really a FooDatabaseRecorder, so you rename it, and you write a brand new and different BarDatabaseRecorder class. But it has the same interface - a record() method that takes a Measurement object. Then with a couple lines changed in your main() function, you can write to both databases at the same time to see which one you like best. When you settle on which one you like, it's really easy to rip out the legacy code, because you only change a couple lines in main() then delete one entire class. Or, you might decide they're useful for different things and write to both permanently.

        Similarly, in the example I gave above of having an indoor and outdoor temperature sensor - let's say the outdoor temperature sensor needs to be replaced with a different model because it's not waterproof enough. And the better, more waterproof model has a different low-level interface. Same deal, you write a 2nd WaterproofSensorReader class (or, if the original model is an ABC1234 sensor and the waterproof one is model DEF9876, you can name your classes after that). Both classes have the same interface - a read() method that returns a Measurement object. Another couple lines changed in your main() method, and you're reading Measurement objects from both.

        The really important thing to notice here is that reading from the sensor and writing to the database are decoupled - you can do one without the other. Coupling, along with cohesion that I mentioned in the other comment, are the two crucial things to understand for the why of OOP.

        Another big benefit of decoupling is ease of development / testing. When you're writing & debugging your WaterproofSensorReader class, you don't need to set up a database (either a Foo or Bar one). You can just create an instance of that Reader class and ask it for a Measurement. And you can write the BarDatabaseRecorder class without needing access to any sensor hardware at all. All it needs to do is accept a Measurement and do the right thing with it, so you can feed it canned example data. -500 degrees and 200 percent humidity yesterday.

        5 votes
      2. bhrgunatha
        Link Parent
        Yes, a dataclass is a regular python class and python doesn't treat them any differently. I think there are 2 main reasons to use them: For you to declare they are primarily intended for storing...

        Yes, a dataclass is a regular python class and python doesn't treat them any differently.

        I think there are 2 main reasons to use them:

        1. For you to declare they are primarily intended for storing state (data) rather than any processing or manipulation of that state. You can still write as many methods as you want but if they aren't trivial I think it's better to make it a regular class.
        2. A shorthand to declare such classes otherwise called syntactic sugar. To make it easier to read and write code for those cases.
        4 votes
    2. skybrian
      Link Parent
      Nit: at the beginning of this explanation, in the first bit of code, there are technically no methods yet, only functions.

      Nit: at the beginning of this explanation, in the first bit of code, there are technically no methods yet, only functions.

      2 votes
  2. [2]
    skybrian
    (edited )
    Link
    I would be careful not to get too deep into object orientation and other ways of organizing code too early. My approach is to delay organization until I'm sure I need it, refactoring as I go. I...

    I would be careful not to get too deep into object orientation and other ways of organizing code too early.

    My approach is to delay organization until I'm sure I need it, refactoring as I go. I stick with a single file until it gets unwieldy. Maybe start with "hello world" (whatever that means for your app) and start coding in the main method. Extract a function after you've written the code inline and you have a block of code that works and would make sense on its own. Once you have a few functions that operate on the same data structure, then it's time to consider a class. Once you have a class, consider what functionality you've already written that belongs there.

    At least, that's what I do when in doubt. Sometimes you know ahead of time the structure you want.

    Edit: It's kind of hard to talk about in the abstract, though. If you're willing to post some code publicly then you could get more specific advice.

    7 votes
    1. mrbig
      (edited )
      Link Parent
      I don't intend to. I'm just focusing on it for learning purposes. I don't even like OOP, to be honest. This is the last tutorial I tried to follow, a Pygame project from the book Python Crash...

      I would be careful not to get too deep into object orientation and other ways of organizing code too early.

      I don't intend to. I'm just focusing on it for learning purposes. I don't even like OOP, to be honest.

      It's kind of hard to talk about in the abstract, though. If you're willing to post some code publicly then you could get more specific advice.

      This is the last tutorial I tried to follow, a Pygame project from the book Python Crash Course 2ed. Following tutorials is frequently mostly typing, so what I achieved there is not a real representation of my abilities -- I would not be able to do something like that on my own. In fact, I failed to answer the latest exercises, which were basically smaller versions of this project.

      I must notice that I don't have a real problem with syntax and logic, but rather with memory and organization -- I literally cannot keep more than two or three things in my active memory at the same and will eventually forget what's going on. Thins can quickly become overwhelming. Curiously, I don't have this issue with Emacs lisp. Lisp just feels reasonable for me. Sadly, it won't help to get a job, and that's something I need right now.

      This is great advice, thank you very much!

      3 votes
  3. [2]
    Akir
    Link
    Since you are just starting out you might just want to start programming and see how well it turns out. If it really is not important, you can write absolute spaghetti code so you can get an idea...

    Since you are just starting out you might just want to start programming and see how well it turns out. If it really is not important, you can write absolute spaghetti code so you can get an idea where you personally need to organize.

    Personally I start off by writing a skeleton of all the classes and methods I need. Then I try to write the documentation first. Then I realize as I am writing that I made logical mistakes as I was planning and rewrite everything. 😸

    5 votes
    1. mrbig
      (edited )
      Link Parent
      I appreciate the advice and it's certainly good for most cases, but this approach makes me extremely anxious and triggers my ADHD big time. It's hard for me to do anything without a neat...

      Since you are just starting out you might just want to start programming and see how well it turns out.

      I appreciate the advice and it's certainly good for most cases, but this approach makes me extremely anxious and triggers my ADHD big time. It's hard for me to do anything without a neat environment, I can't read or study in a messy room, for example. Minimal irregularities become incredibly distracting.

      So I tend to (over?)compensate by being rigorous and organized in my affairs.

      Thank you very much for answering!

      3 votes
  4. [2]
    stu2b50
    Link
    In general you just get better the more you work on a variety of programs. A general strategy is to get down a minimum viable product -- the base basic possible implementation of what you're...

    In general you just get better the more you work on a variety of programs. A general strategy is to get down a minimum viable product -- the base basic possible implementation of what you're trying to do. From there, it's all incremental, which is nice.

    Additionally, try to think about the parts of the program which radiate outwards design wise. Typically UIs, for instance, I do last, since the core functionality will dictate what the UI has, so if you make it first, you'll probably have to change it by the end.

    In terms of actual tools, I've been using Whimsical at work. It's just a nice diagram builder. Good for collaboration, though I suppose you won't use that.

    4 votes
    1. mrbig
      (edited )
      Link Parent
      That's quite reasonable. Thank you very much. Tools are awesome, but one must know what to do with it! Do you use it freeform, or are there any documented methods that can make tools like this one...

      Additionally, try to think about the parts of the program which radiate outwards design wise.

      That's quite reasonable. Thank you very much.

      In terms of actual tools, I've been using Whimsical at work. It's just a nice diagram builder

      Tools are awesome, but one must know what to do with it! Do you use it freeform, or are there any documented methods that can make tools like this one especially productive?

      2 votes
  5. [2]
    joplin
    Link
    Do you ever use libraries or modules that other people wrote? I've found that using a well-written library teaches you how to organize your own code even if you don't intentionally set out with...

    Do you ever use libraries or modules that other people wrote? I've found that using a well-written library teaches you how to organize your own code even if you don't intentionally set out with that goal in mind. There are certain concepts that are well known that you start to see in other people's code and you can start to use in your own.

    One of the most important concepts that helps with organizing code is called separation of concerns. It's also sometimes referred to as "separating your business logic from your display logic." The idea is that the code that displays your data is completely separate from the code that manipulates your data. This is probably the single most helpful concept I've learned when it comes to writing better code. Concepts that go along with this include loose coupling and information or data hiding. These help with making your program easier to modify and less fragile.

    4 votes
    1. mrbig
      Link Parent
      That’s great advice, thanks!

      That’s great advice, thanks!

      2 votes
  6. [2]
    knocklessmonster
    (edited )
    Link
    I'm not a skilled programmer (com-sci dropout for Information Systems, lol), and I'm pretty bad at commenting, but I don't have issues tracking my admittedly simple programs. However, the way I...

    I'm not a skilled programmer (com-sci dropout for Information Systems, lol), and I'm pretty bad at commenting, but I don't have issues tracking my admittedly simple programs. However, the way I design a program is to think about what I need, and choose object, function and variable names that are descriptive, either in their capacity as a variable, which I assume you're already doing. Better commenting a habit I should start doing.

    Basically what I've learned is do something like this:

    #What the class does
    class ThingClassDoes #(abbreviated name here for what it's supposed to do)
      def thingForClass() # this would be the constructor
      def thingForClass() # this would be the method, but
        # brief description of what the function does
      def functionInClass # this is the actual function, you'll generally not need a constructor)
        # brief description of what function does
    

    If your names are meaningfu, it goes a long way, the rest is just keeping things clean and sort of having a flow to your code.

    3 votes
    1. mrbig
      Link Parent
      Thanks, I'll keep that in mind!

      Thanks, I'll keep that in mind!

      2 votes
  7. skybrian
    Link
    One thing to keep in mind about object-oriented programming is that you have a choice of ways to organize your functions and these choices are sometimes arbitrary. For example, let's say you have...

    One thing to keep in mind about object-oriented programming is that you have a choice of ways to organize your functions and these choices are sometimes arbitrary. For example, let's say you have two classes, Foo and Bar, and a function that takes two arguments:

    def quack(foo, bar):
       # do something
    

    There are three places that you could put this function:

    • As a method on Foo (the foo argument becomes self)
    • As a method on Bar (the bar argument becomes self)
    • As a standalone function. (It stays the way it is.)

    Which one you choose is often a matter of taste: where do you expect to find this function? It's often useful to look at a method's arguments and consider whether you can move the method to some other class, or when looking at a standalone function, you can decide whether it should really be a method on one of its parameters.

    But, you don't want to get too caught up in tidying up. This is internal organization for the benefit of you and your team and the users aren't going to care. Moving things around can be helpful, but someone who expects to find them in their old location might be confused.

    1 vote