How do you plan or outline a program?
I’m currently studying Python Object Oriented programming and got to a point where logic and syntax are the least of my problems. I always get to a stage where I’m completely lost between modules, classes, objects and a sea of “selfs”.
I’m not doing anything too complicated, just small projects for practice, but I think I would benefit from planning. My mental processes are highly disorganized (ADHD) and I need all the help I can get with that.
I don’t need an automated tool (even though it might come in handy) -- sketching things out on paper is probably enough.
I only know about UML, which seems fine. Can anyone recommend a tutorial about this and other tools?
Edit to link my last attempt at following a tutorial:
This is the last tutorial I tried to follow, a Pygame project from the book Python Crash Course 2ed
. Following tutorials is frequently mostly typing, so what I achieved there is not a real representation of my abilities -- I would not be able to do something like that on my own. In fact, I failed to answer the latest exercises, which were basically smaller versions of this project.
My problem is not with syntax and the basics of how OOP works, but rather with memory and organization of information.
This is a huge topic, so I'm only going to be able to scratch the surface of it.
In general, if you're still learning this, ignore fancy tools and standards like UML entirely. Learn to diagram on paper, or by writing plain old comments in your text editor / IDE that explain the design. Once you have a good handle on that, you'll be able to decide which, if any, tools you want to use.
You mentioned confusion between modules/classes/objects, so I'm going to zero in on that.
Say you have some Python code with a bunch of methods that look like this (but with real names, obviously):
One thing to notice here is that
a, b, c
tend to get passed around as a group. This is called cohesion, because the variables "stick together" in the parameter lists.When I see code like this, a refactoring that always comes to my mind is that my code should reflect the fact that these three variables are tied together. Using the dataclasses that are in Python 3.7+:
Then I change the methods to accept an instance of my
Foo
class rather than individuala, b, c
parameters:In addition to the
a, b, c
parameters "sticking together", I've also got method cohesion. All my methods do something with a Foo object. And frequently, I'll have something like this, wheremethod1
is a sort of "primary" method that calls the others as helpers, and just passes along the same Foo object:When I see method cohesion like this, it's a sign that I should wrap those cohesive methods into a class, and "lift" their
foo
parameter so that it's an argument to the constructor rather than an argument to each individual method.Every method in the class, because it has the
self
parameter, has access toself.foo
. Since there's only one member variable, this may not seem like a big change from just passingfoo
directly. But suppose all of our methods also took aBar
parameter. we could add that as a parameter to the constructor, and a member variable, very easily:And then, every method within the class also has access to
self.bar
.A book I'd highly recommend that goes into a lot of detail along these lines is Code Complete. The first edition is from the early 90s and the second is from the aughts, but it holds up with age better than almost any other programming book I've ever read.
That was, without exaggeration, the best explanation on the usefulness of OOP I have ever read. Thank you very much for this.
I already knew the basics of how
self
Python OOP works in general, but the books and tutorials I've read before jump straight from functions to classes without providing consistent reasons for their use. "cohesion" and "method cohesion" are new concepts for me, but I can already see their usefulness.This book looks awesome. Does it contain information similar to what you shared?
And why did you use a
dataclass
in one example and a regular class in the other? Is there a clear disadvantage in using them? I prefer a cleaner syntax myself. Won'tdataclass
works pretty much like a regular class, sharing itsself
with other methods?You're welcome! Glad you found it helpful.
I think Code Complete is exactly the sort of book you're looking for. It's not an "intro to programming" type book, it's targeted at people who already know the nuts and bolts of programming and are struggling with the things you're struggling with.
With the example classes I gave, there really are two different types of classes. I think of them as data classes and worker classes.
side note: the
@dataclass
stuff added in Python recently is just one way of writing data classes. You can also use a 3rd-party library like attrs, or write them yourself (though this involves a fair bit of boilerplate code). If you read the PEP that introduced Python dataclasses a big part of their motivation is having a convenient way to do data class stuff within the Python standard library, where relying on a 3rd-party library like attrs isn't an option. So don't worry too much about specifics of Python's@dataclass
, and focus on the underlying idea of "data classes". You could do all of this in Java or C++ that doesn't have the@dataclass
magic that Python has.The
Foo
class in my example above is a data class. It's just data, it doesn't do anything. TheFooProcessor
class is a worker class. It does things with the data in aFoo
object.For a concrete example, let's suppose I have a Raspberry Pi with a little temperature & humidity sensor wired to it. I want to write a script that measures the temperature & humidity once a minute, and sends it off to a database. Then from that database I can draw graphs of temperature over time, or figure out averages, etc.
I can write the skeleton of my program pretty easily:
Now, I could write, all in the same single .py file, the
read_from_sensor
andwrite_to_database
methods, plus all their helper methods. That would get unwieldy pretty fast. So instead I'm going to split it up into classes. Here's a data class:It's just a bundle of data representing a measurement. At some time, we read the sensor and got a temperature and humidity value from it.
I've switched to using an
@attrs
class instead of an@dataclass
to underscore the point that the idea of data classes is what's important, not the specifics of how@dataclass
functions in Python.And I've also got some worker classes:
This class knows all the low-level electronic details of how to talk to the sensor I have connected to my Raspberry Pi, and decode its output format. It has a
read()
method, which returns an instance of theWeatherMeasurement
object. This is encapsulation - I ask the object to read the sensor, and it gives me back a measurement. I don't have to think about how it reads from the sensor. If I'm working on a project with other people, someone else might write that class, and I just call itsread
method. The class might even be provided in a third-party library - say the manufacturer of the sensor wrote a Python library to handle the low-level details of talking to the sensor for you.In the constructor I give it a device path (or some other way of identifying which sensor I'm talking to). What that's useful for is, suppose I decide to connect two sensors to one Raspberry Pi, and have one sensor inside a window and the other outside. All I need to do is create two instances of the
SensorReader
class, and pass in the correct device path for each one.My other worker class takes those measurements and knows how to write them to a database, let's say over HTTP:
My main method now looks like this:
Two worker classes, passing a data class between them, which is a very common thing to do.
A really useful thing to note here is that my worker objects are initialized once, at the beginning of the script, and then re-used over and over again. That's useful because worker classes very often have some long-lived state they keep around as member variables. For my sensor device, I'm opening up some kind of connection to it, and I might want to keep that connection open rather than closing & re-opening it every minute. If that's every minute, it may not be a big deal, but imagine I want to record every second, or even 10 times a second.
That's even more important for the recorder - if I'm talking to the database over HTTP I'd want to use something like
requests.Session
for connection keep-alives. Otherwise I'd be opening up a whole new HTTP connection (including the TCP handshake and possibly the TLS negotiation too) every single second, which can add significant overhead.Where this object-oriented design really shines is when things start to change.
Suppose you're using FooDatabase to store your weather measurements, and you aren't quite happy with it. You read on Hacker News about this hot new BarDatabase you want to try. BarDatabase uses a different input format, might use SQL instead of HTTP, or whatever. That's fine. Your current
MeasurementRecorder
class is really aFooDatabaseRecorder
, so you rename it, and you write a brand new and differentBarDatabaseRecorder
class. But it has the same interface - arecord()
method that takes a Measurement object. Then with a couple lines changed in yourmain()
function, you can write to both databases at the same time to see which one you like best. When you settle on which one you like, it's really easy to rip out the legacy code, because you only change a couple lines inmain()
then delete one entire class. Or, you might decide they're useful for different things and write to both permanently.Similarly, in the example I gave above of having an indoor and outdoor temperature sensor - let's say the outdoor temperature sensor needs to be replaced with a different model because it's not waterproof enough. And the better, more waterproof model has a different low-level interface. Same deal, you write a 2nd
WaterproofSensorReader
class (or, if the original model is anABC1234
sensor and the waterproof one is modelDEF9876
, you can name your classes after that). Both classes have the same interface - aread()
method that returns a Measurement object. Another couple lines changed in yourmain()
method, and you're reading Measurement objects from both.The really important thing to notice here is that reading from the sensor and writing to the database are decoupled - you can do one without the other. Coupling, along with cohesion that I mentioned in the other comment, are the two crucial things to understand for the why of OOP.
Another big benefit of decoupling is ease of development / testing. When you're writing & debugging your
WaterproofSensorReader
class, you don't need to set up a database (either a Foo or Bar one). You can just create an instance of that Reader class and ask it for a Measurement. And you can write theBarDatabaseRecorder
class without needing access to any sensor hardware at all. All it needs to do is accept a Measurement and do the right thing with it, so you can feed it canned example data. -500 degrees and 200 percent humidity yesterday.Yes, a dataclass is a regular python class and python doesn't treat them any differently.
I think there are 2 main reasons to use them:
Nit: at the beginning of this explanation, in the first bit of code, there are technically no methods yet, only functions.
I would be careful not to get too deep into object orientation and other ways of organizing code too early.
My approach is to delay organization until I'm sure I need it, refactoring as I go. I stick with a single file until it gets unwieldy. Maybe start with "hello world" (whatever that means for your app) and start coding in the main method. Extract a function after you've written the code inline and you have a block of code that works and would make sense on its own. Once you have a few functions that operate on the same data structure, then it's time to consider a class. Once you have a class, consider what functionality you've already written that belongs there.
At least, that's what I do when in doubt. Sometimes you know ahead of time the structure you want.
Edit: It's kind of hard to talk about in the abstract, though. If you're willing to post some code publicly then you could get more specific advice.
I don't intend to. I'm just focusing on it for learning purposes. I don't even like OOP, to be honest.
This is the last tutorial I tried to follow, a Pygame project from the book
Python Crash Course 2ed
. Following tutorials is frequently mostly typing, so what I achieved there is not a real representation of my abilities -- I would not be able to do something like that on my own. In fact, I failed to answer the latest exercises, which were basically smaller versions of this project.I must notice that I don't have a real problem with syntax and logic, but rather with memory and organization -- I literally cannot keep more than two or three things in my active memory at the same and will eventually forget what's going on. Thins can quickly become overwhelming. Curiously, I don't have this issue with Emacs lisp. Lisp just feels reasonable for me. Sadly, it won't help to get a job, and that's something I need right now.
This is great advice, thank you very much!
Since you are just starting out you might just want to start programming and see how well it turns out. If it really is not important, you can write absolute spaghetti code so you can get an idea where you personally need to organize.
Personally I start off by writing a skeleton of all the classes and methods I need. Then I try to write the documentation first. Then I realize as I am writing that I made logical mistakes as I was planning and rewrite everything. 😸
I appreciate the advice and it's certainly good for most cases, but this approach makes me extremely anxious and triggers my ADHD big time. It's hard for me to do anything without a neat environment, I can't read or study in a messy room, for example. Minimal irregularities become incredibly distracting.
So I tend to (over?)compensate by being rigorous and organized in my affairs.
Thank you very much for answering!
In general you just get better the more you work on a variety of programs. A general strategy is to get down a minimum viable product -- the base basic possible implementation of what you're trying to do. From there, it's all incremental, which is nice.
Additionally, try to think about the parts of the program which radiate outwards design wise. Typically UIs, for instance, I do last, since the core functionality will dictate what the UI has, so if you make it first, you'll probably have to change it by the end.
In terms of actual tools, I've been using Whimsical at work. It's just a nice diagram builder. Good for collaboration, though I suppose you won't use that.
That's quite reasonable. Thank you very much.
Tools are awesome, but one must know what to do with it! Do you use it freeform, or are there any documented methods that can make tools like this one especially productive?
Do you ever use libraries or modules that other people wrote? I've found that using a well-written library teaches you how to organize your own code even if you don't intentionally set out with that goal in mind. There are certain concepts that are well known that you start to see in other people's code and you can start to use in your own.
One of the most important concepts that helps with organizing code is called separation of concerns. It's also sometimes referred to as "separating your business logic from your display logic." The idea is that the code that displays your data is completely separate from the code that manipulates your data. This is probably the single most helpful concept I've learned when it comes to writing better code. Concepts that go along with this include loose coupling and information or data hiding. These help with making your program easier to modify and less fragile.
That’s great advice, thanks!
I'm not a skilled programmer (com-sci dropout for Information Systems, lol), and I'm pretty bad at commenting, but I don't have issues tracking my admittedly simple programs. However, the way I design a program is to think about what I need, and choose object, function and variable names that are descriptive, either in their capacity as a variable, which I assume you're already doing. Better commenting a habit I should start doing.
Basically what I've learned is do something like this:
If your names are meaningfu, it goes a long way, the rest is just keeping things clean and sort of having a flow to your code.
Thanks, I'll keep that in mind!
One thing to keep in mind about object-oriented programming is that you have a choice of ways to organize your functions and these choices are sometimes arbitrary. For example, let's say you have two classes, Foo and Bar, and a function that takes two arguments:
There are three places that you could put this function:
Which one you choose is often a matter of taste: where do you expect to find this function? It's often useful to look at a method's arguments and consider whether you can move the method to some other class, or when looking at a standalone function, you can decide whether it should really be a method on one of its parameters.
But, you don't want to get too caught up in tidying up. This is internal organization for the benefit of you and your team and the users aren't going to care. Moving things around can be helpful, but someone who expects to find them in their old location might be confused.