Architecture for untrained software engineers (Python)
I've been programming for some time now but notice without any formalized education in CS I often get lost in the weeds when it comes to developing larger applications. I'm familiar with the principles of TDD and SOLID - which have helped with maintainability - however still feel that I'm lacking in the ability to architect a properly structured system. As an example, I'm currently developing a flask REST API for a website (just for learning purposes). This involves parsing a html response and serializing the result as JSON. I'm still quite unclear as to structuring this sort of thing. If any more experienced developers could point me in the right direction/offer up their opinion I'd be very appreciative. Currently I have something like this (based - I hope correctly? - on uncle bob's clean architecture).
Firstly, I'm defining the domain model. i.e the structure of the API response. Then, from outside in.
- Infrastructure (Flask): User makes request via interface (in my case a request to some endpoint)
- Adapters: request object checks if the request is valid (on the way back it checks if the response is valid) - Is this layer only for error handling?
- Repository: I'm struggling a bit here, AFAIUI this layer is traditionally a database. In my case however, where the request is valid, is this where I should handle the networking layer? i.e all the requests to return the website source? I'm also confused given at this stage I should be returning the relevant domain model, like an ORM, but as my data is unstructured, in order to do this I need to transform the response first. Where would it be best to handle this?
- Use Cases: Here I transform the domain model depending on the request. For example, filter all objects by id. Have I understood this correctly?
- Serializers: Encode the domain model as JSON to return from flask route.
If you got this far, thanks so much for reading. I really hope to hear the opinions of more experienced devs who can steer me in the right direction/correct me should I have misunderstood anything.
Sounds like you're trying to fit a square peg into a circular hole. Tbh I think these architectural models are worthless, but if you're going to use them, treat them as very big picture, abstract buckets.
Just focus on the flow of data. First, your client sends some data to your rest endpoint (what kind of data? What format?). Your endpoint handler takes that, deserializes it into something that (what specifically?) you can use, you do processing (what?) , return some data (what?).
I'm not qualified to comment specifically on networking stuff as I don't do any of that regularly. However, I will say that the best way to learn is to do it wrong, and then figure out how to fix it. I've sometimes written the same program 5 or 10 times (over many years) before being happy with it. Each time I improve it, I'm able to do more with it or add more useful features to it.
Eventually you start to see patterns that you can reuse. These are not as formal as you get from the Design Patterns book, or whatever, but you learn the basic way to model a particular type of program, and you start to learn where each potential choice in the design can be improved for the particular case you're working on. Soon you can sit down and say, "I'm going to need a program that does x, y, and z. I know that I'll be dealing with a, b, and c, so I need to make sure I do m, n, and o." But it takes trying the different combinations to figure that out. (And also reading, etc.)
I used to be highly involved with the Stack Overflow Code Review website. Once you have your code working, I recommend posting it there for further insight into how it could be improved. You can either post the whole thing, or focus on a specific class or file you'd like to improve. (Just be sure to read the rules. Whatever you post should be compilable/runnable by someone trying to review it.)
Thanks so much for this. I think I may have just been waiting for someone to give me the go ahead to make mistakes. I'll be sure to post the repo on code review once it's done. I'm already starting to see patterns I was just concerned I might be picking up bad habits or something, hence all the GoF/Uncle Bob research. Hopefully it'll get easier over time. Thanks again.
I wrote a big long response, but I figured that since my way of doing things are so different than the ways you have learned, It would be better not to confuse you. But I do have some advice.
Don't worry about the repository. From the description you gave us, I see no reason to include one unless you're looking to log data. And even then you might not need to use a database - it can be any data store, even a text file. But then again, I am not sure if you've given us the full scope of your project.
Use Cases are typically supposed to be real-world. Something like "An accountant uses our system to find a customer's account balances".
Don't worry too much about trying to apply different structural systems. You should really understand a number of different design patterns because they all have their strengths and weaknesses. When it comes to large projects, there's usually a lot of exceptions to explicit design rules in practice. This project is relatively simple, so you might not even need to bother with these huge enterprise-grade structures. Heck, you might not even want to use Flask - It's a very large hammer to drive this very small nail with. But then again, since this is a learning project, that might go against the very reason why you're building this to begin with.
Thanks so much for your reply Akir. It's a shame you didn't post your initial reply. I'd be really interested to know how you'd structure an application like this or the steps you'd take to structure any application really.
The full scope of the project is something like this. Let's say there is a table on a website somewhere consisting of an artist, song, album, year etc per row
If structural systems isn't the way to go, perhaps you could direct me to some appropriate design patterns? Often I get into the situation where after an application I'm working on grows to a certain size it becomes really unwieldy and not much fun to work on. After a few failed attempts I'm thinking I must be missing something here :)
Ideally I'd like to be able to draw out a simple diagram before starting and stick to that from start to finish. That way I'd have a guide and a reference as to how the whole system works in tandem when the going gets tough.
Thanks again for your reply Akir, I really appreciate your time.
Not the same person you replied to, just fyi
Yep, I'd recommend the requests library
Use BeautifulSoup. Do not parse HTML with regex https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
You'll probably store it as a python dictionary, then you can easily convert it into a JSON with the standard library
json.dumps handles the headers, too
Imo focus on the flow of data, and draw it out. Either on paper or with whimsical https://whimsical.com/
Hey, thanks for this. Yeah, I'm using bs4 (regex as a last resort). I'm storing as a dict and using json.dumps. I will try to focus on drawing it out though, good advice. Can I ask why you think architectural models are useless? Most of the experienced devs I've spoken to have said I should focus on getting a good understanding there.
Because software architecture is complicated, and often domain specific to your specific needs and wants. Making it fit into these rigid boxes is a pointless exercise. Think about it: you fit your problem into the model best as you could. Did you actually gain anything out of that exercise? No. The only benefit was that it forced you think through all the parts, but you could do that on your own.
It's like in literature when people try to jam every story into the Hero's Journey or other such structures. Or when you force essays to be in the 5 part paragraph.
Or like the much panned 8-legged essay (now that's an obscure reference)
I think making me think through it, whether a lot was extraneous or not was actually helpful. Even if I disregard what I learnt, it's good to know where you don't want to go. I have a better picture of that now thanks to this thread. Appreciate your reply.
My personal philosophy when it comes to software development is to keep things as simple as possible, with the sidenote that one should also consider the future for what you are building. If this project has no future outside of your experimenting, then you can skip a lot of the extra structural planning and documentation steps - unless, of course, that's what you are trying to learn.
There are a ton of different ways to plan your software projects. As an analyst, my planning includes more than the software itself; it requires me to understand the greater system of which that software will be taking part of. So things like Structured Analysis / SADT are useful frameworks to help me understand everything. But there are tons of steps in those programs that are redundant, obvious, or otherwise unnecessary if I'm just programming a simple tool that may not be needed for more than a few weeks.
God, I really hate saying this. It sounds like I'm telling you to be lazy. In reality, I'm a proponent of planning. It will save you a ton of pain later on most of the time. So do me a favor and ignore everything I just said and start planning things the way you wanted in the beginning. If your planning doesn't work out as well as you intended, it's a learning experience and you will get better over time.
What I'd recommend for you in particular since you said you would like to have a diagram is to learn about the various types of UML diagrams and the entire visual language, because those can really help quite a lot to keep you in check for larger projects. UML lets you diagram all the way down to the classes you write, so it's great for understanding your structure before you start coding.
There's lots of different software you can use to make your diagrams, but I use Dia. Just be warned that it's not terribly user-friendly.
Sitting hearing thinking about this I thought, "that's great advice, easily implemented". I then very quickly back pedalled :) I find both of these things very very difficult to do. That said, with the encouragement offered up in this thread I can only hope it'll get easier over time.
I appreciate the UML recommendation. I worked with a much older sys admin years back and he would UML everything. It's funny though, many of the younger programmers I've met don't seem to do this. Is it some old school concept? I don't have a formal CS background so don't know. Is that kind of thing taught at the college/university level?
I'm not sure you can call UML outdated. It's just a visual language useful for understanding systems. I did learn about it in college but I was aware of it beforehand. It's one of those things you may not see unless you are involved with a systems architect on a large complex program.
That being said, "lean" and Agile software development is pretty popular now, and in those paradigms the motto is basically to build now and make changes later. And while that works for some things - particularly if you have a huge team - it doesn't really work for everything IMHO. But that also depends on the person working on the project.
Most of the REST APIs I've worked on had an architectural plan something like this:
So these are the layers. All of the "rules" I stated in each layer are "soft". They should be followed right up until they don't make sense. The point is less to be strict, and more to follow a convention, until someone presents a reasonable argument to depart from the established pattern. Some other things you've mentioned:
One thing I personally advocate for in most APIs that I've built, is to have the team deliberately separate the domain model from the API. By this I mean, if you had a DB with a Customer table/object, the ORM class should NOT be the class that is serialized and returned to the user. Reason being, it combines two separate concepts into a single class. The domain model is how you the developers think about and manipulate your data. The user usually doesn't need that data, or needs it formatted differently. Separating the two from the beginning allows each to evolve separately, even though there's extra overhead involved mapping the domain model to the API before returning the user's request.
Thanks so much for this. It's encouraging there's not so much divergence across projects. I wouldn't want my learning to only apply to the problems I'm currently working on.
This is great advice and reminds me of Phil Karlton: "There are only two hard things in Computer Science: cache invalidation and naming things." :)
Listen up young whippersnapper, if yer gonna tell a joke, tell it right! :p