10 votes

How Instagram uses static analysis like linting and automated refactoring to help manage their multi-million-line Python codebase

5 comments

  1. [5]
    cptcobalt
    Link
    This sort of thing makes me shiver. I have never worked on such a large codebase before, but it almost seems nonsensical? Ostensibly, IG is a glorified image host, which has tacked on tons of...

    Our server app is a monolith, one big codebase of several million lines and a few thousand Django endpoints all loaded up and served together.

    This sort of thing makes me shiver. I have never worked on such a large codebase before, but it almost seems nonsensical? Ostensibly, IG is a glorified image host, which has tacked on tons of functionality over the years: multi-images, IGTV, videos, DMs, etc. But I don't think it quite stands to reason that IG's server codebase needs to be several million lines of code, though? What the heck are they doing?

    2 votes
    1. [2]
      Deimos
      Link Parent
      I can certainly imagine it happening at the larger companies. Tildes has about 10,000 lines of Python right now, and that's almost entirely from a single developer over a couple of years while...

      I can certainly imagine it happening at the larger companies. Tildes has about 10,000 lines of Python right now, and that's almost entirely from a single developer over a couple of years while also doing... everything else too. When you're a 10-year-old company with hundreds of developers constantly adding more stuff, I can easily imagine it building up to that point. It's rarely anyone's job to go back and clean up old code that isn't needed any more, just keep adding more.

      9 votes
      1. Emerald_Knight
        Link Parent
        Hell, I've been managing a production back-end system that I've built from the ground up a little over a year ago and it has more than 36k lines of application code and more than 28k lines of unit...

        Hell, I've been managing a production back-end system that I've built from the ground up a little over a year ago and it has more than 36k lines of application code and more than 28k lines of unit tests, and that's after continually paying down on technical debt and cutting down on the number of lines of code needed. Without paying down that technical debt or caring for long-term maintainability, focusing instead on rapid feature development, I could easily have a code base exceeding 100k lines of code. That's just the back-end code for our API. That's not accounting for our app development or website. That's not accounting for configuration, deployment scripts, or third-party libraries.

        That back-end is going to be steadily growing, too. There will be several/many third-party service integrations that will inevitably be needed in the future. There will be many more task automations to implement. There will be many more reports and analytics that will need to be generated. There will be predictive algorithms that will need to be added in place. There are so many components that are going to be built that this back-end alone will easily become a monolith of its own. It will become so complex and require so much additional functionality that it will inevitably need to be broken apart into several different services just to keep it manageable and scalable, and being broken apart into different services will inevitably mean identical code being present in those different services that will need to be maintained in parallel.

        Instagram having several million lines of code is not even the tiniest bit surprising.

        6 votes
    2. [2]
      Staross
      Link Parent
      Probably a lot of spaghetti code, and ads.

      Probably a lot of spaghetti code, and ads.

      1. cptcobalt
        Link Parent
        I mean, yes, but it's what remains unsaid.. a giant amount of the instagram server base is likely ads, user tracking, profiling, recommendations and engagement schemes, etc.

        I mean, yes, but it's what remains unsaid.. a giant amount of the instagram server base is likely ads, user tracking, profiling, recommendations and engagement schemes, etc.

        1 vote