8 votes

What are the main considerations/downsides to make before adding an index to a column

I was asked this question in an interview and I said the downsides would be extra disk space used to store it and more disk writes from the database to maintain the index on updates. I asked a friend and he said the downsides are space used in memory to store the index as well as the fact that an index could take days to build on a large dataset.

What does everyone here think about this question?

3 comments

  1. [3]
    ThreeMachines
    Link
    DBA isn’t my bread-and-butter, but I don’t think there’s usually much downside to think about for indexing; if there were major tradeoffs, DBs would be a lot harder to work with. Having said that,...

    DBA isn’t my bread-and-butter, but I don’t think there’s usually much downside to think about for indexing; if there were major tradeoffs, DBs would be a lot harder to work with. Having said that, they’re not literally free.

    • There’s a nominal memory and disk cost, as you addressed. But if the resource cost for a single index was nontrivial, it’d be because the dataset was large enough that the index would be really important.
    • Indexes can be slow, as your friend said, although not “days”. If you need to query across a dataset that would take days to index, then you need, like, a columnar data warehouse, not Postgres.
    • If you’re indexing across a small enough dataset, the query planner might ignore your index. But that seems harmless; when the data grows you’ll be glad you had the index already there.
    • If rows will be frequently deleted, you might have to reindex frequently, I guess?
    • You don’t need to make an index if one already exists?

    Honestly, this seems like a weird question, and it would be a little bit of a yellow flag to me in an interview. The performance impacts of surprise table scans are so, so much worse than any of the costs of having an index that I don’t know why you’d ever want your engineers intentionally not add indexes for their queries.

    (There is the question of how you remove indexes that are no longer needed by any queries, but that feels to me like something you need to address at an organizational level as part of your software development lifecycle controls.)

    5 votes
    1. [2]
      Comment deleted by author
      Link Parent
      1. ThreeMachines
        Link Parent
        Oh, interesting; I wouldn’t have thought of clustering as being part of the index, since it’s a separate statement (which is also why I was unaware of the CLUSTER command until now). Unless...

        Oh, interesting; I wouldn’t have thought of clustering as being part of the index, since it’s a separate statement (which is also why I was unaware of the CLUSTER command until now). Unless clustering works differently outside of Postgres?

    2. teaearlgraycold
      Link Parent
      Only other thing I can think of is that each index increases the cost of insertions. Once it's already built (and if you're just reading) then it can usually be considered "free" - as in, there's...

      Only other thing I can think of is that each index increases the cost of insertions. Once it's already built (and if you're just reading) then it can usually be considered "free" - as in, there's no real downside.