Life altering PostgreSQL patterns

[8]

EsteeBestee

4 days, 22 hours ago (edited 4 days, 7 hours ago)

Link

There's some obvious sounding, but very sound advice in there. I have direct experience with being fucked over by bad database designs. Like you would think it's obvious to have created_at and...

There's some obvious sounding, but very sound advice in there. I have direct experience with being fucked over by bad database designs. Like you would think it's obvious to have created_at and updated_at everywhere you can, but please for the love of god make sure you actually have those columns and actually write to them. Our damn schema has those in most tables now, but there are still some using created_on (and some tables with both created_on and created_at and different bits of code write to one or the other, no I'm not joking) and some with no indication of any sort of creation date at all and it's a fucking nightmare to run queries against to generate reports.

One that maybe isn't super obvious but I will say is super great advice was "Represent statuses as a log". You never ever know if 5 years after you create a table if you need to audit the data in there for some reason and need to know past values instead of current values. For example, we have an integration with a vendor that sends us account data with statuses. The data is under a lot of scrutiny as it has to do with financial shit and there are statuses linked to the accounts we see and sometime shit hits the fan with one of our customers and they need to have a data trail of the statuses on their accounts at certain points of time in the past and you/they are just fucked sometimes if you only keep the current state of data. At the very least, if you have an 'account' table or something, at least have an 'account_status' table, too, that can be joined on 'account.id' or whatever and tracks all status changes per account.

14 votes

vord
4 days, 18 hours ago
Link Parent
And that's why you always setup those two triggers, means nobody ever has to think about writing to them, it happens automatically. For stuff that need auditing, a trigger for update or delete...

but please for the love of god make sure you actually have those columns and actually write to them

And that's why you always setup those two triggers, means nobody ever has to think about writing to them, it happens automatically.

For stuff that need auditing, a trigger for update or delete that leaves the prior row in a different table helps.

9 votes
[6]
Eji1700
4 days, 18 hours ago (edited 4 days, 17 hours ago)
Link Parent
I am currently working on getting data out of a vendor database into something we can actually use. This is from a BILLION dollar company on one of their flagship products. Lets say you have...
I am currently working on getting data out of a vendor database into something we can actually use. This is from a BILLION dollar company on one of their flagship products.

Lets say you have table1. Table1 has

numA, name, date, thingId

then you have zable1 (yes...zable...so it appears at the bottom of table lists). zable1 has

numA, name, option, stuff

Every, single, table, has a "numA" column. That's not the real name, but they are all the SAME name. In most cases, it's a primary key, but not all of them. In the example above, table1 has duplicate values in numA, and you join zable1, like this:
```
select * 
from table1 t
left join zable1 z on t.thingId = z.numA
```
Oh and those names, they are shockingly accurate. The only way you know what zable1 (which go up to zable90s) does, is if you have access to the data dictionary, which is woefully lacking in useful information.

There are NO "updated at" columns of any real use.

I could ramble more about it but i'd just wind up sad.

Please, for the love of god, whatever personal "i'll fix it later" project you're doing, FIX IT NOW.
2 votes
1. [3]
  Weldawadyathink
  4 days, 6 hours ago
  Link Parent
  That reminds me of this wonderful database story. Ping /u/EsteeBestee, since you might be interested too.
  
  Exemplary
  
  That reminds me of this wonderful database story. Ping /u/EsteeBestee, since you might be interested too.
  
  4 votes
  1. [2]
    EsteeBestee
    4 days, 4 hours ago
    Link Parent
    That was an incredible read and now I suddenly don’t feel as bad about my situation. HOW DO YOU RUN OUT OF COLUMNS?!
    
    That was an incredible read and now I suddenly don’t feel as bad about my situation. HOW DO YOU RUN OUT OF COLUMNS?!
    
    2 votes
    
    vord
    4 days, 4 hours ago
    Link Parent
    One product that I know relies on MS Dynamics and SQL Server. Due to the user-driven nature of the product (You don't need programmers to do all this stuff!) which autogenerated table structures,...
    
    One product that I know relies on MS Dynamics and SQL Server. Due to the user-driven nature of the product (You don't need programmers to do all this stuff!) which autogenerated table structures, things got ugly and we did actually hit up against a SQL Server column limit back in 2016 (which IIRC was something like 1023).
    
    Edit: as linked story says, was 1024 at the time. Think Dynamics had a reserved column or two which prevented you from hitting the database limit with your own stuff.
2. EsteeBestee
  4 days, 7 hours ago
  Link Parent
  Oh my god, holy shit. That also reminds me that we have 80 tables prefixed with “xx_” because someone wanted them grouped together in the schema and THAT’S NOT HOW THAT SHOULD WORK!!! I also...
  
  Oh my god, holy shit. That also reminds me that we have 80 tables prefixed with “xx_” because someone wanted them grouped together in the schema and THAT’S NOT HOW THAT SHOULD WORK!!!
  
  I also checked this morning and we’re now up to 666 tables in our cursed platform DB. At the very least we have a data warehouse with good architecture so that I can use that for queries instead in most cases.
  
  3 votes
3. vord
  4 days, 6 hours ago
  Link Parent
  Old systems are the worst, because back in the day table names and sometimes column names were limited to 8 characters. Large ERP system I worked with has arcane table names like ASCABINC, where...
  
  Old systems are the worst, because back in the day table names and sometimes column names were limited to 8 characters.
  
  Large ERP system I worked with has arcane table names like ASCABINC, where each letter is an abbreviation for something.
  
  3 votes

[4]

krellor

4 days, 21 hours ago

Link

The only one I disagree with is strictly adhering to singular name forms for tables. I disagree that the majority of queries will result in selecting individual records. In fact, my experience is...

The only one I disagree with is strictly adhering to singular name forms for tables. I disagree that the majority of queries will result in selecting individual records. In fact, my experience is that for major applications, I'm mostly working with collections of things, and uniformly referencing collections in plural form and individuals in singular makes for intuitive code.

That said, if you find a situation where the reverse is true, you shouldn't be dogmatic either way. But also, English grammar is just a guideline to aid in readable code. If you need to take a few liberties with spelling or names, as long as it is documented do what makes sense.

5 votes

xk3 (OP)
4 days, 21 hours ago
Link Parent
I think they meant when writing the select statement you are referencing individual values like "city.name" instead of "cities.name". I agree this shouldn't be dogmatic but it should be consistent...

selecting individual records

I think they meant when writing the select statement you are referencing individual values like "city.name" instead of "cities.name". I agree this shouldn't be dogmatic but it should be consistent (whenever possible) to minimize confusion. I've preferred to always use plurals and I never had second thoughts about "cities.name".

But I agree, it's not a strong tip like the others are.

7 votes
[2]
tanglisha
4 days, 3 hours ago
Link Parent
I don't think it matters whether they're single or plural, so long as you pick one and stick with it. Some random combination is so much worse that whatever bad might come from either.

I don't think it matters whether they're single or plural, so long as you pick one and stick with it. Some random combination is so much worse that whatever bad might come from either.

1 vote
1. krellor
  4 days, 1 hour ago (edited 4 days ago)
  Link Parent
  Agreed. Like most things, consistency within the project is king.
  
  Agreed. Like most things, consistency within the project is king.
  
  2 votes

[2]

Eji1700

4 days, 17 hours ago

Link

I think the only thing I really take issue with is: If you're writing any serious amount of SQL without using aliases in your query, you're insane. Having descriptive table names so that when...

I think the only thing I really take issue with is:

Name your tables singularly

If you're writing any serious amount of SQL without using aliases in your query, you're insane. Having descriptive table names so that when you're looking at your list you know exactly what's in there is very helpful. There are reasons to have singular names vs plural.

Everything else seems like sound enough advice or is coming from an area I know little about or might be at odds with my particular use cases.

5 votes

vord
4 days, 17 hours ago
Link Parent
We're not in 1999 anymore. Much like variables, make your table and column names as long as they need to be to convey what they do at a glance. Oh and put comments on your tables and columns....

We're not in 1999 anymore. Much like variables, make your table and column names as long as they need to be to convey what they do at a glance.

Oh and put comments on your tables and columns. Future you will thank you.

4 votes

[11]

vord

4 days, 17 hours ago

Link

I worked as a DBA most of my adult life so far. Most experienced with Oracle, then SQL Server, then Postgres. The article is pretty good, but I have two major thoughts on the author's thoughts....

I worked as a DBA most of my adult life so far. Most experienced with Oracle, then SQL Server, then Postgres. The article is pretty good, but I have two major thoughts on the author's thoughts.

UUIDs for PK sucks unless you have a very specific need for it. Indexes are the lifeblood of anything beyond a toy database, and using UUIDs to join many things together gets nasty fast. The only time UUIDs make sense in my mind is highly distributed systems with intermittent connectivity which rely on eventual consistency.

Views are absolutely horrid. Don't use them. NEVER join views together. This is the understatement of the century:

The query planner also seems to have trouble seeing through them in general.

While Oracle has made progress on that front, it is still very much a crapshoot whether it will help or hurt any given query. A view is by and large a black box to an optimizer...it must compute the entire view to make use of it in a join. If you have two views which leverage a table with 100,000,000 rows in it, and you try to join those views together, you'll end up querying that table twice. The hit can be masked if there is good indexing and smaller tables, but if you end up hitting a full table scans (and with UUIDs for PKs, it'll happen more) on a gigantic tables, it's a nightmare.

3 votes

[2]
xk3 (OP)
4 days, 15 hours ago (edited 4 days, 15 hours ago)
Link Parent
Huh... TIL. I always thought postgres views are mostly transparent to the query planner but that is not always the case:...
Huh... TIL. I always thought postgres views are mostly transparent to the query planner but that is not always the case:
- https://dba.stackexchange.com/questions/151169/are-views-harmful-for-performance-in-postgresql
- https://asktom.oracle.com/ords/f?p=100:11:::::P11_QUESTION_ID:9087996409290
dbt (a common tool in the Analytics Engineering community) often favors using views:

Generally start with views for your models, and only change to another materialization when you notice performance problems.

But this is mostly for OLAP workflows which run a handful of times per week where the performance difference might not matter or might not immediately be obvious.

The speed difference between INT4 and UUID index lookup can be over an order of magnitude in really long and wide tables. Luckily they aren't usually that bad: https://ardentperf.com/2024/02/03/uuid-benchmark-war/

However, I absolutely agree that UUIDs are pretty awful, even as external IDs: they aren't great for debugging as they are hard to visually compare (you might think they are the same but your glance only compared the first and last 2 chars, etc). You can't even double-click-select them because they have dashes. And users definitely prefer smaller IDs like "BOB123"
3 votes
1. vord
  4 days, 7 hours ago
  Link Parent
  It does seem that the newer UUID schemes are significantly better than the ones I remember, but the memory pressure problem still applies when OLTP databases are sized in the multiple of...
  
  It does seem that the newer UUID schemes are significantly better than the ones I remember, but the memory pressure problem still applies when OLTP databases are sized in the multiple of terabytes.
  
  UUIDs seem to be much better in distributed systems where the network will be your bottleneck, but that's not always viable.
  
  1 vote
[3]
TurtleCracker
4 days, 4 hours ago
Link Parent
Doesn’t UUIDv7 address most of the indexing and performance concerns?

Doesn’t UUIDv7 address most of the indexing and performance concerns?

1 vote
1. [2]
  vord
  4 days, 4 hours ago
  Link Parent
  I mentioned elsewhere in this topic that I see UUIDv7 is much better, but the added 25% memory usage is still killer in large-scale applications. I like @krellor's solution of having a regular...
  
  I mentioned elsewhere in this topic that I see UUIDv7 is much better, but the added 25% memory usage is still killer in large-scale applications. I like @krellor's solution of having a regular integer PK, then a UUID for API usage.
  
  3 votes
  1. TurtleCracker
    3 days, 22 hours ago
    Link Parent
    We are transitioning from bigint to uuid for primary keys. The flexibility for the frontend to set the identifier without checking the backend first is super helpful. We also have many...
    
    We are transitioning from bigint to uuid for primary keys. The flexibility for the frontend to set the identifier without checking the backend first is super helpful. We also have many applications in our ecosystem that communicate and having certainty on ID uniqueness across them is worth it for us.
    
    4 votes
[5]
0xSim
4 days, 10 hours ago
Link Parent
How? We always use UUIDs in our tables in SQL Server, and it seems to be ok? We also manage a lot of data. Not financial transactions, but enough to have performances as the first concern.

using UUIDs to join many things together gets nasty fast

How?

We always use UUIDs in our tables in SQL Server, and it seems to be ok? We also manage a lot of data. Not financial transactions, but enough to have performances as the first concern.
1. [3]
  zod000
  4 days, 6 hours ago
  Link Parent
  UUID/GUIDs in general are just a lot slower to index and take up more space, so at scale they can become a performance issue. The only time I really care to use UUIDs for a PK is when the value in...
  
  UUID/GUIDs in general are just a lot slower to index and take up more space, so at scale they can become a performance issue.
  
  The only time I really care to use UUIDs for a PK is when the value in question will be publicly accessible as part of an API or URL. I really don't like having systems that are able to be trivially iterated over.
  
  2 votes
  1. [2]
    krellor
    4 days, 5 hours ago
    Link Parent
    In those situations I typically create an auto incrementing integer as the PK, and a separate UUID that can be exposed for public API invocations. That let's me handle foreign keys and other...
    
    In those situations I typically create an auto incrementing integer as the PK, and a separate UUID that can be exposed for public API invocations. That let's me handle foreign keys and other internal operations like joins with integers which are more efficient at scale, and have UUIDs for public reference.
    
    The largest public facing platform I ever managed was a massive regulatory database that allows the public to generate accounts and API tokens, allowed for custom query creation through the API or a web interface, and allowed third parties to build apps on top of our APIs using rate limited accounts. That was a ton of fun to optimize, and the work that went into it was quite the education. Ultimately I ended up having the relational database that the queries would run against using set operations against the internal keys to build a result set that only included unique IDs of what to return, and then a separate select behind the scenes against a denormalized warehouse of the data so that the request doesn't have to materialize data across so many foreign keys relationships.
    
    2 votes
    
    zod000
    4 days, 2 hours ago
    Link Parent
    I've done the exact same thing in specific instances where I felt the UUIDs were likely to cause the performance issues, but generally it wasn't worth the time unless the tables are really huge...
    
    I've done the exact same thing in specific instances where I felt the UUIDs were likely to cause the performance issues, but generally it wasn't worth the time unless the tables are really huge and/or have a constant flood of writes.
2. vord
  4 days, 7 hours ago (edited 4 days, 7 hours ago)
  Link Parent
  Had a vendor system that relied of UUIDs exclusively in an OLTP system. Had atomic row transactions on the order of several thousand per minute, often with a lot of updates or deletes. The indexes...
  
  Had a vendor system that relied of UUIDs exclusively in an OLTP system. Had atomic row transactions on the order of several thousand per minute, often with a lot of updates or deletes.
  
  The indexes would get trashed every hour. And because it was highly normalized, and often wide (see @xk3's reply), performance would tank without constant rebuilds because the indexes would be wide near the top and sparse on the branches, which is the opposite of what you would want.
  
  Unless your index is quite sparse, inserting UUIDs into a sorted index (best for performance) requires re-sorting the block. Having too sparse of an index makes it less performant. Imagine you're trying to create a sorted list in minimal amount of memory (as a large database will almost always be memory-constrained). If you put everything in a sequential row, you merely allocate another bit of memory and tack it on the end. But if you need to insert random data, and you run out of room, you have to reallocate the memory to fit the new record, moving around the data to fit it in. This also applies cache pressure which can age things out while they're still hot. And since a good database will cache both hot indexes and hot data, any cache pressure isn't great, and you can only scale memory so high relative to size of data.
  
  Other systems we've had with similar levels of transactional workload didn't have this problem because the PKs weren't essentially randomized at the top. They didn't need additional sorting, just being tacked on the end and periodically compacted.
  
  Edit: I edited quite heavily to fix some early-morning wrong-word logic.
  
  1 vote

[7]

revivinglaziness

4 days, 18 hours ago

Link

Thank you for sharing this! I'd seen many of these but not all in one place. This is my first time seeing enum tables, but I like the idea of that pattern a lot. It matches well with the liberal...

Thank you for sharing this! I'd seen many of these but not all in one place.

This is my first time seeing enum tables, but I like the idea of that pattern a lot. It matches well with the liberal (and justified, IMO) use of FK constraints they suggest. But I think it could also help avoid the need for some schema migrations, which are always a little bit nerve-wracking no matter how well you automate them.

1 vote

[6]
em-dash
4 days, 5 hours ago
Link Parent
I hate enum tables so much. They sound so tempting and yet they're so bad in practice. The thing about enum values is they usually correspond to enum values in your codebase (otherwise they...

I hate enum tables so much. They sound so tempting and yet they're so bad in practice.

The thing about enum values is they usually correspond to enum values in your codebase (otherwise they wouldn't really be enums, just data). Now instead of your application just requiring a specific schema version to work, it requires a specific schema and data version. This isn't better, it's just marginally different to migrate and more tempting to do it the wrong way.

At least this implementation uses the text representation as the key. Most people will autopilot add an integer or UUID key and now you have to make sure that mapping is consistent across environments.

4 votes
1. [5]
  vord
  4 days, 4 hours ago
  Link Parent
  The main way enum tables make sense is if you'll have many disparate teams working against your data model while not working closely together. Having a strict enum table helps insure data...
  
  The main way enum tables make sense is if you'll have many disparate teams working against your data model while not working closely together. Having a strict enum table helps insure data integrity and notification across teams (yo the enum table is adding/changing, make sure you won't have breakage).
  
  That said, they can be a real PITA if you don't need them.
  
  3 votes
  1. [4]
    em-dash
    3 days, 16 hours ago
    Link Parent
    Amusingly, with one particularly memorable pair of enum tables (in the same product) I had the opposite experience because of how badly they were handled: they had serial integer keys and people...
    
    helps insure data integrity and notification across teams
    
    Amusingly, with one particularly memorable pair of enum tables (in the same product) I had the opposite experience because of how badly they were handled: they had serial integer keys and people kept adding values to the end and using the same number as each other, and that caused Weird and Exciting Results.
    
    I finally "solved" this by writing the current maximum value for each in dry erase marker on the window next to my desk, and declaring that anyone wanting to add a new value must first physically walk over and increment the appropriate value and use the new value as the key. It's a mutex enforced by physics!
    
    3 votes
    
    [3]
    vord
    3 days, 16 hours ago
    Link Parent
    In-database sequence for declaring the enumID. Throw out anything they try to insert in its place.
    
    In-database sequence for declaring the enumID. Throw out anything they try to insert in its place.
    
    1 vote
    
    [2]
    em-dash
    3 days, 14 hours ago
    Link Parent
    The numbers had to match numbers in the corresponding code-enum, so no, you really did need to insert them with explicit IDs. (Epilogue: eventually I did just drop the tables, after inventing a...
    
    The numbers had to match numbers in the corresponding code-enum, so no, you really did need to insert them with explicit IDs.
    
    (Epilogue: eventually I did just drop the tables, after inventing a mildly terrifying pipeline to move all the extra metadata they had accumulated into constants in the right places in the codebase. It involved SQL that generated vim commands which were then pasted into a terminal with the right file open in vim (this was before bracketed paste was widely supported). I do not remember why "just regex-replace a CSV dump into an array of structs" wasn't an option, but I'm pretty sure I did it with vim specifically because I thought it was funny.)
    
    1 vote
    
    vord
    3 days, 9 hours ago
    Link Parent
    I'm sure I'm just missing key bits of context,but if they're choosing the numbers in a codebase that would imply they could change them, wouldn't it? If it is technically possible, but never done,...
    
    I'm sure I'm just missing key bits of context,but if they're choosing the numbers in a codebase that would imply they could change them, wouldn't it?
    
    If it is technically possible, but never done, it is a policy failure. Which is the worst kind of failure, because fixing it often involves lots of meetings to the point coping with the bad pattern is often more cost effective.

Akir

4 days, 5 hours ago

Link

Man, This conversation is making me miss the days when DBA was one of the hats I was wearing. I miss how much control you have over the data, and spending time to optimize how it’s stored,...

Man, This conversation is making me miss the days when DBA was one of the hats I was wearing. I miss how much control you have over the data, and spending time to optimize how it’s stored, queried, and restricted. Sadly I stopped around the time when Postgres started putting really interesting data in there because as a one man operation in an increasingly complex business, in-house development for the things that needed any sort of data management was untenable.

[2]

gco

1 day, 20 hours ago

Link

On UUIDs: I thought you still had to check for uniqueness to avoid collisions, is there something preventing this issue?

On UUIDs:

You don't need to coordinate with the database to produce one.

I thought you still had to check for uniqueness to avoid collisions, is there something preventing this issue?

xk3 (OP)
1 day, 19 hours ago (edited 1 day, 19 hours ago)
Link Parent
It's good practice to check but a lot of applications don't because the likelihood of collisions is almost zero from a probability perspective. But I would still do it if only for mental hygiene....

It's good practice to check but a lot of applications don't because the likelihood of collisions is almost zero from a probability perspective.

But I would still do it if only for mental hygiene. It helps me sleep better at night. But I rarely use UUID anyways. I would give a unique name to each database and then use a sequence generator for internal keys. Then you can use db_id + row_id as a unique internal key. The db_id could be the hardware MAC address if that is guaranteed to be unique. This solves essentially the same purpose that UUID has for distributed databases. This is a surrogate key: an artificially created key that has no inherent business meaning; a system-generated identifier, such as an auto-incrementing integer or UUID.

These are in contrast to "business keys": columns which have business meaning (aka natural keys) which need to be unique then you need to use that data anyway to check uniqueness. UUID does not help here.

1 vote

Link information

35 comments