I know it's framed as a relatively low stakes human interest story, but this is wrong enough to annoy me. It's not an issue with null as a concept, that's just as easy to distinguish from the...
And it's all thanks to a computer scientist who decided that, when it comes to programming, the word should be reserved for signifying an invalid or non-existent value.
I know it's framed as a relatively low stakes human interest story, but this is wrong enough to annoy me. It's not an issue with null as a concept, that's just as easy to distinguish from the actual name "Null" as it is from any other string.
The problems here are entirely thanks to data being parsed from strings without proper safeguards, it's a very strong sign that the systems were written in a way that isn't fit for purpose, and edge cases like this are just the tip of the iceberg.
Yup, null is considered a "billion dollar mistake", not because it comes from a common surname or anything, but because any data can be null, leading to you either needing to check for it...
Yup, null is considered a "billion dollar mistake", not because it comes from a common surname or anything, but because any data can be null, leading to you either needing to check for it everywhere, or (what happens in the real world) you get really strange bugs because something was null when you didnt expect it to be. Sum types allowing you to implement an optional type are much better, because it leads to being explicit about where None can appear.
I feel like this mischaracterizes the issue somewhat. The problem here isn't null references (the mistake that Tony Hoare regrets), but assigning special status to representable values of value...
The origin of this coding blunder traces back six decades to a British computer scientist who first gave null its special reserved status. He obviously didn't take into account the 4,910th most common surname when he did so, and has regretted the move ever since, even calling it a "billion-dollar mistake."
I feel like this mischaracterizes the issue somewhat. The problem here isn't null references (the mistake that Tony Hoare regrets), but assigning special status to representable values of value types. A null reference in Algol sense isn't the same thing as a string value with the content "null" or any value at all. It really boils down to shoddy engineering; null was invented exactly so that you can have a reference value that represents nothing without having a special value type (as opposed to reference type) value that represents nothing.
Weak type systems e.g. of the "stringly typed" kind, awful engineering around things like building database queries, bad band-aid solutions to poor data or poor database schema, other "clever" representation solutions...these are the problems in question, not null.
Tony Hoare regrets null as it was implemented in Algol for its own qualities: the language doesn't force you to check whether references are null before you attempt to dereference them, and many still popular languages inherited this, resulting in the "billion-dollar mistake". It truly is a billlion-dollar mistake in its own right, and it has caused so many crashes and exceptions not to mention worse, more subtle bugs, but this issue in particular is caused by a whole different league of poor engineering you probably couldn't coax out of Tony Hoare at gunpoint.
If I devil's advocate this there is one way I can think of to have the article be using the billion dollar mistake correctly, although it's a bit of a stretch and requires some extra assumptions....
If I devil's advocate this there is one way I can think of to have the article be using the billion dollar mistake correctly, although it's a bit of a stretch and requires some extra assumptions.
If we assume that the relevant code is written in some of the worst offending languages for the mistake, such as Java, then the fact that a string is always nullable makes the error of handling the name incorrectly way easier. What I mean by this is that String name = input.name; makes it feel like it's a "real" value in a way that might make you not think twice about putting it into string builders or other stuff that would normally expect it to actually be a string. The lack of needing to express that it's nullable may have led to poor handling in other layers, such as db query serialization, because they weren't forced to explicitly decide how null would be handled.
Eh. Anything can be undefined in JS, plus undefined vs null (and inconsistent use up to the developer). TypeScript with well defined types helps put guardrails on JS, and optional chaining makes...
Eh. Anything can be undefined in JS, plus undefined vs null (and inconsistent use up to the developer). TypeScript with well defined types helps put guardrails on JS, and optional chaining makes working with them easier, but I still prefer a language with an explicit Option[Foo] and handlers in the core.
The small (like three people) company I work at is all but certain to grow even with the world tensions for various reasons. I'm still mentally preparing myself for when our older databases have...
The small (like three people) company I work at is all but certain to grow even with the world tensions for various reasons. I'm still mentally preparing myself for when our older databases have to deal with this and more.
A pizza place I use semi-frequently only required me to enter a first name when I had originally created my account with them, but when they upgraded their system to include a last name, they...
A pizza place I use semi-frequently only required me to enter a first name when I had originally created my account with them, but when they upgraded their system to include a last name, they actively gave me the last name "NULL". Apparently it doesn't break their system though, since they still collect my payment and the 'za still arrives hot!
In most(?) regions of the US, if you have a "modern" vehicle (not designated as "historic") that is gas powered, you have to regularly get an emissions test - once every 2 years seems to be the...
In most(?) regions of the US, if you have a "modern" vehicle (not designated as "historic") that is gas powered, you have to regularly get an emissions test - once every 2 years seems to be the norm.
I have a family member who never had to do this over their entire ownership of a vehicle (12+ years).
Their name contains the letters "DEL". I'm convinced this is the reason.
Apparently they used to be more common, but as the number of service stations/mechanics (as opposed to mere gas stations) declined, many states abolished them rather than establish their own...
Apparently they used to be more common, but as the number of service stations/mechanics (as opposed to mere gas stations) declined, many states abolished them rather than establish their own facilities.
Edit: NM, changed link back to techspot since the WSJ article actually returns 'Application error: a client-side exception has occurred (see the browser console for more information).' after a few...
Edit: NM, changed link back to techspot since the WSJ article actually returns 'Application error: a client-side exception has occurred (see the browser console for more information).' after a few seconds in both Firefox and Chrome.
The submitted techspot article is basically a significantly paired down version of the WSJ one they quoted in the article, with absolutely no new information added (AFAICT), so I have changed the link to the WSJ article.
LOL, yeah it's pretty funny... unintended consequences! WSJ finally fixed it now, at least. But I ain't gonna change the link and tags a third time. If people want to read the WSJ version they can...
LOL, yeah it's pretty funny... unintended consequences! WSJ finally fixed it now, at least. But I ain't gonna change the link and tags a third time. If people want to read the WSJ version they can just use the mirror I posted. :P
Just a meta-observation (having been there myself), I love how the first three comments on here are all (possibly annoyed sounding) software peers commenting within the first 30 minutes- all with...
Just a meta-observation (having been there myself), I love how the first three comments on here are all (possibly annoyed sounding) software peers commenting within the first 30 minutes- all with overlapping (because they were being written at the same time) frustration at how and why this is a problem!
I resemble this, and approve, and am vaguely amused. You guys win today. Thank you
I've always wanted to talk to someone who understood database development and design, because that's where a lot of this stems from. Programs can null check and ideally just not allow for nulls...
I've always wanted to talk to someone who understood database development and design, because that's where a lot of this stems from.
Programs can null check and ideally just not allow for nulls unless told to (part of why I use F#), but everytime I deal with SQL it's like i'm teleporting back 20 years and now dealing with worse tooling, worse rules, worse support and a lot more nonsense.
And to be clear I use SQL a LOT. I'm not in one of those positions where I just pop up a mysql db for storage in my app, but instead doing some fairly major heavy lifting in it (and thank god have found someone better than me to handle the tricky stuff).
I've tried to make sure that every table we make is "NOT NULL" for every column we have because that way you can quickly tell if a join isn't returning a value (because it's the only way a null can get in there) vs....well it's just a null.
As far as I can tell though SQL is very very finnicky about stuff like this, even more modern flavors like postgress. The minimal understanding I have is that it's a system that was built to be performant first, then sorta kinda human readable second (poorly i might add), and then cared about modern coder needs.
May be that we're using them for very different things, but I want nulls in my database! I'm a lot more concerned about accuracy than anything else when it comes to my data model - including...
May be that we're using them for very different things, but I want nulls in my database! I'm a lot more concerned about accuracy than anything else when it comes to my data model - including accurately recording the difference between things like '' (that person is known to have no last name), 'Null' (that person has the last name Null), and NULL (the information for that person's last name is missing). I don't know of a robust way to mark a value as missing without using null.
But then again I've got the luxury being able to keep my actual queries relatively basic and handling the rest of the logic in the application layer, and I do that primarily because I find SQL itself pretty arcane and clunky when I do need to dive in to do something deep and complex with ACID guarantees, so I can very much believe I might feel differently if I were using it more in my day to day work.
In my model we default to ‘None’ or ‘None Provided’ in that scenario. Again the goal being to make sure that we can instantly identify that NULL means “join returned nothing” not “join returned...
In my model we default to ‘None’ or ‘None Provided’ in that scenario.
Again the goal being to make sure that we can instantly identify that NULL means “join returned nothing” not “join returned nothing or blank value”
It makes sense, it’d just always give me a creeping sense of unease waiting for Dr. None Provided to come along and break things when they sign up for the service. Obviously you know your data...
It makes sense, it’d just always give me a creeping sense of unease waiting for Dr. None Provided to come along and break things when they sign up for the service. Obviously you know your data domain and I’m sure you’re choosing placeholder values that are definitely outside that and couldn’t possibly end up in the DB in reality, but I dunno, I’ve just seen enough 30 year old systems running on duct tape and wishes a decade past their decommissioning date with at least 20 “impossible” changes somehow bolted to the side that it makes me nervous to build in assumptions.
I’m that annoying guy in the meeting insisting on being impractically purist about the data at rest and then throwing coalesce statements in at query time with abandon, basically.
That's why I have 'None Provided' for last name, and again the chance that a Mr/Mrs None provided is substantially less than me putting in a null and then getting a null on a join, which happens...
That's why I have 'None Provided' for last name, and again the chance that a Mr/Mrs None provided is substantially less than me putting in a null and then getting a null on a join, which happens all the time.
Ideally ALL reasonable states should be modeled, which is what drives me nuts about sql, and failed joins should return a SQL specific "join returned nothing" code or something, but since they're using null for it, i'm never ever using null in my database if I can avoid it as it adds one extra layer of confusion and investigation on every null value.
In short, in my circumstance I will fail/have issues if I ever get someone with 'None Provided' as a last name.
In your circumstance you can get issues on join that returns null because that's not enough information to determine if the record doesn't exist or if the value for that record is null.
That’s what all those coalesces are for in my queries! Nulls get replaced with a column-specific UUID just within the context of the query, that way you can identify what was missing (or...
That’s what all those coalesces are for in my queries! Nulls get replaced with a column-specific UUID just within the context of the query, that way you can identify what was missing (or nonexistent) without changing what gets stored. But yeah, hardly elegant and definitely prone to human error, so don’t take this as me saying everyone should always be doing it that way.
I keep mulling this over and feel like that's got to be brutal? You'd have coalesce wrappings on everything (since you're already willing to use null), and now you're handling it in your...
I keep mulling this over and feel like that's got to be brutal? You'd have coalesce wrappings on everything (since you're already willing to use null), and now you're handling it in your view/query definitions rather than in your actual table structure?
Seems like it'd only cause more confusion when you have to debug something complex like multiple CTE's with windowing functions and the like?
In my experience it's verbose but not too bad in terms of logic - you pretty much just blindly replace last_name with coalesce(last_name, last_name_placeholder) as last_name anywhere that it's...
In my experience it's verbose but not too bad in terms of logic - you pretty much just blindly replace last_name with coalesce(last_name, last_name_placeholder) as last_name anywhere that it's used, and then you can treat it just the same as you would a column that isn't nullable. In your case, using 'None Provided' as the placeholder value would reproduce the exact behaviour you've got now - and for debugging you can easily drop in something easy to spot in the output like 'HEY I AM A LAST NAME PLACEHOLDER' - but then in real use you just set it to a UUID (or other value that can be transactionally confirmed not to be in the column for the duration of the query) and it all functions the same way.
It's wordy as hell, and the other half of the answer is that most of the time my real solution is "if I'm joining on this particular column, the application layer doesn't want records where it's null anyway", but I've found strategic coalesceing pretty helpful on the occasions I do need it! Possibly because if I find myself doing anything more than the basics in SQL rather than at the application side, I've already resigned myself to a somewhat frustrating day...
Ideally you don't allow the data to get there and handle that in your model. You don't populate a bool, int, timestamp unless you know it, you populate something else saying "these are the known...
What’s your default “I don’t actually know this” value for bool? Int? Timestamp?
Ideally you don't allow the data to get there and handle that in your model. You don't populate a bool, int, timestamp unless you know it, you populate something else saying "these are the known values". You try your best to model in such a way that you cannot represent erroneous states.
Obviously I rarely get to do that outside of apps I build, but for the vast majority of the remaining data it's pretty easy?
Dates are the easiest since they default to 1/1/1900. I still hate this because you can wind up doing math on a date that is a "not date", but that's fairly easy to sanity check.
Numbers depends? In the vast majority of cases I'm dealing with positive int's and keys and what not so I can just use something like -1.
Bool I basically don't use if it can be unknown, because I don't think that's ever a good idea, but in the cases where I inherent that from some other system I'll see if there's a rational "less damaging" default, but in almost any other case I won't use bool because it's just asking for trouble to have null/false/true as your options in a zillion little spots.
Hilariously, this article broke Tildes (not really though).
This is Tildes hall of fame material.
I know it's framed as a relatively low stakes human interest story, but this is wrong enough to annoy me. It's not an issue with null as a concept, that's just as easy to distinguish from the actual name
"Null"
as it is from any other string.The problems here are entirely thanks to data being parsed from strings without proper safeguards, it's a very strong sign that the systems were written in a way that isn't fit for purpose, and edge cases like this are just the tip of the iceberg.
Yup, null is considered a "billion dollar mistake", not because it comes from a common surname or anything, but because any data can be null, leading to you either needing to check for it everywhere, or (what happens in the real world) you get really strange bugs because something was null when you didnt expect it to be. Sum types allowing you to implement an optional type are much better, because it leads to being explicit about where
None
can appear.I feel like this mischaracterizes the issue somewhat. The problem here isn't null references (the mistake that Tony Hoare regrets), but assigning special status to representable values of value types. A null reference in Algol sense isn't the same thing as a string value with the content "null" or any value at all. It really boils down to shoddy engineering; null was invented exactly so that you can have a reference value that represents nothing without having a special value type (as opposed to reference type) value that represents nothing.
Weak type systems e.g. of the "stringly typed" kind, awful engineering around things like building database queries, bad band-aid solutions to poor data or poor database schema, other "clever" representation solutions...these are the problems in question, not null.
Tony Hoare regrets null as it was implemented in Algol for its own qualities: the language doesn't force you to check whether references are null before you attempt to dereference them, and many still popular languages inherited this, resulting in the "billion-dollar mistake". It truly is a billlion-dollar mistake in its own right, and it has caused so many crashes and exceptions not to mention worse, more subtle bugs, but this issue in particular is caused by a whole different league of poor engineering you probably couldn't coax out of Tony Hoare at gunpoint.
If I devil's advocate this there is one way I can think of to have the article be using the billion dollar mistake correctly, although it's a bit of a stretch and requires some extra assumptions.
If we assume that the relevant code is written in some of the worst offending languages for the mistake, such as Java, then the fact that a string is always nullable makes the error of handling the name incorrectly way easier. What I mean by this is that
String name = input.name;
makes it feel like it's a "real" value in a way that might make you not think twice about putting it into string builders or other stuff that would normally expect it to actually be a string. The lack of needing to express that it's nullable may have led to poor handling in other layers, such as db query serialization, because they weren't forced to explicitly decide how null would be handled.And this is part of why I love JavaScript's usage of
undefined
and optional chaining likemyObject.?name
to write more robust code.Eh. Anything can be
undefined
in JS, plusundefined
vsnull
(and inconsistent use up to the developer). TypeScript with well defined types helps put guardrails on JS, and optional chaining makes working with them easier, but I still prefer a language with an explicitOption[Foo]
and handlers in the core.(I write a lot of JS/TS and Scala)
The
?.
operator is also known as the Elvis operator.Probably related: Falsehoods Programmers Believe About Names
The small (like three people) company I work at is all but certain to grow even with the world tensions for various reasons. I'm still mentally preparing myself for when our older databases have to deal with this and more.
Oh man, I can't help but laugh about it but this can be really serious - think about your appointment getting cancelled at a hospital due to this!
A pizza place I use semi-frequently only required me to enter a first name when I had originally created my account with them, but when they upgraded their system to include a last name, they actively gave me the last name "NULL". Apparently it doesn't break their system though, since they still collect my payment and the 'za still arrives hot!
More likely it is
null
- as in nothing - in their system, and only when it needs to be displayed it’s converted to the string “NULL”.In most(?) regions of the US, if you have a "modern" vehicle (not designated as "historic") that is gas powered, you have to regularly get an emissions test - once every 2 years seems to be the norm.
I have a family member who never had to do this over their entire ownership of a vehicle (12+ years).
Their name contains the letters "DEL". I'm convinced this is the reason.
Vehicle inspections are less common than you'd think: https://en.wikipedia.org/wiki/Vehicle_inspection_in_the_United_States
Apparently they used to be more common, but as the number of service stations/mechanics (as opposed to mere gas stations) declined, many states abolished them rather than establish their own facilities.
Edit: NM, changed link back to techspot since the WSJ article actually returns 'Application error: a client-side exception has occurred (see the browser console for more information).' after a few seconds in both Firefox and Chrome.
The submitted techspot article is basically a significantly paired down version of the WSJ one they quoted in the article, with absolutely no new information added (AFAICT), so I have changed the link to the WSJ article.
p.s. Mirror of WSJ article:
https://archive.is/Usm8i
I wonder if between comments & any tags that they have, if WSJ is suffering from the same issue we had a couple of hours ago ;)
I love how this article is taking down multiple sites and just reinforcing the whole issue with "Null"
LOL, yeah it's pretty funny... unintended consequences! WSJ finally fixed it now, at least. But I ain't gonna change the link and tags a third time. If people want to read the WSJ version they can just use the mirror I posted. :P
Just a meta-observation (having been there myself), I love how the first three comments on here are all (possibly annoyed sounding) software peers commenting within the first 30 minutes- all with overlapping (because they were being written at the same time) frustration at how and why this is a problem!
I resemble this, and approve, and am vaguely amused. You guys win today. Thank you
I've always wanted to talk to someone who understood database development and design, because that's where a lot of this stems from.
Programs can null check and ideally just not allow for nulls unless told to (part of why I use F#), but everytime I deal with SQL it's like i'm teleporting back 20 years and now dealing with worse tooling, worse rules, worse support and a lot more nonsense.
And to be clear I use SQL a LOT. I'm not in one of those positions where I just pop up a mysql db for storage in my app, but instead doing some fairly major heavy lifting in it (and thank god have found someone better than me to handle the tricky stuff).
I've tried to make sure that every table we make is "NOT NULL" for every column we have because that way you can quickly tell if a join isn't returning a value (because it's the only way a null can get in there) vs....well it's just a null.
As far as I can tell though SQL is very very finnicky about stuff like this, even more modern flavors like postgress. The minimal understanding I have is that it's a system that was built to be performant first, then sorta kinda human readable second (poorly i might add), and then cared about modern coder needs.
May be that we're using them for very different things, but I want nulls in my database! I'm a lot more concerned about accuracy than anything else when it comes to my data model - including accurately recording the difference between things like
''
(that person is known to have no last name),'Null'
(that person has the last name Null), andNULL
(the information for that person's last name is missing). I don't know of a robust way to mark a value as missing without using null.But then again I've got the luxury being able to keep my actual queries relatively basic and handling the rest of the logic in the application layer, and I do that primarily because I find SQL itself pretty arcane and clunky when I do need to dive in to do something deep and complex with ACID guarantees, so I can very much believe I might feel differently if I were using it more in my day to day work.
In my model we default to ‘None’ or ‘None Provided’ in that scenario.
Again the goal being to make sure that we can instantly identify that NULL means “join returned nothing” not “join returned nothing or blank value”
It makes sense, it’d just always give me a creeping sense of unease waiting for Dr. None Provided to come along and break things when they sign up for the service. Obviously you know your data domain and I’m sure you’re choosing placeholder values that are definitely outside that and couldn’t possibly end up in the DB in reality, but I dunno, I’ve just seen enough 30 year old systems running on duct tape and wishes a decade past their decommissioning date with at least 20 “impossible” changes somehow bolted to the side that it makes me nervous to build in assumptions.
I’m that annoying guy in the meeting insisting on being impractically purist about the data at rest and then throwing
coalesce
statements in at query time with abandon, basically.That's why I have 'None Provided' for last name, and again the chance that a Mr/Mrs None provided is substantially less than me putting in a null and then getting a null on a join, which happens all the time.
Ideally ALL reasonable states should be modeled, which is what drives me nuts about sql, and failed joins should return a SQL specific "join returned nothing" code or something, but since they're using null for it, i'm never ever using null in my database if I can avoid it as it adds one extra layer of confusion and investigation on every null value.
In short, in my circumstance I will fail/have issues if I ever get someone with 'None Provided' as a last name.
In your circumstance you can get issues on join that returns null because that's not enough information to determine if the record doesn't exist or if the value for that record is null.
That’s what all those
coalesce
s are for in my queries! Nulls get replaced with a column-specific UUID just within the context of the query, that way you can identify what was missing (or nonexistent) without changing what gets stored. But yeah, hardly elegant and definitely prone to human error, so don’t take this as me saying everyone should always be doing it that way.I keep mulling this over and feel like that's got to be brutal? You'd have coalesce wrappings on everything (since you're already willing to use null), and now you're handling it in your view/query definitions rather than in your actual table structure?
Seems like it'd only cause more confusion when you have to debug something complex like multiple CTE's with windowing functions and the like?
In my experience it's verbose but not too bad in terms of logic - you pretty much just blindly replace
last_name
withcoalesce(last_name, last_name_placeholder) as last_name
anywhere that it's used, and then you can treat it just the same as you would a column that isn't nullable. In your case, using'None Provided'
as the placeholder value would reproduce the exact behaviour you've got now - and for debugging you can easily drop in something easy to spot in the output like'HEY I AM A LAST NAME PLACEHOLDER'
- but then in real use you just set it to a UUID (or other value that can be transactionally confirmed not to be in the column for the duration of the query) and it all functions the same way.It's wordy as hell, and the other half of the answer is that most of the time my real solution is "if I'm joining on this particular column, the application layer doesn't want records where it's null anyway", but I've found strategic
coalesce
ing pretty helpful on the occasions I do need it! Possibly because if I find myself doing anything more than the basics in SQL rather than at the application side, I've already resigned myself to a somewhat frustrating day...How does that work with non-text columns? What’s your default “I don’t actually know this” value for bool? Int? Timestamp?
Ideally you don't allow the data to get there and handle that in your model. You don't populate a bool, int, timestamp unless you know it, you populate something else saying "these are the known values". You try your best to model in such a way that you cannot represent erroneous states.
Obviously I rarely get to do that outside of apps I build, but for the vast majority of the remaining data it's pretty easy?
Dates are the easiest since they default to 1/1/1900. I still hate this because you can wind up doing math on a date that is a "not date", but that's fairly easy to sanity check.
Numbers depends? In the vast majority of cases I'm dealing with positive int's and keys and what not so I can just use something like -1.
Bool I basically don't use if it can be unknown, because I don't think that's ever a good idea, but in the cases where I inherent that from some other system I'll see if there's a rational "less damaging" default, but in almost any other case I won't use bool because it's just asking for trouble to have null/false/true as your options in a zillion little spots.