Just skimming right now. Don't do correlations like this - you're bound to find a correlation: (1) Nothing fundamentally can be in the top right quadrant. There is no product that appeared last...
(1) Nothing fundamentally can be in the top right quadrant. There is no product that appeared last year with a TTOSA of 10000. Can't be. Adjust for that.
(2) The bottom left is also bound to be scarce because open source wasn't much of a thing long ago, and long ago products that were quickly displaced by FOSS are potentially forgotten. In particular, the dead-since-the-80s proprietary inspiration wouldn't necessarily appear in the readme.
With how messily intertwined these two measurements are, I hope the author is a lot more careful about his statistics than just drawing a line through that scatter plot.
One mitigation for (1) could be to bake a cap into the scatter plot: Products where the proprietary founding was less than 10 years ago will be ignored, as well as products where the TTOSA is more than 10 years. That's quite restrictive, but I think that should cut off the data in such a way as to not find a correlation that doesn't exist: You're capturing all the products released in that timeframe, and you know about all of them whether they were replaced in 10 years.
Just skimming right now.
Don't do correlations like this - you're bound to find a correlation:
(1) Nothing fundamentally can be in the top right quadrant. There is no product that appeared last year with a TTOSA of 10000. Can't be. Adjust for that.
(2) The bottom left is also bound to be scarce because open source wasn't much of a thing long ago, and long ago products that were quickly displaced by FOSS are potentially forgotten. In particular, the dead-since-the-80s proprietary inspiration wouldn't necessarily appear in the readme.
With how messily intertwined these two measurements are, I hope the author is a lot more careful about his statistics than just drawing a line through that scatter plot.
One mitigation for (1) could be to bake a cap into the scatter plot: Products where the proprietary founding was less than 10 years ago will be ignored, as well as products where the TTOSA is more than 10 years. That's quite restrictive, but I think that should cut off the data in such a way as to not find a correlation that doesn't exist: You're capturing all the products released in that timeframe, and you know about all of them whether they were replaced in 10 years.