7
votes
Fortnightly Programming Q&A Thread
General Programming Q&A thread! Ask any questions about programming, answer the questions of other users, or post suggestions for future threads.
Don't forget to format your code using the triple backticks or tildes:
Here is my schema:
```sql
CREATE TABLE article_to_warehouse (
article_id INTEGER
, warehouse_id INTEGER
)
;
```
How do I add a `UNIQUE` constraint?
Have you ever faced the problem of taking a float or widely-varying int value and constraining it to a handful of discrete values? What did you use to do that? Did you like it?
I'm working on a project that does this right now and I'm having a heck of a time designing an API.
So a function that just maps from a float to an enum?
Or from a float or int to a restricted set of floats/ints. A great example is a Eurorack module that quantizes voltage to a scale, so you're only playing notes in key. The use case at work is binning ages into "child", "adolescent", "young adult", etc.
A big if statement is fine if you're only going to do it for one value but it gets tedious and brittle for many.
Do you intend to always go with the naive process of evenly chunking the domain and mapping each chunk to a value in the range? If not then there is no generic way to implement this and I would just write a function for each use case. But if you do go with the naive way then I would recommend something like:
Note that in the above code
input_max
is a non-inclusive boundary. Ifinput
were to equalinput_max
then you would index past the last element ofoutputs
.This is a really good approach for that case, thank you! I'm thinking also about cases where there are overlapping ranges, though - I was going to use something like an interval tree internally, but then I need an API for specifying precedence.
I think in that case iterating over a vector of ranges will work just fine. The order in the vector defines precedence.
Edit:
For example:
Yeah, absolutely. What I'm interested in is designing the API - would it be better to produce a closure from these ranges? A mutable collection of ranges you can change over time? Etc.
I don’t know enough about your use case to really say - but using closures for this seems like a weird solution, especially so in rust. My recommendation is use the solution that is the most boring and least impressive.
Huh. I think I should capture some context around this (so to speak), maybe I'll write a blog post.
Thank you for the help!
That would be helpful.
Okay, threw together an example in the Playground. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=ecbcccda9a0a3995cec9457d7366cfb7
The name "bucketizer" made me laugh.
So it looks like they'd both work. But with the closure the API is obtuse at the calling site. If you're not the person that created the closure there's no obvious explanation for how this domain/range map works. The type is
impl Fn(&T) -> Option<R>
which could be implemented in one of an infinite number of ways. With the struct option you can quickly go to the implementation and notice thebuckets
struct field.Edit:
Additionally, if you ever want a second way to interact with these buckets then you're SOL with a closure unless you want to introduce a ton of code smell. A struct with methods can be easily extended.
Heh, yeah, I considered
Quantizer
(what they're called in the audio world) but it is overloaded a bit.And, yeah, the closure API does seem a little opaque. I would definitely add some documentation to both, though.
Maybe you don’t plan to have this used in such a way where this matters, but there’s no way to attach documentation to a stray closure variable. Imagine you’re in a function far far away from the bucketizing closure’s heap allocation. All you have is a closure. It’s passed in as a parameter. This code was written by someone else so you don’t know where the closure comes from without going up the call stack. There’s not a great way to know from a single source of truth what this closure is and how it works. A modern text editor using a language server could point you directly to a variable’s struct type definition though.
Depends on your requirements but chucking the mantissa and just using a map of exponents and sign value would be simple to bin by powers of ten.
Oh, good call! Could be very useful for that special case.
Need more info on your stack, data scale, and app-specific context to give a good answer. But some basic approaches I would use on my preferred stack as a data engineer: Pandas’ Series.cut, or db-side, stored procedure to create/update a table of balanced Postgres Range types to map onto. “Binning” is the term to search for in your respective ecosystem to find existing solutions.
I'll look into those APIs and see if there's anything I can pull for the design, thank you! In this case, it's Rust, and there's no good library function for it at the moment.
To clarify, are we talking about a library API to be consumed by other programs, or a REST-style web API that you’re designing?
The former! The latter would be very wasteful, most likely.
Gotcha, in that case I’m out of my element - I don’t know the Rust ecosystem well enough to comment on design standards.