What programming/technical projects have you been working on?
This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?
We just launched a rewrite of our app at work. This is the first SaaS product I've launched and the first time I'm operating an application that actually gets a consequential amount of traffic (2-3k new signups per day). It's been a bit exhausting but it's also really rewarding. This is so much better than my last job where maybe once every 3 months I'd feel like I learned something new.
Congratulations, it's not every day that a rewrite actually succeeds! That's a heck of an accomplishment.
Thanks! It hasn’t been without its issues but we’re still up and running and working through them.
I've been spending a lot of time on a project at work that involves changing over our data model from an old design (if you can call it that) using a bunch of JSONB blobs to store core business data, to a mostly well-normalized database with dozens of new tables.
Part of this is re-computing this data for every item in our system, in order to make sure it's available in the new format before any users actually need to use the new functionality. Unfortunately, a lot of the older data is completely malformed, which things like dates in place of ages (like, 2001 instead of 56) and people whose date of menopause is older than their current age. Stuff like that.
I've been adding patches to the part of the system that computes this data to do things like recognize a year in an age field and convert it to the appropriate age for that item. For instance:
With several tens of thousands of items, we're now down to just 33 items with data so malformed that even all my reconciliation logic can't figure things out. We'll probably go through them one by one today and just delete most of them; I seriously doubt they're actually in use.
I'm pretty proud of myself!
Did you do anything to try to automate finding anomalous data, or just plan on going through it manually?
Thank you! A lot of the work was actually adding robust logging so that we could identify which items weren't able to be computed, and why. Then we would kick off a batch job to compute all the uncomputed data every evening and sort of slowly whittle down the number of failing items.
My Ultimate Tic Tac Toe neural network (previous post) has finished the first training run of 5000 games. The results are somewhat disappointing, but that was kind of expected. Since it was the first attempt, all necessary training parameters and the network architecture were best guesses and probably far from the optimum. Tuning hyper-parameters sucks when you have to wait a week between each attempt...
The results of the first run by themselves are fairly weak, reaching at best an 85% win-rate against a random opponent when deployed without tree search.
Combining them with tree search (1s search time) has essentially a 100% win-rate against Random, so I need a better opponent to measure against. To that end, I'm in the process of writing a Rust program that uses Monte Carlo Tree Search (using the excellent — though somewhat under-documented — mcts crate); that should allow for a more nuanced comparison once I'm done.
The second iteration has gone through 800 games so far, and the latest snapshot has a 93% win-rate against Random (without tree search), so the tweaks I made to the hyper-parameters and the network architecture (now an 8-block ResNet) seem to have helped. I'm looking forward to seeing the final results of this training run.
I ended up implementing ultimate tic-tac-toe myself just now: https://github.com/Apostolique/UltimateTicTacToe.
My friend is gonna add multiplayer to it later.
It's not my cleanest code, but the implementation was pretty fun. A bit of a brain twister.
Maybe I can do an AI too afterwards. This is my favorite way to play tic-tac-toe now.
Edit: Added animations and recorded this: snip1, snip2.
Well, I haven't been doing much computer programming outside of work these days. Instead, I have been doing synthesizer programming. I'm trying to unify my understanding of various types of synthesis. As mentioned in another thread, I downloaded dexed, a DAW plug-in that emulates the Yamaha DX7 synth from the 80s. I have a pretty good understanding of analog synthesis. I used to have an Oberheim Matrix 6 and spent hours creating and tweaking sounds, some of which I actually performed with. But the FM synthesis used by the DX line was pretty different. I had a couple of FB01s, but they weren't really programmable.
So far I've figured out how to make a few usable sounds. The first thing I did was watch some tutorials about how to recreate analog waveforms like sawtooth and square waves. Useful, but I have a bunch of analog synth emulators, so I don't really need that.
What I did next was make a bass sound that when played softly is a simple mellow triangle-wave-like sound, but the louder you play it, the more growly it gets. FM is great for creating growly sounds.
After that I experimented with a few of the different FM algorithms they supply and ended up making something resembling a male voice. I mean, it sounds synthetic, but it's pretty decent. I definitely need to dive into the various algorithms more and understand what each one is for. I have some leads on some books for programming it, but haven't gotten around to purchasing any yet.
After doing that, I also went in the opposite direction and tried to see what sorts of "digital" sounds I could get out of my analog synths. I managed to make an OK DX7-style electric piano on my Oberheim 8 Voice emulator. I tried doing the same thing on my ARP 2600 emulator, since it has a ring modulator which is good for getting those high harmonics you need for the bell-like tone at the start of the voice. But those attempts proved less fruitful than the other stuff I did. Overall, it's been fun digging into this stuff.
I'm doing a project that involves training an artificial neural network, more specifically a multi-layer perceptron. I'm doing it on Python using Tensorflow which makes things relatively simple. The hard part comes when I want to create a custom loss function that includes the derivative of the ANN with respect to inputs of the model.
I have more than 6M observations and it blows up (meaning the available RAM is not enough) both my machine (16GB) and Google's Colab (who knows how many GBs that thing has. I've resigned myself to do slow training to only around 100k observations and continue the project that way. I need it finished by the end of the year.
There is obviously some problem in my code (most likely in how I'm calculating the derivatives) but I can't figure it out. Tried StackOverflow and Reddit but to no avail.
This does sound interesting. Would you be willing to elaborate a bit more?
What exactly are you trying to achieve with the custom loss function? What is your batch size?
This is pure speculation, but if the computed derivatives become part of the loss, Tensorflow may be using automatic differentiation on top of the derivatives you computed manually, so you end up with derivatives of derivatives, increasing the RAM requirements.
I can share a StackOverflow post with the code and some discussion on the problem.
Hm. That code looks really strange:
Your MLP has a LeakyReLU directly after the input layer (?)
The LeakyReLUs in your hidden layers have
alpha=1, which means they don't do anything at all
At the beginning of
constrained_mseyou're converting the entire six million data points into a Tensor. That will run out of memory (edit: or at least be very slow) because Keras will call that function multiple times IIUC. I get error messages like the following when I run the code:
2021-09-30 17:54:51.181884: W tensorflow/core/framework/cpu_allocator_impl.cc:80] Allocation of 3041280000 exceeds 10% of free system memory.
I can't really comment on the rest of constrained_mse; I'm more familiar with PyTorch than Keras/TF, but I think the RAM issue is really caused by the
tf.convert_to_tensor-call. What you need is a data loader that gives you the data in smaller batches.
TBH, my suggestion would be to go through the Tensorflow tutorials again, especially the ones on loading data from a pandas frame.
I will pick up the code again in the next few days so I will think about what you said and reply back.
I am adding functionality to linklonk.com that would let users share their collections of liked content through a public link. Trying to decide how to structure the public link - should it be randomly generated (like Google Docs) or should it be controlled by the user.
Trying to get some user feedback here https://linklonk.com/item/3854292744564965376
One user suggested: let the user control the prefix and make the suffix random (and therefore hard to guess) like: /collection/movies_QRadZdIWL.
The initial version is done. The announcement explains the design decisions: https://linklonk.com/item/9109722933069545472
I don't really code or anything, but a few weeks ago I decided to setup fancyindex for nginx.
I decided to add in some icons for common file types and per-folder readme.md files, which are handy. I initially had plaintext, but I wanted to emphasize something and couldn't -- so I went overboard, using zero-md.
Anyway, here's a screenshot. I changed the header so people wouldn't find it and crack the ridiculous password.
All in all, fancyindex is handy and a total breeze to work with.