10
votes
What programming/technical projects have you been working on?
This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?
I finally managed to beat LunarLander-v2 (see my last post for details) in about 400 episodes and about 90 minutes of runtime on a CPU.
The winning solution came a bit unexpected, as it is a Deep Q-Network variant, Implicit Quantile Networks. The big difference between DQN and IQN is that the latter is a distributional algorithm. That means that it does not just try to estimate the mean of the rewards for each action, but the entire distribution, which helps when the reward distribution is multimodal. If you imagine an action that could either be very good or very bad, then simply taking the mean is going to be an inaccurate characterization of that action.
Contrary to DQN, I find IQN to be difficult to comprehend. From what I understand, they work by transforming uniform random samples from the [0, 1]-interval into estimates of the reward distribution at the sampled quantile. That is, you pick a random number, say 0.6, and the network gives you the expected reward (for each possible action) at the 60th percentile of the reward distribution (of that action).
In order to act in the environment, you draw several samples and average them as a characterization of each action. I have no idea why this works so much better than just estimating the mean in the first place (other than the intuition I gave above), but it does, and the spaceship quickly lands safely.
I have some more studying to do if I want to thoroughly understand why exactly it works, but I'm glad that I finally finished my quest to find solutions for LunarLander on both ends of the time vs. sample-efficiency trade-off.
My Pinephone Brave Heart edition finally arrived so I've been playing with Ubuntu Touch. I have grand plans to have a crack at submitting patches for that but we'll see how I go, I haven't exactly got any OS development experience as it is.
Wrote about this in the "what did you do last week?" thread, but probably was a better fit for this thread in retrospect! But long story short had a pretty big breakthrough with a complicated project with a really rocky start and a looming deadline. Lots of details in that comment thread -- https://tildes.net/~talk/lg6/what_did_you_do_this_week#comment-4izq
I'm poking at building a simple risk-matrix generator web site, in part to scratch an itch I have, and in part to get caught back up on some of the more recent development tools. I'm looking at whether it makes any sense to integrate some fancy JS frontend with my preferred Rails backend, but honestly, at this point the site is so simple that I'm hard-pressed for any reason to take on the technical complexity.
Currently writing a discord bot that analyzes server, like channel and user activity, top users and etc. Using python for the bot and hosting it on a raspberry pi zero.
At work: setting up CI/CD for our Qt program to build Linux, MacOS, and Windows executables. Coming from a web dev & interpreted language background in general, I was intimidated by the task but I'm learning that the world of C++ and compiled software isn't really all that hard.
At home: working through Project Euler in C++ to learn the language, trying to wrap up Advent of Code, and designing a minimal weekly meal prep web app to learn Elm and brush up on Django.
Gonna start working my way through How to Design Programs and hopefully learn me some things.
As I mentioned a couple weeks ago, I'm very slowly working on a rewrite of Arx Fatalis. So far, I think the most interesting thing to come out of this is seeing how little some things have changed. In retrospect it makes sense; it's probably a combo of backwards compatibility and "if it ain't broke."
For example, the way that Rust's
winit
window event loop works had me worried. It only allows handling a single event at a time, so I thought I might need to write my own input buffering code. Then it turns out that the way a window's event loop works hasn't changed much since at least 2002. Arx Fatalis also needed to work off of this, so when I get to this part no significant changes will be needed.Also, there are more oddities I've found within the source code. One I'm trying to figure out right now is this visible paranoia surrounding the initialization code. Any time something is initialized it's deleted first. Like, one of the very first functions called, even before the window is created, nulls and zeroes out all of the minimap's memory. It has me really wondering if older versions of C++ had issues with uninitialized memory creeping in.
Recently I've been delving into Win95 stuff a lot, because I've run out of portable games to run on my locked-down school-provided laptop but QEMU works. Aside from all the fuckery in the depths of QEMU to get everything working, I've been playing some really obscure games. I've particularly been getting into the modding scene for a Frogger game that came out in 1997, which exists for some reason. It's honestly kind of neat how much effort people have put into this. There's an actual dedicated modding tool and everything. Right now I'm working on finishing the unused levels left in the game files. It's surprisingly complicated for a game from 1997. It never ceases to amaze me how much programmers from back then could cram in 700MB.