19 votes

What programming/technical projects have you been working on?

Automatically posted June 11

This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?

16 comments

Weldawadyathink
June 12
Link
I spent all last weekend in an ADHD dopamine fueled coding session working on my website redesign. For background, it is Audiobookcovers.com. One of the best subreddits is /r/audiobookcovers. The...
- Exemplary
I spent all last weekend in an ADHD dopamine fueled coding session working on my website redesign. For background, it is Audiobookcovers.com. One of the best subreddits is /r/audiobookcovers. The people there find or create artwork and assemble it into audiobook cover art. There is some pretty fantastic artwork there. Well reddit search is awful, and a few years ago I wanted a way to fix it. So I have been working on this website as a hobby project since then.

The original version of the site was some terrible python scripts hosted on pythonanywhere.com. It used a google cloud API to OCR the images and had fulltext search through either mysql or elasticsearch (can't remember which). It was very clunky, but I learned quite a lot. I have since rewritten it many times with different technologies. The currently hosted version is using Nextjs on Vercel, and is what I used to learn react. It uses openAI's CLIP model for the search, in a supabase postgres db.

If you haven't heard of it, CLIP is a really cool AI model. Ultimately it just generates embeddings. An embedding is just a fancy name for a vector, which is a fancy name for a long string of numbers. For example (111, 386, 99) is an embedding. Specifically, it is a 3 dimensional embedding. A CLIP AI model is a collection of two separate models, one for images and one for text. If you gave it a picture of a dog, and the text "a dog", the embeddings you get from the two models will be similar. That property is called "mapping to the same vector space". This property is incredibly useful for image search. You first get a list of images, run the image model on each, and store that list of embeddings. Then, when a user wants to search, you take their search term and run it through the text model. Then you take the search embedding and calculate the distance from all of the image embeddings, and show the closest ones. For those of you wondering, this isn't directly related to the GPT LLMs that are in the news nowadays. The CLIP model was first released before chatgpt existed. And, while it does take some computing power, it doesn't take anywhere near the power needed for an LLM, even a very small one.

This elegantly fixes a ton of problems with traditional image search. My original website used OCR, so you could only search for the text on the image. And if the OCR messed something up, it would be very difficult to search for. And if the image has no text, it just wasn't searchable. With this AI search, finding text from images pretty much just works. But you can also describe images, and that works too. Searching for "blue dragon" will give you images of the Eragon cover. It's a very cool technology.

Anyway, I tested multiple setups for this. The model needed a decent amount of ram (for a cheap server, 2-3gb). So keeping it running constantly on a basic VM host would be costly. I also tried a bunch of different on demand services, that will start your docker image when a request comes in. The big issue was cold starts. One of the best services was google cloud run, but starting the machine took 30+ seconds, and keeping a machine alive all the time would cost more than I wanted to spend. So I ended up on Replicate. It has the same cold start problem, but their public models are shared between customers. So if you use a commonly used model, you should never have cold starts. This worked fantastically for the current iteration of the website.

The issue with replicate is costs. Shortly after I coded this, I went back to school, and stopped working completely. The bills started being higher than I budgeted, so I set a $4/month spend cap. Unfortunately that meant the website would just stop working at some point each month once the spend cap hit. I was hoping to pay for this cost with basic google adsense, but I made about $1.50 in the year I had ads on the site, so that definitely didn't pan out.

That is all to say I have been working on a redesigned system that wouldn't use Replicate for quite a while now. Previously I was focused on changing the stack (Deno + Fresh), so I only tried to solve the replicate problem recently. Somehow I stumbled on Fly.io. I knew of them previously, but I didn't think they were a reasonable solution to my problem. If you aren't aware, they are a virtual machine hosting provider that is quite good at low-traffic websites, but can scale up very easily. But they don't just give you a VM and let you set it up from scratch, like ec2 or digitalocean. You give them a docker image, and they run that as a full virtual machine. I previously thought this seemed backwards. The benefit of docker is that they are fast to start, and don't have the overhead of a full virtual machine. Why would I want a full VM instead? Well they aren't using a traditional hypervisor, they use a technology called firecracker. It's the same tech that powers AWS lambda. It is built on Rust, and allows you to run very lightweight virtual machines with almost no overhead, and far better isolation between machines than docker provides. They somehow make cold starts on these machines incredibly quick; they are quick enough to be almost not noticeable even for a webserver. They also can suspend instead of stop them, which I think is similar to hibernate. You pay for compute and memory when the machine is running, but when it's off, you only pay a tiny amount for the root filesystem storage space. Very cool technology.

Also, as part of this rewrite, I wanted to look into newer CLIP models. There are a few, but the ones I am interested in are Apple's MobileClip. Apple released a series of CLIP models that are free to use. Unlike other models, they are designed to run on mobile phones, so they are faster to run than other CLIP models. I assume these are what power the iPhone's Photos search interface. They have 5 versions, s0, s1, s2, b, and blt. s0 is the fastest, and blt is the highest quality but slowest. In order to compare, I needed to regenerate the embeddings for my entire list of 5000 images with the new models. My original embed with openAI clip running on replicate cost about $2.50. It's a one-time cost, so I didn't think much of it. With the mobileclip models running on replicate, the entire embedding pass for all 5 models cost around $0.75. Needless to say, I am sticking with fly.io and not using replicate again. I think replicate is probably a good service for larger LLM or diffusion models, but for a small model like CLIP, its just the wrong tool for the job.

I am still in the process of comparing the models, but I really like the results so far. I found one search where the openAI model brought up very weird results. I don't know if the embeddings in my database got messed up or something. All of the mobileclip models perform better than the openAI model, and run much faster and more cheaply. I think that s0 is going to be suitable for my needs, but I am still planning to test more.

Anyway, this rewrite is also a rewrite of the website as well. I am using Deno as a runtime. Throughout my learning javascript, react, and web development, I know one thing for sure: typescript is essential. Typescript is my favorite language by a huge margin, and plain javascript is one of my least favorites. Learning TS has actually made me dislike Python because of it's lackluster type system. Before this rewrite, I made a Factorio mod, and have been working on a Playdate game, both of which use Lua. I think I like Lua, but I miss typescript's type system all the time. ts-lua would be such a fantastic project. So I want a runtime that has good support for typescript. I have never been impressed with ts-node or tsc, because setting them up was a lot of work. I really like Bun, but I have settled on Deno. With Deno 2.0 node compatibility, I don't see any reason to use Bun or other alternate runtimes. And for the website, I have been using the Fresh 2.0 alpha. It is a much different way of building a website than Next or client-side react, but, for the right project, I think it works well. With the rewrite, the new site is almost exclusively server rendered. I use a small bit of client-side preact to swap the images when they are loaded with the blurred placeholder images. If you have more than a tiny slice of client interactivity, Fresh is probably a bad framework, but for Audiobookcovers.com, it seems to work well.

Also as part of this, I wanted to stop using Supabase. I currently am using it as a plain postgres database. I like it as a way to host a cheap database, but I wanted to try hosting my own database. I have an Oracle Cloud free tier VM that has a 4 core ampere CPU, 24GB ram, and 200GB storage. This seemed like a perfect candidate for a database server which could be used for all my future projects. So I had to learn how to setup postgres, pgbouncer, and pgbackrest myself. I had done this once before, but ran into some issues. This time, I had help from Gemini pretty extensively. It hallucinated quite a bit, but I was able to work through the hallucinations and get it working. I also think I have it secured quite well. You have to connect through pgbouncer, have to use ssl, and have to use a scram-sha-256 password. I hope that is everything I have to worry about. SSH is only possible through tailscale or an ssh key that lives in 1password. Once you know all the pieces that go into it, it isn't terribly difficult to run postgres.

I also needed a new host for the images themselves. I currently use backblaze with fastly cdn in front. For the rewrite, I am trying to avoid services that increase price significantly to get out of the free tier. The fastly free tier includes $50/month of cdn traffic, but if you exceed that, you have to pay at least $50/month. Same issue with supabase, where if I exceed their limits, I have to pay at least $20/month. I am happy to pay, but I don't want the $0 to $20+ jump if my website gets a bit more traffic than usual. In my testing with fly.io, I found tigris data. It's an s3 clone built on top of fly.io. Storage costs about 3x more than backblaze b2, but has a global CDN and unlimited free egress. Migrating to them was very easy, and I didn't have to set up a CDN in front, which was nice.

My current iteration uses Drizzle for my database. It's a lightweight typescript wrapper for sql. You write functions that look very similar to sql in typescript, and have the benefit of full type safety. For example, a simple db lookup looks like this:
```
const [result] = await db
        .select({
          id: image.id,
          extension: image.extension,
          blurhash: image.blurhash,
          source: image.source,
        })
        .from(image)
        .where(eq(image.id, input))
        .limit(1);
```
Very slick, which is why I used it in the first place. But for more complex queries, it can get annoying to use quickly. For the embedding search, I need to order the results by a calculated column. The only way to do that in Drizzle is with a subquery. Here is what that query looked like:
```
const sq = db.$with("sq").as(
        db
          .select({
            id: image.id,
            extension: image.extension,
            blurhash: image.blurhash,
            source: image.source,
            similarity: cosineDistance(image.embedding, query.embedding).as(
              "similarity",
            ),
          })
          .from(image)
          .where(eq(image.searchable, true)),
      );

      const dbResult = await db
        .with(sq)
        .select()
        .from(sq)
        .where(lte(sq.similarity, similarity))
        .orderBy(sq.similarity);
```
Not very slick. The way I saw it, I wanted to just use basic sql, because I knew how to do that query in sql. But I needed to have type safety (see above). I considered building my own wrapper around a postgres node api that would validate the results with zod. But I did some searching, and someone already build exactly that: slonik. You give it a template string sql query, and a zod validator for what the result should look like. Exactly what I wanted! That exact same query looks like this in slonik:
```
const results = await pool.many(
    sql.typeAlias("imageDataWithDistance")`
      SELECT
        id,
        source,
        extension,
        blurhash,
        ${model.dbColumn} <=> ${JSON.stringify(vector.embedding)} as distance
      FROM image
      WHERE searchable
      ORDER BY distance ASC
      LIMIT 24
    `,
  );

// The zod validator is defined in another file:
imageDataWithDistance: z.object({
      id: z.string().uuid(),
      source: z.string(),
      extension: z.string(),
      blurhash: z.string(),
      distance: z.number(),
    }),
```
So much nicer! (Some of you who aren't familiar with modern typescript may be worried about sql injection, but that isn't an issue here. JS template functions use foo``, and send that template to the function foo. It gets the template string and the functions within the template, and assembles it itself. So a smart sql api can use that to convert it into a properly parameterized sql query.) Deno has a compatibility issue with node-postgres, which slonik uses internally. That took a lot of debugging to file a bug report, but I finally found a workaround for slonik that someone else posted on github.

This rewrite has been a ton of fun! I am looking forward to getting my image downloader and the frontend website finished, but I am hopeful it will be soon!
14 votes
derekiscool
June 11
Link
I was recently laid off from Mocrosoft and have been having almost no traction in my job search - so I've decided to start building a website full of short little tech demos. The first thing I've...

I was recently laid off from Mocrosoft and have been having almost no traction in my job search - so I've decided to start building a website full of short little tech demos.

The first thing I've been working on is creating and showcasing a charting library built with WebGL. It's quite ambitious for just a tech demo, but it's enjoyable work so far, and it seems you really need something extra to stand out in this job market!

15 votes
lynxy
June 11
Link
Designing and printing mine own minirack (10"), to hold some audio and networking equipment in one or two rooms. Because it's a fun challenge, and because I just can't force myself to use somebody...

Designing and printing mine own minirack (10"), to hold some audio and networking equipment in one or two rooms. Because it's a fun challenge, and because I just can't force myself to use somebody else's solution. Ever.

8 votes
earlsweatshirt
June 13
Link
I’ve been working on a big-screen optimized UI for the iPad version of Surfboard. (Current version for comparison) I’m pretty happy with the UI itself, just have a laundry list of bugs to fix...

I’ve been working on a big-screen optimized UI for the iPad version of Surfboard.

(Current version for comparison)

I’m pretty happy with the UI itself, just have a laundry list of bugs to fix mostly related to the fact that adding/removing a column causes the width of a scroll view (e.g. the feed) to change. It’s definitely one of those moments where I kinda wish I had just done the app in UIKit. It’s also been kinda eye opening going back to the project’s code (some of my earliest SwiftUI work, and pre many improvements in newer SwiftUI versions) and seeing a lot of room for performance improvements. Surely some day I’ll get around to doing that too 😆

4 votes
[9]
Parou
June 12
Link
Still working on my Virtual Tabletop. It's progressing nicely. I have like 2 or 3 major features left to have it ready for a first game in our group. It turns out the UI stuff I was pushing till...

Still working on my Virtual Tabletop.
It's progressing nicely. I have like 2 or 3 major features left to have it ready for a first game in our group.

It turns out the UI stuff I was pushing till now was actually the easiest part, as it is pretty simple to translate the screen cursor position back to the VTT cursor position and drop tokens and other things in (it's of course not really dropping it in, just creating a copy of the image that follows the cursor in the window, then when the mouse button is released, deletes it again and send a command to the server and ask it to add the actual token to all connected clients)

The interface looks pretty nice already. Thankfully we stepped back into simple flat designs with a rounded corner here and there about 10 years ago after all those gradients, shadows and stuff.

I fixed a lot of things in the past two days that I was trying to figure out on my own in the beginning after having some quick thoughts at night.

I'm looking forward to everything.

3 votes
1. [4]
  jmpavlec
  June 12
  Link Parent
  Sounds really neat, do you have any screenshots to share? I'm curious to see what it looks like.
  
  Sounds really neat, do you have any screenshots to share? I'm curious to see what it looks like.
  
  2 votes
  1. [3]
    Parou
    June 12
    Link Parent
    I do, actually! Keep in mind that this is my first time working with HTML5 Canvas and that excessively with JavaScript in general, as well as my very first time fighting through Python for the...
    
    I do, actually! Keep in mind that this is my first time working with HTML5 Canvas and that excessively with JavaScript in general, as well as my very first time fighting through Python for the websocket server. It has been about 2 weeks since I started working on this and only in the last few days finally got to the UI part!
    
    Screenshot 1 (GM View)
    Screenshot 2 (Player View)
    
    3 votes
    
    [2]
    jmpavlec
    June 12
    Link Parent
    That's fantastic progress for 2 weeks using some tech you don't know too well. Are you using any AI to help out? I find it quite useful for languages I'm not that proficient in. What js library...
    
    That's fantastic progress for 2 weeks using some tech you don't know too well. Are you using any AI to help out? I find it quite useful for languages I'm not that proficient in.
    
    What js library are you using for websockets? I use websockets for my game website as well. Amazing how simple it makes it to add some interactivity to the frontend across multiple browsers.
    
    What kind of state management are you using on the frontend?
    
    2 votes
    
    Parou
    June 12
    Link Parent
    I'm actually using all standard stuff. Standard websockets in JS, standard websockets library in Python, etc. I don't use AI, in fact I disabled the AI features in my editor first thing when I...
    
    I'm actually using all standard stuff. Standard websockets in JS, standard websockets library in Python, etc.
    
    I don't use AI, in fact I disabled the AI features in my editor first thing when I installed it on my new system some time ago. A friend talked to me about this stuff tho, didn't look into it much yet. I'm back at writing code from years of not doing anything because of health and other factors. First ever mechanical keyboard definitely helps me stay motivated in comparison to the last few years.
    
    4 votes
2. [4]
  Gazook89
  June 12
  Link Parent
  What is the motivation behind creating a new VTT? Which have you tried that you didn’t like or came up short?
  
  What is the motivation behind creating a new VTT? Which have you tried that you didn’t like or came up short?
  
  1 vote
  1. [3]
    Parou
    June 13
    Link Parent
    Our group used Roll20 until about 2.5 years ago, when they updated something that caused a widely existing issue with extremely high GPU usage, that they tried to address a few times, but failed...
    
    Our group used Roll20 until about 2.5 years ago, when they updated something that caused a widely existing issue with extremely high GPU usage, that they tried to address a few times, but failed at doing so, leaving everyone with that issue unable to use their VTT (Giant-ass thread in their forums). It affected our entire group.
    
    Then for the last 2.5 years, we used Owlbear, which is great, but certain things are still annoying as a user, especially the low space on free plans if you accumulate multiple ongoing campaigns (would be an easy thing to pay for a subscription, but in our group everyone GMs a lot, so everyone would need to pay for one.)
    
    We tried Foundry and others, but their focus seems graphical effects that keep the GPU usage as high as a video game during idle on the screen.
    
    My goal here is primarily to bring back the easy of use and performance with barely any GPU usage, especially if nothing is actively moving, as Roll20 had in the past, to my group.
    
    Bonus points for the fact that almost all of us have a homeserver running in some way, so we can easily have the Python server running 24/7 and split our games up to entire servers even.
    
    2 votes
    
    [2]
    Gazook89
    June 13
    Link Parent
    Is it for dnd, or another system, or system agnostic? Are you trying to make something that is appealing to a large audience or mostly just bespoke for your games? In the past couple weeks I also...
    
    Is it for dnd, or another system, or system agnostic? Are you trying to make something that is appealing to a large audience or mostly just bespoke for your games?
    
    In the past couple weeks I also became aware of Fey-Gate, each is a solo-ish dev project
    
    1 vote
    
    Parou
    June 13
    Link Parent
    Oh, that one looks pretty interesting. Thank you for the link. Mine is going to not be focused on any specific system. I will add basic mod / extension support though, which I might use to add...
    
    Oh, that one looks pretty interesting. Thank you for the link.
    
    Mine is going to not be focused on any specific system. I will add basic mod / extension support though, which I might use to add some character sheets for Call of Cthulhu and others.
    
    I'm mainly working on this as a hobby project for our group, but will make it public later down the road.
    
    1 vote
doors_cannot_stop_me
June 12
Link
I'm working on (among 20 other projects) a light source build. It's effectively a funny-shaped flashlight with really specific button inputs, but with a funky overall shape for its application....

I'm working on (among 20 other projects) a light source build. It's effectively a funny-shaped flashlight with really specific button inputs, but with a funky overall shape for its application. I've managed to hand-solder everything it needs for a prototype (LEDs, resistors, switches) and 3d printed an ugly shell and it works!

Now I (a complete novice when it comes to electronics) am going to have to learn how to design a PCB so I don't have to keep doing it the way I know how (ugly soldering and a half-day turnaround on assembly). I've picked KiCad for now, and hopefully I don't eff it all up!

3 votes
CrypticCuriosity629
June 17 (edited June 17)
Link
So this is a bit of a creative project as well as a software project and hardware. I'm creating a swiss army knife of a USB drive, with Windows, Mac, and Linux recovery and penetration testing...

So this is a bit of a creative project as well as a software project and hardware.

I'm creating a swiss army knife of a USB drive, with Windows, Mac, and Linux recovery and penetration testing capabilities, multiple live OS's, and a bunch of portable tools.

BUT in the theme of the Relic Chip from Cyberpunk 2077.

I recently bought a micro USB to use in the drive, and I've already 3D Printed a prototype shell for it.

The idea is to have 3 partitions. One to hold Ventoy and the ISOs, one for encrypted persistent storage for the ISOs, and one to be used to store portable apps and software, as well as general multi-OS storage.

When I plugged it into my linux, the USB popped up as VenderCoProductCode. This wasn't the partition, this was like the name/info of the device itself.

Because I'm neurotic I set out to change this information, so I fell down a rabbit hole of how to flash the USB with custom controller firmware so I can change all the device descriptors to something Cyberpunk themed. Ended up nuking the block map, so spent a very long time with poorly translated mass production software to link the physical NAND pages to the disk sectors.

But I ended up getting it to work, so now as far as any computer is concerned, the USB drive is manufactured by Arasaka and is the RELIC_BIOCHIP_XCX-34092.

While doing so I realized I can flash LED controlling firmware and settings as well onto the drive's firmware, so I'm actually really curious about these other leads on the drive to see if they can be connected to an LED to make the Biochip light up. If anyone knows, let me know! I might try to experiment.

Now that the device descriptors have been altered and it's been re-flashed with the physical memory mapped correctly, I'm going to be partitioning it and gathering all the resources that will go on the USB, and naming the partitions in theme with everything else.

Eventually I want a 1 stop shop for all computer troubleshooting and, if needed, penetration testing tools.

3 votes
whs
June 12
Link
Tried a few things First, work gave me Cursor, which I already told them I'm not submitting AI code in my code. So I decided that if they gonna give me one anyway I might as well pick Copilot...
Tried a few things

First, work gave me Cursor, which I already told them I'm not submitting AI code in my code. So I decided that if they gonna give me one anyway I might as well pick Copilot since it's cheaper for the company, and Microsoft doesn't need to pirate their own VS Code extension store unlike Cursor. Yesterday I had this idea that maybe I'll checkout libdatrie, an open source double-array trie library and try using o3-mini to find vulnerabilities in it, just like somebody already did on the full o3.

Well, turns out you can't do that because Copilot do block outputs matching public code, and since libdatrie is public you can't use Copilot to work on it. Normally you'd disable that feature, but since it's company-provided I can't disable that. I tried o3-mini with Aider instead (my work provide API keys as well, which was my work project for the past few weeks) but it didn't find anything useful.

Second, I improved my Thai LLM evaluation - WeiXiaoTien. There's this ads from early 2000 with a confusing family tree told to confuse the viewers, so that you'd need to use their ointment. I put the story into various LLM and ask it to chart out the family tree as JSON.

I got a comment, probably from one of the Thai-tuned LLM author that this put models that doesn't do JSON at disadvantage. It's like it's trying to test two things - whether it can get family tree right, and whether it can do JSON. From what I learned, the "standard" way of doing LLM evaluation is to feed LLM text output along with solution into another LLM and ask it to return a score. I don't think I want to use this approach, as the expected results are fixed and I only need machine-readable results to grade it.

I tested a few approaches
1. Pydantic AI (ab)use LLM Tool Use. LLM can call Tools to aid it. This is how MCP works, or if you use Agent mode in Cursor/Copilot it is how the LLM edit things. If you use Kagi AI with web search, the web search is a Tool. Internally, LLM receive a list of tools, along with its expected parameters. Pydantic submit just one tool to the LLM - output. It force the AI to always call output with the correct parameters schema. If the LLM doesn't behave (eg. it doesn't call tools, it use the wrong schema) then it throw the error to LLM and ask it to try again. However, models do need to be trained to be able to use tools, which models from smaller makers are not. (Even Google's current Gemma3 do not officially support tools)
2. Structured Output. How LLM work is that it generate a list of possible next tokens and its probability, then pick one at weighted random. People figured out that you could filter the list of tokens so that the output follow a schema. For example the first output for a JSON object must be either whitespace, or [. Ollama says that after a few tokens the LLM should understand that it is speaking JSON and should flow more naturally. The limitation of this technique is that the models do not receive JSON schema and field descriptions. I assume that it also may have break the thinking process - like if you want to say "A is Uncle of B" but at the Uncle token the runtime force you to choose from Mother or Sister and you can't undo that you already says "A is" so you're forced to choose from wrong answers. Some LLM like to think before answering, and since you're forcing them to speak JSON they now can't think in the way they was trained. Finally, JSON schema is a complex system and different LLM providers have different levels of support. Some, not at all. (Most do support JSON output, without schema, however)
Finally I realize I could've just use the same old technique... Just ask the model to speak the answer normally, then use a fixed model that support Tool use to converts the original text output into JSON. The problem with this technique is that some smaller models output like a madman's writing, which should be disqualified since no human would accept that output, but LLM always try to understand even a madman's writing. Also, even though the conversion LLM do not receive the problem statement, some models do think loudly about the problem and leak the problem to the conversion LLM anyway (which is probably how phi4-reasoning get very high points).

Anyway the result is that Claude Sonnet 4 got the best result this time (the very first version, GPT 4.5 nailed the problem) followed by Claude 3.5 Sonnet, Qwen Max (not open weight), GPT 4.5, ChatGPT 4o, O3, Phi4 reasoning (for the reason above), Claude Opus 4, Gemini 2.5 Flash, Grok 3, DeepSeek R1 0528 (the best open weight model). For local LLM, Gemma3 27b scored the most.

Interestingly, I have IQ2_S version of OpenThaiGPT 72b model and it did perform better than Qwen3 32B Q4_K-M.
2 votes