19
votes
What programming/technical projects have you been working on?
This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?
I started a blog! I've maintained a personal website for a while but never put much more content than a resume on it. I've been experimenting with post formats, trying to see what's the most interesting to write.
Definitely trying to keep it going, but the biggest challenge I've been facing is figuring out what I want to dive into.
When I saw only 3 posts, i thought it'll be a beginner blog. But I was immensely impressed by the depth of your latest post about MFAs. Then, I started to notice the interactivity and the fact that you even made an authenticator inside the post?! (Nice ZomboCom reference too ;)
Great work! How long does it take for you to complete one of the posts?
Thanks!
It depends pretty heavily on the post, the first two were written in about a week each, and the MFA post written in about two weeks. Usually just spend some time after work when I have a lighter work-from-home day, or on the weekend at the coffee shop.
It's all on github, so you can see me struggle to make everything work exactly as I'd like it to haha
Over the past two weeks I have been working on a moderation bot for the /r/history discord server. A few years ago I already made a script that was inspired by automoderator allowing me to set up regex rules and have any match put in an alert channel for the discord moderators.
As it turns out using word/regex matching easily can cause a lot of false positives so while useful the options there are limited.
Two weeks ago I was playing around with the idea of using the openAI API and basically routing all messages on the server through GPT4. This worked really well after I did tweak the system prompt. Unfortunately, the amount of tokens used made it really expensive to run.
I then switched to GPT3.5 hoping it would be nearly as good. For reference:
As it turns out, there is a huge difference between GPT3.5's ability to incorporate context and GPT4. To the point that GPT3.5 even after a ton of prompt tweaking was nearly as useless as the simple word/regex matching. Just looking up the prices I did see that I could have trialed
gpt-3.5-turbo-instruct
which should be better with context (maybe? Information is a bit lacking). But is not as cheap asgpt-3.5-turbo-0125
.I then did switch to a two stage system where after GPT3.5 flagged a message I asked GPT4 to validate it. This does work, but is still too expensive for my taste due to the use of GPT4. Not criminally expensive, but easily several dollars per day if there are a lot of GPT3.5 flags.
Naturally, instead of abandoning the idea, I then switched to a three stage system. Here I used the original regex word match system, feed those matched in GPT3.5, if GPT3.5 agrees I then finally feed it to GPT4 for final confirmation.
Which actually works pretty well!
In the past two days I have also explored discord slash commands which allow moderators to put users on a watchlist. Meaning that any message from a user on that list will directly be fed to GPT3.5 together with whatever reason the moderator provided on adding the user. This is basically intended to be similar to usernotes on reddit. But, discord does not allow for visually identifying users in a way that only is visible for mods. So, we let the bot keep an eye on folks on the naughty list.
For people that are really curious, these are the prompts I am currently using: https://gist.github.com/creesch/91def12accc08980adfb2e1c739def46
I do feel they are a bit on the long side as they are the main token usage. But certainly for GPT3.5 a lot of the extra instructions are needed.
In conclusion, offloading a bunch of moderation duties on LLM's really is feasible. But unless you are flush with cash it starts to become expensive pretty quickly. It also isn't really feasible if you aren't already familiar with the community and all the various moderation challenges.
Anyway, it has been a fun project to play with.
Have you considered looking into any of the myriad of self-hostable large language models- whether they'd behave as capably, whether it would be cheaper to host them on a VPS over using GPT4, whether the maintenance work is worth it?
A lot of work has gone into making tools such as LLaMa run on even very simple hardware, such as the Raspberry Pi. Depending on the amount of content you're piping in, it might be capable enough! Just a thought :)
I briefly considered it but have not explored it for a variety of reasons:
tl;dr Doubts about it being capable enough, time, and mostly convenience reasons.
Edit: Just to make sure my last point wasn't total hyperbole I did some quick looking around. But, most guides I find so far further reinforce my point. They basically are more or less a retelling of the author getting it to work in the first place, not really guides to set it up in some sort of production capacity. I certainly will keep an eye on this space but at the moment I don't think it is worth it for what I wanted to do.
That's fair enough! I outlined a few of these concerns in my comment, not knowing quite how detrimental they were at this point in time, and it looks like you know better than I on that front :)
LLaMa is... not good locally, and definitely not on a Pi. Phi-2 is better, but I wouldn't trust it either. They're just too easily perplexed. Look into api.together.xyz which can infer Mixtral, especially Nous Hermes 2, at around the cost of GPT-3.5 when you count input only. I believe it will perform better, so it may be worth checking out.
You might have had the page open for a while before you replied, But, no so far Mixtral is not doing really doing better. Of the alternatives suggested it seems various anthropic Claude version are fairly capable. But, as far as context goes, specifically history context most models do seem to struggle.
Ah, yes, I have. Too bad. I hope you find something good.
Seems like you've done a lot of work to reduce false positives. Are false negatives not as much of a concern?
Valid question, but no, not nearly as much. False positives in this context have a much higher detrimental effect.
All the flagged messages are sent to an alert channel and still handled by human moderators to figure out what sort of action needs to be taken.
When you have a high positive rate people ignore the channel more often than not, making it a dead tool. Additionally, the amount of false positives with a simple word filter will already be several factors higher anyway, as you can only input those words where you are absolutely sure some shit has hit the fan. Simply because you are trying your best to keep the false positives down as well.
Finally, this is an additional tool next to an active user base pinging mods and mods themselves being part of the community.
I wonder whether a more effective filter than regular expressions would help here.
To be honest, I've only used this on toy projects, but a Winnow classifier with Orthogonal Sparse Bigrams as features was both very simple to implement and fairly accurate; this paper (PDF) describes the approach in the context of email spam.
The basic idea is to track small word groups and adjust their weights upwards or downwards whenever the classifier is wrong about a message. That way, you might be able to train a cheaper alternative to GPT-4 using GPT-4 feedback over time.
Possibly, though I have my doubts in the context of history as the subject. Over the years I have tried a bunch of various filters (though not this one) and they all struggle very much with historical context. To the point that you might as well stick to regex for obvious cases .
Aside from that, I am not looking to implement such a thing from scratch. So if I'd want to explore it as an option, are there other resources available other than research papers?
Ah, okay. Then you've probably tried Naive Bayes filters already, and those have somewhat comparable accuracy.
I mostly mentioned the algorithm because it is so easy to implement, and it deals with context better than the usual implementation of Naive Bayes filters because it uses short phrases instead of single words for classification.
There does not appear to be a well-maintained, ready-to-use implementation of it however, so the point is moot.
Try OctoML’s hosted API running the Mixtral model. I’ve found it to be highly reliable and of similar quality to GPT-4 at a tiny fraction of the cost. The API is the same structure as the GPT-4 API so it makes it a drop in replacement.
It costs $0.0003 per 1k tokens for input and $0.0005 per 1k tokens for output.
Having done some experiments through openrouter which included various Mixtral models. So far none of them are able to do (much) better than GPT3.5 and certainly not GPT4. Specifically, if it involves the context of discussing history they simply seem to struggle.
Great find. I recommended Together for this above, but if your numbers are right, then OctoML is even cheaper. And I have little doubt Mixtral would excel at this task.
Have you tried anthropic‘s Claude? it’s quite effective and cheap. Also, code compatible with your existing code if you use openrouter so dead easy to try.
I haven't, I could give it a go but I am having a hard time finding pricing? I can find a pdf that deals per million tokens, converting that to the openai prices suggests that it is either the same price or more expensive.
https://openrouter.ai/docs#models
OpenAI's pricing has gotten more competitive recently, but still the best model at Anthropic is only $0.008 per 1k prompt token and $0.024 per 1k output token. There are also some ultra cheap models at openrouter that you might want to try for prefiltering.
I didn't realize openrouter is a service, I assumed it was a library of sorts. This is pretty neat.
Unfortunately Claude doesn't seem to be able to respond in JSON format making it not usuable for my purposes. I am going to play around with other models available though!
Claude can respond in JSON though? I use it all day long via langchain. How did you try to use it?
As suggested through my current openAI implementation which specifically sets JSON as the response type. Other models do fine, cloud provides a markdown response with the json somewhere in a code block.
Ah, I’d recommend trying out langchain and its pydantic LLM integrations. It does a great job of making all major LLMs (including Claude and OpenAI) behave nicely with their JSON output.
Heh, I'll check it out. Though I do feel like the sales pitch moved somewhat from the initial promised code compatibility ;)
To be fair, other models work just fine so I have been able to test a bunch without much of a code change. It's just Claude not playing nice.
Edit:
Just tried a bit more, v2 doesn't play nice. V1-instant does actually place nice so I am going to play around with that one for a bit.
How do you get access? I tried to sign up a while ago but got no answer. (I'm just a hobbyist, though.)
Going to Anthropic direct is fussy, but I just use openrouter.ai which resells it.
Huh, I hadn't heard of them and didn't know that was a thing. Is reselling something that the AI vendors are aware of, or is this a gray-market kind of thing?
Most of those models are definitely optin from the vendors, you can even sell your own model through them quite easily. I don’t know if that’s the case for all of them.
I've been working on a basic webpage with JavaScript to make a self-care self-assessment quiz based on a cool self-care worksheet I found online. I want to share it on Tildes when I finish. It's not technically impressive, it might just be fun to take and talk about the self-care categories and how we're doing in them.
Currently I'm stuck on styling the 1 thru 5 scale of radio buttons nicely. Bleh, dealing with CSS and HTML.
I thought it would be cool to see if I could make a graph of the user's results, like the autism pie chart spectrum graph, using JS. I don't know how to do that yet.
It's remarkably challenging to get some inputs working just right in HTML- I was very surprised to see there's not any kind of native numeric range selector.
For graphing, I'd highly recommend looking at chart.js. It's not perfect, but it's much easier to use than d3 and has a lot of options for customization.
Wow, this chart.js is so cool, thank you! What is d3?
It's pretty much the go-to for really advanced data viz tasks, if you're wanting to break beyond the confines of a "chart" it's what most people would use.
Here are some examples from the creators: https://observablehq.com/@d3/gallery
Very cool! I got chart.js to work, so I am very excited. Thank you! 😁
I just released a chill block stacking game for the Vision Pro! Didn’t know any Swift 6 months ago, and built the whole thing in RealityKit. It’s my first real dev project that I’ve seen through to an actual finished product, so I’m super proud of it. It was a huge learning endeavor, but I feel so much more comfortable developing for Apple platforms now. Definitely worth the effort!
Looks awesome! I don't have or intend to get a vision pro, but that looks nicely polished for only ~6 months of Swift. Nicely done!
I'm curious if it is possible to play on a non-flat surface. Does the vision pro accurately model a slope that can be used with your physics? Could you play on top of a ball?
Hah, I do think if I used immersive mode and factored in the room scan, it could work, but right now I have the game situated in a Volume to let the player reposition it easily. I feel like most spatial games would benefit from using this instead of surface detection, as the latter doesn't allow multitasking while the game's open.
Damn that looks really really good, I want to get into VR coding as well but don't have anything to test it on right now haha
I've been working on upgrading the motherboard in my Framework laptop. It's been a great experience and it's so wonderful that there's a laptop manufacturer out there that has this kind of upgradability. They even make a case for the old motherboard so I can use it as a micro server.
This week I had been working on my complex plotting made with fortran. I got it to where it outputs tiffs. I also got it to where it compresses the tiffs too. It uses zlib bindings for fortran to do so. That was fun. The zlib bindings were not up to my standards at first, and not really packagable. So I worked on the that project's makefile to have it build a shared library, and added pkg-config integration. I also added an install target, so now it can be packagable. So all of that resulted in a pull request for those bindings, and after that for a package to be added to the aur. That was a fun detour that gives back. My next steps will be fully object orientifying my tiff writer. After that my next step will be adding logluv support, as currently floats compress very badly.
I'm building an expense tracking app, similar to Splitwise. See my older posts on these threads for more context.
After a 3-day slump (mostly because my partner bought The Lost Ruins of Arnak, and we just had to hyperfixate on it for the week) I finished the group logic and now you can have multiple groups with separate expenses each.
I now need to add members to each group, and create the logic for the "who paid, who owes what" part of the app. For now my partner and I just need "X paid, split equally" and "2+ people paid, split equally", but I'll be sure to add more options as we approach v1.0.
Before I dive into new features, I need to rinse the spaghetti a little - mostly just refactoring to better separate the UI from the business logic, but also reworking the API to be a little nicer.
Right now I have three classes/structs, one for the top-level state/model, one for the groups, and one for the individual expense. Each with its own set of methods.
The problem comes when I want the UI to call a method on an expense (i.e, edit the cost, or the name). Right now I call a method in State which does validation, then calls a method of Group which calls a method of Expense.
I'm struggling to figure out which approach is better. Is this cascading from method to method alright, or should I just expose the relevant fields directly and let the initial State method mutate it? I dunno, I could see benefits for both. The first is more encapsulated, each class has its own methods and you can't change data without calling the methods. The second one is more clearer and concise, but you mutate fields directly, which is a bad practice in traditional OOP circles. Rust isn't traditional though, and most Rust codebases I've seen picked one style and stuck with it, but I've seen both approaches done. I'll research more.
I also want to start dogfooding this ASAP so I can catch bugs and realize which features I need and what I don't. Also figuring out the UX, right now I created the most basic UI ever, and I think I want to add a bit more sparkle to it so it doesn't look like 1996 called to get its design back.
Tangentially, does anyone have any recs for free budget apps. I just started working and thinking of making my own excel sheet or doing it on pen and paper as my expenses are still very trackeable right now
I don't have any, but my partner made an Excel for his personal expenses and he's been very happy with it, so I would say it's worth exploring.
https://actualbudget.org/
Essentially YNAB, but free and open-source. Though you need to host it yourself.
I wrote a script that prompts OpenAI's multi-modal
gpt-4-vision-preview
model to (lossily) convert PDF pages to Markdown via OpenAI's API. GitHub repo here: https://github.com/zyocum/pdf2mdCouple of little projects:
I can't stop watching this little program do its thing: pong wars
I'm really digging this little equation plotter in 100 lines that can dynamically adjust the ticks on the x- and y-axis of a plot as I pan around and zoom in and out on a multitouch screen.
Both using a parsimonious and trustworthy (not just open source, but easy to build) software stack. The part I've added is a pimple on the backside of a gnat perched atop a mountain.
I've regained the motivation to work on my homelab again and I'm really happy with the progress I've made. Using ansible to orchestrate podman containers and it's going great!
Still working on my personal blog / forum / wiki software. (Working name: Keeper.)
There's more to do, but at this point I'm happy with the code that syncs drafts between browser and server. I added event handlers to autoload on page visibility and autosave when focus is lost, and it just works when switching tabs. Handling edit conflicts is pretty basic, but at least the user is warned, and for now these are just conflicts between different windows where the same user is working, so good enough.
Except... what if the reason that focus is lost is that the user clicked on a link? And suppose the network is slow, so it doesn't complete before page reload? It seems like a race condition.
I thought about hacky ways to handle it, but I'm going to go with avoiding page reloads. At least, switching from edit to preview shouldn't be a page reload.
For the coming two years frames of the movie Star Wars episode 8 will be shared hourly through this Twitter account
I'm looking forward to some nice frames and maybe people can see some things they appreciate