14
votes
What programming/technical projects have you been working on?
This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?
I recently completed "building" an AI chatbot as a final project for my coursework.
As most people here would probably say, it left me feeling pretty "meh."
The technical meat and potatoes was pretty basic. Install Ollama and a few models to try, throw together a pretty small chunk of python code to read in some documents and feed the model a prompt, and that's about it.
The part that took longest was selecting a model and fine tuning the prompt. My initial prompt for it may have been more lines than the actual code. I had to attempt to give it instructions that it would never completely follow, because AI is just supposed to have a mind of its own I guess?
Seeing my GPU max out for 1-2 minutes to generate nonsensical responses was a bit depressing. I could have put the same amount of time into training a human support agent and they would work with a higher level of accuracy and cause less frustration to everyone involved.
Ollama defaults to a context size of 2048 tokens, and feeding in a single document, let alone multiple, can very quickly blow through that budget. If your system prompt comes before you feed in the documents, it may fall out of the context window entirely -- then the model reverts to just trying to make up plausible continuations because it can't see the instructions. This is a common pain point with Ollama and might explain the poor performance you're seeing.
I've been bitten by this myself and have avoided Ollama since then. You can configure it to use a larger context size, but since it is just a thin-ish wrapper around llama.cpp, it doesn't add all that much value over using llama.cpp directly IMO.
Interesting, thanks for the suggestions. I have almost zero experience in C++ so that would have extended the time expenditure of this project substantially.
Do you have any examples of anything you've built that performs well?
I've built a few smaller tools, like one that renames scientific papers to a readable title instead of the
<opaque numbers>.pdf
you usually get from downloading them. The first page is run throughpdfttotext
and the LLM extracts the title as the new filename. Simple, but useful -- and almost impossible to do right with just RegEx.I've also used local LLMs for extracting JSON from raw OCR data (you can instruct, and even force, a model to output JSON only). That worked reasonably well, it even caught a few cases where the ground-truth labels I had were wrong. And since the process was automated, I threw multiple different models at it and then looked at examples where the LLMs disagreed.
More about LLama.cpp
LLama.cpp is basically a toolkit for running LLMs and Vision-Language Models -- it can be compiled from scratch, but there are also pre-built packages available, e.g. in Homebrew on Mac and Linux.
If you're on Windows, KoboldCPP, which is a llama.cpp-fork that is geared towards role-playing and creative writing, has the easier setup -- there's a
.exe
-file to download which comes with a graphical wizard for configuration.The software provides an API endpoint that is (partly) compatible with the OpenAI API so that you can use, for example, the
openai
package from PyPI to interact with it once you start the server. KoboldCPP listens on http://localhost:5001/v1/ for API requests by default, and llama.cpp comes with allama-server
executable that has a webinterface and a similar OpenAI-compatible endpoint. Switching a program from OpenAI or OpenRouter to a local server or vice versa generally involves just a two-line change (setting the API base-url and adding or removing the api-key).That said, since you're using Ollama already, the easiest path would probably be to just set the context to a value that's larger than 2048 (if that is indeed the problem). I think you need to create a new modelfile for that, but it's been a while since I played with Ollama.
From what I've seen, LLMs do appear to be pretty successful at extracting text from documents. Your scientific paper titles use case makes sense.
I have to admit, I do find the concept of AI-based document organization and search to be worrying. Not too much on the technical side, but more about societal impact. If an LLM is going to parse through a company's internal docs and give responses, it encourages employees to just dump files in a central location instead of organizing them at all.
Yes, organization of files etc is hard to do well, especially in large companies that haven't done it properly from the very beginning. But I think this will amplify the issue further.
That particular horse has left the barn at least twenty years ago, when GMail decided to rely on search instead of organizing emails into folders. Sometimes it is a better trade-off to spend some time searching for something rather than organizing everything.
I think I get what you mean though. People -- and I definitively include myself here -- will inevitably start trusting the automation (since it works almost all of the time) to a degree that is unwarranted, and then they won't want to or even be able to complete the task on their own.
If been bitten by that too, but Ollama has one big advantage over llama.cpp and that is that it has become the standard for self-hosting LLMs. All client apps I see support Ollama, but almost none of them support llama.cpp, so I just stick to it for now.
I haven’t used llama with it yet, but can you maybe try chunking and vectorizing the documents and then also the input prompt, feed only relevant passages and try it out - basically building a small RAG pipeline. I can imagine it would work quite well
I'm still writing new stuff in my esoteric programming language, Funciton. It uses Unicode box drawing characters and looks a bit like flow charts. Here's what the factorial function looks like:
Each of the function invocations, including the multiplication (×) and even the if-then-else (‽), are themselves written in Funciton. The only built-in operations are bit shifting, less-than, bitwise nand, function invocation and lambdas. The only data type is the arbitrary-size integer.
Here's a YouTube playlist describing the language and then detailing all of the library functionality I've already implemented, starting with simple arithmetic and building up to more advanced structures like lists. The last video at time of writing is regular expressions, which I created about a month ago.
I am now working on a new data structure which I will call “arrays”. On the surface it sounds basically identical to lists, but lists cannot support arbitrary indexing and therefore you can't have binary search. The implementation of arrays is considerably more elaborate than lists. Stay tuned. I estimate the video will come out within the next 2 weeks (but it's a hobby project so I make no promises).
https://esolangs.org/wiki/Funciton
I...what... I can't get how to read any of that.
So I have made some progress but also some setbacks on my audio player. I have all the parts I ordered arrive, and yesterday I did the first battery capacity test, which went poorly. The 14500 battery says it has a capacity of 2500 mAh and lasted 35min. I am for comparison using an old battery bank as a benchmark that is 10000 mAh, and I am at 3:30 runtime, and it has not indicated yet that it is at 75% (only has four indicator lights for charge remaining, all 4 are still on). I was doubtful of the capacity of the 14500 cell, since its capacity seemed to be over double what is normally the max capacity of a 14500, but I thought maybe there has been capacity improvements since whenever the articles I was reading were published. Turns out, probably not and instead it was false advertising so I will need to return them. I am going to look at some 18650 cells in person later this week to get a rough idea for physical size, and then maybe order a charge controller for it and switch to 18650 if it is not too bulky. According to my current capacity tests, I can get at least a 3000 mAh 18650 cell, which should get several hours of playback.
I am also considering switching from running the device with a Pi 3a+ to a Pi Zero 2W, since I do not need the slight performance boost of the 3a+, and a Zero 2W will give me space savings.
I have been working on some other slight improvements to the code as well while waiting for batteries to arrive. I have also been researching transistors, since that seems to be the only way to cut off power via software to my screen backlight. My screen is running on one of the 3.3v output GPIO pins, so I am considering using another GPIO pin and a transistor to handle turning off the screen back light. I have also done some code cleanup and documentation, which will help prepare for some changes and features I want to add in later.
Lithium batteries are rife with fraud (especially on Amazon). I had his luck getting cells that match the advertised capacity from B&H photo if you just need a few.
Yeah, I was suspicious of them when ordering, but they seemed to be comparable to other options I could source. B&H would be a good option if I was not in Canada and therefore tariffs. If I go the 18650 route which I probably will, I can source those locally from a reputable seller.
While waiting for these parts to arrive, I have been starting to chip away at some features to implement and bugs to fix. One of those features was a hold switch, that toggles on/off the clickwheel. This code was relatively easy to implement. The next one, was to reimplement the feature of the screen turning off. The original code had it based on 60s of inactivity the screen will turn off. I decided instead to go with just having the hold switch turn on/off the screen, which was another easy feature to implement. Once those were done, I decided it would be a good opportunity to address the bug of the screen freezing, normally after 60s of playback.
I am starting to think that the original developer implemented the screen turning off to hide this bug, and not to save battery since the screen backlight stays on regardless.The program is using tkinter to provide a graphical interface, so I am now currently researching that and it appears that tkinter freezes the GUI when the thread is being used, so I may have to research multi-threading in python and how well Raspberry Pis handle multithreaded applications.Edit: One of the things I have enjoyed about this project, is in designing the hardware and modifying the software, I get to design the product around what I care about and how I am going to use it. Doing a standard 18650 battery was one of those, as I wanted the battery to be easy to source a replacement. The screen sleep was another one, as when in my pocket, I would have the hold switch on so it makes sense for the screen to sleep, but if I am just sitting in a chair listening to music, I want to see what song is playing. Therefore, I decided to trigger the screen sleep on the hold switch, and not on inactivity
Edit 2: I hooked the RPi to an HDMI monitor, which mirrors my display. The HDMI display was not freezing, when the display I am using was. My configuration is using FBCP Porting, which may be the cause. A quick research indicates that I can use an open source github library that implements FBCP better, so I will experiment with that.
It's a tiny thing, but I registered gotb.li for the TheatreBunch client project I've been working on - I'll use it for shortlinks as well as what's set up now, which is links to the project's website and various social media.
The project is to set up a community for theatres. The first client of NullusAnxietas. Whee!
this is really stupid --- but every day I watch Countdown off the channel4 site. I want to go direct to the latest episode, so why not right a bash script that fetches the URL and generates a little html redirect? Overkill.
run that crap every day after the episode is posted and I'm made in the shade.
I love these kinds of scripts. Just a way to automate a basic task.
Somewhat similar, I've been using a browser extension that lets you make url rewrite patterns to redirect Reddit links to old.reddit, and it doesn't work perfectly. I wish it only a single page to think about. :)
that extension is great. if you’re logged in, you should be able to opt out of the new design. i’ve got another to clean the reddit image urls, which is boss.
edit: clean reddit image urls
I actually use a Firefox extension called Redirector and put my own regex into it. I have been having issues with gallery links so maybe looking at the code of the extension you shared will help with that.
hot damn. I'm gonna use that. I just switched to Firefox a month ago and there are so many little things to adjust to get back to my old Chrome-life :)
this one?
Yep, that's it! There's a ton of them but this one looked... the least sketchy I guess?
its perfect and I really like how it lets you see the output of the regex. That's a great find. I've been using this userScript for ages, but it isn't always perfect and only covers a few sites.
It'd be great to have something like this that had the ability to follow pattern lists -- sort of a uBO for url cleaning.
If you find yourself building regex, I highly recommend using regexr. When I have to do this for work every so often, I use this site and would not be able to live without it.
I've just started reading through the river Wayland compositor's upcoming river-window-management protocol to start working on my own wm for it. I'm planning to write it in Zig, but that's about all I've got done so far :P I use river currently and I think the idea of an entire wm being able to be built into a Wayland protocol is a super cool idea so I can't wait to see how things turn out.
Woah this sounds very cool. Also +1 on zig, I love this beautiful little language. It's so clean and functional (not in the math sense, just very useful and usable). Ngl I've been hyped since Zig 0.6 :D
I have begun to believe that Khronos just likes designing spec objects and functions. I think they might also just enjoy making Vulkan users allocate objects just for the fun of it.
Edited in this paragraph because I realized I forgot an example: I wanted to just bind a storage buffer to basically 0,0 (set 0, bind 0). The juggling of descriptor sets and set pools and layout objects and all that noise to just say "bind buffer x with offset 0 to 0, 0" is crazy. Even though I have it working, it's so crazy that I think I'll ultimately abandon that nonsense and just use a push constant to push a device buffer address.
I'm slightly into a project where my real goal is just to learn some Vulkan stuff through building a silly little thing. The silly little thing I've decided to do is trying to push textureless voxel rendering as far as I reasonably can. The "fun" part will hopefully start once I get into meshing and such. One of my goals with meshing is to not ever have 2 vertices in the same location even though the triangles connected to a single location may have up to 3 different normals and up to 4 different colors. I think this is doable for as long as I carefully lay out nointerpolates.
I also have some really dumb additional ideas that are probably slower because of conversions even though they're amusing from the perspective of trying to minimize data size. For example, could I represent every face in a 256x256x256 cube of colors using 3 bytes for the vertex positions? The issue is that I'd need 257 positions on each axis, but maybe I could squeak by somehow using data from the normals. Helpfully there are only 6 distinct normals in a cube if I assume all the raw vertex data only composes unit-size axis aligned cubes. I think the smallest I could reasonably make each vertex is 5 bytes, and that's using a 256 color palette. Maybe 4 bytes if I drop to 32, but that feels kind of low.
I'm currenlty working on completing Land of Lisp. It's really fascinating so far, and has broken my expectations at almost every turn.
I'm also working on a first person multiplayer cooking video game in godot :D