7
votes
What programming/technical projects have you been working on?
This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?
I've been dabbling with GIS stuff lately, and it turns out to be a pretty fun rabbit hole to explore.
I'm planning on picking morel mushrooms this spring. It turns out to be insanely lucrative. Last year I was able to pick 5lbs+ per hour, at a price of $25/lb. Essentially I'm getting a senior IT wage to hike through forests and climb mountains in some of the most beautiful wilderness in the world, and that was just my first year with no experience. Some people walk away from this with 6 figures cash in a season of picking.
Morels grow like crazy in places where there was a forest fire the previous year. Last year, British Columbia had an insane fire season, with 465 wildfires covering 869688 hectares. That's an area almost the size of Puerto Rico, or more than 3x the size of Hong Kong. Here's what that looks like.
Finding morels involves knowing where these fires are, what sort of geography you're dealing with, elevation, temperature, precipitation, tree types, soil drainage, solar exposure, etc. Mushroom hunters are traditionally a very secretive bunch, and their knowledge is passed on through word of mouth. I think I've found an edge though; I can use my geek powers to download vast amounts of data and bring those into the forest with me.
So I've started making a map. A very detailed map. I've downloaded more than 50GB of datasets from a dozen different sources, covering everything from mean spring temperature to tree species to grizzly bear habitats. I have topographic data down to 5 metre resolution, aerial imagery from 4 different sources, down to 0.5m resolution in some cases. I've set up a caching proxy server called mapproxy so that all the normally online data is stored on my laptop for when I'm in the forest without internet.
This whole project has turned out to be a much deeper rabbit hole than I expected. It's tested my skills in python, sql, graphic design, and a number of other fields. It's been really rewarding, too. It was a neat moment when some of this clicked for me. Like for example, elevation data is distributed by the BC government as a giant 500MB .TIF image. It's just an image file showing a greyscale picture of the province, but each grey level corresponds directly to an elevation in metres, so that can be used to generate accurate topographic contour lines or hillshading. Climate data is the same, and you can use vector analysis to calculate the average temperature within a polygon and things like that. I've been able to overlay that info onto the map for each fire, like so.
It's just astounding how much data there is here, and it's been a lot of fun digging into it. OpenStreetMaps data for instance, shows things like individual park benches, and metadata about whether or not they have a backrest. I was able to download data about grizzly bear habitats, overlay that with the fire outlines, calculate area times population density for the overlapping areas, and output that as a skull and crossbones on the map showing estimated number of grizzly bears in that area. Here's what that looks like. I also got fairly depressed by some of this data, like the sheer amount of deforestation that happens in this province. Before, and after. It's really disheartening to see exactly how little old-growth we have left. There's virtually nowhere in this province that hasn't been logged or mined already.
Anyway, I've been working on this for a week and I've gone from zero GIS knowledge to the Dunning–Kruger point where I realize how far in over my head I am. I'm learning shitloads about a topic I previously had no interest in, and it's absolutely fascinating.
Definitely one of the coolest projects I’ve seen on here in a while! Thanks for sharing.
Happy to bounce ideas around if you get stuck on the data front - though GIS isn’t my standard fare either.
So, I've run into a bit of a hurdle on the data front. Yesterday I stumbled across some data that was too tantalizing to ignore, but it's proven to be moderately annoying to completely impossible to deal with. I hope by writing this out I'll figure out a solution, but I'm also looking for any advice anyone can give.
I found a FTP server that has post-burn aerial imagery for most of the fires in BC last year, but they're in the form of enormous GeoTIFF files. The smallest fires are something like 2GB, but the ones I'm interested in are more like 32GB, split into 16x 2GB files. I can add these to QGIS directly, but naturally it grinds to a halt.
My solution to this has been to make an image pyramid out of the GeoTIFF images, which provides lower resolutions for lower zoom levels as well as splitting everything into a bunch of smaller PNG tiles that compress better than TIFF. These tiles can then be served by mapproxy's WMS server that QGIS connects to.
I've got this working, but it's been tedious. For a fire with 16x GeoTIFF images, I need to make 16 separate image pyramids, then 16 layers and cache sources in mapproxy's yaml config file, all with the correct bounding boxes.
So, I've tried using gdal_merge to take all the giant 2GB GeoTIFF images and tile them together into one enormous one I can feed to gdal2tiles to create the pyramid. I did this on a smaller fire, it works, great! So now I can automate this, let's make a crummy bash script:
Aaaand it crashes immediately as it tries to allocate 32gb of ram to create the merged GeoTIFF. I only have 16gb in this laptop, so the oom killer kicks in and kills my process. I don't know what to do except create a big swap partition and let it thrash my ssd. I'm not really sure how to deal with such enormous images, let alone the hundreds of them I'm going to have to deal with if I want all this imagery on my map.
Bleh. I think I'll go for a walk and try and think about something else for a bit.
Hmmm that is tricky. I don’t know enough about TIFFs in particular as a format to know the viability of any larger-than-memory strategies. Got any samples you can share? (Same format is ok - doesn’t have to be the specific locations you’re interested in). Hard to give pointed advice without knowing the image compression characteristics and how the GIS metadata is encoded.
First thought: allocate 32GB swap, create the massive combo images, then pass them through ImageMagick’s minifyImage routine (reduces pixel density by 1/4), or just see if you can reencode the TIFF to under 16GB with better compression flags. Then tile as normal once you can fit the base image in mem. But I don’t know if traditional image processing routines will impact the geodata…
Barring that, happy to do crunching locally on workstation and send you the results if you want to package the process up into a self-contained script. Away from home at the moment, but returning some point tomorrow.
The offer of processing power is compelling, and really generous of you. I'd be tempted to take you up on that, but I ended up realizing having 16 different pyramids for a single fire isn't so bad. I can import them as one layer in qgis, and I fleshed out my bash script to iterate over all the files generate the mapproxy configuration for all the layers, grids, and caches automatically.
I was mostly just combining the tiles into one enormous file for the logistics of not having to deal with so many individual layers by hand, and that's the only part that needed absurd amounts of ram.
Now I can just put all the tiles in their own directory and go:
for a in *; do ./maketiles.sh $a; done
and it'll process everything. It looks like it'll take a good long while to complete, perhaps days, but at least I don't have to do anything by hand. The bottleneck is now the fact that I have to use an external hard drive since I can't fit this all on my SSD.Awesome, glad you found your fix!
As said here, i am still working on Musyca, a software that stores the music you listen in a SQLite database. Nothing changed, just some fixes here and there.
But now Musyca has a companion: Kolekti. It is a software written in Go that has a barebones TUI where you put start date and end date and it outputs a table of your top artists, songs or albums. It is still a work in progress so for now it only shows topArtists.
I need to work on the TUI interface and add a selection for artist, song or album. I'm still a bit lost working with this kind of visual library.
I've been working a bit more on my archive tool mentioned last week, and I'm getting ridiculous compression rates with it. I posted this there, too, but god damn, I feel like I'm doing something wrong because I'm getting ridiculous rates for lossless compression, and this is just the first step. Also dove into researching and trying to "restore" slightly artifacted JPEGs, but I don't feel like it would be too practical despite my datahoarding heart screaming at me.
That being said, I have so many project ideas popping up lately, it's not even funny.
Due to reasons, I had to reboot my VPS for the first time in a long while. As a result, I have about 5 new project ideas. Probably my fault for treating my server as a pet instead of as cattle, but it's not like I'll learn anything from it :V
Due to not dealing with Rails for... almost a decade now, I spent far more time than necessary trying to get my current booruwiktionary up and running again. Between various gem updates, updates to the booru itself, and lots of updates via
pacman
that probably didn't get to be tested because the server itself was running so long, I succeeded. Or, at least I thought I did. Browsing everything works, but the fact that I can't log in and can't upload new content means that I'm pretty damn frustrated. I'm legitimately thinking of working on my own booru again (holy crap, this was pre-pandemic? I really should get around to it), something that can compile to something self-contained instead of dealing with weird issues like this.Due to the above rebooting issue, I've also been thinking about writing my own wiki software. Probably won't go anywhere, since the only real issue I currently have with it is that Semantic MediaWiki is slightly borked, the main page throws out an error, but only if logged in. Logged out users can still see it fine. I can probably get around that by either updating SMW or removing it completely.
I personally feel like if you have less than 3 servers, treating them as pets is Good Enough™ (maybe aside from a small install script for ssh config and the like)
All the advice and tools I've seen for "cattle" behavior (infra as code, cloud setup stuff, etc) seems to be aimed at businesses and other scenearios where you can accept the extra complexity in exchange for teamwork and/or scalability and stability promises that are (i think) "overkill" for most "hobby" work. (Except for learning and experimentation)
(Its very quick and simple for me to just
docker-compose pull
and get an update done or config changed, rather than editing an Ansible script and waiting for it to slowly realize it can skip a majority of the script)Just have good backups (which i am still delaying doing and will probably bite me in the ass sometime) and the rest doesnt really matter IMO.
I found this from your comment in the most recent technical projects thread and just wanted to say that personal booru software is neat and it would be really cool to see some competition for hydrus.
From what I remember, Hydrus is a desktop booru, which isn't quite what I want to create (at least at the time of writing; I can definitely see myself trying to create something similar if I ever got the time to do so, and if I do eventually get used to GUI coding). One of my goals with the booru project was to be able to share links to images so that others could browse and access them, like a regular booru.
Oh! Yes, it is, and I think I forgot that most boorus aren't. Web based is also very cool, and I guess makes your leaning toward Elixir/Phoenix for it make more sense.