What programming/technical projects have you been working on? - ~comp

[6]

Weldawadyathink

August 21, 2024

Link

To get some more experience with Postgres and sql, I have been trying to put the ssn data leak in a database. First attempt was with sqlite, which didn't work out very well. sqlite is great, but a...

To get some more experience with Postgres and sql, I have been trying to put the ssn data leak in a database.

First attempt was with sqlite, which didn't work out very well. sqlite is great, but a 350GB database was just too much. More precisely, my patience was too little. All queries would peg one core at max, and lock the entire database, as expected. I spun up a Postgres server for the better concurrency.

Next was trying to clean the data with postgres. Getting it in wasn't too much work, it only took a couple hours. But the csv wasn't properly formatted. The last column is the ssn, and the 3 columns before that are people's aliases. Some of the aliases have commas in them, and they are not quoted properly. So importing straight to columnar data wasn't possible. I imported everything as text, and worked out some regex to extract the needed columns. Should have worked just fine, just needed to process it. After waiting 12 hours to pull one column from the dataset, I decided that this wasn't the correct route forward.

Now I wanted to clean the dataset with other tools and import straight into postgres. I decided to clean it with Google bigquery. I worked with bigquery for analytics in my previous role, so I knew how powerful it could be. I signed up for a GCP trial for free credits and downloaded the torrent into a VM. That download/transfer took quite a bit, but I finally got everything set up. Bigquery was able to import the 350GB of text files in a few seconds. From there, it could run 12 regex per entry and output everything as a new csv in about 15 minutes!

Now those csv files are imported into postgres on my laptop, and postgres is churning away making some indexes. Next I want to try out a project I found called evidence. It uses a flavor of markdown to create static generated websites of analytics data. I have been using powebi for years, and using something that is entirely code will be refreshing.

6 votes

[3]
archevel
August 21, 2024
Link Parent
Could take a peek at doing the data extraction/cleaning using plain old awk. It is quite fast, but a bit esoteric :)

Could take a peek at doing the data extraction/cleaning using plain old awk. It is quite fast, but a bit esoteric :)

2 votes
1. [2]
  Weldawadyathink
  August 21, 2024
  Link Parent
  I might play around with that, thanks for the tip. Do you know how it compares to sed as far as speed? I used sed a few times to remove some lines that caused problems, and sed itself would take...
  
  I might play around with that, thanks for the tip. Do you know how it compares to sed as far as speed? I used sed a few times to remove some lines that caused problems, and sed itself would take probably 45 minutes to an hour to run through one of the csv files without even doing any real processing.
  
  1 vote
  1. archevel
    August 21, 2024
    Link Parent
    I would guess sed would be faster in most cases. Awk is more of a programming language for processing text files whereas sed is more of a way to edit the content of some stream. Then again I...
    
    I would guess sed would be faster in most cases. Awk is more of a programming language for processing text files whereas sed is more of a way to edit the content of some stream. Then again I suppose this will depend on what you actually do. Maybe an awk program can be faster if you can replace some massive regex you're using with sed.
    
    1 vote
[2]
kej
August 21, 2024
Link Parent
Evidence looks pretty neat, and thank you for saving me some time on my own leak analysis by telling me that the next several things I was going to try didn't work.

Evidence looks pretty neat, and thank you for saving me some time on my own leak analysis by telling me that the next several things I was going to try didn't work.

1 vote
1. Weldawadyathink
  August 21, 2024
  Link Parent
  No problem! I wish you luck. If you want a copy of my cleaned files, shoot me a pm.
  
  No problem! I wish you luck. If you want a copy of my cleaned files, shoot me a pm.
  
  4 votes

skybrian

August 21, 2024

Link

This week I published my repeat-test library. Getting started documentation is pretty minimal, though, so I don’t expect it will get attention. Next up: writing a tutorial and an explanation about...

This week I published my repeat-test library. Getting started documentation is pretty minimal, though, so I don’t expect it will get attention. Next up: writing a tutorial and an explanation about why anyone should care about it.

3 votes

[3]

first-must-burn

August 22, 2024

Link

Been learning how to set up a static website with AstroJS and deploy on Cloudflare pages. Pretty painless and, importantly, free! The template I started with is pretty good visually, but a real...

Been learning how to set up a static website with AstroJS and deploy on Cloudflare pages. Pretty painless and, importantly, free!

The template I started with is pretty good visually, but a real hodgepodge under the hood, so I'll be slowly cleaning it up / normalizing with TS components, etc.

One of the priorities with this website is accessibility, so I've been learning about and implementing the various guidelines, testing and tuning tab orders, and running it through screen readers. It's actually not too bad once you get things set up, but I can't imagine doing it successfully in any environment where I have anything less than complete control of the page structure.

3 votes

[2]
brogeroni
August 22, 2024
Link Parent
Exciting! What's your experience with ASTRO so far? Do you like it?

Exciting! What's your experience with ASTRO so far? Do you like it?

2 votes
1. first-must-burn
  August 22, 2024
  Link Parent
  Most of my recent frontend experience is in vue.js, and so far I'd say it compares favorably with that. I know one can use vue and other framework components with Astro, but for now I've been...
  
  Most of my recent frontend experience is in vue.js, and so far I'd say it compares favorably with that. I know one can use vue and other framework components with Astro, but for now I've been doing vanilla Astro for a static mpa site. I find it, if anything, easier to pick up than vue.
  
  The documentation was really thorough and readable. It was worth actually reading a lot of it start to finish before I started coding. Most of the time, this is not the case, and I muddle along in a tutorial and get into the docs for specific questions.
  
  The component file structure in Astro feels like the Vue structure in useful ways. I like the way you can type the components and pass arguments into them. I wish the argument props pattern were a little more explicit, not just one statement chunked among the code, but the vs code tools do good type checking, so it's okay.
  
  This was my first experience with jsx (or whatever you call the mixed-mode HTML+JS. I am still not quite sure when I am writing one or the other (e.g. js curly braces inside a js block delimited by curly braces). But I expect that will get easier with time. I find it visually messy, but I can see why people like the flexibility. I imagine it will take me some time to adopt a set of conventions for it that feel readable and maintainable.
  
  I find the compiler pretty painfully slow. The dev server is responsive enough once it starts, but the build takes way longer than it feels like it should for a five page site. But it's definitely usable. I like the way the dev server annotated elements with file and line number and leaves everything unminified.
  
  This is only my second crack at using Tailwind, which is not required or even part of Astro, but it seems like a pretty common pairing. It feels like we've come full circle doing inline styles again (but it is much more concise). Using it for components makes a lot of sense because most of those styles are one-off and closely tied to specific elements, so being able to edit in situ, as well as passing style class lists as arguments to customize the styling of specific component instances makes a lot of sense. And you can drop back to defining your own styles if needed. For example, I have one called nicefocus that makes an element have consistent highlight on focus during keyboard navigation. I can tweak that in the style sheet and update everything at once across all components. But then I can locally adjust the padding and margin of specific elements without having to dig through a style sheet to find the named style.
  
  My other CSS framework project used Bootstrap. Bootstrap is more component oriented, but there are several component libraries available for Tailwind. From a development perspective, I don't see much difference (both are just class annotations), but the fact that Tailwind compiles only what you use seems much better than downloading a giant Bootstrap package just to get flex Layouts.
  
  2 votes

hellojavalad

August 22, 2024 (edited August 22, 2024)

Link

I had a house fire a few months ago that upended everything in my life. I haven't coded outside of work (I work with FileMaker, so not what most people think of when thinking about programming)...

I had a house fire a few months ago that upended everything in my life. I haven't coded outside of work (I work with FileMaker, so not what most people think of when thinking about programming) since then. I've been majorly depressed the last two weeks and don't have any hobbies post fire (waiting on insurance to cover items I need to resume things). While I do game, nothing I play is terribly mentally stimulating. I am trying to code a bit in my free time to help knock me out of my funk.

I had done the Roguelike tutorial earlier this year and just decided to run through it again. I'd like to do some simple game dev as a side project and I am drawing up plans for my own roguelike. As indicated above, I'm not a traditional programmer, but I am familiar with Python enough to be dangerous. It is really nice to work with an actual proper programming language and a real IDE.

I will also note that I am trying to get into Starfield modding as well, but the Creation Kit baffles me a bit sometimes. I don't know if there is something wrong with my installation, but sometimes I load it up and get tons of warnings. Other times I do it and it's fine. And it crashes a lot. I know game engines can be big and complicated beasts, but I haven't needed to ctrl+s every five minutes worrying about crashes to desktop since I worked with Photoshop on a a low-end Windows XP high school PC.

3 votes

gil

August 23, 2024

Link

I've been working on something interesting over the past weeks. I need to find specific text in 2.3 million images. The problem is that running Tesseract OCR takes a few seconds, sometimes...

I've been working on something interesting over the past weeks. I need to find specific text in 2.3 million images. The problem is that running Tesseract OCR takes a few seconds, sometimes minutes, depending on the complexity of each image. So I wrote a small Node.js server that controls that queue and I'm distributing the work between a few devices I have at home. The server stores the OCR text in a SQLite DB and does the text matching logic as well. Love this kind of little project!

I've considered running it in some cheap cloud, but free is better than cheap and I can wait a few days anyways. I would also have to store TBs of files somewhere, open my server to the internet, or make the server send work to the clients instead of clients calling the server.

3 votes

[3]

archevel

August 21, 2024

Link

I recently cobbled together a small bash utility for easily saving bash commands. I've multiple times tun into the issue with having several shells open then closing them only to have the history...

I recently cobbled together a small bash utility for easily saving bash commands. I've multiple times tun into the issue with having several shells open then closing them only to have the history "forget" some subset of commands I've run in one of the shells. So I made:
https://github.com/archevel/stored_command

It really just stores your previously executed command in a hidden sub folder where you wrote it. This is so it retains some of the execution context (come to think of it I might want to capture the environment variables as well). Neat little utility and useful in my case for saving some regularly used curl where I don't want to memorize and type out a bunch of arguments.

2 votes

[2]
ShroudedScribe
August 21, 2024
Link Parent
If I understand your readme correctly, if you only have one stored command, rsc will execute that command without confirmation? If so, it would be useful to make that a preference in a config...

If I understand your readme correctly, if you only have one stored command, rsc will execute that command without confirmation? If so, it would be useful to make that a preference in a config file. I personally don't like to execute commands blind, but I can also see where someone would prefer that functionality.

1 vote
1. archevel
  August 21, 2024
  Link Parent
  Good suggestion! I also figured I should probably print out the command executed so it's a bit clearer what's going on.
  
  Good suggestion! I also figured I should probably print out the command executed so it's a bit clearer what's going on.
  
  2 votes

Banazir

August 23, 2024

Link

I'm working on the same stuff I was two months ago but I've made some incremental progress. On the lab side, I ended up not needing a reverse proxy at all (sorry @Mendanbar, I can't give...

I'm working on the same stuff I was two months ago but I've made some incremental progress.

On the lab side, I ended up not needing a reverse proxy at all (sorry @Mendanbar, I can't give suggestions like you asked). I was in the middle of setting up fast reverse proxy and noticed I couldn't ping out from any of my servers. That's how I discovered that somehow none of them had a gateway set up. I did that, and suddenly they knew where to find the router and where to listen, which meant now the ports are now considered open. Now I'm slowly working on figuring out what I really want to do with the lab - is it purely for learning or will it have some personal utility as well? I'm leaning toward a mix of both, so I'll probably set up Samba as a domain controller and then figure out other services as I go.

I did finish my MD2 hash algorithm, although less successfully than I hoped. I think I'm running into a problem with how Lua represents numbers, and since I can't force it to use integers it means I'm going to get different results than if I used another language. Either way, I've ironed out every other bug I can find and tests run smoothly, it just... doesn't work. I'm planning on trying MD4 and MD5 next because they're more complex variations of the same algorithm, and hopefully I'll gain some insight there. If not, I might have to rewrite everything in ~~Rust~~ C just to guarantee that I'm not running into issues with the language.

2 votes

talklittle

August 21, 2024

Link

Recently started moving iOS Swift code into a separate "Common" package that can be shared between projects. I went with a Swift package generated using $ swift package init. Packages generated...

Recently started moving iOS Swift code into a separate "Common" package that can be shared between projects. I went with a Swift package generated using $ swift package init. Packages generated this way are smaller (fewer files) and easier to move around if needed, compared to a Framework project generated in Xcode, as far as I can tell.

There were some annoying gotchas with Xcode, e.g. the same Common project can't be opened alongside an app project at the same time, not the usual way. Nor two app projects depending on the Common package at the same time. Instead you must create a Workspace to open the two/three projects simultaneously.

I think I saw somewhere there's a possible workaround, like importing the Swift package as a file:// URI instead of as a local filesystem package, but I stopped messing with it when I got it working, instead of trying every possible setup.

1 vote