• Activity
  • Votes
  • Comments
  • New
  • All activity
  • Showing only topics in ~comp with the tag "databases". Back to normal view / Search all groups
    1. How do Reddit, Lemmy, Tildes, etc. process, store, and serve ranked threads/links?

      I'm familiar with how the ranking algorithms work on a high level. What I'm curious about is to how those algorithms are actually applied. How do these platforms actually apply the ranking...

      I'm familiar with how the ranking algorithms work on a high level. What I'm curious about is to how those algorithms are actually applied.

      How do these platforms actually apply the ranking algorithms so that the user sees the threads appropriately ordered? My knowledge is limited to PHP and MySQL, so I'm looking at it through the lense of those systems. I've thought of a few possible ways, but all of them seem pretty resource intensive.

      1. Maintain a table of threads with all relevant information required to calculate ranking, as well as ranking itself. A server side script executing on a routine basis every X minutes (cron job?) updates the rankings on all the threads, so they can be easily ordered. However, people most likely don't care about threads >Y days old, so those can be excluded or automatically deranked somehow.

      2. Maintain a table of threads with all relevant information required to calculate ranking. When a user visits, the last X threads (again, users probably don't care about really old stuff) are pulled out of the database and ran through a ranking and sorting algorithm, reordered and displayed to the user. This seems the most resource intensive?

      I am by no means a professional developer, but I've been dabbling recently and the concept of how these large quantities of data are ranked both perplexes and interests me.

      15 votes
    2. What are your favorite Postgres features?

      You could use any SQL database to create a few tables, insert some data, and do queries, while ignoring anything vendor-specific. But Postgres has a lot of other features and many extensions. What...

      You could use any SQL database to create a few tables, insert some data, and do queries, while ignoring anything vendor-specific. But Postgres has a lot of other features and many extensions. What do you recommend checking out beyond the basics?

      I've used a lot of databases, but it's been many years, so I assume things have changed quite a bit. I skimmed a few PostgreSQL release notes and learned that there's now a MERGE statement that looks pretty handy. (It's standard SQL.) And from Neon's list of supported extensions, the plv8 extension caught my eye. It would let me write stored procedures in JavaScript. Does anyone use that? Do you use stored procedures at all?

      I wonder what Tildes uses?

      (To keep discussion organized, please write about one feature per top-level comment.)

      17 votes
    3. What's a simple, cheap way to run a database-backed website as a hobbyist?

      I use Github and Netlify to run some simple websites for free. It works well. However, I've been thinking of experimenting with a database-backed website for fun and Netlify doesn't have any...

      I use Github and Netlify to run some simple websites for free. It works well. However, I've been thinking of experimenting with a database-backed website for fun and Netlify doesn't have any persistence.

      What's a good way to do this that scales to zero when nobody's using it? I want to be able to forget about it entirely for months or years at a time. When someone visits, it should start up and run on demand without costing me $20 a month on standby.

      Back in the day, I used Google App Engine for this. I learned a lot of datastore tricks to get around its poor latency, but I'm lazy and don't want to do that anymore. I'm pretty sure I want a SQL database and full text search. Either sqlite or Postgres would do, but I doubt there's a cheap enough way to run Postgres.

      Litestream looks interesting and so does LiteFS, except that it's pre-1.0 and I don't know what changes fly.io will make that I have to keep up with. If I used Litestream, I'd have to figure out how to run it and where to store the replication logs.

      Edit: one nice-to-have is being able to easily dump the database and run it locally or on another cloud provider. (I don't anticipate it getting so big that it's impractical.)

      47 votes
    4. Dealing with databases, inserts, updates, etc. in Python

      Current Library: built in sqlite Current db: sqlite (but will have access to Snowflake soon for option 1 below) Wondering if anyone here has some advise or a good place to learn about dealing with...

      Current Library: built in sqlite
      Current db: sqlite (but will have access to Snowflake soon for option 1 below)

      Wondering if anyone here has some advise or a good place to learn about dealing with databases with Python. I know SQL fairly well for pulling data and simple updates, but running into potential performance issues the way I've been doing it. Here are 2 examples.

      1. Dealing with Pandas dataframes. I'm doing some reconciliation between a couple of different datasources. I do not have a primary key to work with. I have some very specific matching criteria to determine a match (5 columns specifically - customer, date, contract, product, quantity). The matching process is all built within Python. Is there a good way to do the database commits with updates/inserts en masse vs. line by line? I've looked into upsert (or inserts with clause to update with existing data), but pretty much all examples I've seen rely on primary keys (which I don't have since the data has 5 columns I'm matching on).

      2. Dealing with JSON files which have multiple layers of related data. My database is built in such a way that I have a table for header information, line level detail, then third level with specific references assigned to the line level detail. As with a lot of transactional type databases there can be multiple references per line, multiple lines per header. I'm currently looping through the JSON file starting with the header information to create the primary key, then going to the line level detail to create a primary key for the line, but also include the foreign key for the header and also with the reference data. Before inserting I'm doing a lookup to see if the data already exists and then updating if it does or inserting a new record if it doesn't. This works fine, but is slow taking several seconds for maybe 100 inserts in total. While not a big deal since it's for a low volume of sales. I'd rather learn best practice and do this properly with commits/transactions vs inserting an updating each record individually within the ability to rollback should an error occur.

      11 votes
    5. How do I convert a dictionary in sqlite3, and I want to convert it to something usable by StarDict. Is that possible?

      I don't know if my question makes any sense, and it's okay to say "No, sorry, it doesn't work like that". I have a (pirated) dictionary that has the words and definitions in an sqlite3 database...

      I don't know if my question makes any sense, and it's okay to say "No, sorry, it doesn't work like that".

      I have a (pirated) dictionary that has the words and definitions in an sqlite3 database format.

      I want to convert this into something that can be used by either Fora, GoldenDict, or StarDict.

      Fora can use StarDict, DSL, XDXF, Dictd, and TSV/Plain dictionaries

      GoldenDict can use Babylon .BGL, StarDict .ifo/.dict./.idx/.syn, Dictd .index/.dict(.dz), and ABBYY Lingvo .dsl source files.

      I've found lots of software that goes the other way - it'll take a dictionary and dump it into an sqlite database. But it doesn't go the otherway.

      Is what I'm asking for coherent, does it make any sense?

      (I'd prefer Windows, or FreeBSD, but at this point I'd install Linux to get it done).

      9 votes
    6. Tell me your worst experience with database performance (as a developer)

      I’d like your help – and your stories! I’m working on an article with a tentative title of “Tales of the Crypt: Horror stories (about your past) where database performance caused a real problem.”...

      I’d like your help – and your stories!

      I’m working on an article with a tentative title of “Tales of the Crypt: Horror stories (about your past) where database performance caused a real problem.” It’s meant to be schadenfreude nostalgia, about your late nights coping with a performance issue (with, hopefully, a happy ending of “…and this is what we did to fix it”).

      So, what happened? Tell me about it.

      I do want to quote you, but we can be oblique about the attribution – especially because sometimes these stories are from a previous employer and do not represent any current affiliation. But I do want the verisimilitude that demonstrates that these tales-of-woe come from real people. As a result, I’m fine with writing, “Kim’s first job was as a mainframe programmer at a hotel chain, where database transactions required tape changes. ‘Yada yada story,’ says Kim, who now is CIO of a Midwest insurance firm.” Real person, but you don’t need to worry about getting anyone to approve your words. (Though if you’re happy with full name, company, and role, I’m even happier; send in a private message if you prefer.)

      I used an ancient example above, but I’m hoping for more recent database performance stories. Ideally some of the “here’s how we fixed it” become practical suggestions for developers who are enduring such a situation today.

      8 votes
    7. Interest in a 0-downtime managed Postgres migration tool?

      I recently discovered how much of a hassle it is to migrate off of Heroku Postgres. Both to keep the user from screwing themselves and as a vendor lock-in mechanism there's no possibility of...

      I recently discovered how much of a hassle it is to migrate off of Heroku Postgres. Both to keep the user from screwing themselves and as a vendor lock-in mechanism there's no possibility of running an external replica for your Heroku Postgres database. Unless I'm missing something (after a bunch of Googling) it seems like there is a market gap for a tool that allows for a seamless migration away from a Postgres DB that won't allow for replicas.

      I imagine that, given some constraints on what queries you allow on your database, you could proxy connections to a managed Postgres database while feeding them simultaneously to a new database. Starting from a snapshot of the old database one could catch the new database up to a live state of the old database and then swap over all connections to the new one.

      Does such a proxy already exist? I'd love to know as I could use it. If not it might be fun to build and a good side-project.

      4 votes
    8. When should you really use a NoSQL database?

      I've always used Postgres. For a medium-sized app that I work on right now it's running great. Were I to seriously need more throughput I'd either shard (no small task I know) or use CockroachDB...

      I've always used Postgres. For a medium-sized app that I work on right now it's running great. Were I to seriously need more throughput I'd either shard (no small task I know) or use CockroachDB (which to my understanding is basically Postgres with built-in sharding and no extension support). Throwing away relationships, constraints, unique compound indices and all of the tools I love that Postgres provides just to get schemaless JSON with high write throughput out of the box doesn't sound like a good deal. But so many people have made the decision to go NoSQL so there must be something I'm missing.

      8 votes
    9. What are the main considerations/downsides to make before adding an index to a column

      I was asked this question in an interview and I said the downsides would be extra disk space used to store it and more disk writes from the database to maintain the index on updates. I asked a...

      I was asked this question in an interview and I said the downsides would be extra disk space used to store it and more disk writes from the database to maintain the index on updates. I asked a friend and he said the downsides are space used in memory to store the index as well as the fact that an index could take days to build on a large dataset.

      What does everyone here think about this question?

      8 votes