• Activity
  • Votes
  • Comments
  • New
  • All activity
  • Showing only topics with the tag "sql". Back to normal view
    1. Overwhelmed with the realm of data exploration (datalakes, AI, plus some c-level pressure)

      Hi all, I have been tasked with the gargantuan task of understanding and eventually implementing what is effectively turning our database into an all-knowing human. What they want at the base...

      Hi all,

      I have been tasked with the gargantuan task of understanding and eventually implementing what is effectively turning our database into an all-knowing human.

      What they want at the base level is to be able to open up a chat bot or similar and ask "where can I put an ice cream shop in <x region of our portfolio>?" And the result should be able to reason against things like demographics in the area, how many competing ice cream shops are in the area, etc.

      They also want it to be able to read into trends in things like rents, business types, etc., among many other "we have the data, we just don't know how to use it" questions.

      You may be sitting there saying "hire a data analyst" and I agree with you but the ai bug has bitten c-level and they are convinced our competition has advanced systems that can give this insight into their data with a snap of a finger.

      I don't know if this is true but regardless, here I am knee deep in the shit trying to find some kind of solution. My boss thinks we can throw everything into a datalake and connect it to chatgpt and it will just work, but I have my reservations.

      We have one large database that is "relational" (it has keys that other tables reference but they rarely have proper foreign keys, this is a corporate accounting software specifically for commercial real estate and was not our design and is 30 years old at this point) and we have a couple of smaller databases for things like brokerage and some other unrelated things.

      I'm currently of the opinion that a datalake won't do much for us. Maybe I'm wrong but I think cultivating several views that combine our various tables in a sensible way with sensible naming will help to give AI a somewhat decent chance at being successful.

      My first entry point was onelake + powerbi + copilot, but that isn't what they're looking for and it's ridiculously expensive. I then looked at powerbi "q&a" which was closer but still not there. You can do charts and sums and totals etc but you can't ask it introspective questions, it just falls on its face. I don't think it was designed for the type of things my company wants.

      I have since pivoted to retrieval augmented generation (rag-ai) with azure openai and I feel like I'm on the right path but I can't get it to work. I'm falling face first through azure and the tutorials that exist are out of date even though they're 3 months old. It's really frustrating to try to navigate azure and fabric and foundry with no prior understanding. Every time I try something I have to create 6 resource group items, permissions left right and center, blob stores, etc, and in the end it just...doesn't work.

      I think I'm headed in the right direction. I think I need to make some well formatted views/data warehouses, then transform those into vector matrices which azure's openai foundry can take and reason against in addition to the normal LLM that 4o or o1 mini uses

      I tried to do a proof of concept with an exported set of data that I had in a big excel sheet but uploading files as part of your dataset is painful as they get truncated and even if they don't, the vectorizing doesn't seem to work if it's not a PDF or image etc.

      I need to understand whether I'm in the right universe and I need to figure out how to get this implemented without spending 10 grand a month on powerbi and datalakes that don't even work the way they want.

      Anyone got any advice/condolences for me? I've been beating my head against this for days and I'm just overwhelmed by all the buzz words and over promises and terrible "demos" of someone making a pie chart out of 15 records out of the contoso database and calling it revolutionary introspective conversational AI

      I'm just tired 😩

      20 votes
    2. What are your favorite Postgres features?

      You could use any SQL database to create a few tables, insert some data, and do queries, while ignoring anything vendor-specific. But Postgres has a lot of other features and many extensions. What...

      You could use any SQL database to create a few tables, insert some data, and do queries, while ignoring anything vendor-specific. But Postgres has a lot of other features and many extensions. What do you recommend checking out beyond the basics?

      I've used a lot of databases, but it's been many years, so I assume things have changed quite a bit. I skimmed a few PostgreSQL release notes and learned that there's now a MERGE statement that looks pretty handy. (It's standard SQL.) And from Neon's list of supported extensions, the plv8 extension caught my eye. It would let me write stored procedures in JavaScript. Does anyone use that? Do you use stored procedures at all?

      I wonder what Tildes uses?

      (To keep discussion organized, please write about one feature per top-level comment.)

      17 votes
    3. Dealing with databases, inserts, updates, etc. in Python

      Current Library: built in sqlite Current db: sqlite (but will have access to Snowflake soon for option 1 below) Wondering if anyone here has some advise or a good place to learn about dealing with...

      Current Library: built in sqlite
      Current db: sqlite (but will have access to Snowflake soon for option 1 below)

      Wondering if anyone here has some advise or a good place to learn about dealing with databases with Python. I know SQL fairly well for pulling data and simple updates, but running into potential performance issues the way I've been doing it. Here are 2 examples.

      1. Dealing with Pandas dataframes. I'm doing some reconciliation between a couple of different datasources. I do not have a primary key to work with. I have some very specific matching criteria to determine a match (5 columns specifically - customer, date, contract, product, quantity). The matching process is all built within Python. Is there a good way to do the database commits with updates/inserts en masse vs. line by line? I've looked into upsert (or inserts with clause to update with existing data), but pretty much all examples I've seen rely on primary keys (which I don't have since the data has 5 columns I'm matching on).

      2. Dealing with JSON files which have multiple layers of related data. My database is built in such a way that I have a table for header information, line level detail, then third level with specific references assigned to the line level detail. As with a lot of transactional type databases there can be multiple references per line, multiple lines per header. I'm currently looping through the JSON file starting with the header information to create the primary key, then going to the line level detail to create a primary key for the line, but also include the foreign key for the header and also with the reference data. Before inserting I'm doing a lookup to see if the data already exists and then updating if it does or inserting a new record if it doesn't. This works fine, but is slow taking several seconds for maybe 100 inserts in total. While not a big deal since it's for a low volume of sales. I'd rather learn best practice and do this properly with commits/transactions vs inserting an updating each record individually within the ability to rollback should an error occur.

      11 votes
    4. Recommendations to learn SQL?

      I read the AskReddit thread on "What costs less than $100 that changed your life?" (link unavailable since I'm at work) but someone responded "SQL" - jobs just open up that make a ton of money. I...

      I read the AskReddit thread on "What costs less than $100 that changed your life?" (link unavailable since I'm at work) but someone responded "SQL" - jobs just open up that make a ton of money.

      I did a cursory search on Indeed and holy moly they were right -- SQL jobs get easily 2x what I make now. I'm pretty good at Excel and that sort of thinking, so I was thinking I'd try taking a class.

      Do yall have any recommendations as to a good course to take in SQL, preferably online, preferably free or cheap? I'm willing to pay a bit if it'll mean I can make a lot more, but I'm currently not making a ton, haha.

      Any responses welcome, including ideas as to how to break into like, tech-oriented fields as well.

      9 votes