Overwhelmed with the realm of data exploration (datalakes, AI, plus some c-level pressure)
Hi all, I have been tasked with the gargantuan task of understanding and eventually implementing what is effectively turning our database into an all-knowing human. What they want at the base...
Hi all,
I have been tasked with the gargantuan task of understanding and eventually implementing what is effectively turning our database into an all-knowing human.
What they want at the base level is to be able to open up a chat bot or similar and ask "where can I put an ice cream shop in <x region of our portfolio>?" And the result should be able to reason against things like demographics in the area, how many competing ice cream shops are in the area, etc.
They also want it to be able to read into trends in things like rents, business types, etc., among many other "we have the data, we just don't know how to use it" questions.
You may be sitting there saying "hire a data analyst" and I agree with you but the ai bug has bitten c-level and they are convinced our competition has advanced systems that can give this insight into their data with a snap of a finger.
I don't know if this is true but regardless, here I am knee deep in the shit trying to find some kind of solution. My boss thinks we can throw everything into a datalake and connect it to chatgpt and it will just work, but I have my reservations.
We have one large database that is "relational" (it has keys that other tables reference but they rarely have proper foreign keys, this is a corporate accounting software specifically for commercial real estate and was not our design and is 30 years old at this point) and we have a couple of smaller databases for things like brokerage and some other unrelated things.
I'm currently of the opinion that a datalake won't do much for us. Maybe I'm wrong but I think cultivating several views that combine our various tables in a sensible way with sensible naming will help to give AI a somewhat decent chance at being successful.
My first entry point was onelake + powerbi + copilot, but that isn't what they're looking for and it's ridiculously expensive. I then looked at powerbi "q&a" which was closer but still not there. You can do charts and sums and totals etc but you can't ask it introspective questions, it just falls on its face. I don't think it was designed for the type of things my company wants.
I have since pivoted to retrieval augmented generation (rag-ai) with azure openai and I feel like I'm on the right path but I can't get it to work. I'm falling face first through azure and the tutorials that exist are out of date even though they're 3 months old. It's really frustrating to try to navigate azure and fabric and foundry with no prior understanding. Every time I try something I have to create 6 resource group items, permissions left right and center, blob stores, etc, and in the end it just...doesn't work.
I think I'm headed in the right direction. I think I need to make some well formatted views/data warehouses, then transform those into vector matrices which azure's openai foundry can take and reason against in addition to the normal LLM that 4o or o1 mini uses
I tried to do a proof of concept with an exported set of data that I had in a big excel sheet but uploading files as part of your dataset is painful as they get truncated and even if they don't, the vectorizing doesn't seem to work if it's not a PDF or image etc.
I need to understand whether I'm in the right universe and I need to figure out how to get this implemented without spending 10 grand a month on powerbi and datalakes that don't even work the way they want.
Anyone got any advice/condolences for me? I've been beating my head against this for days and I'm just overwhelmed by all the buzz words and over promises and terrible "demos" of someone making a pie chart out of 15 records out of the contoso database and calling it revolutionary introspective conversational AI
I'm just tired đ©