21 votes

Is it possible to easily finetune an LLM for free?

Posted August 28 by cuteFox

Tags: ask.advice, artificial intelligence, language models.large, free

so Google's AI Studio used to have an option to finetune gemini flash for free by simply uploading a csv file. but it seems they have removed that option, so I'm looking for something similar. I know models can be finetuned on colab but the problem with that is it's way too complicated for me, I want something simpler. I think I know enough python to be able to prepare a dataset so that shouldn't be a problem.

7 comments

[6]
SloMoMonday
August 28
Link
Can't speak specifically to Google's environment and if you could provide an example of what that CSV contained, it would be helpful to figure out what fine tuning you are looking for. But to put...

Can't speak specifically to Google's environment and if you could provide an example of what that CSV contained, it would be helpful to figure out what fine tuning you are looking for.

But to put it mildly, Easy Fine Tuning is one of the many big problems with the current architecture of LLMs and speaks to many, many different topics. There are a lot of parameters in the current retail models and any potential user needs to nail down exactly what we want these tools to do so we're not wasting compute.

If you want to go down the rabbit hole that has consumed the last 2 years of my free time, your fine tuning methods range from simply setting up an interaction template to Reinforcement Learning Tools to specializing models and reducing your computation requirements with Parameter Efficiency strategies like Low Rank Adaptations narrowing down the range of paremeters your model considers or just reducing memory requirements by quantization. Its like saying you want to fine tune your car. Is it the fuel injection or suspention or breaks or your seat position.

If you want to fine tune, my guess is that you have a very specific use case you want to build for. While Google and the other big AI providers have very expensive hardware running massive models, I'd seriously reconsider if you need all the parameters these models have scraped off the entire internet. A smaller model could be run off your own hardware and gets you out of an unhealthy Google dependency. A smaller model is also less prone to collapse over longer contexts and you have a far easier time identifying where errors come from.

12 votes
1. [5]
  cuteFox (OP)
  August 29
  Link Parent
  The CSV was literally just pairs of questions and answers, like one column should contain questions and other the answers. the primary reason is I want to finetune is to customise the style of the...
  
  The CSV was literally just pairs of questions and answers, like one column should contain questions and other the answers. the primary reason is I want to finetune is to customise the style of the response, I don't think I can finetune on my GPU with only 6 GB vram. and as I said in my post, I don't want anything complicated, just very simple fine-tuning, I don't even care if they train on my data as I wouldn't upload anything personal
  
  1 vote
  1. [3]
    creesch
    August 29
    Link Parent
    Is the style of the response so specific that you can't do this by adjusting the system prompt? In AI studio it is called "system instructions" and most chat interfaces allow you to define on in...
    
    is I want to finetune is to customise the style of the response
    
    Is the style of the response so specific that you can't do this by adjusting the system prompt? In AI studio it is called "system instructions" and most chat interfaces allow you to define on in some way or another.
    
    You can have all sort of stuff in there, but also style guidance. For example, here are the specific things I have in most system prompts as far as style goes
    
    Follow these rules in all responses unless the user explicitly overrides them: - Avoid emojis, filler, and hype. - Avoid persuasive "marketing" language, buzzwords (e.g., "revolutionary," "game-changing"), and exaggerated claims. - Don't offer unsolicited praise or compliments. - Aim for a neutral, somewhat casual tone. Use contractions (e.g., "it's" instead of "it is"), shorter sentences, and everyday vocabulary. Avoid overly formal or academic language unless necessary. - Don't use em dashes (—) in writing.
    
    It takes some experimentation to get right and different models respond better to different instructions but generally it is quite effective for general style adjustments in the responses.
    
    4 votes
    
    [2]
    cuteFox (OP)
    August 29
    Link Parent
    I know about system prompt, but I found fine-tuning to work much much better, I'll give the system prompt another try though
    
    I know about system prompt, but I found fine-tuning to work much much better, I'll give the system prompt another try though
    
    2 votes
    
    thumbsupemoji
    August 29
    Link Parent
    Gemini especially is extremely adaptable & eager to please—get a burner gmail account & check out the /r/chatgptjailbreak page, I'm not sure you could need it to do anything that it won't already do.
    
    Gemini especially is extremely adaptable & eager to please—get a burner gmail account & check out the /r/chatgptjailbreak page, I'm not sure you could need it to do anything that it won't already do.
    
    2 votes
  2. SloMoMonday
    August 29
    Link Parent
    Think I have an idea of what that tool is. A CSV upload does not seem right for System Prompts or templates. But it feels like the perfect interface for people that needed to tune their system...
    
    Think I have an idea of what that tool is. A CSV upload does not seem right for System Prompts or templates. But it feels like the perfect interface for people that needed to tune their system using data sets pulled from conventional data stores. And even then it wouldn't be a full model tuning because thats a money and time furnace.
    
    My guess is that the file uploader was a streamlined LORA trainer. LORAS work by allowing you to almost nudge an LLM towards specific behaviour using specialized training datasets. In prompting or system variables, you can tell the LLM how much to tend towards the lora data. Just need to be careful of Overfit errors where the lora is only able to talk to the training data.
    
    I guess Google did a poor job of communicating how this method of fine tuning worked and pulled the feature when it broke things or reserved it for paying power users.
    
    You can actually get a lot of free Transformation Libraries from Huggingface to make all kinds of specialized versions of your model.
    
    But the bad news is that I can't think of an easy and free tool to get to the same result you had. There are plenty of free tools and libraries to get there but are not very user friendly. And all the easy tools tend to be expensive. The easiest alternative i can find is a Colab LORA creation wizard where you just plug and play the files and parameters. And you may need to convocalthat CSV into a JSON object.
    
    Besides that, theres unfortunately not a lot of non-technical ways to do a technical task like fine tuning that I can think of. Reinforcement training could help where you basically keep drilling test prompts and rating the outputs. Or you could have a logical rules based text checker in the background that gives the LLM answers for definitive questions.
    
    1 vote
Handshape
August 29
Link
The good/fast/cheap trifecta is in play. It'll be important to know exactly what you want the difference in your fine-tune to look like. Like SloMoMonday, I encourage you to look to a smaller...

The good/fast/cheap trifecta is in play. It'll be important to know exactly what you want the difference in your fine-tune to look like.

Like SloMoMonday, I encourage you to look to a smaller open-weight model as your starting point.

The kind folks at Unsloth have done some amazing work making the process of running parameter-efficient fine tunes faster and less painful.

Perhaps most of all, I encourage you to look at easier, cheaper options like RAG before jumping all the way to fine tuning.

7 votes