9 votes

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

2 comments

  1. skybrian
    Link
    Here's the abstract: This might be thought of as a sophisticated way of asking the LLM to repeatedly make improvements to the answers it came up with so far. The LLM self-evaluates (giving itself...

    Here's the abstract:

    This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks. Addressing the challenges of accuracy and reliability in LLMs, particularly in strategic and mathematical reasoning, MCTSr leverages systematic exploration and heuristic self-refine mechanisms to improve decision-making frameworks within LLMs. The algorithm constructs a Monte Carlo search tree through iterative processes of Selection, self-refine, self-evaluation, and Backpropagation, utilizing an improved Upper Confidence Bound (UCB) formula to optimize the exploration-exploitation balance. Extensive experiments demonstrate MCTSr's efficacy in solving Olympiad-level mathematical problems, significantly improving success rates across multiple datasets, including GSM8K, GSM Hard, MATH, and Olympiad-level benchmarks, including Math Odyssey, AIME, and OlympiadBench. The study advances the application of LLMs in complex reasoning tasks and sets a foundation for future AI integration, enhancing decision-making accuracy and reliability in LLM-driven applications.

    This might be thought of as a sophisticated way of asking the LLM to repeatedly make improvements to the answers it came up with so far. The LLM self-evaluates (giving itself a score after writing each answer) and the tree search algorithm picks the next answer to try to improve.

    They only do eight "rollouts", so that's a tiny tree. By contrast, Monte Carlo Tree Search is normally used with games like chess and Go where the evaluation functions run quicker and the search trees are enormous. So, it's kind of a toy as far as the search goes, but they do get some good improvement out of it. It would have been nice if they compared to a less sophisticated algorithm, like asking the LLM for eight answers and then choosing the best.

    To use tree search more, LLM's will have to get a lot faster, and they'll be limited by their ability to self-evaluate. It's probably a better fit for domains where something more reliable than an LLM can do the evaluation, like mathematical proofs or program fuzzing.

    This came up because of a speculative blog post about how the future of AI is LLM's combined with tree search.

    7 votes
  2. pete_the_paper_boat
    Link
    I wonder if more permanent gains could be made if the output of this process was used as fine tuning data. I presume it wouldn't be too dissimilar from learning from forum discussions, or...

    I wonder if more permanent gains could be made if the output of this process was used as fine tuning data.

    I presume it wouldn't be too dissimilar from learning from forum discussions, or educational books.

    3 votes