5 votes

How to build a GPT-3 for science

2 comments

  1. skybrian
    Link
    It seems like a good way to make scientific papers based on fictional data, which we have too many of already. We need something quite different from this. A good start might be making a text...

    It seems like a good way to make scientific papers based on fictional data, which we have too many of already.

    We need something quite different from this. A good start might be making a text generator that never makes up quotes and always cites documents that actually exist. It would be sort of like a search engine that generates summaries.

    4 votes
  2. patience_limited
    Link
    To some extent, I'm sceptical that language processing alone will unlock truly novel and reliable insights from the vast corpus of published papers (and certainly not without expensive, extensive...

    Generative AI for science could help reverse the deceleration of innovation in science by making it easier and cheaper to find new ideas. Such models could also provide data-backed warnings of therapeutic hypotheses that are certain to fail, counterbalancing human bias and avoiding billion-dollar, decades-long blind alleys. Finally, such models could combat the reproducibility crisis by mapping, weighing, and contextualizing research results, providing a score on trustability.

    So why don’t we have a DALL-E or GPT-3 for science? The reason is that although scientific research is the world’s most valuable content, it is also the world’s least accessible and understandable content. I’ll explain what it would take to unlock scientific data at scale to make generative AI for science possible, and how it would transform the way we engage with research.

    To some extent, I'm sceptical that language processing alone will unlock truly novel and reliable insights from the vast corpus of published papers (and certainly not without expensive, extensive manual quality rating on the training data). AlphaFold was a very specialized example of ML working on robustly coded data. But I'll be happy to see the outcome!