4 votes

The Big Five are word vectors


  1. [2]
    An interesting blog post about the Big Five personality traits. Many people don't know how they were derived. (This was new to me.) But it's possible to choose a different number of components....

    An interesting blog post about the Big Five personality traits. Many people don't know how they were derived. (This was new to me.)

    Back in 1933 LL Thurstone administered a survey of 60 adjectives to 1300 people. In his seminal The Vectors of Mind he reports that “five factors are sufficient” to explain the data. In subsequent decades such studies, more or less, resulted in five principal components: Agreeableness, Extroversion, Conscientiousness, Neuroticism, and Openness/Intellect.

    But it's possible to choose a different number of components. Apparently, three would have mostly worked:

    Have you ever heard of component over-extraction? It’s not a story the psychologists would tell you. It’s when a researcher extracts too many Principal Components then rotates variance from the earlier, valid PCs on to the later marginal PCs. This is what happened with the Big Five, believe it or not! What is now Agreeableness was once a much more robust and theoretically satisfying ‘Socialization’ factor which was spread out over PCs 3-5 to make Conscientiousness, Neuroticism, and Openness. Rotation can be justified to produce interpretable factors. But if you ever find yourself rotating then arguing about the correct number of factors, check yourself!


    I started my PhD predicting Big Five traits from Facebook statuses. After reading how the personality sausage was made I realized the project used word vectors (of Facebook statuses) to predict noisy approximations of where individuals lived in Big Five space, which was originally defined by word vectors. It seemed more interesting to cut to the chase and learn something fundamental about personality from word vectors. (Also, the dataset I was using became toxic after Cambridge Analytica.) The rest of my PhD was working to constrain word vectors in order to reproduce the Big Five. This involved using transformers rather than LSA (more on that in future posts). The resulting correlation between factors from word vectors (DeBERTa) vs surveys are below. As you can see, there is very close agreement for the first three factors. Where the results diverge, it’s not clear which method is in error. Maybe surveys are right and all the correlations will go to 1 when we get GPT-5. Maybe surveys are just biased and noisy and too many PCs were extracted. Maybe they are measuring different things and we need to refine our interpretation of both. At any rate it’s not obvious to me that surveys should be considered the gold standard between the two. The Lexical Hypothesis is about language structure, after all, and psychology is the only field that uses surveys to analyze natural language.

    2 votes
    1. skybrian
      (edited )
      Link Parent
      From another blog post by the same author: [...] So what is the first factor (alpha)? [...] And here is the paper he coauthored: And comment on the approach used: (I've been quoting the more...

      From another blog post by the same author:

      A little known fact in personality science is that the first factors dwarf the minor factors of the Big Five. Consider the eigenvalues of 435 adjectives below. They represent how much variance (personality information) each factor explains in the survey and NLP data.


      The first factor is 8 times greater than the fifth; a disparity papered over with a name like Big Five.

      So what is the first factor (alpha)?

      considerate, peaceful, respectful, kind, courteous, unaggressive, polite, agreeable, cordial, reasonable, pleasant, benevolent, compassionate, understanding, charitable, helpful, accommodating, cooperative, amiable, tolerant, humble, trustful, patient, genial, altruistic, easygoing, modest, unselfish, friendly, down-to-earth, generous, diplomatic, mannerly, relaxed, selfless, sincere, undemanding, warm, tactful, affectionate


      abusive, belligerent, disrespectful, quarrelsome, unkind, rude, bigoted, intolerant, inconsiderate, uncooperative, irritable, vindictive, impolite, prejudiced, antagonistic, ungracious, crabby, egotistical, cruel, surly, uncouth, cranky, scornful, impatient, selfish, egocentric, possessive, greedy, jealous, tactless, combative, callous, conceited, bitter, uncharitable, unsympathetic, unruly, unstable, bullheaded, unfriendly


      It turns out, that the first PC of almost any personality survey looks like α. Surveys of drug dependence, psychiatric disorders, or what you think of werewolves—all return a suspiciously similar first PC. This has been come to be known as the general factor of personality (GFP). If you are looking for commentary beyond “do unto others”, there is an extensive literature.

      And here is the paper he coauthored:

      Deep Lexical Hypothesis: Identifying personality structure in natural language

      Recent advances in natural language processing (NLP) have produced general models that can perform complex tasks such as summarizing long passages and translating across languages. Here, we introduce a method to extract adjective similarities from language models as done with survey-based ratings in traditional psycholexical studies but using millions of times more text in a natural setting. The correlational structure produced through this method is highly similar to that of self- and other-ratings of 435 terms reported by Saucier and Goldberg (1996a). The first three unrotated factors produced using NLP are congruent with those in survey data, with coefficients of 0.89, 0.79, and 0.79. This structure is robust to many modeling decisions: adjective set, including those with 1,710 terms (Goldberg, 1982) and 18,000 terms (Allport & Odbert, 1936); the query used to extract correlations; and language model. Notably, Neuroticism and Openness are only weakly and inconsistently recovered. This is a new source of signal that is closer to the original (semantic) vision of the Lexical Hypothesis. The method can be applied where surveys cannot: in dozens of languages simultaneously, with tens of thousands of items, on historical text, and at extremely large scale for little cost. The code is made public to facilitate reproduction and fast iteration in new directions of research.

      And comment on the approach used:

      I spent more than a year fine-tuning a method to extract personality relationships from RoBERTa, the state of the art model at the time. Soon after GPT-3 was released and it performed better right off the shelf. That compute supersedes domain knowledge is a reoccurring bitter lesson within AI. Compute increases exponentially. If you can get 30% gains over a general ML solution using domain knowledge, you can also just wait for compute to catch up and get the same results using general methods. Finding ways to relate psychology questions to off the shelf NLP models is therefore a good way forward. A new model with noticeably better performance is made public every six months or so. Those validating word space are preparing the way for greater intelligences—PaLM, GPT-7, OSCar (Optimal Sentience Cartography)—to rain down psychological truths.

      (I've been quoting the more reasonable bits, but reading further, there's a fair bit of wild speculation in other blog articles.)

      1 vote