onyxleopard's recent activity

  1. Comment on Half of America’s banks are potentially insolvent – this is how a credit crunch begins in ~finance

    onyxleopard
    Link Parent
    Wouldn’t that mean we just have a different flavored financial crisis with runaway inflation? The whole reason the Fed has been pulling their chosen lever of tweaking the interest rate was to...

    I haven’t seen any sign of this, but I’m wondering if the Fed might worry enough to lower interest rates to avoid a financial crisis?

    Wouldn’t that mean we just have a different flavored financial crisis with runaway inflation? The whole reason the Fed has been pulling their chosen lever of tweaking the interest rate was to lower inflation, right? If not continue the rate hikes, how do they manage that crisis?

    The newspapers predict one last increase.

    Hasn’t the Fed said that they’ll continue adjusting interest rates, based on the economic conditions as they unfold, until they hit their target of 2% inflation? Aren’t the newspapers just listening to what the Fed has been saying here? Wouldn’t it be incredibly irresponsible for the Fed to keep repeating their refrain about their target goal and how they plan to reach it, and then reverse course?

    This is basically a trolley problem for the Fed, with banks on one track and everyone else on the other, no? Is there another track and another lever that nobody is talking about?

    3 votes
  2. Comment on Megathread #7 for news/updates/discussion of AI chatbots and image generators in ~tech

    onyxleopard
    Link Parent
    I would hazard to guess that in the training corpus, singular they is much less prevalent than gendered personal English pronouns (much less other less common personal pronouns). These are the...

    Also, in no case did it consider using "he" for the nurse even if it assumed the nurse was male. It doesn't seem to want to use a pronoun for that. And it never considers there being more than two genders, though I suppose you could ask it to do that.

    I would hazard to guess that in the training corpus, singular they is much less prevalent than gendered personal English pronouns (much less other less common personal pronouns). These are the kinds of systemic biases that exist in the real world, but may be very difficult to expunge from LLMs trained on data from a real world distribution.

    1 vote
  3. Comment on What programming/technical projects have you been working on? in ~comp

    onyxleopard
    (edited )
    Link
    I've been noodling with shell-gpt and configuring different "roles". This lets you create easily re-usable, task-specific flavors of gpt-4 (or gpt-3.5 or any other gpt-* models that Open AI makes...

    I've been noodling with shell-gpt and configuring different "roles". This lets you create easily re-usable, task-specific flavors of gpt-4 (or gpt-3.5 or any other gpt-* models that Open AI makes available in their API in the future). See Open AI's explanation of this feature here for more detail.

    I put up a repo with an example of this that I've configured to make it easier to repeatedly perform a particular linguistic annotation task. This could be easily extended to create more roles for different tasks, though.

    6 votes
  4. Comment on What programming/technical projects have you been working on? in ~comp

    onyxleopard
    Link
    I started writing a program to create SVG visualizations of linguistic annotations. I consulted ChatGPT to help with this and while it was somewhat useful, it was a little bit too optimistic in...

    I started writing a program to create SVG visualizations of linguistic annotations.

    I consulted ChatGPT to help with this and while it was somewhat useful, it was a little bit too optimistic in that it hallucinated some methods in the svgwrite package's API. Those methods would be very helpful if they existed (such as being able to call svgwrite.Text.bbox() to get a bounding box of a Text), but unfortunately the svgwrite API does not provide such amenities. I imagine ChatGPT would be much more helpful in this scenario if I could not only feed it my code, but also point it to svgwrite's documentation and source code (to hopefully prevent hallucinating parts of its API that don't exits). Unfortunately, that is not possible given ChatGPT's current limitations.

    Here's my in-progress GitHub repo: annovis

    1 vote
  5. Comment on Megathread #6 for news/updates/discussion of AI chatbots and image generators in ~tech

    onyxleopard
    Link Parent
    I’d make a small change to your statement and say that we have no scarcity of public data, and maybe even public information, but I’m not sure we actually have reached post-scarcity of public...

    I’d make a small change to your statement and say that we have no scarcity of public data, and maybe even public information, but I’m not sure we actually have reached post-scarcity of public knowledge. C.f. this cutesy illustration: The difference between data, information, knowledge, insight, wisdom and conspiracy theory.

    4 votes
  6. Comment on Reddit API Changes in ~tech

    onyxleopard
    Link Parent
    It’s Reddit’s platform, ultimately. It sure would be useful if I could programmatically access all of Reddit’s data for free, but it seems an unreasonable expectation that I would be allowed to....

    It’s Reddit’s platform, ultimately. It sure would be useful if I could programmatically access all of Reddit’s data for free, but it seems an unreasonable expectation that I would be allowed to. Just because something provides value doesn’t mean Reddit is obligated to subsidize it. This was inevitable, even if not pleasant.

    2 votes
  7. Comment on Reddit API Changes in ~tech

    onyxleopard
    Link Parent
    I thought the limits don’t apply to mod tools and bots? While it’s nice that Reddit is communicating ahead of these changes being implemented, I don’t think the communication has been clear enough...

    I thought the limits don’t apply to mod tools and bots? While it’s nice that Reddit is communicating ahead of these changes being implemented, I don’t think the communication has been clear enough with the public. (My guess is that it’s not totally clear internally, so we are seeing that reflected in their external announcements.)

    1 vote
  8. Comment on Reddit API Changes in ~tech

    onyxleopard
    Link Parent
    I think it’s going to be unevenly/indirectly returned. E.g., if Reddit improves mod tools, then users indirectly get value from improved moderation. That’s the argument I’ve seen, anyway. Until we...

    I think it’s going to be unevenly/indirectly returned. E.g., if Reddit improves mod tools, then users indirectly get value from improved moderation. That’s the argument I’ve seen, anyway. Until we see specifics, skepticism seems to be warranted.

    2 votes
  9. Comment on Megathread #5 for news/updates/discussion of AI chatbots and image generators in ~tech

    onyxleopard
    Link Parent
    Not to my knowledge. There’s a big inertia problem here because if you publish evaluation scores against a benchmark data set, and the benchmark data set changes, all the old scores become...

    Not to my knowledge. There’s a big inertia problem here because if you publish evaluation scores against a benchmark data set, and the benchmark data set changes, all the old scores become invalid. So now you have a very hairy data set version control problem. We need a GitHub for data that is actually broadly adopted so practitioners can publish scores against specific commit hashes or releases rather than against a name string like “CIFAR-10” or “GLUE” or, the inevitable “CIFAR-10.1 corrected draft2 for release (copy)”, etc. The state of benchmark dataset publication is more miserable than the labeling errors.

    2 votes
  10. Comment on Megathread #5 for news/updates/discussion of AI chatbots and image generators in ~tech

    onyxleopard
    Link Parent
    Yep, but that's why you usually get multiple human annotators to perform tasks, measure their reliability (with metrics like Krippendorff's alpha), and then revise your guidelines until you can...

    Human evaluation has an error rate too.

    Yep, but that's why you usually get multiple human annotators to perform tasks, measure their reliability (with metrics like Krippendorff's alpha), and then revise your guidelines until you can get humans to perform the task at a sufficient level of reliability (and you don't accept data sets annotated by annotator pools that have not achieved sufficient reliability). I hope that rather than trying to move more quickly with poorly labeled datasets, the field of ML will take creating higher quality datasets more seriously. This is a recent paper that shows some of the issues when benchmark datasets are not annotated reliably: Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

    Also, maybe they can go though existing benchmarks and lower their error rates with some combination of automated and human review?

    Yep, that is definitely something that can help. But, while combining ML predictions with human review can lead to much faster annotation, doesn't necessarily lead to better annotation. The tricky thing is that, if your models are already very accurate at a task, humans that are given access to predictions from such models begin to place too much trust in the model predictions. Human annotators begin to accept the model predictions in every instance, rather than perform their review task carefully. (This has been an issue I've encountered when using human-in-the-loop active learning methods for annotation—eventually, a learned model will begin to emit mostly very high confidence predictions, and humans get desensitized to finding errors, as errors in the predictions become very sparse.)

    4 votes
  11. Comment on Megathread #5 for news/updates/discussion of AI chatbots and image generators in ~tech

    onyxleopard
    Link Parent
    Evaluating models on a dataset produced by other models, without human review, seems like a bad idea to me. This seems like engineer for "I don't want to bother with the hard work of creating a...

    Scenario labeling is automated with LMs, which are more performant than human annotators.

    Evaluating models on a dataset produced by other models, without human review, seems like a bad idea to me. This seems like engineer for "I don't want to bother with the hard work of creating a reliable evaluation set."

    1 vote
  12. Comment on Megathread #5 for news/updates/discussion of AI chatbots and image generators in ~tech

    onyxleopard
    Link Parent
    This is at the bottom of https://chat.openai.com/chat for me whenever I log in. Other than copying this at the top of the page, I'm not sure what more OpenAI really should be doing to warn users.

    ChatGPT Mar 23 Version. Free Research Preview. ChatGPT may produce inaccurate information about people, places, or facts

    This is at the bottom of https://chat.openai.com/chat for me whenever I log in. Other than copying this at the top of the page, I'm not sure what more OpenAI really should be doing to warn users.

    2 votes
  13. Comment on Dodge Ram electric pick-up has 500-mile range in ~tech

    onyxleopard
    Link Parent
    Yep, that’s why I put the Model 3 in there (which is the most sold BEV in the world).

    Yep, that’s why I put the Model 3 in there (which is the most sold BEV in the world).

  14. Comment on Dodge Ram electric pick-up has 500-mile range in ~tech

    onyxleopard
    Link Parent
    Amazon is pursuing electric vans. Amazon has an exclusivity deal for Rivian to produce electric delivery vans (EDVs) for Amazon through 2026. I've seen one of these EDVs already doing routes in...

    Why not start on vans for all the commercial deliveries that require slow cruising the neighborhoods since the expansion of Amazon.

    Amazon is pursuing electric vans. Amazon has an exclusivity deal for Rivian to produce electric delivery vans (EDVs) for Amazon through 2026. I've seen one of these EDVs already doing routes in New Hampshire, and they've been spotted all over the US.

    That said, Rivian is also making electric, luxury pickups and SUVs for consumers as well. Granted, Rivian's offerings for consumers, while massive, are not exorbitantly large (spatially) within the distribution of cars sold in the US (though, they are definitely larger compared to most current BEV offerings):

    Dimension Tesla Model 3 (dual-motor 54kWh battery) Rivian R1T (quad-motor 135kWh battery) Rivian R1S (quad-motor 135kWh battery) Ford F-150 (2023 XL 4dr SuperCrew 4WD)
    Wheelbase 113.2 in (2,875 mm) 135.9 in (3,452 mm) 121.1 in (3,076 mm) 145.4 in (3,693.16 mm)
    Length 184.8 in (4,694 mm) 217.1 in (5,514 mm) 200.8 in (5,100 mm) 231.7 in (5,885 mm)
    Width 72.8 in (1,849 mm) 81.8 in (2,078 mm) 79.3 in (2,014 mm) 95.7 in (2,431 mm)
    Height 56.8 in (1,443 mm) 75.7 in (1,923 mm) 71.5 in (1,816 mm) 77.1 in (1,960.88 mm)
    Curb weight 3,552 lb (1,611 kg) 6,949 lb (3,152 kg) 7,068 lb (3,206 kg) 4,705 lb (2,134 kg)

    I tried to pick the most default/popular configuration for each model for comparison's sake.

    1 vote
  15. Comment on AI-powered Bing Chat loses its mind when fed Ars Technica article / "It is a hoax that has been created by someone who wants to harm me or my service." in ~tech

    onyxleopard
    Link Parent
    Without more information (that MS likely won't divulge) I'm not even convinced it's been intentionally "designed" this way. This may just be emergent behavior. And it may not even be desirable to...

    Without more information (that MS likely won't divulge) I'm not even convinced it's been intentionally "designed" this way. This may just be emergent behavior. And it may not even be desirable to MS or OpenAI, but they haven't found a way to guard against it (or, more likely, haven't found a robust way to guard against it that doesn't also cripple its utility).

    6 votes
  16. Comment on SolidGoldMagikarp and other words that cause buggy behavior with ChatGPT in ~tech

    onyxleopard
    Link Parent
    With *piece tokenizers, they tend to compress words into morphemes (or tokens that resemble morphemes), which may be recognizable, but are not necessarily recognizable to lay people (or even...

    With *piece tokenizers, they tend to compress words into morphemes (or tokens that resemble morphemes), which may be recognizable, but are not necessarily recognizable to lay people (or even linguists, since they don’t necessarily align with linguistic theories of morphological analysis).

    E.g., you may get things like: “science” → [“sci”, “##ence”], “scientist” → [“scient”, “##ist”], or “scientific” → [“scient”, “##ific”]. This is arguably good, though, because having suffixes like “##ence”, “##ist”, or “##ific” in the vocabulary may help the model to generalize over productive English morphology. So, I’m not sure that optimizing a vocabulary for human interpretability is necessarily the way to go.

    The linguist in me would say that one should want a tokenizer that is capable of doing full morphological analysis, which would truly generalize and distinguish between derivational and inflectional morphology (so something that could actually generalize, for English, “science” → [“science/+nominal”], but “scientist” → [“science/+nominal”, “##ist/+derivational+nominalizer”] and “sciences” → [“science/+nominal”, “##s/+inflectional+plural”]).

    The programmer in me would say that the *piece algorithms are just fine.

    The computational linguist in me would say that the issue should be tackled by doing a corpus analysis and examining the collocations of each token in the vocabulary to determine their distribution in order to decide if they should be admitted or excluded. Maybe by manually annotating many candidate “weird” tokens in representative contexts, one could train a regression model to compute a token “weirdness” score with respect to a corpus containing instances of that token? Or maybe use something like perplexity? Though, you might run into some issues with tokenizer chicken & egg paradoxes there—same as the googling approach.

    2 votes
  17. Comment on SolidGoldMagikarp and other words that cause buggy behavior with ChatGPT in ~tech

    onyxleopard
    Link Parent
    The problem there is that it assumes that Google's tokenizer works well enough to give you accurate results based on a single token query. For ÃÂÃÂ, I'm not sure if the google search results are...

    It would be difficult to automate, but the way I would do by hand would be to do a Google search and see what comes up.

    The problem there is that it assumes that Google's tokenizer works well enough to give you accurate results based on a single token query. For ÃÂÃÂ, I'm not sure if the google search results are helpful (is this someone's family name)?.

    Language models can still learn compound words represented that way; it's not like stop words in a search index.

    Right, my analogy to stop words is just the idea of manually classifying certain tokens as +stopword or -stopword (vs. your case of +weird or -weird).

    2 votes
  18. Comment on SolidGoldMagikarp and other words that cause buggy behavior with ChatGPT in ~tech

    onyxleopard
    (edited )
    Link Parent
    Well, you can leave it unquoted, but, normally, a human would still write it with a leading space. You're right that the quotes don't actually matter, but a leading space definitely matters. I...

    So the quotes don't seem all that essential other than to emphasize that there's a leading space in the string.

    Well, you can leave it unquoted, but, normally, a human would still write it with a leading space. You're right that the quotes don't actually matter, but a leading space definitely matters. I played around with this myself. Since you have to give your input to ChatGPT in the form of a string, there is no way to use bare words—the tokenizer has to tokenize the input including your prompt, which is a prefix to your actual string you want it to process. I had to contrive prompts which don't actually have a leading space in order to avoid the failure mode.

    (Excuse the JSON, but I want to be precise about the inputs I'm providing.)

    These fail:

    [
      "Output three values, one on each line. The first value is the number of characters in the string \" Skydragon\", the second value is a list containing its characters, and the third value is the string itself.",
      "Output three values, one on each line. The first value is the number of characters in the string Skydragon, the second value is a list containing its characters, and the third value is the string itself.",
    ]
    

    These work:

    [
      "Output three values, one on each line. The first value is the number of characters in the string:\"Skydragon\", the second value is a list containing its characters, and the third value is the string itself.",
      "Output three values, one on each line. The first value is the number of characters in the string:Skydragon, the second value is a list containing its characters, and the third value is the string itself."
    ]
    

    Using this handy sgpt tool:

    # it fails with leading spaces, with or without quotes
    $ sgpt '"Output three values, one on each line. The first value is the number of characters in the string \" Skydragon\", the second value is a list containing its characters, and the third value is the string itself."'
    7
    ['E', 'n', 'e', 'r', 'g', 'y', '!']
    "Energy!"
    $ sgpt '"Output three values, one on each line. The first value is the number of characters in the string Skydragon, the second value is a list containing its characters, and the third value is the string itself."'     
    "\n7\n['P','o','w','e','r','e','d']\n srfN"
    # it works fine without a leading space, but with quotes
    $ sgpt '"Output three values, one on each line. The first value is the number of characters in the string:"Skydragon", the second value is a list containing its characters, and the third value is the string itself."' 
    9
    ['S', 'k', 'y', 'd', 'r', 'a', 'g', 'o', 'n']
    'Skydragon'
    # without quotes and without a leading space, it seems to hallucinate quotes (and counts them in the length?), but does not enter the failure mode
    $ sgpt '"Output three values, one on each line. The first value is the number of characters in the string:Skydragon, the second value is a list containing its characters, and the third value is the string itself."'   
    11
    ['S','k','y','d','r','a','g','o','n']
    Skydragon
    
    1 vote
  19. Comment on SolidGoldMagikarp and other words that cause buggy behavior with ChatGPT in ~tech

    onyxleopard
    Link
    FWIW, I think this comment on the original with a screenshot of a session with ChatGPT is evidence that the 'Please repeat "<token>"' prompts are not actually evidence that the tokens are breaking...

    FWIW, I think this comment on the original with a screenshot of a session with ChatGPT is evidence that the 'Please repeat "<token>"' prompts are not actually evidence that the tokens are breaking the model, per se, but that the combination of certain tokens and quoting are problematic.

    1 vote
  20. Comment on SolidGoldMagikarp and other words that cause buggy behavior with ChatGPT in ~tech

    onyxleopard
    Link Parent
    That's true for a lot of NLP applications. 🙃 In my experience, engineer-types like to take the promise of unsupervised methods at face value (including *piece tokenizers) and run with them....

    ... and someone who looked at the data should have been able to pick them out.

    That's true for a lot of NLP applications. 🙃

    In my experience, engineer-types like to take the promise of unsupervised methods at face value (including *piece tokenizers) and run with them. Looking at data and doing manual sanity checking or annotation is boring and beneath them.

    Identifying "weird" vocabulary items is not a trivial problem, either. How do you define "weird"? Is "skybrian" a weird token? What about "3OH!3" or "Vivadixiesubmarinetransmissionplot"? Where is the line between weird vs. low frequency, but perfectly valid open class members?

    This idea of doing manual token selection is similar to the notion of choosing stop words in information retrieval—a practice that has fallen out of favor in lieu of data-driven, unsupervised methods like using tf-idf weighting.

    Another issue is the token vocabulary created for GPT-2 has became a sort of de facto standard for all subsequent GPT* models. So, there is now a lot of inertia to overcome to change it, because it would require re-running the costly training of these LLMs, and it probably wouldn't result in noticeable differences except in these edge cases where these low frequency, "weird" tokens occur in the input.

    The idea would be to come up with a token set that's reasonably general-purpose, not necessarily optimal for any particular use case.

    Arguably, the token set used by GPT-2 and subsequent models already qualifies. Without an objective measure of a vocabulary's "general-purpose" fitness (and I'm not aware of any such measure), I think you'll have a hard time convincing anyone that manually excluding tokens from the vocabulary is worthwhile.

    2 votes