26 votes

When using the wrong shell causes weird bug reports

5 comments

  1. priw8
    Link
    Linking my blog post about a pretty... interesting issue to debug, since I'm curious if anyone here had similar things happen to them. Here's also the full content, in case reading it directly on...

    Linking my blog post about a pretty... interesting issue to debug, since I'm curious if anyone here had similar things happen to them. Here's also the full content, in case reading it directly on Tildes is what you prefer:

    Microsoft Pain™: the UTF16 beam

    There are times when you're dealing with a bug so weird and ridiculous, that you begin to question reality itself, and wonder whether some higher entity is intentionally fooling with you. This is one such story.

    This is going to be about a program called thanm. It's not really relevant what it does here, so just keep in mind that it has a few things it can do, and 2 of them are:

    • extract metadata from an ANM archive,
    • recreate the archive from scratch using metadata extracted earlier.

    Generally, freshly extracted metadata should always be valid and usable for archive creation. If it wasn't, that'd be a pretty serious bug. A small caveat here is that the metadata extraction has no feature to specify the output file, and so it just writes to stdout. Though that isn't an issue, since every shell supports output redirection.

    The bug report

    A person I know reported an issue where they weren't able to recreate an archive, and were getting the following error:

    syntax error, unexpected invalid token
    

    A bit of a confusing error - is there an error in the error message that prevents the character from being printed? Or is it some sort of whitespace character?

    Either way, the first thing to figure out was whether the issue was coming from an edit to the file, or the file being bad immidiately after its creation. So,
    I asked them to run the following 2 commands:

    thanm -l filename.anm > filename.txt    
    thanm -c filename.anm filename.txt
    

    ...and that still gave the same error. That's concerning, since as I said earlier, it'd be a pretty serious bug if an archive can't be recreated. So I went and checked the same file on my machine, and naturally it worked fine. Huh. In this case, maybe it's a bug that's already patched? My next question was about the version the bug happened on, and it turned out that it's the exact same one as the one I have locally. You know, the one that worked without errors :D

    At this point, it's getting pretty confusing. How would the exact same executable yield 2 different results for 2 people?

    Running out of ideas, I asked the person to send me the file created by the first command. And at last, with that one I was able to reproduce the issue. Finally some progress! Somewhat, at least; this still doesn't explain what's actually wrong with this file, as it looks just like the valid one in the text editor. But clearly, I must be missing something, so I checked with diff to be sure:

    Binary files filename.txt and filename-broken.txt differ
    

    Hold up, what do you mean by "binary files"?!

    Somebody touched my text encoding!

    thanm only accepts and outputs ASCII encoded files. And yet, somewhere along the way, there was something that reencoded the file into UTF16, even when the 2 commands were ran with nothing between them. So... what? when? how?

    This was back when I still used Windows, and so was the person getting the error - that's how I knew the exact same executable worked for me. And Microsoft sure loves UTF16, so that's a small clue we have. That was when I realized, the terminal screenshots I was being sent showed a PS prompt, which comes from Powershell by default. So, I tried to run the commands with it instead of using cmd or bash, and sure enough - the issue has finally been reproduced from scratch.

    When using output redirection to a file, Powershell for some unfathomable reason decides that it's a wonderful idea to reencode everything as UTF16! What encoding does it even assume the original text is in? No idea, it's probably based on system locale, just to make things extra spicy. Anyway, after figuring out the cause, we had a good laugh about it.

    ...to this day, thanm is still missing an option to write the output to file instead of stdout. Every once a while, someone runs into the encoding issue because they use Powershell. I think the last one I saw was 2 or 3 weeks ago, but realistically not every instance of this happening comes to my attention, so there's bound to be a few cases I did not hear about.

    So please, do not implicitly reencode text output of unrelated programs in your software, save others the pain. Until next time!

    12 votes
  2. [4]
    tauon
    (edited )
    Link
    That sounds… horrible as well as typically™️ Microsoft. That, I know all too well. Going on a tangent, but recently I've been struggling with a very, very puzzling issue myself: I have a shell...

    That sounds… horrible as well as typically™️ Microsoft.

    Anyway, after figuring out the cause, we had a good laugh about it.

    That, I know all too well.

    Going on a tangent, but recently I've been struggling with a very, very puzzling issue myself:

    I have a shell function defined in my .zshrc, it does some seemingly normal stuff (although it's not working yet – I get some form of error from the non-shell program I call that I haven't been able to resolve yet), but after it has ended, my shell session seems to lose all functionality. Like

    zsh: command not found: man
    
    

    levels of gone (most/all shell built-ins seem to survive this cut-off). Long shot, but anybody here got any ideas per chance? I'm unfortunately basically a complete newbie with regards to anything shell scripting…

    1. [3]
      smores
      Link Parent
      If you still have shell built-ins but you’ve lost access to e.g. man, it sounds like your PATH environment variable is getting unset/reset. PATH holds a colon-delimited list of file paths to...

      If you still have shell built-ins but you’ve lost access to e.g. man, it sounds like your PATH environment variable is getting unset/reset. PATH holds a colon-delimited list of file paths to search for executable files when you type a bare executable name into your shell, like man.

      I would try running echo $PATH before and after running your function to see whether this is the issue or not. If PATH is being unset, it might be worth running env to see if this is specific to PATH, or if your whole environment is getting blown away for some reason.

      9 votes
      1. [2]
        tauon
        Link Parent
        Ding ding ding, we have a winner! I don't know why I didn't see that myself, haha. Seriously though, thank you. The amount of in-principle-it’s-easy stuff you can (and should) consider at all...

        Ding ding ding, we have a winner! I don't know why I didn't see that myself, haha.

        Seriously though, thank you. The amount of in-principle-it’s-easy stuff you can (and should) consider at all times in shell or most programming languages just adds up, seemingly never ending, and when you just know the basics like me, it can become a bit… too much to remember.

        I have identified the mistake and already fixed at least that part of what I'm working on, and now it’s back to

        Anyway, after figuring out the cause, we had a good laugh about it.

        3 votes
        1. smores
          Link Parent
          Yay! I'm glad I was able to help!

          Yay! I'm glad I was able to help!

          1 vote