That’s fascinating! I learned about a number of encodings once I got on the Internet. Being on the Mac there was a bunch of stuff that would render weirdly if it wasn’t straightforward Roman...
That’s fascinating! I learned about a number of encodings once I got on the Internet. Being on the Mac there was a bunch of stuff that would render weirdly if it wasn’t straightforward Roman letters. Even accents and stuff would be off if you didn’t know what you were doing.
Old Soviet systems mostly used an encoding named KOI8-R. KOI stands for “Код Обмена Информацией” (Information Interchange Code), 8 means 8-bit, and R means Russian (there’s also a Ukrainian version named KOI8-U, the Ukrainian alphabet is distinct from Russian). That encoding is, to put it politely, mildly insane: it was designed so that stripping the 8th bit from it leaves you with a somewhat readable ASCII transliteration of the Russian alphabet, so Russian letters don’t come in their usual order.
That is ingenious, bonkers, and a terrible idea! It’s ingenious if you need to transliterate a lot of stuff to Roman letters, but how common is that for most people? It’s a terrible idea because it makes sorting things extremely awkward and sorting things is one of the most important functions of a computer. I’m old enough to have run into EBCDIC in actual practice once or twice. It’s terrible (because it’s designed for technology we no longer use), but at least it’s in lexigraphical order!
Presumably it was built that way not for convenient transliteration, but because back in the day lots of systems weren't 8-bit clean. For example, the default data transfer mode for FTP only...
It’s ingenious if you need to transliterate a lot of stuff to Roman letters, but how common is that for most people
Presumably it was built that way not for convenient transliteration, but because back in the day lots of systems weren't 8-bit clean. For example, the default data transfer mode for FTP only transfers 7-bits per character. For ASCII content that gives you a nice little transfer speed improvement. You have to switch to "image" mode to get that elusive 8th bit.
So imagine you've accidentally copied a KOI8-R file over FTP's ASCII mode. It should still display in a readable-ish manner. You could just switch it to image mode, but there would have been times where you'd have no control over whether the servers you're using kept the 8th bit.
That’s fascinating! I learned about a number of encodings once I got on the Internet. Being on the Mac there was a bunch of stuff that would render weirdly if it wasn’t straightforward Roman letters. Even accents and stuff would be off if you didn’t know what you were doing.
That is ingenious, bonkers, and a terrible idea! It’s ingenious if you need to transliterate a lot of stuff to Roman letters, but how common is that for most people? It’s a terrible idea because it makes sorting things extremely awkward and sorting things is one of the most important functions of a computer. I’m old enough to have run into EBCDIC in actual practice once or twice. It’s terrible (because it’s designed for technology we no longer use), but at least it’s in lexigraphical order!
Presumably it was built that way not for convenient transliteration, but because back in the day lots of systems weren't 8-bit clean. For example, the default data transfer mode for FTP only transfers 7-bits per character. For ASCII content that gives you a nice little transfer speed improvement. You have to switch to "image" mode to get that elusive 8th bit.
So imagine you've accidentally copied a KOI8-R file over FTP's ASCII mode. It should still display in a readable-ish manner. You could just switch it to image mode, but there would have been times where you'd have no control over whether the servers you're using kept the 8th bit.