2 votes

Plain Text - Dylan Beattie - NDC Oslo 2021

Posted April 24, 2022 by onyxleopard

Tags: text, unicode, encoding, databases

https://www.youtube.com/watch?v=_mZBa3sqTrI

Link information

This data is scraped automatically and may be incorrect.

Authors: NDC Conferences
Duration: 54:13
Published: Feb 15 2022

4 comments

[4]
FlippantGod
April 25, 2022
Link
Much time was spent on examples of encoding troubles but not much about loosing data. From personal experience, Windows 10 in US locale does not handle certain Japanese encodings in the file...

Much time was spent on examples of encoding troubles but not much about loosing data.

From personal experience, Windows 10 in US locale does not handle certain Japanese encodings in the file explorer, possibly just inside the zip archive viewer? It was probably EUC-JP but I forget. Well, some of the files inside apparently did not exist.

1 vote
1. [3]
  onyxleopard (OP)
  April 25, 2022
  Link Parent
  I think many times the risk of data loss at the source is minor—even if some software can’t represent the data to a human in a way that humans understand, the actual bits are still there on disk....
  
  I think many times the risk of data loss at the source is minor—even if some software can’t represent the data to a human in a way that humans understand, the actual bits are still there on disk. Over the wire or via some protocol, however, sometimes bits may get discarded. I know I’ve seen people get befuddled due to GitHub nuking carriage returns, or naive software that decodes belligerently using the wrong encoding and replaces erroneous data with � U+FFFD.
  
  2 votes
  1. FlippantGod
    April 25, 2022 (edited April 25, 2022)
    Link Parent
    In this case a user could easily fail to realize that there were actually more files inside the archive. If they extracted and deleted the zip, instead of using 7zip or another non broken tool,...
    
    In this case a user could easily fail to realize that there were actually more files inside the archive. If they extracted and deleted the zip, instead of using 7zip or another non broken tool, they never would have known. But I do understand your point.
    
    edit: now that I think back on it, I may have lost something. I did not discover the issue until one of my old archives opened up empty, which was obviously incorrect. I had to recheck earlier archives to find that some files had been missed, and of course by that time I had already deleted many and didn't bother to check others (calculated guess because I was lazy).
    
    2 votes
  2. FlippantGod
    April 25, 2022
    Link Parent
    Also, I know this is a discussion of encoding errors, but an area of data encoding and reading that I am less informed about is also interesting: cd reading. I seem to recall some software project...
    
    Also, I know this is a discussion of encoding errors, but an area of data encoding and reading that I am less informed about is also interesting: cd reading.
    
    I seem to recall some software project focused on properly reading different encodings of compact discs and I believe music specifically. It seemed to be concerned about correctly understanding and actually reading various parity schemes, and ways to deal with damaged discs or read errors, skips, hardware reader bugs, etc. It wasn't really something I had been aware of.
    
    2 votes