21 votes

I am fighting back by switching this website from HTML to PDF

86 comments

  1. [43]
    Wes
    Link
    There's about a thousand things I could point out about how flawed this is (performance, accessibility, indexability, caching, and so on), but I guess they're willing to make those tradeoffs. Viva...

    There's about a thousand things I could point out about how flawed this is (performance, accessibility, indexability, caching, and so on), but I guess they're willing to make those tradeoffs. Viva la open web, where things like this are possible.

    27 votes
    1. trazac
      Link Parent
      Literally all of the arguments for PDFs also apply to markup, other than maybe 'page oriented' but I'm sure there is some trick out there that can make that happen in HTML.

      Literally all of the arguments for PDFs also apply to markup, other than maybe 'page oriented' but I'm sure there is some trick out there that can make that happen in HTML.

      8 votes
    2. [41]
      onyxleopard
      (edited )
      Link Parent
      It's fine to disagree. Still, I found your comment inscrutably dismissive, and I couldn't surmise what the substance of your disagreements actually are. Let me play devil's advocate: You've hinted...

      It's fine to disagree. Still, I found your comment inscrutably dismissive, and I couldn't surmise what the substance of your disagreements actually are.

      Let me play devil's advocate:

      You've hinted at some particular flaws, but without actually explaining them, I can only surmise what your actual beefs are with the premise of the post.

      For instance, you claim that "indexability" is a flaw, yet, the post itself claims:

      PDFs are discoverable. Search engines index them as easily as any other format.

      I know Google and DuckDuckGo index PDFs—just add filetype:pdf to your query. So, what do you mean when you say "indexability" is a flaw? And for your other parenthetical claims, what do you mean by them?

      The post goes into some depth about how "page-oriented" documents are much more accessible than infinite-scroll HTML pages. So, what exactly do you mean when you claim "accessibility" is a flaw for PDFs?

      Edit: In "New Frontiers in PDF Accessibility" there is more content on accessibility features of PDF.

      I don't really know much about performance of PDFs (I'd imagine it varies a lot by PDF reader implementation, much like browser rendering engines), caching, and so on, so I can't really play devil's advocate well there. Given the post itself refutes some of your claims, would you care to back them up?

      4 votes
      1. [39]
        stu2b50
        Link Parent
        Not OP, but at one point I wanted to do conversions of PDFs to ePub, and it was then that I discovered the eldritch horror that is extracting text in a reliable way from the PDF spec. It is an...

        Not OP, but at one point I wanted to do conversions of PDFs to ePub, and it was then that I discovered the eldritch horror that is extracting text in a reliable way from the PDF spec. It is an absolute mess; if you have a "nice" document, i.e when you save as PDF in word or something, it usually works... okay, but it's very easy for a plain looking PDF to be have no simple way to access its text.

        PDF in general is an eldritch abomination of a spec.

        21 votes
        1. [6]
          Octofox
          Link Parent
          PDF is a printing format, not a machine readable format. If you use PDF for anything other than storing data in a nice way to be printed, you are using it wrong. A website as a PDF is about as...

          PDF is a printing format, not a machine readable format. If you use PDF for anything other than storing data in a nice way to be printed, you are using it wrong.

          A website as a PDF is about as misused as it gets.

          17 votes
          1. [5]
            onyxleopard
            Link Parent
            I'm not sure where you're getting this idea from. PDFs need never be printed if the consumer just wishes to view or edit them electronically.

            I'm not sure where you're getting this idea from. PDFs need never be printed if the consumer just wishes to view or edit them electronically.

            1. [4]
              Octofox
              Link Parent
              The PDF format is an adaptation of the format used to print pages. It's primary purpose is to store things in a way that reliably prints. Sure you don't have to print it, but that is what it was...

              The PDF format is an adaptation of the format used to print pages. It's primary purpose is to store things in a way that reliably prints. Sure you don't have to print it, but that is what it was designed for. If you just wanted a text document you would be better off with basic html or an epub (which is html). PDF will never work well on mobile because it has no support for reflowing text or other elements.

              18 votes
              1. [2]
                Greg
                Link Parent
                Seems like ePub is actually a really nice tool for the job in this context - it fulfils a lot of the more specific requirements the author was talking about: sensible subset of HTML without having...

                Seems like ePub is actually a really nice tool for the job in this context - it fulfils a lot of the more specific requirements the author was talking about: sensible subset of HTML without having to create a new standard, single self-contained file that can be hashed and verified, good existing open source tooling, etc. but then it also allows reflowing, user control over styling, and generally much improved accessibility over PDF.

                11 votes
                1. Adys
                  Link Parent
                  Epub is a wonderful format. Unfortunately browsers don't support it. It's a real shame.

                  Epub is a wonderful format. Unfortunately browsers don't support it. It's a real shame.

                  6 votes
              2. onyxleopard
                Link Parent
                Well, my personal experience directly contradicts your assertion. I read PDFs just fine on my phone (assuming they're just reasonably typeset textual content). This is how academic papers are...

                PDF will never work well on mobile because it has no support for reflowing text or other elements.

                Well, my personal experience directly contradicts your assertion. I read PDFs just fine on my phone (assuming they're just reasonably typeset textual content). This is how academic papers are usually still published. ePub can be better than PDF, but so can plain old HTML (for which I don't need a specialized .epub reader software). The whole premise of the radical reaction of the OP is that plain old, self-contained HTML is basically non-existent in this day and age. Hell, just give me UTF-8 encoded plain-text and that would be preferable to most web pages!

                3 votes
        2. [32]
          onyxleopard
          Link Parent
          But is it less abominable than the specifications for all the HTML/JS used in an average web page? Like, could you reliably convert a website to ePub? (As someone who has scraped content off a lot...

          PDF in general is an eldritch abomination of a spec.

          But is it less abominable than the specifications for all the HTML/JS used in an average web page?

          Like, could you reliably convert a website to ePub? (As someone who has scraped content off a lot of web pages, I'd argue it's equally abominable.)

          2 votes
          1. [31]
            stu2b50
            Link Parent
            Yes, I would say. HTML by itself is fairly predictable, especially if you limit yourself to strictly formatted HTML. If you try and allow for malformed HTML it gets trickier, but in general it's...

            Yes, I would say. HTML by itself is fairly predictable, especially if you limit yourself to strictly formatted HTML. If you try and allow for malformed HTML it gets trickier, but in general it's pretty easy to work with. Well formed HTML is just a tree. It's a thing of beauty compared to PDF parsing.

            In terms of JS, the deeper into shadow DOM you get the wackier it gets, but apart from canvas apps, at least in the end it does have to present a DOM to rendered, so you can use selenium or something.

            Out of all the document formats that exist today, PDF has to be one of the worst to work from a programming point of view. The good parts is compatibility, and strict layouts, but it's definitely god awful to do anything with than view.

            15 votes
            1. joplin
              Link Parent
              100% agree. It’s unfortunate because it doesn’t have to be that way. You can create PDFs where the text is easily extractable, but because some designer somewhere felt that the spacing between an...

              100% agree. It’s unfortunate because it doesn’t have to be that way. You can create PDFs where the text is easily extractable, but because some designer somewhere felt that the spacing between an “i” and an “n” in Helvetica was 3 twips too generous, we can’t have nice things.

              6 votes
            2. [29]
              onyxleopard
              Link Parent
              I've used Selenium a bunch because it is the only sane way to access some content that is loaded dynamically. But you're basically saying we need a full on browser driver just to extract the...

              so you can use selenium or something.

              I've used Selenium a bunch because it is the only sane way to access some content that is loaded dynamically. But you're basically saying we need a full on browser driver just to extract the content from a web page (and we're still dependent on the browser's JS engine etc.). If there were such a PDF viewer driver software then I'm sure it would make scraping content from PDFs similarly easy. (I'd argue even Selenium is not that great—it wouldn't even exist but for the poor souls who need to test their websites on different browsers because implementing a standards-compliant browser is effectively impossible.)

              The good parts is compatibility, and strict layouts, but it's definitely god awful to do anything with than view.

              Right, but that's really all it was ever intended to be—a portable document format.

              1 vote
              1. [28]
                stu2b50
                Link Parent
                I mean we're pretty far off topic. Circling back, First, I concur with the OP that text indexing on PDFs is no doubt significantly worse than on other document formats, including HTML documents....

                I mean we're pretty far off topic. Circling back,

                First, I concur with the OP that text indexing on PDFs is no doubt significantly worse than on other document formats, including HTML documents. The format simply is not made for programmatic traversal - not to mention being a complex beast over all the ages of iteration. Any student who has ever CTRL+F'd a PDF textbook can tell you that much.

                Secondly, why is the comparison even between a dynamic website with javascript and PDFs? That is the dichotomy the parent website made, but it's not one that needs to exist in reality. I presume it's a PDF simply for shock value.

                You could, y'know, have a HTML website without javascript. Then you have a nice, clean tree based document format that's easy to parse, people on mobile can look at it without having to horziontally scroll. That's another way to "protest" against the modern web.

                Or go the way of Gemini and use a markdown based document spec.

                10 votes
                1. [27]
                  onyxleopard
                  Link Parent
                  I'd love it if more websites were published as such. But, that's not the world we live in. I think PDFs are a bit radical, too, but I'm still skeptical that carefully published PDFs are any worse...

                  You could, y'know, have a HTML website without javascript.

                  I'd love it if more websites were published as such. But, that's not the world we live in.

                  I think PDFs are a bit radical, too, but I'm still skeptical that carefully published PDFs are any worse than carefully published HTML documents. (You can do some pretty sinful stuff with HTML if you either don't know the semantics of the tags or you intentionally defy their intended uses.)

                  Regardless of the document format, for programmatic access, I'd absolutely prefer to work with source content, rather than the published content.

                  1 vote
                  1. [26]
                    stu2b50
                    Link Parent
                    That's really orthogonal to the point, though. This thread has a lot of pushback against PDF, and PDF as a "better" format than other easily available, well defined, more modern, easier to work...

                    I'd love it if more websites were published as such. But, that's not the world we live in.

                    That's really orthogonal to the point, though. This thread has a lot of pushback against PDF, and PDF as a "better" format than other easily available, well defined, more modern, easier to work with formats - plain HTML just being an obvious one. Not to mention one more amenable to different screen resolutions - this page is not fun to go on on a phone.

                    Whether or not the modern web, clearly it doesn't, is full of plain HTML is completely off in some other island of discussion. Basically, the website creator can just as well protest against the tyranny of javascript with plain HTML - it would, in fact, be a much better experience in multiple ways than the PDF document he has now.

                    I'm not skeptical. An XML based document format is so much easier to work with than PDF. Not even close.

                    9 votes
                    1. [10]
                      onyxleopard
                      Link Parent
                      It's totally fine on my iPhone, but I admit that's due to Apple having a really good PDF viewer built in to Safari. I think it depends on your perspective. If you are receiving a document as a...

                      this page is not fun to go on on a phone.

                      It's totally fine on my iPhone, but I admit that's due to Apple having a really good PDF viewer built in to Safari.

                      An XML based document format is so much easier to work with than PDF.

                      I think it depends on your perspective. If you are receiving a document as a human and all you want to do is consume the contents as the author intended, PDF is basically still the only option. Outside of web pages, you don't see people publishing to .htm files and sharing them (except maybe in email land, but that is another special circle of hell). I don't think you're actually advocating that .docx is superior to PDF, but as you worded it, that would be a claim you're making? If so, I wholeheartedly disagree.

                      2 votes
                      1. [9]
                        stu2b50
                        Link Parent
                        Is it? I also opened it on an iPhone. The problem is the lack of dynamic layout - not a con per se, sometimes the precise layout of a page is important, but clearly suboptimal for displaying a...

                        It's totally fine on my iPhone, but I admit that's due to Apple having a really good PDF viewer built in to Safari.

                        Is it? I also opened it on an iPhone. The problem is the lack of dynamic layout - not a con per se, sometimes the precise layout of a page is important, but clearly suboptimal for displaying a page of text to a phone screen. The text is tiny, and the margins take up too much space - you have to manually zoom in.

                        Outside of web pages, you don't see people publishing to .htm files and sharing them

                        You actually do, though. Perhaps not raw, but there's a whole host of "html/xhtml in a bag" formats. Docx is one of them; ePub is another.

                        Again, I feel like you're just digressing here. No, I'm not saying HTML is an objectively better format in every way to PDF. I'm saying, as per the topic of the website, it's a vastly superior document format for websites than PDFs. More indexable, simpler to work with programmatically, supports user styling, strict layouts are mostly a con in this medium.

                        I feel like I'm wrangling cats - fundamentally, is PDF a good format for document websites? I'll say no, for the myriad of reasons above. There are many better formats, HTML included.

                        8 votes
                        1. [8]
                          onyxleopard
                          Link Parent
                          Pro-tip: Double-tap any paragraph and iOS Safari's PDF viewer will auto-zoom so the text content's width fills the screen. (Double-tap again to zoom back out to 100% zoom level.) From a human...

                          Is it? I also opened it on an iPhone. The problem is the lack of dynamic layout - not a con per se, sometimes the precise layout of a page is important, but clearly suboptimal for displaying a page of text to a phone screen. The text is tiny, and the margins take up too much space - you have to manually zoom in.

                          Pro-tip: Double-tap any paragraph and iOS Safari's PDF viewer will auto-zoom so the text content's width fills the screen. (Double-tap again to zoom back out to 100% zoom level.)

                          is PDF a good format for document websites?

                          From a human reader's perspective, most of the time, I actually do think so, yes. If all you have, content-wise, is text, hyperlinks, and images (which, if you get rid of all the boilerplate and cruft is what most websites' content comprises), I do really think PDF would be great! I agree that thinking about documents only from the human reader's perspective is not the only perspective to consider, but I think it's a perspective that often get's forgotten in the context of technologists who have competing concerns with the human audience.

                          2 votes
                          1. [7]
                            stu2b50
                            Link Parent
                            This has circled around a few too many times, so I'll leave this with my thoughts on PDF from a human reader's perspective. I think it's really bad. My first contribution to this discussion was...

                            This has circled around a few too many times, so I'll leave this with my thoughts on PDF from a human reader's perspective.

                            I think it's really bad. My first contribution to this discussion was about an experience converting PDFs to ePUBs - there was a reason for that. Apart from some specific niches, I find reading on PDFs to be much inferior to ePUB (which, in the end, is a bag of xhtml files with metadata).

                            With ePubs, I can

                            • Change the fonts

                            • Change the font size

                            • Text reflow is seamless, perfect, does not require anything from the creator

                            • Can change the text color trivially

                            • Can change the background color trivially

                            With PDFs, you are really at the mercy of whoever created the PDF, and what dimensions they designed it for. Certainly PDFs have their niche, when you want things to be look strictly as the author intended. But in many cases, for text documents of text, hyperlinks, and images, I couldn't be arsed what the author intended - I want to read in a layout that is comfortable to me, on the device I'm currently on.

                            That's applies to ePub vs PDF, and that applies to HTML/CommonMark vs PDF for a web format.

                            13 votes
                            1. [6]
                              vord
                              Link Parent
                              This, so, so much. This is what I despise so much about the style-driven web. That somehow, for text, hyperlinks, and images, the author's typesetting preferences are more important than the user...

                              I couldn't be arsed what the author intended - I want to read in a layout that is comfortable to me, on the device I'm currently on.

                              This, so, so much. This is what I despise so much about the style-driven web. That somehow, for text, hyperlinks, and images, the author's typesetting preferences are more important than the user experience.

                              I don't want 100 websites to have 100 styles. I want them all to have a consistent, readable style. Like a book.

                              8 votes
                              1. [2]
                                Cycloneblaze
                                Link Parent
                                I get where this is coming from, but I really disagree with it. Not on an objective level or anything, I just like that webpages look different to each other and websites each have their own...

                                I get where this is coming from, but I really disagree with it. Not on an objective level or anything, I just like that webpages look different to each other and websites each have their own identity. I like the level of customisability that CSS brings. I've enjoyed messing with it to make things that I think look nice (which I realise it kind of a luxury of doing it as a creative pursuit and not, like, to sell something). I like that HTML isn't bound to the shape of an A4 page - or pagination at all. I use reader mode as much as the next person but if every website looked like a book, I'd be very bored.

                                I actually think the application of reading long form text (or, well, anything over a few paragraphs) is a minority of what's on the web, and it's reductionist to say any website can be compressed (metaphorically) into that format. Also, hypertext was specifically designed to include more than just raw text, so I kind of can't believe that people want to retreat from that. Some websites are cluttered messes, but a lot aren't.

                                8 votes
                                1. vord
                                  (edited )
                                  Link Parent
                                  If you step away from Youtube, Imgur, Facebook, and their ilk, you'll find little else. Search engines function because the web is mostly text. Sure there are pretty colors, and embedded images...

                                  I actually think the application of reading long form text (or, well, anything over a few paragraphs) is a minority of what's on the web

                                  If you step away from Youtube, Imgur, Facebook, and their ilk, you'll find little else. Search engines function because the web is mostly text.

                                  Sure there are pretty colors, and embedded images and videos now, but hour for hour spent there's more text on the web than anything else. English Wikipedia alone, compressed beyond belief, stripped of images, pushes well over 25GB. Of pure text. Here on tildes text-based content generates more conversation (also text) than anything else.

                                  Also, hypertext was specifically designed to include more than just raw text

                                  Not really. It's in the name. It was designed as a way of linking pages together, specifically by turning text into a link to more text. It predates video and inline images.

                                  In the beginning, Google's primary differentiator was that they factored in how different websites, sharing some common terminology, linked to each other. It was a gamechanger for finding relevant content, well at least until it began getting exploited for more ads and malware.

                                  2 votes
                              2. [3]
                                petrichor
                                Link Parent
                                Out of curiosity, do you use Reader View? If so, does it work well for you?

                                Out of curiosity, do you use Reader View? If so, does it work well for you?

                                2 votes
                                1. spctrvl
                                  Link Parent
                                  I'm not the parent poster, but I feel similarly, and I love reader view.

                                  I'm not the parent poster, but I feel similarly, and I love reader view.

                                  3 votes
                                2. vord
                                  Link Parent
                                  I use reader view for pretty much anything over 500 words. Especially on the phone for dark mode. That's what I want for the majority of the web... raw text that I can make visually appealing for...

                                  I use reader view for pretty much anything over 500 words. Especially on the phone for dark mode.

                                  That's what I want for the majority of the web... raw text that I can make visually appealing for myself, with or without inline images/video.

                                  2 votes
                    2. [15]
                      mrbig
                      Link Parent
                      What does the word orthogonal mean in this context?

                      That's really orthogonal to the point, though

                      What does the word orthogonal mean in this context?

                      1. [14]
                        stu2b50
                        Link Parent
                        It means that the discussion item brought up is unrelated. If you're curious on the origin, two vectors are considered orthogonal if they have a dot product of 0. You can see the dot product in a...

                        It means that the discussion item brought up is unrelated.

                        If you're curious on the origin, two vectors are considered orthogonal if they have a dot product of 0. You can see the dot product in a cartesian context as the projection of one vector onto another - so if they have a dot product of 0, that implies that these two vectors share no commonality.

                        Probably easier to see in 2D. These two vectors are parallel - they would have the maximal dot product (of their magnitudes multiplied together)

                        ----->
                        -------->
                        

                        These vectors are orthogonal. They have a dot product of 0.

                        ^
                        |
                        |
                        |
                        --------->
                        
                        5 votes
                        1. [13]
                          mrbig
                          (edited )
                          Link Parent
                          I see. So it means "unrelated". This was driving me crazy. Thanks. Any advantage in avoiding more popular terms which do not require familiarity with geometry?

                          I see. So it means "unrelated". This was driving me crazy. Thanks.

                          Any advantage in avoiding more popular terms which do not require familiarity with geometry?

                          1 vote
                          1. [4]
                            spctrvl
                            Link Parent
                            Using orthogonal in that context is a little formal, but not at all uncommon. Certainly I don't expect most writers I've seen use the phrase to have first encountered it in a math course.

                            Using orthogonal in that context is a little formal, but not at all uncommon. Certainly I don't expect most writers I've seen use the phrase to have first encountered it in a math course.

                            10 votes
                            1. [3]
                              mrbig
                              Link Parent
                              Yeah... if the interlocutor doesn't know where the term comes from, though, I don't see an advantage...

                              Yeah... if the interlocutor doesn't know where the term comes from, though, I don't see an advantage...

                              1. [2]
                                spctrvl
                                Link Parent
                                You don't need to know where it comes from to know what it means.

                                You don't need to know where it comes from to know what it means.

                                4 votes
                                1. mrbig
                                  Link Parent
                                  Sure. I just wonder if it would be advantageous to use it in that case.

                                  Sure. I just wonder if it would be advantageous to use it in that case.

                          2. [6]
                            Weldawadyathink
                            Link Parent
                            I like orthogonal since it has more nuance than unrelated. A more accurate definition might be “related to the discussion, but going in a different direction. Leaving the main focus.”

                            I like orthogonal since it has more nuance than unrelated. A more accurate definition might be “related to the discussion, but going in a different direction. Leaving the main focus.”

                            6 votes
                            1. [5]
                              mrbig
                              Link Parent
                              That makes sense. I can see how that might be useful. It will only work if the interlocutor understands the concept though. For everyone else it is quite obscure, I think.

                              That makes sense. I can see how that might be useful. It will only work if the interlocutor understands the concept though. For everyone else it is quite obscure, I think.

                              1. [4]
                                Greg
                                Link Parent
                                Equally, I had to look up the word interlocutor and make sure I was understanding it correctly! I'd say both can just be chalked up as necessary perils of using a wide vocabulary.

                                Equally, I had to look up the word interlocutor and make sure I was understanding it correctly! I'd say both can just be chalked up as necessary perils of using a wide vocabulary.

                                3 votes
                                1. [3]
                                  mrbig
                                  (edited )
                                  Link Parent
                                  I use this word because I don't know any better since this is not my first language, really. It is actually a direct translation from Portuguese, I don't think it is ideal. Something more well...

                                  I use this word because I don't know any better since this is not my first language, really. It is actually a direct translation from Portuguese, I don't think it is ideal. Something more well known is preferable, I think. I'm open for suggestions!

                                  1. [2]
                                    Greg
                                    Link Parent
                                    It honestly seems a pretty ideal word for the context! If it were me I'd probably just have said "person", but that's definitely less precise. Either way, I hope my tone came across as I meant it:...

                                    It honestly seems a pretty ideal word for the context! If it were me I'd probably just have said "person", but that's definitely less precise. Either way, I hope my tone came across as I meant it: not a criticism at all, just faint amusement at an uncommon term in a discussion of uncommon term.

                                    1. mrbig
                                      (edited )
                                      Link Parent
                                      No problem I actually gave some thought to it that is why I answered. In Portuguese "interlocutor" is not that unknown, more people know about it it seems.

                                      No problem I actually gave some thought to it that is why I answered. In Portuguese "interlocutor" is not that unknown, more people know about it it seems.

                          3. [2]
                            stu2b50
                            Link Parent
                            There's no particular reason. I don't think that much about word choice when posting in online forums-like places - after all, it's not that serious of a medium. Mostly just stream of mind. I use...

                            There's no particular reason. I don't think that much about word choice when posting in online forums-like places - after all, it's not that serious of a medium. Mostly just stream of mind. I use orthogonal a fair amount as a word in that context so it just comes up.

                            5 votes
                            1. mrbig
                              Link Parent
                              Yeah, I've seen this word in other places, so that is why I'm asking. I can see how that might make more sense on Hacker News or Stack Overflow. Tildes is pretty STEM heavy but is not exactly a...

                              Yeah, I've seen this word in other places, so that is why I'm asking. I can see how that might make more sense on Hacker News or Stack Overflow. Tildes is pretty STEM heavy but is not exactly a technical forum, so it's a friendlier place to pose the question.

      2. userexec
        Link Parent
        Just to respond to your accessibility question, there's a whole industry that does "PDF remediation" for government and higher education to help re-open parts of the web for people with...

        Just to respond to your accessibility question, there's a whole industry that does "PDF remediation" for government and higher education to help re-open parts of the web for people with disabilities and avoid ADA lawsuits in the US. You can structure a PDF in such a way that a screen reader can parse it nicely, but it takes awareness and some skill in document creation. Far too many PDFs have structural issues that either leave them a garbled mess for those with visual impairments, or just exclude entire blocks of content.

        An individual, skilled author could possibly handle making a PDF-based site that still delivered its content to all audiences, but as soon as you move to an organization level and have unskilled authors creating poorly-structured forms and flyers and schedules as sole sources of information, PDF starts excluding people with disabilities and becomes a legal nightmare.

        Now this is not to say that you can't make an HTML site an accessibility nightmare--of course you can. Some JS developers (and I say this as a primarily JS developer) seem like they're on some holy crusade to ensure nobody using assistive devices can ever use the web properly again. But what you can do with HTML that you can't with PDF is force non-technical authors to express themselves through accessibly-coded content types inside of a Content Management System. The way websites separate concerns by having structure and content separate from style and layout makes them simple(r) to code in a predictable and navigable way for blind users, for example, and with a CMS you can actually scale that reliably across a whole organization.

        8 votes
  2. [18]
    post_below
    Link
    But... PDF? He's angry at user hostile trends on the web and he picks PDF? Compared to HTML, PDFs are: Larger (a lot larger) for the same content. Not as well indexed (despite his claim). Some...

    But... PDF? He's angry at user hostile trends on the web and he picks PDF?

    Compared to HTML, PDFs are:

    • Larger (a lot larger) for the same content.
    • Not as well indexed (despite his claim). Some text in some PDFs gets indexed, but it will never be as well indexed (read discoverable) as HTML.
    • Not dynamic. Though you could no doubt come up with workarounds, PDFs don't make interaction easy for anyone.
    • Hard to integrate server side scripting and databases. You're going to have huge overhead given even medium traffic levels if you want to dynamically generate anything
    • Not mobile friendly. Unfriendly enough to be called hostile.
    • Not as accessible (for a variety of demographics)

    I get his point, the web is a mess. Some of it is annoying, a lot of it is profit driven. I agree with a lot of what he's getting at. I just don't see why going from incidentally user hostile to actively user hostile makes the statement he wants to make... except maybe that the low hanging controversy gets people talking.

    Side note: The web has always been a mess. It's just a higher volume mess now. People have been complaining about standards and bloat since the beginning. Best just to embrace the chaos.

    24 votes
    1. [14]
      babypuncher
      Link Parent
      The web was arguably in a much worse place 20 years ago. Many websites were reliant on proprietary browser plugins (Flash), Java applets, and even browser/OS exclusive features like ActiveX.

      The web was arguably in a much worse place 20 years ago. Many websites were reliant on proprietary browser plugins (Flash), Java applets, and even browser/OS exclusive features like ActiveX.

      13 votes
      1. [9]
        post_below
        Link Parent
        Add to that IE's destructive dominance for a lot of years. MS' actively anti-standards philosophy and attempts to make everything proprietary were a threat to the core concepts of the web....

        Add to that IE's destructive dominance for a lot of years. MS' actively anti-standards philosophy and attempts to make everything proprietary were a threat to the core concepts of the web.

        Fortunately Chrome derailed that train. Once Google gets bloated and slow and incompetent on a level comparable with the MS of those days, maybe their train will get derailed too.

        The beauty of the web, so far, is that giant, organic messes are really good at evolving.

        8 votes
        1. [8]
          vord
          Link Parent
          Firefox derailed that train. :) Chrome started a new one. And it's only as open insofar that it is independant of Google and isn't dictating new standards.

          Fortunately Chrome derailed that train.

          Firefox derailed that train. :) Chrome started a new one. And it's only as open insofar that it is independant of Google and isn't dictating new standards.

          7 votes
          1. [7]
            post_below
            Link Parent
            Not sure what you mean, Firefox share of the browser market never exceeded IE. It was great that it took some of the market though. You made me wonder if I was misremembering so I found this:...

            Not sure what you mean, Firefox share of the browser market never exceeded IE. It was great that it took some of the market though.

            You made me wonder if I was misremembering so I found this: browser market share history (YouTube)

            4 votes
            1. Eric_the_Cerise
              Link Parent
              Firefox has been my primary browser since it was called Phoenix. I also worked as a web developer before and during the (second) browser wars. I don't buy that video. I agree that FF has never,...

              Firefox has been my primary browser since it was called Phoenix. I also worked as a web developer before and during the (second) browser wars.

              I don't buy that video. I agree that FF has never, ever been #1 in market share (never even close), but for a solid 2-3 years around 2010, they were pretty consistently above 30%.

              But that's not even the real point. Once FF started to get above 10% (easily 2+ years before Chrome's debut), that was when we web developers had to start taking it into account for web development. It meant, effectively, writing two web pages in one -- writing a lot of "If IE, do it like this; else if FF, do it the right way" ... and that was when Microsoft started getting a lot of grief over all of the "special" rules over how IE was rendering pages.

              5 votes
            2. [5]
              vord
              (edited )
              Link Parent
              They never surpassed no. That video is roughly in line with my memory. But Chrome was no accident. Firefox's had an absolute meteoric rise inside of 4 years (makes sense, was born out of...

              They never surpassed no. That video is roughly in line with my memory. But Chrome was no accident.

              Firefox's had an absolute meteoric rise inside of 4 years (makes sense, was born out of Netscape's ashes, hence original names Phoenix -> Firebird). A no-name open source project yanked away 10% marketshare from the most dominant tech company in the world

              They correctly figured that with their household brand (given that Googling was already a verb) they could release their own browser and take the lead.

              Things are changing again. Google isn't quite as beloved as circa 2006 and Firefox is looking better and better these days.

              3 votes
              1. [4]
                hungariantoast
                Link Parent
                Can we, for just a moment, circle back to the article at hand and lament that this will never happen with the web again? It's too large. It grows too fast. It seems out of reach for all but the...

                A no-name open source project yanked away 10% marketshare from the most dominant tech company in the world

                Can we, for just a moment, circle back to the article at hand and lament that this will never happen with the web again?

                It's too large. It grows too fast. It seems out of reach for all but the largest of organizations.

                That is... exceedingly sad and bleak, when considering the popularity and importance of the web.

                2 votes
                1. [2]
                  Wes
                  Link Parent
                  All the same, both Chromium and Firefox are open-source software with proven forks. Compare it to something like desktop operating systems and the web starts looking pretty healthy in comparison.

                  All the same, both Chromium and Firefox are open-source software with proven forks. Compare it to something like desktop operating systems and the web starts looking pretty healthy in comparison.

                  3 votes
                  1. hungariantoast
                    Link Parent
                    I think there is a greater quantity and quality of innovative takes on desktop operating systems than there are on web browsers 🤷 I'm not saying the "desktop operating system ecosystem" is doing...

                    I think there is a greater quantity and quality of innovative takes on desktop operating systems than there are on web browsers 🤷

                    I'm not saying the "desktop operating system ecosystem" is doing particularly better than the "web browser ecosystem", I don't think either area of computing is doing nearly as well as it could, but I do think trying to compare them in an effort to make one seem healthier than another isn't really worth doing...

                    2 votes
                2. vord
                  Link Parent
                  So long as we allow the 5ish tech titans to grow ever larger and more dominant over every possible sector? Yea, it'll never happen. But I still believe in anti-trust. Disruptive players breaking...

                  Can we, for just a moment, circle back to the article at hand and lament that this will never happen with the web again?

                  So long as we allow the 5ish tech titans to grow ever larger and more dominant over every possible sector? Yea, it'll never happen.

                  But I still believe in anti-trust. Disruptive players breaking the status quo. We just gotta stop letting the largest companies in the world devour everyone else.

                  No. More. Aquisitions. Hope is not lost.

                  1 vote
      2. [4]
        vord
        Link Parent
        We still have proprietary browser/OS exclusive features. It's called WildVine, and is essentially the main replacement for Flash/Java. Hell, the most widespread use of Flash/Java I remember were...

        We still have proprietary browser/OS exclusive features. It's called WildVine, and is essentially the main replacement for Flash/Java. Hell, the most widespread use of Flash/Java I remember were for games/music/video and fancy nav bars. They likely would have never attained such dominance if HTML had syntax for collapseable and hover menus.

        There's also something to be said for that horrid diversity of addons: Half-decent sites needed to accomodate users without those things. That was more true pre-IE6.

        1 vote
        1. [3]
          babypuncher
          Link Parent
          Widevine is hardly a replacement for Flash/Java/ActiveX. It is a DRM platform for protecting HTML5 video streams and not much else. The purpose of Flash, Java, and ActiveX was to provide a...

          Widevine is hardly a replacement for Flash/Java/ActiveX. It is a DRM platform for protecting HTML5 video streams and not much else. The purpose of Flash, Java, and ActiveX was to provide a framework for developers to more easily write interactive graphical applications free of the limitations of HTML and performance implications of JavaScript. It has been replaced by expanded HTML and CSS features, as well as faster JavaScript engines and new JavaScript frameworks that make developing rich apps using web standards much easier.

          6 votes
          1. [2]
            vord
            Link Parent
            As I rememberm the three main buckets of thise interactive applications were: Video (hence why Youtube and Netflix used it) Basic UI elements like collapseable menus. This has gotten better, but...

            As I rememberm the three main buckets of thise interactive applications were:

            • Video (hence why Youtube and Netflix used it)
            • Basic UI elements like collapseable menus. This has gotten better, but was likely going to happen regardless....tables were a nightmare.
            • Apps, namely games and business software.

            Video still uses OS specific proprietary bits. While I'll admit that JS UI elements are better than the Flash ones, they have also utterly broken the ability to use most websites without JS. A proper HTML/CSS only solution for UI elements would have been far preferable. I kinda count this as a wash.

            Most apps that I've used online would be far better either as a standalone apps or static HTML. This Javascript requirement for browsers has basically just reinvented the JVM, but with Javascript now.

            I want the web to be functional again without mandatory arbitrary code execution. I guess that ship has sailed though. My best hope is mandating API availability for anything a user can see/access for automation purposes, from which a better web could be born.

            1. babypuncher
              Link Parent
              I think the move from a proprietary solution to a standards-based solution is absolutely a net positive.

              A proper HTML/CSS only solution for UI elements would have been far preferable. I kinda count this as a wash.

              I think the move from a proprietary solution to a standards-based solution is absolutely a net positive.

              1 vote
    2. [3]
      hungariantoast
      (edited )
      Link Parent
      The author describes why they aren't just sticking to a subset of HTML. So, if not HTML or PDF, then what else could they have used right now?

      The author describes why they aren't just sticking to a subset of HTML. So, if not HTML or PDF, then what else could they have used?

      The author describes why they aren't just sticking to a subset of HTML. So, if not HTML or PDF, then what else could they have used right now?

      2 votes
      1. Greg
        Link Parent
        I'm not as anti-PDF as some of the comments here - I actually find it quite an interesting experiment, and it's certainly spurred a discussion, which I imagine was partly the point - but I think...

        I'm not as anti-PDF as some of the comments here - I actually find it quite an interesting experiment, and it's certainly spurred a discussion, which I imagine was partly the point - but I think ePub is probably a better fit. I mentioned further up that it seems to tick a lot of the boxes that the author was asking for, while also doing a better job of the things that a lot of readers here are worried about.

        The biggest downside is probably the lack of native browser support, and there are philosophical questions to ask about how much of the HTML spec's complexity is included by references in the ePub spec, but for me it still would've been the better choice.

        8 votes
      2. post_below
        Link Parent
        The reason they gave is that using a subset of HTML would leave you tempted to use more of the available features. So then I guess the selling point is limits. Maybe printed fliers as an alternative?

        The reason they gave is that using a subset of HTML would leave you tempted to use more of the available features. So then I guess the selling point is limits.

        Maybe printed fliers as an alternative?

        2 votes
  3. [4]
    mrbig
    (edited )
    Link
    Yeah... the PDF reader that comes with my Android phone does not reflow, so I'll go out on a limb here and say that most people will have a bad reading experience on mobile. It's like a book for...

    Yeah... the PDF reader that comes with my Android phone does not reflow, so I'll go out on a limb here and say that most people will have a bad reading experience on mobile. It's like a book for ants. And it is not like reflow always works anyway.

    20 votes
    1. [3]
      Octofox
      Link Parent
      Reflowing PDFs is almost impossible. The format was made to exactly pixel perfect layout a document ready for printing. Adobe's solution to reflowing is to have AI process the document and try to...

      Reflowing PDFs is almost impossible. The format was made to exactly pixel perfect layout a document ready for printing. Adobe's solution to reflowing is to have AI process the document and try to come up with a more mobile friendly version.

      18 votes
      1. Thra11
        Link Parent
        Yes. The problem is that semantic information is lost when the text is laid out (This is one reason nobody uses pdf as a "working" format, only as something they export or convert to when they've...

        Reflowing PDFs is almost impossible.

        Yes. The problem is that semantic information is lost when the text is laid out (This is one reason nobody uses pdf as a "working" format, only as something they export or convert to when they've finished. The original document will typically be Latex or Word). Software attempting to reflow a PDF can guess the semantics from the layout correctly 90% of the time, but every so often there will be a situation where a sentence happens to end at the end of a page, and the reflow software can't tell whether the sentence on the next page is a continuation of the current paragraph or the start of a new paragraph. Similarly, when words are hyphenated and split across a newline in the pdf, it's not always obvious whether they were hyphenated before they landed on a line boundary. You can guess some of them by checking whether the two halves are words in their own right and whether removing the hyphen yields a known word (Yay, now your reflowing software needs to include a dictionary for every single language!), but there are plenty of words which can be written either as a single word or a hyphenated pair.

        Page numbers and repeated chapter headings on every page can also cause problems.

        8 votes
      2. hook
        Link Parent
        I have so far seen one serviceable-to-good PDF reflow and that was on my sadly crushed iRiver eInk reader. IIRC the PDF reader on that thing was made by Adobe.

        I have so far seen one serviceable-to-good PDF reflow and that was on my sadly crushed iRiver eInk reader. IIRC the PDF reader on that thing was made by Adobe.

        4 votes
  4. [14]
    onyxleopard
    Link
    I loved this quote from Drew DeVault. It captures my feelings towards the modern web and its near limitless baggage:

    I loved this quote from Drew DeVault. It captures my feelings towards the modern web and its near limitless baggage:

    “The total word count of the W3C specification catalogue is 114 million words at the time of writing. If you added the combined word counts of the C11, C++17, UEFI, USB 3.2, and POSIX specifications, all 8,754 published RFCs, and the combined word counts of everything on Wikipedia’s list of longest novels, you would be 12 million words short of the W3C specifications.
    I conclude that it is impossible to build a new web browser. The complexity of the web is obscene. The creation of a new web browser would be comparable in effort to the Apollo program or the Manhattan project.
    It is impossible to:
    • Implement the web correctly
    • Implement the web securely
    • Implement the web at all

    12 votes
    1. [13]
      soks_n_sandals
      Link Parent
      Is there a proposed solution to reduce the complexity of web standards?

      Is there a proposed solution to reduce the complexity of web standards?

      3 votes
      1. [8]
        Octofox
        Link Parent
        No because the complexity exists because people wanted it all. The world desired a way to quickly distribute and run applications from untrusted authors and the web was a good way to do that....

        No because the complexity exists because people wanted it all. The world desired a way to quickly distribute and run applications from untrusted authors and the web was a good way to do that. These days the browser is just a sandboxed application runner.

        10 votes
        1. [7]
          Akir
          Link Parent
          You mean software developers. People just want to browse the web. Developers are the people who decided to change what the web is. The only people I see who are excited about using the web as an...

          people

          You mean software developers.

          People just want to browse the web. Developers are the people who decided to change what the web is.

          The only people I see who are excited about using the web as an application framework are developers. When average people hear about a web version of a type of application, they might say "Cool" but I don't think I've ever actually heard anyone talk about it as being a positive feature.

          Granted, people are now accustomed to the web being an application framework, so now there would be a backlash if they were not able to use the web in that way anymore.

          4 votes
          1. [4]
            Octofox
            Link Parent
            It’s the same thing. People say “I want the site to be able to do this” or “I’m using the other site because they do this and yours doesn’t.”

            It’s the same thing. People say “I want the site to be able to do this” or “I’m using the other site because they do this and yours doesn’t.”

            5 votes
            1. [3]
              Akir
              Link Parent
              It is not the same thing. It only begins to become functionally simelar if you are providing a service, in which case your site is an application. The thing about websites that the people in...

              It is not the same thing. It only begins to become functionally simelar if you are providing a service, in which case your site is an application.

              The thing about websites that the people in charge of developing them tend to forget is that the greatest value of any given page is typically in the content. People don't buy from online stores because the website is fun, do they? Amazon would fold overnight if that were the case. Social networks, news outlets, and just about every other major category of website would completely fall apart without content. Even if you have a website that is purely to fill a business role, it only exists because it serves the content being brought to it.

              1 vote
              1. [2]
                Octofox
                Link Parent
                People want applications in the browser. Google docs was crushing microsoft office so bad that MS put in the monumental effort to get word in the web browser. The average users does not care how...

                People want applications in the browser. Google docs was crushing microsoft office so bad that MS put in the monumental effort to get word in the web browser. The average users does not care how hard it is to build a web browser from scratch. They care about being able to click a link and see a document pop up without having to install software which may not work on their OS.

                6 votes
                1. petrichor
                  Link Parent
                  ...and today I learned that Word works in the web browser.

                  ...and today I learned that Word works in the web browser.

          2. [2]
            stu2b50
            Link Parent
            I don't think laypeople appreciate the web in that direct of a technological way, but I do think they do, though. Webapps are a big jump in "it just works". You don't have to install anything, you...

            I don't think laypeople appreciate the web in that direct of a technological way, but I do think they do, though. Webapps are a big jump in "it just works". You don't have to install anything, you don't have to worry about malware, you just go to the page and it just works. And it works the same on all your devices - no downloads, no installs, no license transfers.

            Like my parents have gone full g-suite, and while they can't derive the why and what of their gains, they do see this "it just works" as a major benefit over their old model of computing.

            5 votes
            1. Akir
              Link Parent
              This is what I find to be the most ironic thing about making the web into an application framework; I think that there's a good class of people who just simply don't want to use them. It's not...

              This is what I find to be the most ironic thing about making the web into an application framework; I think that there's a good class of people who just simply don't want to use them. It's not because they are worse than desktop apps, but because they require a degree of abstraction.

              Do your parents use the web version of all of G-Suite, or do they use the apps? I can tell you that when it comes to my grandmother using her android tablet, she gets very confused when you have to go to a website instead of an app. If she can't find it in her home screen, she's not going to have an easy time remembering how to get to something. And no, this isn't just an "old people can't do this" thing, it's a "people won't bother if they don't have to" thing.

              And the thing is that even our corporate overlords seem to want us to use apps instead of websites. Basically every major social network is full of nag screens whenever you access on a mobile device to use their app, and some take active steps from mobile users accessing the web version. Heck, even Apple and Google are taking steps to get you to use more apps with things like Web Clips.

              "It just works" is important, but being on the web does not automatically mean that it will just work. I've had plenty of issues using web applications. But there's nothing stopping mobile apps or full-blown desktop applications from having that "it just works" quality - in theory, at least.

              3 votes
      2. [4]
        FlippantGod
        Link Parent
        Gopher and/or Gemini? Just for small web type stuff of course.

        Gopher and/or Gemini? Just for small web type stuff of course.

        3 votes
        1. [3]
          Thra11
          Link Parent
          I like the idea of gemini, but I think excluding inline images goes too far, as there's a lot of static content that benefits from images. I'd actually like to see web pages in a format something...

          I like the idea of gemini, but I think excluding inline images goes too far, as there's a lot of static content that benefits from images.

          I'd actually like to see web pages in a format something like a javascript-less epub[1] or FictionBook. Static, self contained, semantic markup, embedded images. Embedding the images means you can't host images on a separate CDN and share them across multiple pages, which I actually view as a feature: Images that appear on multiple web pages probably aren't part of the content, and should be discouraged.

          Interestingly, the original author also proposes a "gemini with images" format: https://www.lab6.com/2 (pdf, sorry). However, they seem to be obsessed with shoving PDF and pagination in there, which I just don't understand. I imagine they must want the viewer to see the document exactly as they intended, but frankly that's deluded. None of my devices are A4 sheets of paper, so your pages aren't going to fit on my pages. As a result, I'm either going to reflow your pages, making them look like shit, or I'm going to get tired of scrolling from side to side to see your text and not read it at all. If you want your content to look (or sound: not everyone consumes text with their eyes) good for everyone, stick to simple semantic markup and let the reader's device lay it out in a manner appropriate to the device and the reader's needs.

          1. I don't see much javascript usage in epub, but it is allowed. I'm assuming it's because people creating epubs know that most of the devices applications used to read them don't really support javascript.

          10 votes
          1. [2]
            Akir
            Link Parent
            I also think that it's strange to fixate on pages. And ironically, this has been somewhat solved in the modern web; you can now use CSS to define page breaks in an HTML document. Granted, nobody...

            I also think that it's strange to fixate on pages. And ironically, this has been somewhat solved in the modern web; you can now use CSS to define page breaks in an HTML document. Granted, nobody seems to actually care about the printability of web pages. The number of published webpages with print-specific CSS is so small as to be an outlier.

            3 votes
            1. Greg
              Link Parent
              It's way off in outlier territory for sure, but I'm glad it exists in the spec because proper print CSS is brilliant if you want to use something like AsciiDoc to generate HTML and PDF in parallel...

              It's way off in outlier territory for sure, but I'm glad it exists in the spec because proper print CSS is brilliant if you want to use something like AsciiDoc to generate HTML and PDF in parallel (which is something I've done on multiple occasions specifically to avoid the kind of disagreements this very thread contains!).

              1 vote
  5. [4]
    dozens
    Link
    Be sure to read the next installments. I was skeptical at first, but they go really neat places. https://www.lab6.com/1 https://www.lab6.com/2

    Be sure to read the next installments. I was skeptical at first, but they go really neat places.

    8 votes
    1. Thra11
      Link Parent
      The first one of those is 16.1MB, despite the fact that it's 99% text. I assume that most of the bulk is the mp3 data, which appears to be completely hidden / inaccessible unless you download the...

      The first one of those is 16.1MB, despite the fact that it's 99% text. I assume that most of the bulk is the mp3 data, which appears to be completely hidden / inaccessible unless you download the pdf and change the file extension to mp3. It's a clever trick, but I don't see a practical use for it.

      It's like one of those instruction manuals that come with an appliance, where the few pages of text are repeated in every language. That makes sense in that context, as it allows them to produce a single type of packaged product and ship it worldwide, without having to care about where it eventually gets sold. However, I've never bought a book that contained the same text in multiple different languages. It would be a massive waste of paper, in the same way that embedding an audio version that I'm not going to listen to in a text document is a massive waste of bandwidth / mobile data.

      12 votes
    2. petrichor
      Link Parent
      That first PDF took fourty-five seconds to load.

      That first PDF took fourty-five seconds to load.

      7 votes
    3. Tygrak
      Link Parent
      That PDF and UTF8 plain text (and MP3 I guess??) polyglot is pretty cool! Of course not really practical but I love it.

      That PDF and UTF8 plain text (and MP3 I guess??) polyglot is pretty cool! Of course not really practical but I love it.

      1 vote
  6. Eric_the_Cerise
    Link
    Fascinating, how much controversy this idea has generated. Tentatively, I'm on the "PDF sucks even harder than HTML/CSS/JS" side, but I also think this guy's main goal was just to generate this...

    Fascinating, how much controversy this idea has generated. Tentatively, I'm on the "PDF sucks even harder than HTML/CSS/JS" side, but I also think this guy's main goal was just to generate this kind of discussion ... that the actual tools for web development have become so bloated and twisted and perverted, that a document formatting tool is now, potentially, a viable alternative.

    6 votes
  7. petrichor
    Link
    So I'm not a big fan of PDFs, for the reasons already laid out by users in this thread (primarily their size and inability to reflow - two points which the author believes are solved, but...

    So I'm not a big fan of PDFs, for the reasons already laid out by users in this thread (primarily their size and inability to reflow - two points which the author believes are solved, but certainly don't appear that way to me).

    The author's nicely-outlined main points are interesting, though. They can really be split into two groups: things PDFs do better than HTML (the first three) and why PDFs aren't worse than HTML (despite, kinda being worse - the latter six). Those first three seem very closely related:

    • PDFs are self-contained and offlineable.
    • PDFs are files.
    • PDFs are decentralized.

    It seems like one of the author's bigger concerns (outside of spec complexity and tracking, which are ever-pervasive) is that saving a webpage doesn't really save a webpage. In my mind, this is better solved by using an extension like SingleFile and lobbying browsers to change their default "save" behavior than taking the drastic jump to running a website on PDFs - but I'm still glad to see this, it's fun to see people trying new things with the open web.

    Latter six
    • PDFs are discoverable.
    • PDFs are independent of browsers – but can still be read easily by most browsers.
    • PDFs and a PDF tool ecosystem exist today.
    • PDF is an open standard.
    • PDFs are part of the web.
    • PDFs are page-oriented. [not comparable to HTML - although it certainly can be with print queries - but also a downside in my book]
    3 votes
  8. guts
    Link
    IPFS could be used as well.

    IPFS could be used as well.

    1 vote