- ~comp

13 votes

Posted December 23, 2019 by unknown user

Topic deleted by author

16 comments

[15]
onyxleopard
December 23, 2019
Link
No, this is backwards. Data that isn’t marked up with standards ratified by the W3C is just data that is not on the web. The web works because of standards. Stuff like schema.org and the semantic...

The web is great. But its real power was in separating content from design.
HTML for your words.
CSS for your layout.
That's a good start. But we can go one deeper. Because content isn't HTML.
Your content probably lives in a database. Whether you're running a personal blog, or an ecommerce platform, the content is actually data.

No, this is backwards. Data that isn’t marked up with standards ratified by the W3C is just data that is not on the web. The web works because of standards. Stuff like schema.org and the semantic web are part of the web, even if they’re relatively esoteric. I just wish the people who designed the standards recognized that nobody is going to use these standards if they aren’t documented and communicated in ways that a wide array of developers will actually implement them correctly.

8 votes
1. [14]
  skybrian
  December 23, 2019
  Link Parent
  It seems like if you're publishing a dataset then you'd use whatever formats seemed most useful to your users and that's quite likely not a markup language. It might be JSON or CSV files? Or maybe...
  
  It seems like if you're publishing a dataset then you'd use whatever formats seemed most useful to your users and that's quite likely not a markup language. It might be JSON or CSV files? Or maybe you could upload the data to BigQuery.
  
  2 votes
  1. [13]
    onyxleopard
    December 23, 2019
    Link Parent
    That’s fine—sharing files over the internet is not a problem. Host them via FTP, make a torrent, or use some other file transfer protocol. Then, link the data files in your web page. Is that...
    
    It might be JSON or CSV files?
    
    That’s fine—sharing files over the internet is not a problem. Host them via FTP, make a torrent, or use some other file transfer protocol. Then, link the data files in your web page. Is that problematic to anyone?
    
    1 vote
    
    [12]
    skybrian
    December 23, 2019
    Link Parent
    I'm not sure how well search engines index them? If you have a bunch of recipes and want them to be searchable, you're probably better off following Google's advice.
    
    I'm not sure how well search engines index them? If you have a bunch of recipes and want them to be searchable, you're probably better off following Google's advice.
    
    2 votes
    
    [11]
    onyxleopard
    December 23, 2019
    Link Parent
    A search engine should only index data on the web. I’ve noticed engines like Google and DuckDuckGo do index and try to provide helpful results for data that falls outside of the web. And site...
    
    A search engine should only index data on the web. I’ve noticed engines like Google and DuckDuckGo do index and try to provide helpful results for data that falls outside of the web. And site creators do seem to put in a lot of effort into optimizing sites for search engines, even if that means eschewing web standards. But if people want data to be indexable and available on the web we need to either get site creators on board with the semantic web, or concede that the web isn’t sufficient, and let Google or whoever has the most resources take over and index everything and shape how data on the internet is presented. I’d prefer the former because if the latter happens, I doubt we’ll get another chance to make the web open.
    
    [10]
    skybrian
    December 23, 2019
    Link Parent
    Is the semantic web different from what schema.org says you should do? Because Google's advice seems to be to to use the schemas documented at schema.org. I'm not sure if there is any difference...
    
    Is the semantic web different from what schema.org says you should do? Because Google's advice seems to be to to use the schemas documented at schema.org.
    
    I'm not sure if there is any difference between getting on board with the semantic web and helping search engines? The idea is to make the data easily usable by anyone.
    
    1 vote
    
    [2]
    dblohm7
    December 23, 2019
    Link Parent
    Google recommends what is best for Google. Sometimes those decisions are the right ones for users or for the web, but other times they are not. Personally I believe that the industry has become...
    
    Because Google's advice seems to be to to use the schemas documented at schema.org.
    
    Google recommends what is best for Google. Sometimes those decisions are the right ones for users or for the web, but other times they are not.
    
    Personally I believe that the industry has become much too deferential to Google engineers (it's the "nobody ever got fired for buying IBM" of modern times) without looking critically at whether or not their rationale for making a decision is objectively the best thing to do.
    
    2 votes
    
    skybrian
    December 24, 2019
    Link Parent
    It works the other way too. A lot of people are sure Google must be doing something nefarious even if they're recommending pretty much the same thing as the rest of the industry. Either way this...
    
    It works the other way too. A lot of people are sure Google must be doing something nefarious even if they're recommending pretty much the same thing as the rest of the industry.
    
    Either way this is lazy thinking. To decide whether some technical advice is good or bad, you need to look at what's actually being recommended rather than taking the shortcut of looking at who is behind it.
    
    Or maybe you don't; we don't need to be experts in everything that comes up in discussion. But when we don't investigate because we don't care that much, it means we don't know whether it's good or bad advice.
    
    1 vote
    
    [7]
    onyxleopard
    December 23, 2019
    Link Parent
    While Google may recommend things like schema.org, they also implicitly recommend things like AMP (because they will rank AMP sites above others, all else being equal) and they also provide...
    
    While Google may recommend things like schema.org, they also implicitly recommend things like AMP (because they will rank AMP sites above others, all else being equal) and they also provide results for local weather data, sports scores, or even Magic: the Gathering Cards. Is the weather part of the web? Are Magic cards?
    
    [6]
    skybrian
    December 24, 2019
    Link Parent
    I'm not sure definitions are all that important? Google went beyond indexing the web a long time ago. But it seems like files available to any web crawler should be considered part of the public...
    
    I'm not sure definitions are all that important? Google went beyond indexing the web a long time ago.
    
    But it seems like files available to any web crawler should be considered part of the public web. Pay sites and websites that restrict crawlers using robots.txt might be a gray area.
    
    1 vote
    
    [5]
    onyxleopard
    December 24, 2019
    Link Parent
    Anything not indexed is the so-called deep web. But it is still the web. Which is fine, but I think it’s important not to confuse the web with whatever Google indexes. I think that gives Google...
    
    Anything not indexed is the so-called deep web. But it is still the web.
    
    Google went beyond indexing the web a long time ago.
    
    Which is fine, but I think it’s important not to confuse the web with whatever Google indexes. I think that gives Google way too much power. It may be a matter of semantics, but it has a way of infecting the way people think.
    
    1 vote
    
    [4]
    skybrian
    December 24, 2019
    Link Parent
    I agree that websites should ideally optimize for web crawlers in general and not Google specifically. But the semantic web seems like a difficult sell because for many people, making things easy...
    
    I agree that websites should ideally optimize for web crawlers in general and not Google specifically.
    
    But the semantic web seems like a difficult sell because for many people, making things easy for web scrapers seems bad. Someone is copying all your data! You need to be rather generous and open-minded to see it as a good thing.
    
    I doubt most Tildes users would be all that happy about our posts and comments being copied in bulk by unknown parties, even though, realistically, we should assume this is happening.
    
    By comparison getting indexed by Google seems fairly benign? You know what it's for.
    
    1 vote
    
    [3]
    onyxleopard
    December 24, 2019
    Link Parent
    Yeah it’s a fundamental trade-off. It reminds me of the maxim: "Information wants to be free." People often take this out of context, though, as when Stewart Brand made this observation he was...
    
    I doubt most Tildes users would be all that happy about our posts and comments being copied in bulk by unknown parties, even though, realistically, we should assume this is happening.
    
    Yeah it’s a fundamental trade-off. It reminds me of the maxim: "Information wants to be free." People often take this out of context, though, as when Stewart Brand made this observation he was actually commenting on a fundamental tension:
    
    "On the one hand information wants to be expensive, because it's so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other."
    
    By comparison getting indexed by Google seems fairly benign? You know what it's for.
    
    Well, actually I don’t think most people realize what the vast quantities of data that Google harvest are used for. Sure, one can see the benefits of being visible to crawlers, but I think it’s only just beginning to dawn on people that giving Google access to your browsing habits, your location data, recordings of your Google Assistant queries, your music preferences, your purchasing history, etc, gives Google incredible amounts of information that each reenforce each others’ value. And you’re not only giving your own data.
    
    An interesting anecdote: A friend recently downloaded all of his personal data that Google had about him, and some of that data included audio files with recordings of him making a Google Assistant query, but it was from his phone’s microphone which picked up my conversation in the background. I don’t use Google Assistant, and I did not give my friend consent to record my conversation, and we were in a private home at the time. I don’t think I was even aware at the time that my friend had made a query to Google. So, even in a private environment, I basically have to assume no expectation of privacy. Who knows? Maybe your neighbors have a Google Dot or an Alexa device that is picking up your conversation and it’s being filed away in some database for an infinitude of analysis to be performed, such as n-gram language model training, speaker differentiation, or maybe even speaker identification via voice.
    
    3 votes
    
    [2]
    skybrian
    December 24, 2019
    Link Parent
    It seems like that's another conversation about cameras and microphones being everywhere? Deciding who can copy publicly available files from your website is a different problem.
    
    It seems like that's another conversation about cameras and microphones being everywhere?
    
    Deciding who can copy publicly available files from your website is a different problem.
    
    onyxleopard
    December 24, 2019
    Link Parent
    I was just trying to explain the distinction about the web vs. data that is accessed via the internet and indexed by Google (or Facebook etc.).
    
    I was just trying to explain the distinction about the web vs. data that is accessed via the internet and indexed by Google (or Facebook etc.).
    
    1 vote
unknown user
December 23, 2019
Link
Sharing it because that's a conversation I want to see happening. Extrapolating the Web data beyond the HTML/CSS/JS stack sounds like a good thing because it broadens the horizons considerably....

Sharing it because that's a conversation I want to see happening. Extrapolating the Web data beyond the HTML/CSS/JS stack sounds like a good thing because it broadens the horizons considerably.

That said... HTML5 introduced a bunch of ARIA things that are meant to enhance accessibility. I'm not yet well-versed in the matter – my skills are limited to making things work for people with no perception disabilities, like myself – but you can very much do the second path (data → speech synthesis → ear) with what we already have.

1 vote