• Activity
  • Votes
  • Comments
  • New
  • All activity
  • Showing only topics with the tag "annotation". Back to normal view
    1. Mod annotations for removed comments

      I just came across this field of 13 admin-removed comments and frankly it left me feeling rather unsettled. That's a lot of content to just nuke all at once. Contextually, the thread up to that...

      I just came across this field of 13 admin-removed comments and frankly it left me feeling rather unsettled. That's a lot of content to just nuke all at once. Contextually, the thread up to that point was genial and non-controversial, so it seems especially odd that there's just this black hole there. What struck me mostly was how opaque the moderation was. There is no indication of what kind of content was removed, why it was removed, or specifically who did the removal or when it happened.

      Then I scrolled down and at the very bottom I found what I guess is meant to address these concerns, a comment from Deimos:

      Sigh, I saw this thread was active and thought it was going to have an actual on-topic discussion in it. Let's (mostly) start this over.

      It's not always clear online so I want to say that I'm not rage-posting or bellyaching about censorship or any of the usual drama that tends to crop up on sites like Tildes from time to time. I trust Deimos' moderation and give this the benefit of the doubt. What I'm actually doing, I guess, is making a feature request about better annotation for removed comments.

      Would it make sense to show a note (like Deimos' comment) in-thread at the position of the deleted content? Instead of down at the bottom of the page or unattached to anything relevant? In my opinion some kind of "reason" message should always be provided with any moderation activity as a matter of course. Even if it's just boilerplate text chosen from a dropdown menu.

      Also, would a single bulk-annotation for all of the related removals make for better UX than 13 separate ones? I think that would be both easier to read, and easier for Deimos to generate on the backend.

      I feel like we may have had this conversation previously, but I couldn't find it. Apologies if I'm beating a dead horse.

      13 votes
    2. XML Data Munging Problem

      Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML. Your input is some XML such as this:...

      Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML.

      Your input is some XML such as this:

      <DOC>
      <TEXT PARTNO="000">
      <TAG ID="3">This</TAG> is <TAG ID="0">some *JUNK* data</TAG> .
      </TEXT>
      <TEXT PARTNO="001">
      *FOO* Sometimes <TAG ID="1">tags in <TAG ID="0">the data</TAG> are nested</TAG> .
      </TEXT>
      <TEXT PARTNO="002">
      In addition to <TAG ID="1">nested tags</TAG> , sometimes there is also <TAG ID="2">junk</TAG> we need to ignore .
      </TEXT>
      <TEXT PARTNO="003">*BAR*-1
      <TAG ID="2">Junk</TAG> is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . *JUNK*-123
      </TEXT>
      <TEXT PARTNO="004">
      Note that <TAG ID="4">*this*</TAG> is just emphasized . It's not <TAG ID="2">junk</TAG> !
      </TEXT>
      </DOC>
      

      The above XML has so-called in-line textual annotations because the XML <TAG> elements are embedded within the document text itself.

      Your goal is to convert the in-line XML annotations to so-called stand-off annotations where the text is separated from the annotations and the annotations refer to the text via slicing into the text as a character array with starting and ending character offsets. While in-line annotations are more human-readable, stand-off annotations are equally machine-readable, and stand-off annotations can be modified without changing the document content itself (the text is immutable).

      The challenge, then, is to convert to a stand-off JSON format that includes the plain-text of the document and the XML tag annotations grouped by their tag element IDs. In order to preserve the annotation information from the original XML, you must keep track of each <TAG>’s starting and ending character offset within the plain-text of the document. The plain-text is defined as the character data in the XML document ignoring any junk. We’ll define junk as one or more uppercase ASCII characters [A-Z]+ between two *, and optionally a trailing dash - followed by any number of digits [0-9]+.

      Here is the desired JSON output for the above example to test your solution:

      {
        "data": "\nThis is some data .\n\n\nSometimes tags in the data are nested .\n\n\nIn addition to nested tags , sometimes there is also junk we need to ignore .\n\nJunk is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . \n\nNote that *this* is just emphasized . It's not junk !\n\n",
        "entities": [
          {
            "id": 0,
            "mentions": [
              {
                "start": 9,
                "end": 18,
                "id": 0,
                "text": "some data"
              },
              {
                "start": 41,
                "end": 49,
                "id": 0,
                "text": "the data"
              }
            ]
          },
          {
            "id": 1,
            "mentions": [
              {
                "start": 33,
                "end": 60,
                "id": 1,
                "text": "tags in the data are nested"
              },
              {
                "start": 80,
                "end": 91,
                "id": 1,
                "text": "nested tags"
              }
            ]
          },
          {
            "id": 2,
            "mentions": [
              {
                "start": 118,
                "end": 122,
                "id": 2,
                "text": "junk"
              },
              {
                "start": 144,
                "end": 148,
                "id": 2,
                "text": "Junk"
              },
              {
                "start": 326,
                "end": 330,
                "id": 2,
                "text": "junk"
              }
            ]
          },
          {
            "id": 3,
            "mentions": [
              {
                "start": 1,
                "end": 5,
                "id": 3,
                "text": "This"
              }
            ]
          },
          {
            "id": 4,
            "mentions": [
              {
                "start": 289,
                "end": 295,
                "id": 4,
                "text": "*this*"
              }
            ]
          }
        ]
      }
      

      Python 3 solution here.

      If you need a hint, see if you can find an event-based XML parser (or if you’re feeling really motivated, write your own).

      4 votes