• Activity
  • Votes
  • Comments
  • New
  • All activity
    1. HTML <ruby> test

      <ruby>明日<rp>(</rp><rt>Ashita</rt><rp>)</rp></ruby> <ruby>明日<rp>(</rp><rt>Ashita</rt><rp>)</rp></ruby>

      1 vote
    2. An Alternative Approach to Configuration Management

      Preface Different projects have different use cases that can ultimately result in common solutions not suiting your particular needs. Today I'm going to diverging a bit from my more abstract,...

      Preface

      Different projects have different use cases that can ultimately result in common solutions not suiting your particular needs. Today I'm going to diverging a bit from my more abstract, generalized topics on code quality and instead focus on a specific project structure example that I encountered.


      Background

      For a while now, I've found myself being continually frustrated with the state of my project configuration management. I had a single configuration file that would contain all of the configuration options for the various tools I've been using--database, API credentials, etc.--and I kept running into the problem of wanting to test these tools locally while not inadvertently committing and pushing sensitive credentials upstream. For me, part of my security process is ensuring that sensitive access credentials never make it into the repository and to limit access to these credentials to only people who need to be able to access them.


      Monolithic Files Cause Monolithic Pain

      The first thing I realized was that having a single monolithic configuration file was just terrible practice. There are going to be common configuration options that I want to have in there with default values, such as local database configuration pointing to a database instance running on the same VM as the application. These should always be in the repo, otherwise any dev who spins up an instance of the VM will need to manually tread documentation and copy-paste the missing options into the configuration. This would be incredibly time-consuming, inefficient, and stupid.

      I also use different tools which have different configuration options associated with them. Having to dig through a single file containing configuration options for all of these tools to find the ones I need to modify is cumbersome at best. On top of that, having those common configuration options living in the same place that sensitive access credentials do is just asking for a rogue git commit -A to violate the aforementioned security protocol.


      Same Problem, Different Structure

      My first approach to resolving this problem was breaking the configuration out into separate files, one for each distinct tool. In each file, a "skeleton" config was generated, i.e. each option was given a default empty value. The main config would then only contain config options that are common and shared across the application. To avoid having the sensitive credentials leaked, I then created rules in the .gitignore to exclude these files.

      This is where I ran into problem #2. I learned that this just doesn't work. You can either have a file in your repo and have all changes to that file tracked, have the file in your repo and make a local-only change to prevent changes from being tracked, or leave the file out of the repo completely. In my use case, I wanted to be able to leave the file in the repo, treat it as ignored by everyone, and only commit changes to that file when there was a new configuration option I wanted added to it. Git doesn't support this use case whatsoever.

      This problem turned out to be really common, but the solution suggested is to have two separate versions of your configuration--one for dev, and one for production--and to have a flag to switch between the two. Given the breaking up of my configuration, I would then need twice as many files to do this, and given my security practices, this would violate the no-upstream rule for sensitive credentials. Worse still, if I had several different kinds of environments with different configuration--local dev, staging, beta, production--then for m such environments and n configuration files, I would need to maintain n*m separate files for configuration alone. Finally, I would need to remember to include a prefix or postfix to each file name any time I needed to retrieve values from a new config file, which is itself an error-prone requirement. Overall, there would be a substantial increase in technical debt. In other words, this approach would not only not help, it would make matters worse!


      Borrowing From Linux

      After a lot of thought, an idea occurred to me: within Linux systems, there's an /etc/skel/ directory that contains common files that are copied into a new user's home directory when that user is created, e.g. .bashrc and .profile. You can make changes to these files and have them propagate to new users, or you can modify your own personal copy and leave all other new users unaffected. This sounds exactly like the kind of behavior I want to emulate!

      Following their example, I took my $APPHOME/config/ directory and placed a skel/ subdirectory inside, which then contained all of the config files with the empty default values within. My .gitignore then looked something like this:

      $APPHOME/config/*
      !$APPHOME/config/main.php
      !$APPHOME/config/skel/
      !$APPHOME/config/skel/*
      # This last one might not be necessary, but I don't care enough to test it without.
      

      Finally, on deploying my local environment, I simply include a snippet in my script that enters the new skel/ directory and copies any files inside into config/, as long as it doesn't already exist:

      cd $APPHOME/config/skel/
      for filename in *; do
          if [ ! -f "$APPHOME/config/$filename" ]; then
              cp "$filename" "$APPHOME/config/$filename"
          fi
      done
      

      (Note: production environments have a slightly different deployment procedure, as local copies of these config files are saved within a shared directory for all releases to point to via symlink.)

      All of these changes ensure that only config/main.php and the files contained within config/skel/ are whitelisted, while all others are ignored, i.e. our local copies that get stored within config/ won't be inadvertently committed and pushed upstream!


      Final Thoughts

      Common solutions to problems are typically common for a good reason. They're tested, proven, and predictable. But sometimes you find yourself running into cases where the common, well-accepted solution to the problem doesn't work for you. Standards exist to solve a certain class of problems, and sometimes your problem is just different enough for it to matter and for those standards to not apply. Standards are created to address most cases, but edge cases will always exist. In other words, standards are guidelines, not concrete rules.

      Sometimes you need to stop thinking about the problem in terms of the standard approach to solving it, and instead break it down into its most abstract, basic form and look for parallels in other solved problems for inspiration. Odds are the problem you're trying to solve isn't as novel as you think it is, and that someone has probably already solved a similar problem before. Parallels, in my experience, are usually a pretty good indicator that you're on the right track.

      More importantly, there's a delicate line to tread between needing to use a different approach to solving an edge case problem you have, and needing to restructure your project to eliminate the edge case and allow the standard solution to work. Being able to decide which is more appropriate can have long-lasting repercussions on your ability to manage technical debt.

      16 votes
    3. Suggestions regarding clickbait and misinformation

      One thing (amongst many) that always bothered me in my 6+ years of using Reddit was their lax rules about posting clickbait articles and straight up misinformation. In my opinion this was...

      One thing (amongst many) that always bothered me in my 6+ years of using Reddit was their lax rules about posting clickbait articles and straight up misinformation. In my opinion this was something that contributed to the rise of radical communities and echochambers in the website.

      In this post I'll talk about Clickbait, Unreliable studies, and Misinformation. I'll give examples for each one and suggest a way to deal with it.

      Clickbait-

      Let's start with the most benign one. These days most big websites use clickbait and hyperbole to gain more traffic. It's something that they have to do in order to survive in today's media climate and I sort of understand. But I think that as a community in Tildes we should raise our standards and avoid posting any article that uses clickbait, instead directly link to the source that the article cites.

      An example would be: An article titled "Life on Mars found: Scientists claim that they have found traces of life on the red planet".

      But when you read the original source it only states that "Mars rover Curiosity has identified a variety of organic molecules" and that "These results do not give us any evidence of life,".
      (This may be a bad/exaggrated example but I think it gets my point across.)

      On Reddit the mods give these kinds of posts a "Misleading" tag. But the damage is already done, most of the users won't read the entire article or even the source, and instead will make comments based on the headline.
      I personally think that these kinds of posts should be deleted even if they get a discussion going in the comments.

      Unreliable studies-

      This is a bit more serious than clickbait. It's something that I see the most in subjects of psychology, social science and futurism.
      These are basically articles about studies that conclude a very interesting result, but when you dig a bit you find that the methodologies used to conduct the study were flawed and that the results are inconclusive.

      An (real) example would be: "A new study finds that cutting your time on social media to 30 minutes a day reduces your risk of depression and loneliness"
      Link: https://www.businessinsider.com/facebook-instagram-snapchat-social-media-well-being-2018-11

      At first glance this looks legit, I even agree with the results. But lets see how this study was conducted:

      In the study, 143 undergraduate students were tested over the course of two semesters.

      After three weeks, the students were asked questions to assess their mental health across seven different areas

      Basically, their test group was 143 students, The test was only conducted for 6 months, and the results were self-reported.

      Clearly, this is junk. This study doesn't show anything reliable. Yet still, it received a lot of upvotes on Reddit and there was a lot of discussion going. I only spotted 2-3 comments (at the bottom) mentioning that the study is unreliable.

      Again, I think that posts with studies like this should be deleted regardless if there is a discussion going in the comments or not.

      Misinformation-

      This is in my opinion the biggest offender and the most dangerous one. It's something that I see in political subreddits (even the big ones like /r/politics and /r/worldnews). It's when an article straight up spreads misinformation both in the headline and in the content in order to incite outrage or paint a narrative.

      Note: I will give an example that bashes a "left-leaning" article that is against Trump. I'm only doing this because I only read left-leaning to neutral articles and don't go near anything that is right-leaning. Because of this I don't have any examples of a right-leaning article spreading misinformation (I'm sure that there are a lot).

      An example would be this article: "ADMINISTRATION ADMITS BORDER DEPLOYMENT WAS A $200 MILLION ELECTION STUNT"
      Link: https://www.vanityfair.com/news/2018/11/trump-troops-border-caravan-stunt

      There are two lies here:

      1. Trump administration did not admit to anything. (The article's use of the word 'Admit' is supposedly justified with 'They indirectly admitted to it'. I personally think this is a bad excuse.)
      2. Most importantly, the 200 million figure is pure speculation. If you go to the older article that this article cites, the 200m figure comes from a speculation that the operation could cost up to 200m if the number of troops sent to the border is 15,000 and they stay there for more than 2 months.
        In reality the number of troops sent was 8,500 and they stayed for only a few days/weeks.

      A few days after this article was published it turned out that the operation costed 70 million. Still a big sum, still ridiculous. But it's almost a third of what the article claimed.

      The misinformation in this example is fairly benign. But I've seen countless other articles with even more outrageous claims that force a certain narrative. This is done by both sides of the political spectrum.

      Not only do I think that we should delete these kinds of posts in Tildes, in my opinion we should black list websites that are frequent offenders of spreading misinformation.
      Examples off the top of my head would be: Vanity Fair, Salon.com, of course far right websites like Fox News, Info Wars and Breitbart.
      A good rule in my opinion would be: If three posts from a certain website get deleted for spreading misinformation, that website should be blacklisted from Tildes.

      In conclusion:
      I think we should set some rules against these problems while our community is still in the early stages. Right now I don't see any of these 3 problems on Tildes. But if we don't enforce rules against them, they will start to pop up the more users we gain.

      I'll be happy to know your opinions and suggestions on the matter!

      32 votes
    4. Automatic tag generation?

      It looks like tags are currently added manually. Would it make sense to automatically tag posts by their content? This would be similar to the behavior of Flairer, /r/flairer

      2 votes
    5. Code Quality Tip: Wrapping external libraries.

      Preface Occasionally I feel the need to touch on the subject of code quality, particularly because of the importance of its impact on technical debt, especially as I continue to encounter the...

      Preface

      Occasionally I feel the need to touch on the subject of code quality, particularly because of the importance of its impact on technical debt, especially as I continue to encounter the effects of technical debt in my own work and do my best to manage it. It's a subject that is unfortunately not emphasized nearly enough in academia.


      Background

      As a refresher, technical debt is the long-term cost of the design decisions in your code. These costs can manifest in different ways, such as greater difficulty in understanding what your code is doing or making non-breaking changes to it. More generally, these costs manifest as additional time and resources being spent to make some kind of change.

      Sometimes these costs aren't things you think to consider. One such consideration is how difficult it might be to upgrade a specific technology in your stack. For example, what if you've built a back-end system that integrates with AWS and you suddenly need to upgrade your SDK? In a small project this might be easy, but what if you've built a system that you've been maintaining for years and it relies heavily on AWS integrations? If the method names, namespaces, argument orders, or anything else has changed between versions, then suddenly you'll need to update every single reference to an AWS-related tool in your code to reflect those changes. In larger software projects, this could be a daunting and incredibly expensive task, spanning potentially weeks or even months of work and testing.

      That is, unless you keep those references to a minimum.


      A Toy Example

      This is where "wrapping" your external libraries comes into play. The concept of "wrapping" basically means to create some other function or object that takes care of operating the functions or object methods that you really want to target. One example might look like this:

      <?php
      
      class ImportedClass {
          public function methodThatMightBecomeModified($arg1, $arg2) {
              // Do something.
          }
      }
      
      class ImportedClassWrapper {
          private $class_instance = null;
      
          private function getInstance() {
              if(is_null($this->class_instance)) {
                  $this->class_instance = new ImportedClass();
              }
      
              return $this->class_instance;
          }
      
          public function wrappedMethod($arg1, $arg2) {
              return $this->getInstance()->methodThatMightBecomeModified($arg1, $arg2);
          }
      }
      
      ?>
      

      Updating Tools Doesn't Have to Suck

      Imagine that our ImportedClass has some important new features that we need to make use of that are only available in the most recent version, and we're several versions behind. The problem, of course, is that there were a lot of changes that ended up being made between our current version and the new version. For example, ImportedClass is now called NewImportedClass. On top of that, methodThatMightBecomeModified is now called methodThatWasModified, and the argument order ended up getting switched around!

      Now imagine that we were directly calling new ImportedClass() in many different places in our code, as well as directly invoking methodThatMightBecomeModified:

      <?php
      
      $imported_class_instance = new ImportedClass();
      $imported_class_instance->methodThatMightBeModified($val1, $val2);
      
      ?>
      

      For every single instance in our code, we need to perform a replacement. There is a linear or--in terms of Big-O notation--a complexity of O(n) to make these replacements. If we assume that we only ever used this one method, and we used it 100 times, then there are 100 instances of new ImportClass() to update and another 100 instances of the method invocation, equaling 200 lines of code to change. Furthermore, we need to remember each of the replacements that need to be made and carefully avoid making any errors in the process. This is clearly non-ideal.

      Now imagine that we chose instead to use the wrapper object:

      <?php
      
      $imported_class_wrapper = new ImportedClassWrapper();
      $imported_class_wrapper->wrappedMethod($val1, $val2);
      
      ?>
      

      Our updates are now limited only to the wrapper class:

      <?php
      
      class ImportedClassWrapper {
          private $class_instance = null;
      
          private function getInstance() {
              if(is_null($this->class_instance)) {
                  $this->class_instance = new NewImportedClass();
              }
      
              return $this->class_instance;
          }
      
          public function wrappedMethod($arg1, $arg2) {
              return $this->getInstance()->methodThatWasModified($arg2, $arg1);
          }
      }
      
      ?>
      

      Rather than making changes to 200 lines of code, we've now made changes to only 2. What was once an O(n) complexity change has now turned into an O(1) complexity change to make this upgrade. Not bad for a few extra lines of code!


      A Practical Example

      Toy problems are all well and good, but how does this translate to reality?

      Well, I ran into such a problem myself once. Running MongoDB with PHP requires the use of an external driver, and this driver provides an object representing a MongoDB ObjectId. I needed to perform a migration from one hosting provider over to a new cloud hosting provider, with the application and database services, which were originally hosted on the same physical machine, hosted on separate servers. For security reasons, this required an upgrade to a newer version of MongoDB, which in turn required an upgrade to a newer version of the driver.

      This upgrade resulted in many of the calls to new MongoId() failing, because the old version of the driver would accept empty strings and other invalid ID strings and default to generating a new ObjectId, whereas the new version of the driver treated invalid ID strings as failing errors. And there were many, many cases where invalid strings were being passed into the constructor.

      Even after spending hours replacing the (literally) several dozen instances of the constructor calls, there were still some places in the code where invalid strings managed to get passed in. This made for a very costly upgrade.

      The bugs were easy to fix after the initial replacements, though. After wrapping new MongoId() inside of a wrapper function, a few additional conditional statements inside of the new function resolved the bugs without having to dig around the rest of the code base.


      Final Thoughts

      This is one of those lessons that you don't fully appreciate until you've experienced the technical debt of an unwrapped external library first-hand. Code quality is an active effort, but a worthwhile one. It requires you to be willing to throw away potentially hours or even days of work when you realize that something needs to change, because you're thinking about how to keep yourself from banging your head against a wall later down the line instead of thinking only about how to finish up your current task.

      "Work smarter, not harder" means putting in some hard work upfront to keep your technical debt under control.

      That's all for now, and remember: don't be fools, wrap your external tools.

      23 votes
    6. Syntax test

      C: int main(void) { puts("test"); return 0; } Go: package main import "os" func main() { _, err := os.Stdout.WriteString("test") if err != nil { _, _ = os.Stderr.WriteString("fail") } } Ruby:...

      C:

      int main(void)
      {
      	puts("test");
      	return 0;
      }
      

      Go:

      package main
      
      import "os"
      
      func main() {
      	_, err := os.Stdout.WriteString("test")
      	if err != nil {
      		_, _ = os.Stderr.WriteString("fail")
      	}
      }
      

      Ruby:

      10.times { |n| puts "test_#{ n }" }
      

      XML:

      <html>
        <body>
          <h1>test</h1>
        </body>
      </html>
      
      1 vote
    7. Triple the apparatuses, triple the weirdness: a layperson's introduction to quantisation and spin, part 2

      EDIT: With the help of @ducks the post now has illustrations to clear up the experimental set-up. Introduction I want to give an introduction on several physics topics at a level understandable to...

      EDIT: With the help of @ducks the post now has illustrations to clear up the experimental set-up.

      Introduction

      I want to give an introduction on several physics topics at a level understandable to laypeople (high school level physics background). Making physics accessible to laypeople is a much discussed topic at universities. It can be very hard to translate the professional terms into a language understandable by people outside the field. So I will take this opportunity to challenge myself to (hopefully) create an understandable introduction to interesting topics in modern physics. To this end, I will take liberties in explaining things, and not always go for full scientific accuracy, while hopefully still getting the core concepts across. If a more in-depth explanation is wanted, please ask in the comments and I will do my best to answer.

      Previous topics

      Spintronics
      Quantum Oscillations
      Quantisation and spin, part 1

      Today's topic

      Today's topic will be a continuation of the topics discussed in my last post. So if you haven't, please read part 1 first (see link above). We will be sending particles through two Stern-Gerlach apparatuses and then we'll put the particles through three of them. We will discuss our observations and draw some very interesting conclusions from it on the quantum nature of our universe. Not bad for a single experiment that can be performed easily!

      Rotating the Stern-Gerlach apparatus

      We will start simple and rotate the set-up of the last post 90 degrees so that the magnets face left and right instead of up and down. Now let's think for a moment what we expect would happen if we sent silver atoms through this setup. Logically, there should not be in any difference in outcome if we rotate our experiment 90 degrees (neglecting gravity, whose strength is very low compared to the strength of the magnets). This is a core concept of physics, there are no "privileged" frames of reference in which the results would be more correct. So it is reasonable to assume that the atoms would split left and right in the same way they split up and down last time. This is indeed what happens when we perform the experiment. Great!

      Two Stern-Gerlach apparatuses

      Let's continue our discussion by chaining two Stern-Gerlach apparatuses together. The first apparatus will be oriented up-down, the second one left-right. We will be sending silver atoms with unknown spin through the first apparatus. As we learned in the previous post, this will cause them to separate into spin-up and spin-down states. Now we take only the spin-up silver atoms and send them into the second apparatus, which is rotated 90 degrees compared to the first one. Let's think for a moment what we expect would happen. It would be reasonable to assume that spin-left and spin-right would both appear 50% of the time, even if the silver atoms all have spin-up too. We don't really have a reason to assume a particle cannot both have spin up and spin right, or spin up and spin left. And indeed, once again we find a 50% split between spin-left and spin-right at the end of our second apparatus. Illustration here.

      Three Stern-Gerlach apparatuses and a massive violation of common sense

      So it would seem silver atoms have spin up or down as a property, and spin left or spin right as another property. Makes sense to me. To be sure, we take all the silver atoms that went up at the end of the first apparatus and right at the end of the second apparatus and send them through a third apparatus which is oriented up-down (so the same way as the first). Surely, all these atoms are spin-up so they will all come out up top again. We test this and find... a 50-50 split between up and down. Wait, what?

      Remember that in the previous post I briefly mentioned that if you take two apparatuses who are both up-down oriented and send only the spin-up atoms through the second one they all come out up top again. So why now suddenly do they decide to split 50-50 again? We have to conclude that being forced to choose spin-left or spin-right causes the atoms to forget if they were spin-up or spin-down.

      This result forces us to fundamentally reconsider how we describe the universe. We have to introduce the concepts of superposition and wave function collapse to be able to explain these results.

      Superpositions, collapse and the meaning of observing in quantum physics

      The way physicists make sense of the kind of behaviour described above is by saying the particles start out in a superposition; before the first experiment they are 50% in the up-state and 50% in the down-state at the same time. We can write this as 50%[spin up]+50%[spin down], and we call this a wave function. Once we send the particles through the first Stern-Gerlach apparatus each one will be forced to choose to exhibit spin-up or spin-down behaviour. At this point they are said to undergo (wave function) collapse; they are now in either the 100%[spin up] or 100%[spin down] state. This is the meaning of observing in quantum mechanics, once we interact with a property of an atom (or any particle, or even a cat) that is in a superposition this superposition is forced to collapse into a single definite state, in this case the property spin is in a superposition and upon observing is forced to collapse to spin up or spin down.

      However, once we send our particles through the second apparatus, they are forced to collapse into 100%[spin left] or 100%[spin right]. As we saw above, this somehow also makes them go back into the 50%[spin up]+50%[spin down] state. The particles cannot collapse into both a definite [spin up] or [spin down] state and a definite [spin left] or [spin right] state. Knowing one precludes knowing the other. An illustration can be seen here.

      This has far reaching consequences for how knowable our universe it. Even if we can perfectly describe the universe and everything in it, we still cannot know such simple things as whether a silver atom will go left or right in a magnetic field - if we know it would go up or down. It's not just that we aren't good enough at measuring, it's fundamentally unknowable. Our universe is inherently random.

      Conclusion

      In these two posts we have broken the laws of classical physics and were forced to create a whole new theory to describe how our universe works. We found out our universe is unknowable and inherently random. Even if we could know all the information of the state our universe is in right now, we still would not be able to track perfectly how our universe would evolve, due to the inherent chance that is baked into it.

      Next time

      Well that was quite mind-blowing. Next time I might discuss fermions vs bosons, two types of particles that classify all (normal) matter in the universe and that have wildly different properties. But first @ducks will take over this series for a few posts and talk about classical physics and engineering.

      Feedback

      As always, please feel free to ask for clarification and give me feedback on which parts of the post could me made clearer. Feel free to discuss the implications for humanity to exist in a universe that is inherently random and unknowable.

      Addendum

      Observant readers might argue that in this particular case we could just as well have described spin as a simple property that will align itself to the magnets. However, we find the same type of behaviour happens with angles other than 90 degrees. Say the second apparatus is at an angle phi to the first apparatus, then the chance of the particles deflecting one way is cos^2(phi/2)[up] and sin^2(phi/2)[down]. So even if there's only a 1 degree difference between the two apparatuses, there's still a chance that the spin will come out 89 degrees rotated rather than 1 degree rotated.

      32 votes
    8. any sites like 23andme that dont sell your DNA to corporations?

      I'd really like to have a DNA test done to know my family history beyond 2 generations (adopted relatives) I've heard numerous times that 23andme will abuse the information they obtain and either...

      I'd really like to have a DNA test done to know my family history beyond 2 generations (adopted relatives)

      I've heard numerous times that 23andme will abuse the information they obtain and either target you with ads or sell your DNA to marketing agencies, are there any non invasive DNA tests available online or elsewhere?

      14 votes
    9. XML Data Munging Problem

      Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML. Your input is some XML such as this:...

      Here’s a problem I had to solve at work this week that I enjoyed solving. I think it’s a good programming challenge that will test if you really grok XML.

      Your input is some XML such as this:

      <DOC>
      <TEXT PARTNO="000">
      <TAG ID="3">This</TAG> is <TAG ID="0">some *JUNK* data</TAG> .
      </TEXT>
      <TEXT PARTNO="001">
      *FOO* Sometimes <TAG ID="1">tags in <TAG ID="0">the data</TAG> are nested</TAG> .
      </TEXT>
      <TEXT PARTNO="002">
      In addition to <TAG ID="1">nested tags</TAG> , sometimes there is also <TAG ID="2">junk</TAG> we need to ignore .
      </TEXT>
      <TEXT PARTNO="003">*BAR*-1
      <TAG ID="2">Junk</TAG> is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . *JUNK*-123
      </TEXT>
      <TEXT PARTNO="004">
      Note that <TAG ID="4">*this*</TAG> is just emphasized . It's not <TAG ID="2">junk</TAG> !
      </TEXT>
      </DOC>
      

      The above XML has so-called in-line textual annotations because the XML <TAG> elements are embedded within the document text itself.

      Your goal is to convert the in-line XML annotations to so-called stand-off annotations where the text is separated from the annotations and the annotations refer to the text via slicing into the text as a character array with starting and ending character offsets. While in-line annotations are more human-readable, stand-off annotations are equally machine-readable, and stand-off annotations can be modified without changing the document content itself (the text is immutable).

      The challenge, then, is to convert to a stand-off JSON format that includes the plain-text of the document and the XML tag annotations grouped by their tag element IDs. In order to preserve the annotation information from the original XML, you must keep track of each <TAG>’s starting and ending character offset within the plain-text of the document. The plain-text is defined as the character data in the XML document ignoring any junk. We’ll define junk as one or more uppercase ASCII characters [A-Z]+ between two *, and optionally a trailing dash - followed by any number of digits [0-9]+.

      Here is the desired JSON output for the above example to test your solution:

      {
        "data": "\nThis is some data .\n\n\nSometimes tags in the data are nested .\n\n\nIn addition to nested tags , sometimes there is also junk we need to ignore .\n\nJunk is marked by uppercase characters between asterisks and can also optionally be followed by a dash and then one or more digits . \n\nNote that *this* is just emphasized . It's not junk !\n\n",
        "entities": [
          {
            "id": 0,
            "mentions": [
              {
                "start": 9,
                "end": 18,
                "id": 0,
                "text": "some data"
              },
              {
                "start": 41,
                "end": 49,
                "id": 0,
                "text": "the data"
              }
            ]
          },
          {
            "id": 1,
            "mentions": [
              {
                "start": 33,
                "end": 60,
                "id": 1,
                "text": "tags in the data are nested"
              },
              {
                "start": 80,
                "end": 91,
                "id": 1,
                "text": "nested tags"
              }
            ]
          },
          {
            "id": 2,
            "mentions": [
              {
                "start": 118,
                "end": 122,
                "id": 2,
                "text": "junk"
              },
              {
                "start": 144,
                "end": 148,
                "id": 2,
                "text": "Junk"
              },
              {
                "start": 326,
                "end": 330,
                "id": 2,
                "text": "junk"
              }
            ]
          },
          {
            "id": 3,
            "mentions": [
              {
                "start": 1,
                "end": 5,
                "id": 3,
                "text": "This"
              }
            ]
          },
          {
            "id": 4,
            "mentions": [
              {
                "start": 289,
                "end": 295,
                "id": 4,
                "text": "*this*"
              }
            ]
          }
        ]
      }
      

      Python 3 solution here.

      If you need a hint, see if you can find an event-based XML parser (or if you’re feeling really motivated, write your own).

      4 votes
    10. Shooting Stars as a Service - Japanese space entertainment company ALE will provide on-demand shooting stars for your event

      I was watching my favorite weekly space show on YouTube, TMRO, and I learned about Astro Live Experiences (ALE.) They will soon launch two test satellites which will be able to provide a burst of...

      I was watching my favorite weekly space show on YouTube, TMRO, and I learned about Astro Live Experiences (ALE.) They will soon launch two test satellites which will be able to provide a burst of 30-40 man made shooting stars at a prearranged time and place, for a fee.

      Japanese company ALE is the first "space entertainment" company of which I am aware. The only event in the same ballpark was New Zealand based RocketLab's Humanity Star which caused a large amount of controversy. ALE's initial technology will allow a 200km radius of earth to see their multi-color shooting star show. According to the interview on TMRO, in the long term, they are planning to allow image rendering and even artificial aurora.

      This type of business seems inevitable as we advance into space. I can see some benefits and some downsides to this technology. What do you all think of this?

      Maybe this topic belongs in ~misc

      14 votes
    11. Do mention notifications work if they happen during am edit?

      Given the quality of code here, this probably works.. but could someone write a comment here which says “test”, then edit it to include @adams? I am curious if the edit will notify me. Thx. Edit:...

      Given the quality of code here, this probably works.. but could someone write a comment here which says “test”, then edit it to include @adams? I am curious if the edit will notify me.

      Thx.

      Edit: during “an” edit

      1 vote
    12. What are your plans for the best holiday of the year, Halloween?

      I freaking love Halloween. The costumes, the candy, the temperature, the pumpkins, and the whole aesthetic. I'm just testing the waters here to see how everyone else here feels about it and what...

      I freaking love Halloween. The costumes, the candy, the temperature, the pumpkins, and the whole aesthetic. I'm just testing the waters here to see how everyone else here feels about it and what ya'lls plans are.

      EDIT: haha, while my title is technically correct I meant "best" holiday, not next

      16 votes
    13. Many updates to The Feature Formerly Known as Comment Tagging

      A couple of weeks ago, I re-enabled the comment tagging feature. Since then, I've been keeping an eye on how it's being used, reading all the feedback people have posted, and have made a few other...

      A couple of weeks ago, I re-enabled the comment tagging feature. Since then, I've been keeping an eye on how it's being used, reading all the feedback people have posted, and have made a few other small adjustments in the meantime. Today, I'm implementing quite a few more significant changes to it.

      First, to try to head off some confusion: if you're very new to Tildes, you won't have access to this feature yet. Currently, only accounts that are at least a week old can use it. Also, the docs haven't been updated yet, but I'll do that later today.

      Here's what's changed:

      • The name has changed from "tag" to "label". I think it's better to use a different term to separate it more easily from topic tags since the features are very different, and "label" shouldn't have the implications that some people attach with "tagging".

      • As suggested by @patience_limited, "Troll" and "Flame" have now been replaced with a single label named "Malice". I don't think the distinction was important in most cases, and the meanings of them were a bit ambiguous, especially with how much the word "troll" has become over-used lately.

        Basically, you should label a comment as Malice if you think it's inappropriate for Tildes for some reason - whether the poster is being an asshole, trolling, spamming, etc.

      • This new Malice label requires entering a reason when you apply it. The reason you enter is only visible to me.

      • Another new label named "Exemplary" has been added, which is the first clearly positive one. This label is intended for people to use on comments that they think are exceptionally good, and it effectively acts as a multiplier to the votes on that comment (and the multiplier increases if more people label the comment Exemplary). Like Malice, it requires entering a reason for why you consider that comment exemplary, but the reason is visible (anonymously) to the author of the comment.

        Currently, you can only use this label once every 8 hours - don't randomly use it as a test, or you won't be able to use it again for 8 hours.

      The interface for some of these changes is a bit janky still and will probably be updated/adjusted before long, but it should be good enough to start trying them out. And as always, beyond the interface, almost everything else is subject to change as well, depending on feedback/usage. Let me know what you think—comment labels have a lot of potential, so it's important to figure out how to make them work well.

      105 votes
    14. Pretty Terrible Story About Death or Something

      I don’t know about you, but I’d always been taught one of 2 things about death. Either You die and that’s that, nothing else happens and you slowly turn to unthinking dust or You die and get...

      I don’t know about you, but I’d always been taught one of 2 things about death. Either
      You die and that’s that, nothing else happens and you slowly turn to unthinking dust or
      You die and get transported to some mystical outside realm, either a heaven, hell, or purgatory where your immortal soul spends an infinite amount of time

      Now, these aren’t nearly the only interpretations in this wide world, but if you grew up as a middle class white kid in suburban America, this is likely all you heard.

      It took until my 30th year for one of these to be the official accepted scientific theory on the afterlife. Finally, after all these years, science had an answer for what happened after death, and it was-

      Well

      Actually, it’s not really what happens after, per se. No, this perception could not occur after death. There simply was no way any living thing could continue to perceive after death, either any way of defining life we have would be thrown out the window. Instead, this was an explanation for those pernicious near-death experiences that pop up every now and again. Rather than being dead and having moved on, these were all visions people have in the moments prior to death.

      Essentially, the afterlife was all a dream put on by the brain in a vain attempt to keep itself happy and alive.

      This led to a thought. What was the limits of these dreams? Would they continue forever? Would the occupant of the dream believe they could still die in the dream, or would they be an immortal thought, a ghost of firing neurons? Is the brain capable of nesting time ad infinitum, or is the clock speed of the brain too slow for that?

      All signs seemed to point towards the brain giving the occupant infinite joy. Citing coma patients who believed they lived millenia in only a few weeks, the majour scientists of the day claimed a way to cheat death. After all, the only limiting factor here was how fast a bolt of electricity could move across, and since that was basically light speed, time didn’t really matter.

      It didn’t really matter.

      This of course led to a massive increase in suicides throughout the globe. It seemed the main limiting factor for many was whether suicide may lead to a unpleasant scenario. Even those who hadn’t, prior to the discovery, had a single suicidal thought cross their mind jumped at the chance of eternal joy. It wasn’t until much later any sense came into people.

      See, it seems most people are born without a fear of the infinite. I won’t assume, of course, but would you truly find an infinite heaven scary? I would. Infinite time leads to infinite scenarios leads to infinite amounts of both joy and pain. Any amount of fun, after a sufficiently long time, gets boring.

      So, the world was whipped into a global frenzy of life. Wars ended as neither side could really justify it anymore. People finally began to help each other.

      And then, just as quickly as this afterlife frenzy started, it was announced the initial findings were incorrect. Perhaps a decimal slipped, so the official story was death was finite and there was no afterlife.

      That was the official story, of course. The unofficial story…

      Well,

      Imagine you’re trying to do infinite things in two seconds. If you could split your time infinitely, you could complete all infinite things in two seconds. But all the same, everything would be done in two seconds.

      Imagine now you’re trying to do those infinite things in two seconds again, but you have to work against your hands slowly disappearing. Much more difficult, and now you’re less likely to complete those infinite things, but a more finite set. If you think this whole scenario is ridiculous, it’s all based off an account by a Survivor.

      The Survivors were a test group who were used to poke and prod at their afterlives until it could be fully explored. They’re who first discovered the effects of cell death on the afterlife.

      As a body dies, the cells begin to die at a rate of 10 millimeters every second. The initial researchers thought this irrelevant, as the speed of the brain was too fast for it too matter. What they didn’t factor in was that he brain is one of the first parts of the body to die. Sure, electricity moving across perfectly kempt brain cells moved near light speed, but add in broken highways of neurons and suddenly it grew much, much slower.

      The first Survivor to discover this recounted the sky slowly darkening and a void suddenly appearing on the horizon. They were lucky, as the test was ended prior to any majour brain damage. One less so had their memories scanned to reveal their perfect paradise being reduced to a one by one meter square and their representation writhing on the floor in apparent pain. They were not recovered.

      Of course, the researchers were horrified. Only weeks prior had they stressed how painless death should now be, and here was a gauntlet thrown at their feet. So they did the only sensible thing: Lie to prevent a mass hysteria ending in the death of all humans.

      And so it’s seemed to work. Just remember, if you see an empty horizon, this is the explanation:
      Death has always been with us.
      Nobody cheats Death.
      Death will always win in a cosmic tug of war.
      And, most importantly, It’s already too late It's already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late
      It’s already too late

      6 votes