DataWraith's recent activity

  1. Comment on Signal is finally bringing its secure messaging to the masses in ~tech

    DataWraith
    Link Parent
    Yes, that makes sense. I find it interesting that the SMS/data plan balance is exactly inverted in the two countries. If you go over your data allowance here, you're generally throttled to a lower...

    Yes, that makes sense.

    I find it interesting that the SMS/data plan balance is exactly inverted in the two countries. If you go over your data allowance here, you're generally throttled to a lower speed but don't lose access or have to pay extra. SMS on the other hand cost about 0.10€ per message.

    2 votes
  2. Comment on Signal is finally bringing its secure messaging to the masses in ~tech

    DataWraith
    Link Parent
    Thank you for that counterpoint, maybe I'm simply caught up in my own filter bubble. I thought Riot was leagues ahead of Signal, even for one-on-one chats, but with the caveat that I haven't used...

    Thank you for that counterpoint, maybe I'm simply caught up in my own filter bubble.

    I thought Riot was leagues ahead of Signal, even for one-on-one chats, but with the caveat that I haven't used the latter in a while. I generally don't really use SMS anymore, and that seems to hold for most people I know (Germany) -- WhatsApp is free, whereas SMS still cost money. You generally get some free per month (depending on your contract), but those can be used up quickly in a back-and-forth chat, so people are almost exclusively using WhatsApp, rarely Telegram or, in my case, Matrix.

    I'm sorry to hear you had such a bad time on Matrix; I mostly use it to communicate with a few trusted friends (using a self-hosted server), and we're also using it for communication at work as a Slack replacement.

    The porn channels struck me as mere annoyance; there is going to be some spam in a decentralized system after all. I can see how this can be a turn-off, but then again, I'm not browsing random channels very often, so it's rarely a problem.

    5 votes
  3. Comment on Signal is finally bringing its secure messaging to the masses in ~tech

    DataWraith
    Link
    On the one hand I admire Signal for its cryptography and privacy features in general, but on the other hand, the user experience always has been severely lacking. The article sounds like they are...

    On the one hand I admire Signal for its cryptography and privacy features in general, but on the other hand, the user experience always has been severely lacking.

    The article sounds like they are trying to change exactly that, but I think it's too little, too late. A billion users within five years sounds downright ludicrous as a goal.

    Personally I stopped using it a long time ago because of their restrictive "you may not use more than three desktop clients"-policy. Dual-booting a single machine of course takes up two slots, so I was limited to two machines total.

    That limit was apparently since raised to five, but it's still strange. Is it a technical limitation? Is it so you can recognize when foreign devices are added to your account? They could at least tell you why the limit is in place, but "There is a limit of 5 linked devices. Confirm you have not hit this limit." is all I could find about it on their website just now.

    Call me a pessimist, but with WhatsApp having introduced e2e crypto and Matrix.org rapidly approaching maturity (with device cross-signing and end-to-end by-default on the horizon), I can only see Signal dying a slow death.

    11 votes
  4. Comment on What is your favorite opening scene in a movie? in ~movies

    DataWraith
    (edited )
    Link
    I love the opening scene of Contact. Technically this could be construed as spoiler A view of the earth. Silence, then rock music. A slow zoom out from earth through the entire solar system, as...

    I love the opening scene of Contact.

    Technically this could be construed as spoiler A view of the earth. Silence, then rock music. A slow zoom out from earth through the entire solar system, as the radio transmissions get farther and farther back in history. Then the camera is so far away from earth that any sign of humanity is absent. More silence. The entire milky way comes into view. Then you see that it is just a single galaxy among its peers, and not necessarily the grandest one. And then even that fades into obscurity as we move away from it. Millions of galaxies fill the screen. And then we end up transitioning back to earth through a reflection in the eye of young Ellie Arroway, at the very start of her story...
    8 votes
  5. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link
    I finally managed to beat LunarLander-v2 (see my last post for details) in about 400 episodes and about 90 minutes of runtime on a CPU. The winning solution came a bit unexpected, as it is a Deep...

    I finally managed to beat LunarLander-v2 (see my last post for details) in about 400 episodes and about 90 minutes of runtime on a CPU.

    The winning solution came a bit unexpected, as it is a Deep Q-Network variant, Implicit Quantile Networks. The big difference between DQN and IQN is that the latter is a distributional algorithm. That means that it does not just try to estimate the mean of the rewards for each action, but the entire distribution, which helps when the reward distribution is multimodal. If you imagine an action that could either be very good or very bad, then simply taking the mean is going to be an inaccurate characterization of that action.

    Contrary to DQN, I find IQN to be difficult to comprehend. From what I understand, they work by transforming uniform random samples from the [0, 1]-interval into estimates of the reward distribution at the sampled quantile. That is, you pick a random number, say 0.6, and the network gives you the expected reward (for each possible action) at the 60th percentile of the reward distribution (of that action).

    In order to act in the environment, you draw several samples and average them as a characterization of each action. I have no idea why this works so much better than just estimating the mean in the first place (other than the intuition I gave above), but it does, and the spaceship quickly lands safely.

    I have some more studying to do if I want to thoroughly understand why exactly it works, but I'm glad that I finally finished my quest to find solutions for LunarLander on both ends of the time vs. sample-efficiency trade-off.

    6 votes
  6. Comment on YAML: probably not so great after all in ~comp

    DataWraith
    Link Parent
    There's also Hjson, though I'm not sure I'd want to use either of them. It just seems strange to use JSON like that -- if you need to import a custom library to deal with a file format such as...

    There's also Hjson, though I'm not sure I'd want to use either of them.

    It just seems strange to use JSON like that -- if you need to import a custom library to deal with a file format such as HJSON, you may just as well use a library that reads a file format that is easier to read and modify for humans (INI, TOML, Dhall, etc.).

    2 votes
  7. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link
    I've been learning a lot about reinforcement learning. In particular, I've become somewhat obsessed with the OpenAI Gym LunarLander-v2 environment. As the name implies, your algorithm controls a...

    I've been learning a lot about reinforcement learning. In particular, I've become somewhat obsessed with the OpenAI Gym LunarLander-v2 environment. As the name implies, your algorithm controls a small spacecraft that is supposed to land on a landing pad in the center of the screen by firing its directional and main thrusters at appropriate times.

    From what I can tell, the environment is considered solved when your average score over the past 100 episodes reaches or exceeds 200. I've seen several reports of people solving the environment within 600 episodes, which is something I still can't do. Sometimes I suspect they don't use the same criterion for calling the environment solved, but that is hard to verify.

    There is an interesting tension between sample efficiency (few frames/episodes) and wall-clock time (few minutes). At the wall-clock time end of the spectrum, I implemented the Cross-Entropy Method with a linear policy, and it reliably solves the environment in about 10 minutes (on a single CPU), but it can run through 2000+ episodes while doing so.

    Aside: That it works so well is somewhat surprising; LunarLander-v2 is very well-suited for gradient-based algorithms due to its dense reward structure, but the Cross-Entropy Method is gradient-free. It works more like an Evolution Strategy or Genetic Algorithm in that it only cares about the sum of rewards over the entire episode and ignores the temporal distribution of the rewards.

    At the other extreme is Bootstrapped Dual Policy Iteration. From eyeballing the charts in the paper, it seems to come close to solving the environment within 600 episodes, but it is incredibly slow. It took 10 hours to simulate 1000 episodes on my machine, and sadly it had only reached a score average of about 130 at that time.

    I've thrown a lot of different algorithms at the problem over the past six months (REINFORCE, Proximal Policy Optimization, Augmented Random Search, Advantage Weighted Regression, several DQN variants, A2C, UDRL, and probably a couple more I'm forgetting -- I should get a life...). Some are quite complicated, and others surprisingly simple. At times it is extremely frustrating, because you have to re-read and re-read a paper until you figure out how exactly the pieces fit together, but once everything is in place and the spaceship lands, it feels great.

    3 votes
  8. Comment on reCAPTCHA: Is there method in monotony? in ~tech

    DataWraith
    Link Parent
    The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting...

    The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting Google Tech Talk about it in 2006. It's basically about how to motivate people to do work for free, such as by packaging it as a game.

    3 votes
  9. Comment on Europe Is Officially out of IPv4 Addresses in ~tech

    DataWraith
    Link Parent
    Yggdrasil is great, but a heads-up if you're new to it and trying it out: you probably want to configure the SessionFirewall in the config file, otherwise everyone else in the yggdrasil network...

    Yggdrasil is great, but a heads-up if you're new to it and trying it out: you probably want to configure the SessionFirewall in the config file, otherwise everyone else in the yggdrasil network can connect directly to anything listening on the yggdrasil tunnel interface (i.e. you lose the "protection" of NAT, just like real IPv6).

    2 votes
  10. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link
    I've been using PyTorch to train neural networks to do OCR on and off over the last few months. This was prompted by a problem we had at work that was not solvable using the open source OCR engine...

    I've been using PyTorch to train neural networks to do OCR on and off over the last few months.
    This was prompted by a problem we had at work that was not solvable using the open source OCR engine Tesseract due to a noisy document background.
    So I thought I'd apply my nascent machine-learning expertise to the problem in my spare time.

    OCR of machine printed documents is apparently considered a solved problem, so there is surprisingly little information to be found that goes beyond "use Tesseract".
    There are a bajillion scientific papers about text spotting (finding text in natural images) and scene-text recognition (recognizing said text in images once cropped) though. Most of them are kind of complicated...

    The simplest approach I found, Facebook's Rosetta, simply wires a convolutional neural network (Resnet18) to directly output character-class probabilities for each vertical slice of the image using CTC -- Connectionist Temporal Classification. That's a method that optimizes a neural network to output the correct character sequence without one having to annotate where each character is in the image.

    However, since CTC decoding ignores doubled characters (the "ll" in "hello", for instance), the networks I ended up training had a lot of difficulty producing them. CTC introduces a blank character to separate such doubled characters, but the network almost never managed to produce it, for reasons I have yet to understand.

    I was stuck for a long while, until I reluctantly gave up on the idea of simplicity and brought out the big guns: encoder-decoder LSTMs enhanced with an attention mechanism.

    And that finally worked.

    The next step is to apply the network design to the actual problem, not to the easier proxy-task I've been using for development.

    4 votes
  11. Comment on We’re excited to unveil Half-Life: Alyx, our flagship VR game, this Thursday at 10am Pacific Time. in ~games

    DataWraith
    Link Parent
    If difficulty is your only concern, you should definitively give it a try! Most of the videos on Youtube are done in the highest difficulty settings (or even custom maps, which tend to skew to...

    If difficulty is your only concern, you should definitively give it a try!

    Most of the videos on Youtube are done in the highest difficulty settings (or even custom maps, which tend to skew to insanely hard) because that's what looks impressive.

    The base game has five difficulty settings

    • Easy
    • Normal
    • Hard
    • Expert
    • Expert+

    Easy and normal are quite doable even for someone playing for the first time. Hard is challenging, you'll have to practice the songs. Expert is very hard; sometimes the notes come almost faster than I can physically move -- but I'm not a very fit person. Expert+ is beyond my abilities.

    The game also has a practice mode that lets you slow down the songs or disable obstacles.

    4 votes
  12. Comment on What is your favourite tv show ? in ~talk

    DataWraith
    Link
    Stargate SG-1. It had its highs and its lows, but it is still my overall favorite. It doesn't take itself too seriously and has a lot of fun moments, but can still build suspense despite you...

    Stargate SG-1. It had its highs and its lows, but it is still my overall favorite. It doesn't take itself too seriously and has a lot of fun moments, but can still build suspense despite you knowing things will work out in the end.

    2 votes
  13. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link Parent
    That's incredibly cool! I've been reading many semantic segmentation papers lately while trying to improve an NN for localizing key fields on badly scanned documents. From what I read, DeepLab v3...

    That's incredibly cool!

    I've been reading many semantic segmentation papers lately while trying to improve an NN for localizing key fields on badly scanned documents. From what I read, DeepLab v3 is apparently considered a bit heavy/slow, so many lighter models have been devised for real-time use on drones or vehicles.

    As an aside: the semantic segmentation images in your YouTube video remind me of Stanley, Stanford's winning entry to the second DARPA Grand Challenge -- they used their LIDAR scanners to map out flat terrain in front of the vehicle and then extrapolated from that what the road looked like all the way to the horizon in real-time. It's amazing that that can be done from monocular images nowadays.

    3 votes
  14. Comment on Fortnightly Programming Q&A Thread - 2019W40 in ~comp

    DataWraith
    Link Parent
    arbtt keeps track of what windows are on the screen and logs it to a file you can later query; I'm not sure whether it tracks inactive windows or how it does it, but this might be a program to dig...

    The other thing is trying to determine if I am on a bit of a fools errand for an idea I had. I would like to make a time tracker that can keep track of not only what application is currently in focus, but also what applications are out of focus but visible on the current desktop.

    arbtt keeps track of what windows are on the screen and logs it to a file you can later query; I'm not sure whether it tracks inactive windows or how it does it, but this might be a program to dig into for inspiration.

    But, are you sure you need that level of granularity? I found arbtt to be pretty invasive because it keeps track of everything (e.g. browsed websites via the window titles). Writing the rules required for fine-grained tracking (e.g. "if I'm in $EDITOR and working in ~/code/foo then that is project X") is also a bit of a pain.

    My favorite time tracker is TagTime, which just pops up a "What are you doing?"-window at random. It isn't precise enough for billing clients, but I found it to be great for getting a coarse view of where time goes.

    4 votes
  15. Comment on Programming Q&A Thread in ~comp

    DataWraith
    Link Parent
    This may sound obvious, but consider doing the same small project in several different programming languages. This allows you to compare and contrast what you like and dislike. In my experience,...

    I also want to break out of solely just using python but I honestly can't break out of it.

    This may sound obvious, but consider doing the same small project in several different programming languages. This allows you to compare and contrast what you like and dislike.

    In my experience, something of limited scope, but that is non-trivial, works best to learn a new programming language. I like to write game-playing programs; those can be as simple as Noughts and Crosses or Connect4 or as complicated as a chess engine.

    The best task I found was writing bots for online challenges such as Vindinium (sadly defunct now) or Halite (possibly defunct now?). It's really motivating to climb the leaderboards, and it is a task that is challenging enough to illuminate the potential of each programming language.

    If that is too time-consuming, you could look at Exercism. They have small exercises in just about any programming language.

    1 vote
  16. Comment on Programming Q&A Thread in ~comp

    DataWraith
    Link Parent
    I'm not sure if I qualify as someone who has good design, so take the following with a grain of salt, but I find it useful to do outside-in or "wish driven" development. I think Peter Norvig...

    I'm not sure if I qualify as someone who has good design, so take the following with a grain of salt, but I find it useful to do outside-in or "wish driven" development. I think Peter Norvig articulated the idea best, but unfortunately I can't seem to find the source again right now, so I might be mistaken.

    The idea is to start with a high-level wish, "I wish the program did X". And then you write a class or procedure that does X by making additional wishes: "I wish I already had a function/class/subroutine that did Y". You complete functionality X by making use of non-existent functionality Y. Then the compiler yells at you that Y doesn't exist, so you write Y the same way you wrote X, recursing until the program is complete.

    This can be formalized using Unit tests (c.f. Test Driven Design). Instead of merely saying "I wish I had X", you write a test that establishes a contract for what X does before building it. This makes you think hard about what X's responsibilities are.

    While I often have a general idea of what I want to build, the specifics tend to only emerge as I gain understanding of the task by solving parts of it. When I'm not under time-pressure, I often go as far as writing a spike, an initial version of the program that is meant to be thrown away once you learn from it what does and doesn't work.

    5 votes
  17. Comment on What's Your Cloud/Syncing Setup for Files, Pics, Mail, Bookmarks, etc? in ~tech

    DataWraith
    Link
    I've looked into the problem of synchronizing files quite extensively. If you're worried about losing a device, it is best to use full-disk encryption when possible. As for access from a device...

    I've looked into the problem of synchronizing files quite extensively.

    If you're worried about losing a device, it is best to use full-disk encryption when possible. As for access from a device you don't own... that is a tricky problem that I have no solution for.

    For the mere file-synchronization part, I want to mention three programs I've used/am using and their pros and cons as I see them though. Depending on your use case, one of them might fit the bill (at least for files; Calendar and Emails, etc. are a different matter entirely.)

    Syncthing

    Syncthing is basically Dropbox minus the cloud storage. Written in Go, it's simple to install and use and, from what I can tell, it is quite secure. The downside is the minus the cloud storage part. You have to have at least two machines running for the data transfer to work, and while you can shard your files into different folders that are synchronized independently, it can still be a pain to make sure everything is everywhere it is supposed to be. There is an Android app that will let you move photos off-device.

    I'm no longer using Syncthing.

    KBFS

    Keybase is currently making headlines with its XLM drop, but the app does have a built-in storage mechanism that gives you 250GB of free, (apparently) S3-backed, storage, end-to-end encrypted. It works quite well; however, as I understand it, the local devices don't hold onto the data (except for a cache), so everything usually goes across the internet. It's nice for backups or selectively sharing files with other people, but sync isn't near-instantaneous, so you can't simply use it as a Dropbox replacement. It also crapped out often in the beginning; this has gotten much better, but if it does fail, you need to notice and restart the app.

    git-annex

    git-annex is... complicated. Complicated but very powerful. It combines git with a content-addressed storage system. The main selling-point of git-annex is that it supports tracking where what file is and how many copies you have. It also supports a lot of different backends: S3, Backblaze B2, KBFS, a plain SFTP server, whatever. Everything can optionally be encrypted. It allows you to utilize just about any file storage or sync service you can think of, including Sneakernet.

    Checkouts can be partial, so you don't have to store everything everywhere -- it is very convenient to be able to git annex drop $FILE to get rid of something large on my limited disk-space laptop, knowing that the file is still backed up in at least three other places. Similarly, if you're missing a file, it's a git annex get away; the program will fetch it from the cheapest available source. git-annex can import podcasts and download from YouTube and Bittorrent as well. It even has a dropbox-style synchronization function, the git-annex assistant. That never worked very well for me though.

    The downside is that you're likely to get things wrong the first (or second) time you try to set everything up, especially if your data spans across different operating systems. Be sure to read the documentation thoroughly before you start. There also doesn't appear to be an Android app, so you need an actual computer to access your files. Windows works, but the NTFS filesystem isn't ideal for git-annex.

    12 votes
  18. Comment on What are some of the most emotionally affecting or resonant games you've played? in ~games

    DataWraith
    Link
    The most emotionally affecting game I have played so far is Life is Strange, Episode I. It's one of the few instances in a game where I actually genuinely cared about the characters. And that is...

    The most emotionally affecting game I have played so far is Life is Strange, Episode I.

    It's one of the few instances in a game where I actually genuinely cared about the characters. And that is the problem, because bad things happen to those characters, and you might be to blame, at least partially.

    The game is largely set at a school, and the protagonist gains the limited ability to rewind time in the first few minutes of gameplay. The game, from then on, mostly consists of endlessly repeating every scene and exploring what happens if you do or say things differently.

    At first, your choices are harmless: do you turn back time to give a correct answer to your favorite teacher? Are you going to use your ability to get even with a bully? Do you water your potted plant or not?

    At some point you have to decide and move on with the story, committing to one of the choices. As the game goes on, the choices become more momentous and can have dire consequences, and worst of all, your past choices start to come back and haunt you, especially the ones where you thought you did the right thing... it's quite upsetting, which is why I never attempted to play Episode II.

    9 votes
  19. Comment on Programming Challenge: Convert between units in ~comp

    DataWraith
    Link
    To me this sounds like a graph-search problem: I'd make every measurement unit a vertex in the graph and connect them with edges annotated with the multiplier necessary to go from one unit to the...

    To me this sounds like a graph-search problem: I'd make every measurement unit a vertex in the graph and connect them with edges annotated with the multiplier necessary to go from one unit to the other. Then you can simply do a shortest-path search from start unit to goal unit (assuming that there is a path) and convert the initially given amount along the way.

    I whipped that solution up into a quick Ruby script, but it could probably be made nicer.

    require 'bigdecimal'
    require 'set'
    
    # Define conversions
    CONVERSIONS = [
      # 1e9nm in a meter
      [BigDecimal(1_000_000_000), "nm", "m"],
    
      # 10cm in a decimeter
      [BigDecimal(10), "cm", "dm"],
    
      # 10decimeter in a meter
      [BigDecimal(10), "dm", "m"] 
    
      # Inserting all conversions is left as an exercise for the reader
    ]
    
    # Define unit mapping (name => integer)
    UNITS = CONVERSIONS.flat_map { |c| c[1, 2] }.uniq.map.with_index { |c, idx| [c, idx]}.to_h
    
    # Create graph 
    graph = Array.new(UNITS.size) { Array.new(UNITS.size, nil) }
    
    # Fill graph with conversions
    CONVERSIONS.each do |multiplier, to, from|
      graph[UNITS[from]][UNITS[to]] = multiplier
      graph[UNITS[to]][UNITS[from]] = BigDecimal(1) / multiplier
    end
    
    # Do a BFS for the right unit
    def convert(graph, quantity, from, to)
      q = [[UNITS[from], quantity]]
      s = Set.new
    
      while q.size > 0
        cur_unit, cur_value = q.shift
        return cur_value if cur_unit == UNITS[to]
        s << cur_unit
    
        graph[cur_unit].each_with_index do |to, idx|
          next if to.nil?
          next if s.include?(idx)
          q << [idx, cur_value * to]
        end    
      end
    
      nil
    end
    
    print "10 nm in nm: "
    puts convert(graph, BigDecimal(10), "nm", "nm").to_s("10F") # 10
    
    print "100 cm in m: "
    puts convert(graph, BigDecimal(100), "cm", "m").to_s("10F") # 1
      
    print "1cm in nm: "
    puts convert(graph, BigDecimal(1), "cm", "nm").to_s("10F") # 10000000
    
    7 votes
  20. Comment on How would trade and economics work in a space opera setting with FTL travel but no FTL communication? in ~hobbies

    DataWraith
    (edited )
    Link
    I've been thinking about how communication might work in a FTL scenario like this a while back. It's somewhat unorganized, and there may be contradictions, because I didn't think too seriously...

    I've been thinking about how communication might work in a FTL scenario like this a while back.

    It's somewhat unorganized, and there may be contradictions, because I didn't think too seriously about it, but I thought it might fit this thread.

    Encryption

    Let's assume powerful computers can crack almost any encryption scheme. There is some research into post-quantum cryptography that would withstand a working quantum computer, but let's assume that even those will be cracked. So we will need to fall back to the One Time Pad, the only theoretically unbreakable encryption.

    Key distribution

    Obviously it is relatively easy to ship the equivalent of a harddrive with encryption keys to someone else if you're a goverment or otherwise have access to a courier ship. You probably don't trust any single courier with the keys to the kingdom, so you split the key material up into several pieces that can be recombined (using XOR) by the ultimate recipient of the key.

    Then there is the question of how to validate the key material. This depends on how the encryption-cracking computers work, but I guess in the worst case, you'd send some armed goons along with the harddrive who keep an eye on it to ensure it is not tampered with.

    Message compression

    Communication becomes somewhat expensive if you have to rely on consumable encryption keys. So messages would probably be sent in the old telegram style, with audio and video reserved for matters of importance.

    Money

    Once you have secure communication, you can theoretically send money. It might be limited to one bank, because there will perhaps not be a n-by-n matrix of encryption keys for every bank pair, so you'd have a separate currency for every bank out there.

    Message distribution

    But how do you get a message from A to B if you cannot afford a courier ship? Using a P2P network!

    Given that even today, we can fit a lot of data into small spaces, it is likely that any ship computer will have free space. Messages can hitch a ride on any ship that is going anywhere by attaching a 'recipient pays x amount of money on delivery, to the first one delivering'. The computer can then decide whether it is worth carrying a kilobyte-message (and how many of them), in case its route matches the recipient address. More remote deliveries would be worth more money, and a chain of intermediate payments could make it more likely for a message to arrive -- kind of like onion routing with financial incentives. Whenever two ships (or a ship and a station) meet, they'd exchange mail. The mail would simply expire after a while if storage got full, with the least valuable deleted first, of course.

    5 votes