DataWraith's recent activity

  1. Comment on Riot Games' new Vanguard anti-cheat system for Valorant involves a kernel mode driver that launches at boot, raising security concerns in ~games

    DataWraith
    (edited )
    Link Parent
    You're right, there is no bulletproof way (as I already conceded), but I thought interleaving the checksums was a neat technical trick -- cracking that is not as simple as returning pre-recorded...

    You're right, there is no bulletproof way (as I already conceded), but I thought interleaving the checksums was a neat technical trick -- cracking that is not as simple as just stubbing out all the call-sites with dummy returns returning pre-recorded hashes. (Edit: I realized that stubbing out all the call-sites -- assuming you can find them -- would accomplish a crack...)

    What I'm proposing looks more like hacking the function that determines the checksum.

    The Gamasutra article I linked talks about that. The point is that the checksums cross-check each other so you can't modify one checksum calculation (to make it lie about the checksum) without modifying all others too, because they'll detect the modification of the first one. It's not bulletproof, but it takes time to crack.

    3 votes
  2. Comment on Riot Games' new Vanguard anti-cheat system for Valorant involves a kernel mode driver that launches at boot, raising security concerns in ~games

    DataWraith
    Link Parent
    This is only tangentially related, but I thought it was fascinating: You can interleave multiple checksums to make it harder to modify any single one, which is what was done for the Spyro: Year of...

    This is only tangentially related, but I thought it was fascinating: You can interleave multiple checksums to make it harder to modify any single one, which is what was done for the Spyro: Year of the Dragon copy protection:

    multiple checksums were applied to the same data. Each checksum used a different start offset into the data, and stepped through the data by different amounts. This meant that overlapping and interleaved sections of data were checksummed at different points, making it almost impossible to alter anything and still have all the checksums add up.

    (Source)

    That said, even that was eventually cracked -- as you say, there is likely nothing that can be done to prevent user modifications once they are in possession of the binary.

    3 votes
  3. Comment on Neuroevolution of Self-Interpretable Agents in ~comp

    DataWraith
    Link
    This is an incredibly cool and creative project. They take a flaw in human mental processing (attentional blindness) and make a virtue out of it by forcing the learning agent to pay attention to...

    This is an incredibly cool and creative project.

    They take a flaw in human mental processing (attentional blindness) and make a virtue out of it by forcing the learning agent to pay attention to no more than 10 image patches at once ("attention bottleneck"). This results in a neural network that has very few parameters, making it possible to evolve its weights with an evolutionary algorithm. And since it only pays attention to 10 different things, its 'thought process' can even be visualized.

    NOTE: Page has flickering images

    3 votes
  4. Comment on Self-hosting a tiny git remote in ~comp

    DataWraith
    Link
    The author has a server running 24/7 so I'm not sure what using Syncthing on top of that really accomplishes vs. just using SSH to host bare repos on that server. For my own use, I've set up...

    The author has a server running 24/7 so I'm not sure what using Syncthing on top of that really accomplishes vs. just using SSH to host bare repos on that server.

    For my own use, I've set up gitolite on a Raspberry Pi, and it is fantastic. I especially like the feature that you don't have to create a repository manually before cloning it, as an empty one gets automatically created just-in-time. This is compatible with my endless stream of small projects I don't necessarily want to share.

    3 votes
  5. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link
    I've been playing around with Snorkel. Snorkel is a project from Stanford DAWN; DAWN aims to democratize AI by making it easier for hobbyists or smaller companies to build AI-powered applications....

    I've been playing around with Snorkel.

    Snorkel is a project from Stanford DAWN; DAWN aims to democratize AI by making it easier for hobbyists or smaller companies to build AI-powered applications.

    One of the problems DAWN identified for small-scale practitioners (apart from the lack of compute power) is the lack of training data. Companies like Google have a huge workforce that labels training data for their machine learning pipelines (even involuntarily through ReCaptcha), but as a hobbyist, I don't.

    Snorkel tries to improve the situation through data programming: you write small functions in Python that heuristically classify an unlabeled dataset. If you did this naively (as I have attempted in the past) you probably tried to assign each heuristic function a score-value ("if this review contains 'worst' that's -3 points on the sentiment axis"), but then you need to tune those scores manually, which gets intractable quickly as their interactions multiply.

    Snorkel's approach is to learn a probabilistic graphical model of your heuristics -- it measures when and how they conflict and estimates how reliable they are. This works entirely on unlabeled data, and the result is a set of pseudo-labels for the entire dataset you can then train an ML model with. The idea here is that the trained model might learn to generalize beyond what the labeling functions themselves provide.

    I've so far experimented with an IMDB sentiment dataset, and after a few hours of fiddling and with ~100 simple keyword heuristics, reached an accuracy of 78%.
    I think that is quite good given that I did not need to hand-label 25000 movie reviews -- using the provided ground-truth labels works better of course (86% accuracy on the testset), but the entire point of Snorkel is that you don't need perfect ground-truth annotations to train a useful model.

    7 votes
  6. Comment on Thoughts on performance & optimization in ~comp

    DataWraith
    Link Parent
    This reminded me of a lecture game developer Casey Muratori gave a while back. It's long, and it's been a while since I've seen it (so the following may be inaccurate), but the thesis he puts...

    Embedded programming and game consoles are an exception since then you're targeting fixed hardware, so it's easier. Sometimes I wonder if we should have VM's that provide guaranteed, standardized performance like game emulation engines?

    This reminded me of a lecture game developer Casey Muratori gave a while back. It's long, and it's been a while since I've seen it (so the following may be inaccurate), but the thesis he puts forward is that there are too many lines of code between a game and the hardware (operating system, graphics drivers, etc.), which makes it impossible to squeeze every last drop of performance out as you would have to understand the whole stack -- and of course the stack is different for different users.

    In the end he proposes a versioned hardware platform, sort of like x86 provides a versioned instruction set. He envisions a game console-like system that has fixed specs and is programmed basically from bare-metal up, allowing you to have full control over the machine in order to eke out the last bit of performance -- I like to think of the idea as a standardized machine for running game-unikernels.

    I'm not sure how practical all this is (probably not very), but it's still fascinating to think about.

    5 votes
  7. Comment on Signal is finally bringing its secure messaging to the masses in ~tech

    DataWraith
    Link Parent
    Yes, that makes sense. I find it interesting that the SMS/data plan balance is exactly inverted in the two countries. If you go over your data allowance here, you're generally throttled to a lower...

    Yes, that makes sense.

    I find it interesting that the SMS/data plan balance is exactly inverted in the two countries. If you go over your data allowance here, you're generally throttled to a lower speed but don't lose access or have to pay extra. SMS on the other hand cost about 0.10€ per message.

    2 votes
  8. Comment on Signal is finally bringing its secure messaging to the masses in ~tech

    DataWraith
    Link Parent
    Thank you for that counterpoint, maybe I'm simply caught up in my own filter bubble. I thought Riot was leagues ahead of Signal, even for one-on-one chats, but with the caveat that I haven't used...

    Thank you for that counterpoint, maybe I'm simply caught up in my own filter bubble.

    I thought Riot was leagues ahead of Signal, even for one-on-one chats, but with the caveat that I haven't used the latter in a while. I generally don't really use SMS anymore, and that seems to hold for most people I know (Germany) -- WhatsApp is free, whereas SMS still cost money. You generally get some free per month (depending on your contract), but those can be used up quickly in a back-and-forth chat, so people are almost exclusively using WhatsApp, rarely Telegram or, in my case, Matrix.

    I'm sorry to hear you had such a bad time on Matrix; I mostly use it to communicate with a few trusted friends (using a self-hosted server), and we're also using it for communication at work as a Slack replacement.

    The porn channels struck me as mere annoyance; there is going to be some spam in a decentralized system after all. I can see how this can be a turn-off, but then again, I'm not browsing random channels very often, so it's rarely a problem.

    5 votes
  9. Comment on Signal is finally bringing its secure messaging to the masses in ~tech

    DataWraith
    Link
    On the one hand I admire Signal for its cryptography and privacy features in general, but on the other hand, the user experience always has been severely lacking. The article sounds like they are...

    On the one hand I admire Signal for its cryptography and privacy features in general, but on the other hand, the user experience always has been severely lacking.

    The article sounds like they are trying to change exactly that, but I think it's too little, too late. A billion users within five years sounds downright ludicrous as a goal.

    Personally I stopped using it a long time ago because of their restrictive "you may not use more than three desktop clients"-policy. Dual-booting a single machine of course takes up two slots, so I was limited to two machines total.

    That limit was apparently since raised to five, but it's still strange. Is it a technical limitation? Is it so you can recognize when foreign devices are added to your account? They could at least tell you why the limit is in place, but "There is a limit of 5 linked devices. Confirm you have not hit this limit." is all I could find about it on their website just now.

    Call me a pessimist, but with WhatsApp having introduced e2e crypto and Matrix.org rapidly approaching maturity (with device cross-signing and end-to-end by-default on the horizon), I can only see Signal dying a slow death.

    11 votes
  10. Comment on What is your favorite opening scene in a movie? in ~movies

    DataWraith
    (edited )
    Link
    I love the opening scene of Contact. Technically this could be construed as spoiler A view of the earth. Silence, then rock music. A slow zoom out from earth through the entire solar system, as...

    I love the opening scene of Contact.

    Technically this could be construed as spoiler A view of the earth. Silence, then rock music. A slow zoom out from earth through the entire solar system, as the radio transmissions get farther and farther back in history. Then the camera is so far away from earth that any sign of humanity is absent. More silence. The entire milky way comes into view. Then you see that it is just a single galaxy among its peers, and not necessarily the grandest one. And then even that fades into obscurity as we move away from it. Millions of galaxies fill the screen. And then we end up transitioning back to earth through a reflection in the eye of young Ellie Arroway, at the very start of her story...
    8 votes
  11. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link
    I finally managed to beat LunarLander-v2 (see my last post for details) in about 400 episodes and about 90 minutes of runtime on a CPU. The winning solution came a bit unexpected, as it is a Deep...

    I finally managed to beat LunarLander-v2 (see my last post for details) in about 400 episodes and about 90 minutes of runtime on a CPU.

    The winning solution came a bit unexpected, as it is a Deep Q-Network variant, Implicit Quantile Networks. The big difference between DQN and IQN is that the latter is a distributional algorithm. That means that it does not just try to estimate the mean of the rewards for each action, but the entire distribution, which helps when the reward distribution is multimodal. If you imagine an action that could either be very good or very bad, then simply taking the mean is going to be an inaccurate characterization of that action.

    Contrary to DQN, I find IQN to be difficult to comprehend. From what I understand, they work by transforming uniform random samples from the [0, 1]-interval into estimates of the reward distribution at the sampled quantile. That is, you pick a random number, say 0.6, and the network gives you the expected reward (for each possible action) at the 60th percentile of the reward distribution (of that action).

    In order to act in the environment, you draw several samples and average them as a characterization of each action. I have no idea why this works so much better than just estimating the mean in the first place (other than the intuition I gave above), but it does, and the spaceship quickly lands safely.

    I have some more studying to do if I want to thoroughly understand why exactly it works, but I'm glad that I finally finished my quest to find solutions for LunarLander on both ends of the time vs. sample-efficiency trade-off.

    6 votes
  12. Comment on YAML: probably not so great after all in ~comp

    DataWraith
    Link Parent
    There's also Hjson, though I'm not sure I'd want to use either of them. It just seems strange to use JSON like that -- if you need to import a custom library to deal with a file format such as...

    There's also Hjson, though I'm not sure I'd want to use either of them.

    It just seems strange to use JSON like that -- if you need to import a custom library to deal with a file format such as HJSON, you may just as well use a library that reads a file format that is easier to read and modify for humans (INI, TOML, Dhall, etc.).

    2 votes
  13. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link
    I've been learning a lot about reinforcement learning. In particular, I've become somewhat obsessed with the OpenAI Gym LunarLander-v2 environment. As the name implies, your algorithm controls a...

    I've been learning a lot about reinforcement learning. In particular, I've become somewhat obsessed with the OpenAI Gym LunarLander-v2 environment. As the name implies, your algorithm controls a small spacecraft that is supposed to land on a landing pad in the center of the screen by firing its directional and main thrusters at appropriate times.

    From what I can tell, the environment is considered solved when your average score over the past 100 episodes reaches or exceeds 200. I've seen several reports of people solving the environment within 600 episodes, which is something I still can't do. Sometimes I suspect they don't use the same criterion for calling the environment solved, but that is hard to verify.

    There is an interesting tension between sample efficiency (few frames/episodes) and wall-clock time (few minutes). At the wall-clock time end of the spectrum, I implemented the Cross-Entropy Method with a linear policy, and it reliably solves the environment in about 10 minutes (on a single CPU), but it can run through 2000+ episodes while doing so.

    Aside: That it works so well is somewhat surprising; LunarLander-v2 is very well-suited for gradient-based algorithms due to its dense reward structure, but the Cross-Entropy Method is gradient-free. It works more like an Evolution Strategy or Genetic Algorithm in that it only cares about the sum of rewards over the entire episode and ignores the temporal distribution of the rewards.

    At the other extreme is Bootstrapped Dual Policy Iteration. From eyeballing the charts in the paper, it seems to come close to solving the environment within 600 episodes, but it is incredibly slow. It took 10 hours to simulate 1000 episodes on my machine, and sadly it had only reached a score average of about 130 at that time.

    I've thrown a lot of different algorithms at the problem over the past six months (REINFORCE, Proximal Policy Optimization, Augmented Random Search, Advantage Weighted Regression, several DQN variants, A2C, UDRL, and probably a couple more I'm forgetting -- I should get a life...). Some are quite complicated, and others surprisingly simple. At times it is extremely frustrating, because you have to re-read and re-read a paper until you figure out how exactly the pieces fit together, but once everything is in place and the spaceship lands, it feels great.

    3 votes
  14. Comment on reCAPTCHA: Is there method in monotony? in ~tech

    DataWraith
    Link Parent
    The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting...

    The concept is referred to as "Human Computation" by Luis von Ahn, the original founder of reCAPTCHA and co-founder of Duolingo (which also makes use of human computation). He gave an interesting Google Tech Talk about it in 2006. It's basically about how to motivate people to do work for free, such as by packaging it as a game.

    3 votes
  15. Comment on Europe Is Officially out of IPv4 Addresses in ~tech

    DataWraith
    Link Parent
    Yggdrasil is great, but a heads-up if you're new to it and trying it out: you probably want to configure the SessionFirewall in the config file, otherwise everyone else in the yggdrasil network...

    Yggdrasil is great, but a heads-up if you're new to it and trying it out: you probably want to configure the SessionFirewall in the config file, otherwise everyone else in the yggdrasil network can connect directly to anything listening on the yggdrasil tunnel interface (i.e. you lose the "protection" of NAT, just like real IPv6).

    2 votes
  16. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link
    I've been using PyTorch to train neural networks to do OCR on and off over the last few months. This was prompted by a problem we had at work that was not solvable using the open source OCR engine...

    I've been using PyTorch to train neural networks to do OCR on and off over the last few months.
    This was prompted by a problem we had at work that was not solvable using the open source OCR engine Tesseract due to a noisy document background.
    So I thought I'd apply my nascent machine-learning expertise to the problem in my spare time.

    OCR of machine printed documents is apparently considered a solved problem, so there is surprisingly little information to be found that goes beyond "use Tesseract".
    There are a bajillion scientific papers about text spotting (finding text in natural images) and scene-text recognition (recognizing said text in images once cropped) though. Most of them are kind of complicated...

    The simplest approach I found, Facebook's Rosetta, simply wires a convolutional neural network (Resnet18) to directly output character-class probabilities for each vertical slice of the image using CTC -- Connectionist Temporal Classification. That's a method that optimizes a neural network to output the correct character sequence without one having to annotate where each character is in the image.

    However, since CTC decoding ignores doubled characters (the "ll" in "hello", for instance), the networks I ended up training had a lot of difficulty producing them. CTC introduces a blank character to separate such doubled characters, but the network almost never managed to produce it, for reasons I have yet to understand.

    I was stuck for a long while, until I reluctantly gave up on the idea of simplicity and brought out the big guns: encoder-decoder LSTMs enhanced with an attention mechanism.

    And that finally worked.

    The next step is to apply the network design to the actual problem, not to the easier proxy-task I've been using for development.

    4 votes
  17. Comment on We’re excited to unveil Half-Life: Alyx, our flagship VR game, this Thursday at 10am Pacific Time. in ~games

    DataWraith
    Link Parent
    If difficulty is your only concern, you should definitively give it a try! Most of the videos on Youtube are done in the highest difficulty settings (or even custom maps, which tend to skew to...

    If difficulty is your only concern, you should definitively give it a try!

    Most of the videos on Youtube are done in the highest difficulty settings (or even custom maps, which tend to skew to insanely hard) because that's what looks impressive.

    The base game has five difficulty settings

    • Easy
    • Normal
    • Hard
    • Expert
    • Expert+

    Easy and normal are quite doable even for someone playing for the first time. Hard is challenging, you'll have to practice the songs. Expert is very hard; sometimes the notes come almost faster than I can physically move -- but I'm not a very fit person. Expert+ is beyond my abilities.

    The game also has a practice mode that lets you slow down the songs or disable obstacles.

    4 votes
  18. Comment on What is your favourite tv show ? in ~talk

    DataWraith
    Link
    Stargate SG-1. It had its highs and its lows, but it is still my overall favorite. It doesn't take itself too seriously and has a lot of fun moments, but can still build suspense despite you...

    Stargate SG-1. It had its highs and its lows, but it is still my overall favorite. It doesn't take itself too seriously and has a lot of fun moments, but can still build suspense despite you knowing things will work out in the end.

    2 votes
  19. Comment on What programming/technical projects have you been working on? in ~comp

    DataWraith
    Link Parent
    That's incredibly cool! I've been reading many semantic segmentation papers lately while trying to improve an NN for localizing key fields on badly scanned documents. From what I read, DeepLab v3...

    That's incredibly cool!

    I've been reading many semantic segmentation papers lately while trying to improve an NN for localizing key fields on badly scanned documents. From what I read, DeepLab v3 is apparently considered a bit heavy/slow, so many lighter models have been devised for real-time use on drones or vehicles.

    As an aside: the semantic segmentation images in your YouTube video remind me of Stanley, Stanford's winning entry to the second DARPA Grand Challenge -- they used their LIDAR scanners to map out flat terrain in front of the vehicle and then extrapolated from that what the road looked like all the way to the horizon in real-time. It's amazing that that can be done from monocular images nowadays.

    3 votes