9 votes

What programming/technical projects have you been working on?

This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?

20 comments

  1. [15]
    crdpa
    (edited )
    Link
    I wrote a XKCD password generator for fun today. I'm a beginner so feel free to give me tips. Here it is. Edit: fixed the program to use secrets module instead of random. Doing this, found another...

    I wrote a XKCD password generator for fun today. I'm a beginner so feel free to give me tips.

    Here it is.

    Edit: fixed the program to use secrets module instead of random. Doing this, found another problem and took some time to come up with a solution, but i think it's ready now. Thanks everyone!

    5 votes
    1. [4]
      Deimos
      (edited )
      Link Parent
      A bunch of advice (a lot of this will be overkill for such a simple script, but I'm assuming you're interested in more general Python/programming advice): Generally, you should try not to write...
      • Exemplary

      A bunch of advice (a lot of this will be overkill for such a simple script, but I'm assuming you're interested in more general Python/programming advice):

      Generally, you should try not to write code in Python at the "top level"—that is, almost everything should all be inside a function or class. The main reason for this is that if something else imports your code (similar to the way you did import secrets), all of the top-level code in the imported module (file) gets executed at that point. So hypothetically, if someone imported your program into a different one to use one of its functions, it would run the code (and almost certainly cause the other program to crash). The "proper"/idiomatic way to structure the program is to have a function named main() and call it like this:

      def main():
          # main program code goes in here
          ...
      
      if __name__ == "__main__":
          main()
      

      The two lines at the bottom are a (totally unintuitive) way of saying "if this program is being executed directly (not imported), call main()". Almost every program that's intended to be run as a script should be structured this way.

      Similarly, you should try to avoid global variables, and especially be careful about modifying them. In this case, a lot of it is just because you wrote the program globally like I described above, but you should still be careful about doing things like removing words from allWords by using .pop(), especially when you're making changes inside a function that didn't create those variables itself. It's hard to keep track of what's happening in a program if functions are changing "outside" variables without them being passed in or returned.

      For example, imagine that you decided to change the program to be able to generate multiple passwords. This would be awkward with your current method, because each time you generate a password, all of the words used in it get removed from allWords, and you might end up with any further passwords losing access to those words (or needing to re-generate the word list each time). It would be better to try to keep all the words in the list. I'll come back to this later with specific advice about what to do instead.

      Also, be careful about putting too much into one function, especially when it makes the function's name no longer describe what it's doing. Your only function is createDict(), but it's doing a lot more than creating a dict. It also generates the password, so it would be much better to have at least two functions: one for generating the word list, and one that uses that word list to generate a password.

      Try to make sure you're using the most appropriate types as much as possible too, since Python has a very good standard library and it will generally have useful functions available as long as your data is in the right type. For example, you're using a dict for the word list, but should just be using a list, since the keys aren't really needed for anything. That makes it easy to use the secrets.choice() function to get a random word, instead of needing to generate a random number for the key.

      Without getting off into the weeds and messing around with some of the more tangential stuff like command-line arguments, here's how I'd write the program:

      import secrets
      import sys
      
      
      def build_word_list(filename):
          word_list = []
      
          with open(filename) as words_file:
              for line in words_file:
                  word = line.strip()
      
                  # only include words without a hyphen and at least 4 characters long
                  if "-" not in word and len(word) >= 4:
                      word_list.append(word)
      
          return word_list
      
      
      def generate_password(word_list, num_words, sep="-"):
          chosen_words = []
      
          for _ in range(num_words):
              random_word = secrets.choice(word_list)
      
              # keep choosing words until we get one that we haven't already chosen
              while random_word in chosen_words:
                  random_word = secrets.choice(word_list)
      
              chosen_words.append(random_word)
      
          return sep.join(chosen_words)
      
      
      def main():
          words_filename_by_language = {
              "en": "words.txt",
              "br": "palavras.txt",
          }
      
          # display help and exit if not called with a valid language code
          if len(sys.argv) != 2 or sys.argv[1] not in words_filename_by_language:
              script_name = sys.argv[0]
              print(f"""Usage:
                  {script_name} br (brazilian portuguese)
                  {script_name} en (english)
              """)
              exit()
      
          word_list = build_word_list(words_filename_by_language[sys.argv[1]])
          password = generate_password(word_list, num_words=4)
      
          print(password)
      
      
      if __name__ == "__main__":
          main()
      

      I tried to write it in a straightforward way and not change anything significantly from how your program currently works. There would definitely be fancier ways to do some of this stuff (for example, if you want to see how to do command-line arguments "properly", take a look at the argparse module).

      I could write a bunch more about why I did things a certain way or the specific changes I made, but I've already rambled a lot. Feel free to ask questions about anything though, and I'll be happy to explain more.

      8 votes
      1. crdpa
        Link Parent
        Wow! Thanks a lot! This was a huge help. I always wondered about the main in python code i saw out there because i can understand having a main function like other languages have (i was learning...

        Wow! Thanks a lot! This was a huge help.

        I always wondered about the main in python code i saw out there because i can understand having a main function like other languages have (i was learning Go at some point), but since i saw a lot of python code out there with and without it, i got confused. I thought people made Python code like shell scripts.

        So, anyway, your explanation really clears a lot of things for me. I will read again tomorrow morning with more time and come back to you if i have any questions.

        Thanks a lot!

        4 votes
      2. [2]
        crdpa
        (edited )
        Link Parent
        Since the number of words is always 4 (following XKCD guidelines) why not put num_words=4 in the arguments of the generate_passwords function like you did with the separator? Also, is there any...

        Since the number of words is always 4 (following XKCD guidelines) why not put num_words=4 in the arguments of the generate_passwords function like you did with the separator?

        Also, is there any book or resource i can read about good practices when programming? Like when to put something somewhere and other things you described.

        2 votes
        1. Deimos
          Link Parent
          Adding 4 as the default for num_words would be totally fine, there's not really any strong reason I did it that way. It felt like something that made sense for the caller to need to specify, but...

          Adding 4 as the default for num_words would be totally fine, there's not really any strong reason I did it that way. It felt like something that made sense for the caller to need to specify, but having a default makes sense too.

          There are a ton of books and sites and articles about programming practices, but I think one of the difficult parts is that it's often hard to tell whether particular advice is actually good or not. It's probably been 10 years since I read it now, but I remember Code Complete being a good book with a lot of solid general advice. A lot of it will just come from experience though, learning what makes programs easy or hard to work with (in both your own programs and others').

          3 votes
    2. [10]
      whbboyd
      Link Parent
      I see you're using random for your randomness source—it uses an insecure pseudorandom number generator, and isn't appropriate for use in secure contexts, e.g. generating passwords. The...

      I see you're using random for your randomness source—it uses an insecure pseudorandom number generator, and isn't appropriate for use in secure contexts, e.g. generating passwords. The documentation includes a warning to this effect, suggesting the use of the secrets module, instead, which has a similar-enough API I believe you can just replace it.

      This is actually a really common footgun in many programming languages. Back in the '90s, a number of online poker platforms were exploitable due to weak random number generators. Rust specifically chose a secure PRNG as the system default even though it's much slower to try to protect programmers not familiar with this issue.

      5 votes
      1. [3]
        crdpa
        Link Parent
        Thanks! I'll change that today. But it's true in my case too? The random is just being used to pick a random word from a dictionary of words, not randomly generating a password. This means that...

        Thanks! I'll change that today.

        But it's true in my case too? The random is just being used to pick a random word from a dictionary of words, not randomly generating a password.

        This means that the chance of random picking the same words as another person is high?

        4 votes
        1. [2]
          novov
          Link Parent
          Yes, probably - I haven't looked at your code, but using proper RNG for passwords is always a good idea. You could think of each possible word as equivalent to an character in an n-letter alphabet...

          Yes, probably - I haven't looked at your code, but using proper RNG for passwords is always a good idea. You could think of each possible word as equivalent to an character in an n-letter alphabet (where n is the amount of your words) and it's not very different.

          5 votes
          1. crdpa
            (edited )
            Link Parent
            I just finished reading the article linked above about the poker exploit and it explains this. That was a great read. I'll fix this after breakfast.

            I just finished reading the article linked above about the poker exploit and it explains this.

            That was a great read.

            I'll fix this after breakfast.

            4 votes
      2. [5]
        crdpa
        (edited )
        Link Parent
        Done! Could you check this snippet for me? It is working, but maybe you can see something i don't. for x in range(0, 4): # randomize a number below the length of the dictionary of words k =...

        Done! Could you check this snippet for me? It is working, but maybe you can see something i don't.

        for x in range(0, 4):
            # randomize a number below the length of the dictionary of words
            k = secrets.randbelow(i)
            # if k is in dictionary and it's not the first loop,
            # regenerate the random number
            if k in allWords and x != 0:
                k = secrets.randbelow(i)
                password.append(allWords[k])
                allWords.pop(k)
            else:
                password.append(allWords[k])
                allWords.pop(k)
        return
        
        4 votes
        1. [4]
          whbboyd
          Link Parent
          You have a potential KeyError here when you do password.append(allWords[k]): consider what happens if k happens to be the same twice in a row. Depending on order, you'll generate this specific k,...

          You have a potential KeyError here when you do password.append(allWords[k]): consider what happens if k happens to be the same twice in a row. Depending on order, you'll generate this specific k, pop it from allWords, generate it again, observe that it's not in allWords, and then in the else block, attempt to look it up again.

          Perhaps you meant the condition on the if to be if k not in allWords…? In that case, the x != 0 would be redundant (since by construction, allWords initially contains every possible k), and the issue could still be triggered if the same k is generated three times in a row.

          It's possible to fix this, and I'll describe how in a moment, but to prefix, what I'd actually recommend is using the choice function from secrets, which I believe has the same behavior (other than source of randomness) as random.choice and neatly sidesteps the issue; and, because it's at a higher level of abstraction, it's clearer what the code is trying to do, as well.

          To keep this general structure, but avoid the potential to reselect the same word twice, you need to keep picking ks until you get one that won't KeyError when you look it up from allWords. This immediately suggests using a loop instead of an if, and Python actually provides an unusual loop/else structure that lets you immediately drop in a while loop:

          for x in range(0, 4):
              k = secrets.randbelow(i)
              while k not in allWords:
                  k = secrets.randbelow(i)
              else:
                  password.append(allWords[k])
                  allWords.pop(k)
          

          By the way, I hope this doesn't come across as me ragging on you or your code! I think it's great to see people getting into programming, and my hope is that by describing my observations and thought processes, I can help you to improve. Don't let anyone try to convince you that bugs or mistakes in code are a sign of incompetence or lack of talent. I've been doing this for well over a decade, and I regularly make mistakes that are well more obvious in retrospect than this.

          6 votes
          1. Deimos
            (edited )
            Link Parent
            Here's a pretty interesting possibility that I just figured out, since I was curious about the internals: The random module has a function random.sample() that gets a random selection of unique...

            Here's a pretty interesting possibility that I just figured out, since I was curious about the internals:

            The random module has a function random.sample() that gets a random selection of unique elements from a collection, which is exactly what we're trying to do. It also has choices(), which gets random non-unique elements, which would work too if we're okay with the possibility of a repeated word (which should be totally fine for this use, but that's a different topic).

            Regardless, the secrets module doesn't have either of those functions, and only has choice() like you mentioned, so there's no "native" way to get multiple choices in one function when you're using secrets. However, if you look at the actual source code for the secrets module, all it does is create an object of the SystemRandom class from the random module, and sets secrets.choice to that object's .choice() method.

            So looking into the random module's code, we can see that SystemRandom is a superclass that overrides a few methods to have a better source of randomness, but is otherwise the same as the base random implementation. That means we can get access to the sample() function using the same more-secure randomness source by just doing this instead of using secrets:

            from random import SystemRandom
            
            secure_random = SystemRandom()
            secure_random.sample(word_list, k=4)
            

            (assuming word_list is a list of the words like the way I wrote it in my comment's implementation, but you could replace it with list(allWords.values()) to use the original dict setup)

            Also this is all completely unnecessary and overkill for this, but I just thought it was interesting to find out.

            5 votes
          2. crdpa
            Link Parent
            No, you don't come across like that at all. I ask because i want to learn and your answer was really awesome. I learned a lot. These things gets too confusing sometimes and another person...

            No, you don't come across like that at all. I ask because i want to learn and your answer was really awesome. I learned a lot.

            These things gets too confusing sometimes and another person explaining using my code is a lot better than just reading documentation.

            Thanks!

            4 votes
          3. unknown user
            Link Parent
            I'm not OP, but this was a great response, and I learnt a bit here too. Great explanation.

            I'm not OP, but this was a great response, and I learnt a bit here too. Great explanation.

            2 votes
      3. Moonchild
        Link Parent
        As, perhaps more relevantly, did openbsd. (The libc rand forwards to arc4random by default, unless you explicitly call srand_deterministic.)

        Rust specifically chose a secure PRNG as the system default even though it's much slower to try to protect programmers not familiar with this issue.

        As, perhaps more relevantly, did openbsd. (The libc rand forwards to arc4random by default, unless you explicitly call srand_deterministic.)

        2 votes
  2. Wulfsta
    Link
    I finally have TensorFlow for ROCm building on NixOS, if anyone with an AMD GPU and NixOS wants to test it out. I have been working on getting this building for some time now, and am pretty sure...

    I finally have TensorFlow for ROCm building on NixOS, if anyone with an AMD GPU and NixOS wants to test it out. I have been working on getting this building for some time now, and am pretty sure that this is a viable path to moving ROCm support into nixpkgs.

    4 votes
  3. [3]
    joplin
    Link
    I've been doing more work on my Genetic Programming. I've got a basic framework up and running for generating GPU shaders to modify input images and I've got the framework to test the results and...

    I've been doing more work on my Genetic Programming. I've got a basic framework up and running for generating GPU shaders to modify input images and I've got the framework to test the results and rank them. I'm working on the genetic operations to generate new generations.

    I've created a Program object that is a list of Statement objects. For crossover, I'm picking a point in each list of Statements and swapping everything beyond that point. The problem is that each Statement works with a given variable in the Program it started out in. That variable may not exist in the new Program. So one option is to bring over all the variables from the old program to the new. Another is to use an existing variable from the new program. Or, we can just call the offspring invalid. But that seems inefficient. I'm not sure how much difference it makes to the outcome of the process which method I choose, but it should be interesting.

    3 votes
    1. [2]
      DataWraith
      Link Parent
      I hope I'm not spoiling your fun with this, in case you want to do more experimenting yourself, but there is some prior work that successfully evolved GPU shaders and may be worth checking out:...

      I hope I'm not spoiling your fun with this, in case you want to do more experimenting yourself, but there is some prior work that successfully evolved GPU shaders and may be worth checking out:

      There is some prior work that used Cartesian Genetic Programming to evolve GPU shaders for image denoising.

      Instead of representing the shaders as a sequence of instructions, they construct a graph structure and then translate that back to program code.

      Each gene encodes a function node that refers back to previous input or function nodes in the graph. This makes variables implicit.
      Genes are encoded as 4 (I think) floating point values per function -- you can then use normal crossover on them, although it is rather common to just not use crossover at all with CGP and rely solely on mutation.

      3 votes
      1. joplin
        Link Parent
        Not spoiling it at all! Thanks very much for the links!

        Not spoiling it at all! Thanks very much for the links!

        2 votes
  4. jgb
    (edited )
    Link
    I have been working on my config language that I mentioned in the last of these threads, JACL. I've streamlined the syntax and semantics for v0.2 and re-implemented it in Rust....

    I have been working on my config language that I mentioned in the last of these threads, JACL.

    I've streamlined the syntax and semantics for v0.2 and re-implemented it in Rust.

    https://github.com/jgbyrne/jacl-rs/

    I would like to hear people's feedback on how they think it compares to existing alternatives like TOML and YAML.

    In case you don't feel like clicking through to the spec here is the lead example.

    servers {
        freenode {
            name = "Freenode"
            addr = "chat.freenode.org"
            port = 6667
            nick = "martin"
        }
    
        rizon {
            name = "Rizon"
            addr = "irc.rizon.net"
            port = 9999
            nick = "sarah"
        }
    
        default = freenode
    }
    
    filters [
        {
            server = "freenode"
            user   = "matthew"
            action = "ignore"
        }
    
        {
            server = "rizon"
            user   = "carl"
            action = "highlight"
        }
    ]
    
    2 votes