9
votes
What programming/technical projects have you been working on?
This is a recurring post to discuss programming or other technical projects that we've been working on. Tell us about one of your recent projects, either at work or personal projects. What's interesting about it? Are you having trouble with anything?
I wrote a XKCD password generator for fun today. I'm a beginner so feel free to give me tips.
Here it is.
Edit: fixed the program to use secrets module instead of random. Doing this, found another problem and took some time to come up with a solution, but i think it's ready now. Thanks everyone!
A bunch of advice (a lot of this will be overkill for such a simple script, but I'm assuming you're interested in more general Python/programming advice):
Generally, you should try not to write code in Python at the "top level"—that is, almost everything should all be inside a function or class. The main reason for this is that if something else imports your code (similar to the way you did
import secrets
), all of the top-level code in the imported module (file) gets executed at that point. So hypothetically, if someone imported your program into a different one to use one of its functions, it would run the code (and almost certainly cause the other program to crash). The "proper"/idiomatic way to structure the program is to have a function namedmain()
and call it like this:The two lines at the bottom are a (totally unintuitive) way of saying "if this program is being executed directly (not imported), call
main()
". Almost every program that's intended to be run as a script should be structured this way.Similarly, you should try to avoid global variables, and especially be careful about modifying them. In this case, a lot of it is just because you wrote the program globally like I described above, but you should still be careful about doing things like removing words from
allWords
by using.pop()
, especially when you're making changes inside a function that didn't create those variables itself. It's hard to keep track of what's happening in a program if functions are changing "outside" variables without them being passed in or returned.For example, imagine that you decided to change the program to be able to generate multiple passwords. This would be awkward with your current method, because each time you generate a password, all of the words used in it get removed from
allWords
, and you might end up with any further passwords losing access to those words (or needing to re-generate the word list each time). It would be better to try to keep all the words in the list. I'll come back to this later with specific advice about what to do instead.Also, be careful about putting too much into one function, especially when it makes the function's name no longer describe what it's doing. Your only function is
createDict()
, but it's doing a lot more than creating a dict. It also generates the password, so it would be much better to have at least two functions: one for generating the word list, and one that uses that word list to generate a password.Try to make sure you're using the most appropriate types as much as possible too, since Python has a very good standard library and it will generally have useful functions available as long as your data is in the right type. For example, you're using a
dict
for the word list, but should just be using alist
, since the keys aren't really needed for anything. That makes it easy to use thesecrets.choice()
function to get a random word, instead of needing to generate a random number for the key.Without getting off into the weeds and messing around with some of the more tangential stuff like command-line arguments, here's how I'd write the program:
I tried to write it in a straightforward way and not change anything significantly from how your program currently works. There would definitely be fancier ways to do some of this stuff (for example, if you want to see how to do command-line arguments "properly", take a look at the
argparse
module).I could write a bunch more about why I did things a certain way or the specific changes I made, but I've already rambled a lot. Feel free to ask questions about anything though, and I'll be happy to explain more.
Wow! Thanks a lot! This was a huge help.
I always wondered about the main in python code i saw out there because i can understand having a main function like other languages have (i was learning Go at some point), but since i saw a lot of python code out there with and without it, i got confused. I thought people made Python code like shell scripts.
So, anyway, your explanation really clears a lot of things for me. I will read again tomorrow morning with more time and come back to you if i have any questions.
Thanks a lot!
Since the number of words is always 4 (following XKCD guidelines) why not put num_words=4 in the arguments of the generate_passwords function like you did with the separator?
Also, is there any book or resource i can read about good practices when programming? Like when to put something somewhere and other things you described.
Adding 4 as the default for
num_words
would be totally fine, there's not really any strong reason I did it that way. It felt like something that made sense for the caller to need to specify, but having a default makes sense too.There are a ton of books and sites and articles about programming practices, but I think one of the difficult parts is that it's often hard to tell whether particular advice is actually good or not. It's probably been 10 years since I read it now, but I remember Code Complete being a good book with a lot of solid general advice. A lot of it will just come from experience though, learning what makes programs easy or hard to work with (in both your own programs and others').
I see you're using
random
for your randomness source—it uses an insecure pseudorandom number generator, and isn't appropriate for use in secure contexts, e.g. generating passwords. The documentation includes a warning to this effect, suggesting the use of thesecrets
module, instead, which has a similar-enough API I believe you can just replace it.This is actually a really common footgun in many programming languages. Back in the '90s, a number of online poker platforms were exploitable due to weak random number generators. Rust specifically chose a secure PRNG as the system default even though it's much slower to try to protect programmers not familiar with this issue.
Thanks! I'll change that today.
But it's true in my case too? The random is just being used to pick a random word from a dictionary of words, not randomly generating a password.
This means that the chance of random picking the same words as another person is high?
Yes, probably - I haven't looked at your code, but using proper RNG for passwords is always a good idea. You could think of each possible word as equivalent to an character in an n-letter alphabet (where n is the amount of your words) and it's not very different.
I just finished reading the article linked above about the poker exploit and it explains this.
That was a great read.
I'll fix this after breakfast.
Done! Could you check this snippet for me? It is working, but maybe you can see something i don't.
You have a potential
KeyError
here when you dopassword.append(allWords[k])
: consider what happens ifk
happens to be the same twice in a row. Depending on order, you'll generate this specifick
, pop it fromallWords
, generate it again, observe that it's not inallWords
, and then in theelse
block, attempt to look it up again.Perhaps you meant the condition on the
if
to beif k not in allWords…
? In that case, thex != 0
would be redundant (since by construction,allWords
initially contains every possiblek
), and the issue could still be triggered if the samek
is generated three times in a row.It's possible to fix this, and I'll describe how in a moment, but to prefix, what I'd actually recommend is using the
choice
function fromsecrets
, which I believe has the same behavior (other than source of randomness) asrandom.choice
and neatly sidesteps the issue; and, because it's at a higher level of abstraction, it's clearer what the code is trying to do, as well.To keep this general structure, but avoid the potential to reselect the same word twice, you need to keep picking
k
s until you get one that won'tKeyError
when you look it up fromallWords
. This immediately suggests using a loop instead of anif
, and Python actually provides an unusual loop/else
structure that lets you immediately drop in awhile
loop:By the way, I hope this doesn't come across as me ragging on you or your code! I think it's great to see people getting into programming, and my hope is that by describing my observations and thought processes, I can help you to improve. Don't let anyone try to convince you that bugs or mistakes in code are a sign of incompetence or lack of talent. I've been doing this for well over a decade, and I regularly make mistakes that are well more obvious in retrospect than this.
Here's a pretty interesting possibility that I just figured out, since I was curious about the internals:
The
random
module has a functionrandom.sample()
that gets a random selection of unique elements from a collection, which is exactly what we're trying to do. It also haschoices()
, which gets random non-unique elements, which would work too if we're okay with the possibility of a repeated word (which should be totally fine for this use, but that's a different topic).Regardless, the
secrets
module doesn't have either of those functions, and only haschoice()
like you mentioned, so there's no "native" way to get multiple choices in one function when you're usingsecrets
. However, if you look at the actual source code for thesecrets
module, all it does is create an object of theSystemRandom
class from therandom
module, and setssecrets.choice
to that object's.choice()
method.So looking into the
random
module's code, we can see thatSystemRandom
is a superclass that overrides a few methods to have a better source of randomness, but is otherwise the same as the baserandom
implementation. That means we can get access to thesample()
function using the same more-secure randomness source by just doing this instead of usingsecrets
:(assuming
word_list
is a list of the words like the way I wrote it in my comment's implementation, but you could replace it withlist(allWords.values())
to use the original dict setup)Also this is all completely unnecessary and overkill for this, but I just thought it was interesting to find out.
No, you don't come across like that at all. I ask because i want to learn and your answer was really awesome. I learned a lot.
These things gets too confusing sometimes and another person explaining using my code is a lot better than just reading documentation.
Thanks!
I'm not OP, but this was a great response, and I learnt a bit here too. Great explanation.
As, perhaps more relevantly, did openbsd. (The libc
rand
forwards to arc4random by default, unless you explicitly callsrand_deterministic
.)I finally have TensorFlow for ROCm building on NixOS, if anyone with an AMD GPU and NixOS wants to test it out. I have been working on getting this building for some time now, and am pretty sure that this is a viable path to moving ROCm support into
nixpkgs
.I've been doing more work on my Genetic Programming. I've got a basic framework up and running for generating GPU shaders to modify input images and I've got the framework to test the results and rank them. I'm working on the genetic operations to generate new generations.
I've created a Program object that is a list of Statement objects. For crossover, I'm picking a point in each list of Statements and swapping everything beyond that point. The problem is that each Statement works with a given variable in the Program it started out in. That variable may not exist in the new Program. So one option is to bring over all the variables from the old program to the new. Another is to use an existing variable from the new program. Or, we can just call the offspring invalid. But that seems inefficient. I'm not sure how much difference it makes to the outcome of the process which method I choose, but it should be interesting.
I hope I'm not spoiling your fun with this, in case you want to do more experimenting yourself, but there is some prior work that successfully evolved GPU shaders and may be worth checking out:
There is some prior work that used Cartesian Genetic Programming to evolve GPU shaders for image denoising.
Instead of representing the shaders as a sequence of instructions, they construct a graph structure and then translate that back to program code.
Each gene encodes a function node that refers back to previous input or function nodes in the graph. This makes variables implicit.
Genes are encoded as 4 (I think) floating point values per function -- you can then use normal crossover on them, although it is rather common to just not use crossover at all with CGP and rely solely on mutation.
Not spoiling it at all! Thanks very much for the links!
I have been working on my config language that I mentioned in the last of these threads,
JACL
.I've streamlined the syntax and semantics for v0.2 and re-implemented it in Rust.
https://github.com/jgbyrne/jacl-rs/
I would like to hear people's feedback on how they think it compares to existing alternatives like TOML and YAML.
In case you don't feel like clicking through to the spec here is the lead example.