Introduction to Genetic Algorithms
Genetic algorithms can be used to solve problems that are difficult, or impossible to solve with traditional algorithms. Much like neural networks, they provide good-enough solution in short amount of time, but rarely find the best one. While they're not as popular as neural networks nor as widely used, they still have their place, as we can use them to solve complicated problems very fast, without expensive training rigs and with no knowledge of math.
Genetic algorithms can be used for variety of tasks, for example for determining the best radio antenna shape, aerodynamic shapes of cars and planes, wind mill shapes, or various queing problems. We'll use it to print "Hello, World!".
How does it work?
Genetic algorithm works in three steps.
- Generate random solutions
- Test how good they are
- Pick the best ones, breed and mutate them, go to step 2
It works just like evolution in nature. First, we generate randomised solutions to our problem (in this case: random strings of letters).
Then, we test each solution and give it points, where better solutions gain more points. In our problem, we would give one point for each correct letter in the string.
Afterwards, we pick the best solutions and breed it together (just combine the strings). It's not bad idea to mutate (or randomize) the string a bit.
We collect the offsprings, and repeat the process until we find good enough solution.
Generate random solutions
First of all, we need to decide in which form we will encode our solutions. In this case, it will be simply string. If we wanted to build race cars, we would encode each solution (each car) as array of numbers, where first number would be size of the first wheel, the second number would be size of the second wheel, etc. If we wanted to build animals that try to find food, fight and survive, we would choose a decision tree (something like this).
So let's start and make few solutions, or entities. One hundred should be enough.
from random import randint goal = "Hello, World!" allowed_characters = list("qwertyuiopasdfghjklzxcvbnmQWERTYUIOPASDFGHJKLZXCVBNM ,!") def get_random_entity(n, string_length): entities =  for _ in range(0, n): entity = "" for _ in range(0, string_length): entity += allowed_characters[randint(0, len(allowed_characters)-1)] entities.append(entity) return entities print(get_random_entity(100, 13))
Test how good they are
This is called a "fitness function". Fitness function determines how good a solution is, be it a car (travel distance), animal (food gathered), or a string (number of correct letters).
The most simple function we can use right now will simply count correct letters. If we wanted, we could make something like Levenshtein distance instead.
def get_fitness(entity): points = 0 for i in range(0, len(entity)): if goal[i] == entity[i]: points += 1 return points
Crossover and mutation
Now it's time to select the best ones and throw away the less fortunate entities. Let's order entities by their fitness.
Crossover is a process, when we take two entities (strings) and breed them to create new one. For example, we could just give the offspring one part from one parent and another part from second parent.
There are many ways how to do this, and I encourage you to try multiple approaches when you will be doing something like this.
P: AAAABBB|BCCCC P: DDDDEEE|FGGGG F1: AAAABBB|FGGGG
Or we can just choose at random which letter will go from which parent, which works the best here. After we have the offsprint (
F1), we should mutate it. What if we were unfortunate, and
H (which we need for our
Hello, World!) was not in any of the 100 entities? So we take the string and for each character of the string, there is a small chance to mutate it - change it at random.
F1: ADDDEBEFGCGG F1`: ADHDEBEFGCGG
And it's done. Now kill certain part of old population. I don't know which percentage is best, but I usually kill about 90% of old population. The 90% that we killed will be replaced by new offsprings.
There is just one more thing: which entities do we select for crossover? It isn't bad idea - and it generally works just fine - to just give better entities higher chance to breed.
def get_offspring(first_parent, second_parent, mutation_chance): new_entity = "" for i in range(0, len(first_parent)): if randint(0, 100) < mutation_chance: new_entity += allowed_characters[randint(0, len(allowed_characters)-1)] else: if randint(0, 1) == 0: new_entity += first_parent[i] else: new_entity += second_parent[i] return new_entity
When we add everything together, we get this output:
Generation 1, best score: 2 ::: QxZPjoptHfNgX Generation 2, best score: 3 ::: XeNlTOQuAZjuZ Generation 3, best score: 4 ::: weolTSQuoZjuK Generation 4, best score: 5 ::: weTgnC uobNdJ Generation 5, best score: 6 ::: weTvny uobldb Generation 6, best score: 6 ::: HellSy mYbZdC Generation 7, best score: 7 ::: selOoXBWoAKn! Generation 8, best score: 8 ::: HeTloSoWYZlh! Generation 9, best score: 8 ::: sellpX WobKd! Generation 10, best score: 9 ::: welloq WobSdb Generation 11, best score: 9 ::: selloc WoZjd! Generation 12, best score: 10 ::: wellxX WoVld! Generation 13, best score: 10 ::: welltX World! Generation 14, best score: 10 ::: welltX World! Generation 15, best score: 10 ::: welltX World! Generation 16, best score: 11 ::: zellov Wobld! Generation 17, best score: 11 ::: Hellty World! Generation 18, best score: 11 ::: welloX World! Generation 19, best score: 11 ::: welloX World! Generation 20, best score: 11 ::: welloX World! Generation 21, best score: 12 ::: welloX World! Generation 22, best score: 12 ::: Helloy World! Generation 23, best score: 12 ::: Helloy World! Generation 24, best score: 12 ::: Helloy World! Generation 25, best score: 12 ::: Helloy World! Generation 26, best score: 12 ::: Helloy World! Generation 27, best score: 12 ::: Helloy World! Generation 28, best score: 12 ::: Helloy World! Generation 29, best score: 12 ::: Helloy World! Generation 30, best score: 12 ::: Helloy World! Generation 31, best score: 12 ::: Helloy World! Generation 32, best score: 12 ::: Helloy World! Generation 33, best score: 12 ::: Helloy World! Generation 34, best score: 13 ::: Helloy World! Generation 35, best score: 13 ::: Hello, World!
As we can see, we find pretty good solution very fast, but it takes very long to find perfect solution. The complete code is here.
When we solve difficult problems, it starts to be increasingly important to maintain diversity. When all your entities are basically the same (which happened in this example), it's difficult to find other solutions than those that are almost the same as the currently best one. There might be a much better solution, but we didn't find it, because all solutions that are different to currently best one are discarded. Solving this is the real challenge of genetic algorithms. One of the ideas is to boost diverse solutions in fitness function. So for every solution, we compute distance to the current best solutions and add bonus points for distance from it.