PHYTON
Genetic Algorithms
• History (John Holland 1960)
• Heuristic search strategy
• Essential parts of Genetic Algorithms
– Representation: chromosome
• Candidate solution to a problem, encoded as a bit string
– Fitness function: Fx
• Applied to each chromosome in a population to check its fitness(closeness to the desired result)
0100110001001
Genetic Algorithms
– Operators:
• Crossover
• Mutation
10 15 3 80 20 1 16 30
10 15 20 1 3 80 16 30
10 15 3 80 10 15 60 80
Parent 1 Parent 2
Child 1 Child 2
Market Basket problem
• Market basket as an association problem for machine learning
• Problem: To determine four items that are most often purchased together out of a set of 100 different items
• Ground Work:
– Creating a synthetic retailer database
– Implanting specific relationships in the database. Items 11,22,33,44 were implanted each with a 90% probability i.e. 65.6% if taken together
Implementation
• GAlib: Matthew Wall - GA library
• Chromosome: Binary String
– A multi-parameter coding, consisting of four sub- strings representing four item numbers
• Fitness function:
– Chromosomes which match all its sub strings to four most occurred, potentially related items are the most fit
0110001 0001001 0010001 0110010
49 9 17 50
Implementation
• Operators:
– Crossover
• Single point/Mid point
– Mutation
• Random
• Simulations:
– Crossover operator: Single point/Mid point
– Overlap: GASimpleGA/GASteadyStateGA
– Population Size
– Mutation Rate
Results
• Crossover operators
Four related items (11,22,33,44) were identified in
both cases. Midpoint crossover identified the solution
in 6th generation while Single point crossover
identified it in the 9th generation.
Results
• Overlapping population:
Four related items (11,22,33,44) were identified in 6th
generation for a Steady State GA, while the simulation
converged on a false solution for a SimpleGA. This could
be because of population overlap in case of Steady State
GA that replaces the worst chromosomes in each
generation.
Results
• Population size:
If population size is too less it becomes hard for the GA to
find solution because of lack of variety in chromosomes.
Also, a huge population size is detrimental in effect
because the potential solution may get lost in the huge
search space or take long time. Population size of 1000
yielded solution in 9th generation as compared to 14th
generation for a population size of 500.
• Mutation Rate:
An increased mutation could be useful in flushing out a solution stuck on local optima, but a very high rate could lead to losing important combinations, that could lead to potential solutions. A 0.001 mutation rate yielded the result in 9th generation as compared to 13th generation for a higher mutation rate of 0.010
Results
Summary
• Does it work
– In some cases, where optimal set of parameters were
used
– Where data can be encoded as a bit string of fixed
length
– Where search space is big and the solution is not
required
– Where the results need to be explainable