GeneticAlgorithms.pdf

Genetic Algorithms

• History (John Holland 1960)

• Heuristic search strategy

• Essential parts of Genetic Algorithms

– Representation: chromosome

• Candidate solution to a problem, encoded as a bit string

– Fitness function: Fx

• Applied to each chromosome in a population to check its fitness(closeness to the desired result)

0100110001001

Genetic Algorithms

– Operators:

• Crossover

• Mutation

10 15 3 80 20 1 16 30

10 15 20 1 3 80 16 30

10 15 3 80 10 15 60 80

Parent 1 Parent 2

Child 1 Child 2

Market Basket problem

• Market basket as an association problem for machine learning

• Problem: To determine four items that are most often purchased together out of a set of 100 different items

• Ground Work:

– Creating a synthetic retailer database

– Implanting specific relationships in the database. Items 11,22,33,44 were implanted each with a 90% probability i.e. 65.6% if taken together

Implementation

• GAlib: Matthew Wall - GA library

• Chromosome: Binary String

– A multi-parameter coding, consisting of four sub- strings representing four item numbers

• Fitness function:

– Chromosomes which match all its sub strings to four most occurred, potentially related items are the most fit

0110001 0001001 0010001 0110010

49 9 17 50

Implementation

• Operators:

– Crossover

• Single point/Mid point

– Mutation

• Random

• Simulations:

– Crossover operator: Single point/Mid point

– Overlap: GASimpleGA/GASteadyStateGA

– Population Size

– Mutation Rate

Results

• Crossover operators

Four related items (11,22,33,44) were identified in

both cases. Midpoint crossover identified the solution

in 6th generation while Single point crossover

identified it in the 9th generation.

Results

• Overlapping population:

Four related items (11,22,33,44) were identified in 6th

generation for a Steady State GA, while the simulation

converged on a false solution for a SimpleGA. This could

be because of population overlap in case of Steady State

GA that replaces the worst chromosomes in each

generation.

Results

• Population size:

If population size is too less it becomes hard for the GA to

find solution because of lack of variety in chromosomes.

Also, a huge population size is detrimental in effect

because the potential solution may get lost in the huge

search space or take long time. Population size of 1000

yielded solution in 9th generation as compared to 14th

generation for a population size of 500.

• Mutation Rate:

An increased mutation could be useful in flushing out a solution stuck on local optima, but a very high rate could lead to losing important combinations, that could lead to potential solutions. A 0.001 mutation rate yielded the result in 9th generation as compared to 13th generation for a higher mutation rate of 0.010

Results

Summary

• Does it work

– In some cases, where optimal set of parameters were

used

– Where data can be encoded as a bit string of fixed

length

– Where search space is big and the solution is not

required

– Where the results need to be explainable