pangadfs
pangadfs is a pandas-based (python) genetic algorithm framework for fantasy sports. It uses a plugin architecture to enable maximum flexibility while also providing a fully-functional implementation of a genetic algorithm for lineup optimization.
Documentation: https://sansbacon.github.io/pangadfs/
Source Code: https://github.com/sansbacon/pangadfs
The key pangadfs features are:
- Fast: takes advantage of pandas and numpy to generate thousands of lineups quickly.
- Extensible: any desired functionality can be added with a straightforward plugin architecture.
- Pythonic: library is easy to use and extend as long as you are familiar with data analysis in python (pandas and numpy). You don't also have to be an expert in linear programming.
- Fewer bugs: Small core means fewer bugs and easier to trace code. Unlike other optimizers, pangadfs does not generate complicated equations behind the curtain that are difficult to comprehend and debug.
Requirements¶
- Python 3.8+
- pandas 1.0+
- numpy 1.19+
- stevedore 3.30+
- numpy-indexed 0.3+
Installation¶
$ pip install pangadfs
Example¶
Create It¶
A simple pangadfs optimizer could look like the following
from pathlib import Path
from pangadfs import GeneticAlgorithm
ctx = {
'ga_settings': {
'csvpth': Path(__file__).parent.parent / 'appdata' / 'pool.csv',
'n_generations': 20,
'population_size': 30000,
'stop_criteria': 10,
'verbose': True
},
'site_settings': {
'flex_positions': ('RB', 'WR', 'TE'),
'lineup_size': 9,
'posfilter': {'QB': 14, 'RB': 8, 'WR': 8, 'TE': 5, 'DST': 4, 'FLEX': 8},
'posmap': {'DST': 1, 'QB': 1, 'TE': 1, 'RB': 2, 'WR': 3, 'FLEX': 7},
'salary_cap': 50000
}
}
# set up GeneticAlgorithm object
ga = GeneticAlgorithm()
# create pool and pospool
pop_size = ctx['ga_settings']['population_size']
pool = ga.pool(csvpth=ctx['ga_settings']['csvpth'])
posfilter = ctx['site_settings']['posfilter']
flex_positions = ctx['site_settings']['flex_positions']
pospool = ga.pospool(pool=pool, posfilter=posfilter, column_mapping={}, flex_positions=flex_positions)
# create salary and points arrays
points = pool[cmap['proj']].values
salaries = pool[cmap['salary']].values
# create initial population
initial_population = ga.populate(
pospool=pospool,
posmap=ctx['site_settings']['posmap'],
population_size=pop_size
)
# apply validators (default are salary and duplicates)
initial_population = ga.validate(
population=initial_population,
salaries=salaries,
salary_cap=ctx['site_settings']['salary_cap']
)
population_fitness = ga.fitness(
population=initial_population,
points=points
)
# set overall_max based on initial population
omidx = population_fitness.argmax()
best_fitness = population_fitness[omidx]
best_lineup = initial_population[omidx]
# create new generations
n_unimproved = 0
population = initial_population.copy()
for i in range(1, ctx['ga_settings']['n_generations'] + 1):
# end program after n generations if not improving
if n_unimproved == ctx['ga_settings']['stop_criteria']:
break
# display progress information with verbose parameter
if ctx['ga_settings'].get('verbose'):
logging.info(f'Starting generation {i}')
logging.info(f'Best lineup score {best_fitness}')
# select the population
# here, we are holding back the fittest 20% to ensure
# that crossover and mutation do not overwrite good individuals
elite = ga.select(
population=population,
population_fitness=population_fitness,
n=len(population) // ctx['ga_settings'].get('elite_divisor', 5),
method='fittest'
)
selected = ga.select(
population=population,
population_fitness=population_fitness,
n=len(population),
method='roulette'
)
# cross over the population
crossed_over = ga.crossover(population=selected, method='uniform')
# mutate the crossed over population (leave elite alone)
mutated = ga.mutate(population=crossed_over, mutation_rate=.05)
# validate the population (elite + mutated)
population = ga.validate(
population=np.vstack((elite, mutated)),
salaries=salaries,
salary_cap=ctx['site_settings']['salary_cap']
)
# assess fitness and get the best score
population_fitness = ga.fitness(population=population, points=points)
omidx = population_fitness.argmax()
generation_max = population_fitness[omidx]
# if new best score, then set n_unimproved to 0
# and save the new best score and lineup
# otherwise increment n_unimproved
if generation_max > best_fitness:
logging.info(f'Lineup improved to {generation_max}')
best_fitness = generation_max
best_lineup = population[omidx]
n_unimproved = 0
else:
n_unimproved += 1
logging.info(f'Lineup unimproved {n_unimproved} times')
# show best score and lineup at conclusion
print(pool.loc[best_lineup, :])
print(f'Lineup score: {best_fitness}')
Run it¶
Run the sample application with:
$ basicapp
INFO:root:Starting generation 1
INFO:root:Best lineup score 153.00000000000003
INFO:root:Lineup unimproved 1 times
INFO:root:Starting generation 2
INFO:root:Best lineup score 153.00000000000003
INFO:root:Lineup improved to 155.2
. . .
INFO:root:Starting generation 19
INFO:root:Best lineup score 156.3
INFO:root:Lineup improved to 156.5
INFO:root:Starting generation 20
INFO:root:Best lineup score 156.5
INFO:root:Lineup unimproved 1 times
player team pos salary proj
0 Saints NO DST 3800 9.8
34 Patrick Mahomes KC QB 8000 26.6
62 Dalvin Cook MIN RB 9500 27.2
68 Nyheim Hines IND RB 4600 15.9
72 Brian Hill ATL RB 4000 12.8
109 Gabriel Davis BUF WR 3000 10.7
136 Keelan Cole Sr. JAX WR 3600 11.9
138 Calvin Ridley ATL WR 7100 21.6
142 Justin Jefferson MIN WR 6300 20.0
Lineup score: 156.5
Extensibility¶
pangadfs is extensible by design and is motivated by difficulties I encountered with other optimizers, which tend to have a monolithic design and don't make it easy to swap out components.
This flexibility is made possible by the stevedore plugin system, which allows allow applications to customize one or more of the internal components.
As recommended by the stevedore documentation, the base module includes base classes to define each pluggable component. Each namespace has a default implementation (crossover, fitness, mutate, select, and so forth), which, collectively, provide a fully-functional implementation of a genetic algorithm.
License¶
This project is licensed under the terms of the MIT license.