Miscellaneous
pangadfs.misc
¶
calculate_jaccard_diversity(lineup1, lineup2)
¶
Calculate Jaccard diversity between two lineups
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lineup1
|
First lineup (array-like of player IDs) |
required | |
lineup2
|
Second lineup (array-like of player IDs) |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
Jaccard diversity (1 - Jaccard similarity) |
Examples:
>>> lineup1 = [1, 2, 3, 4, 5]
>>> lineup2 = [1, 2, 6, 7, 8]
>>> diversity = calculate_jaccard_diversity(lineup1, lineup2)
>>> print(f"Diversity: {diversity:.3f}")
Diversity: 0.667
Source code in pangadfs/misc.py
diversity(population)
¶
Calculates diversity of lineups
Parameters:
Name | Type | Description | Default |
---|---|---|---|
population
|
ndarray
|
the population |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: is square, shape len(population) x len(population) |
Source code in pangadfs/misc.py
diversity_optimized(population)
¶
Calculates pairwise diversity between samples (overlap of player IDs).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
population
|
ndarray
|
shape (N, K), where each row is a lineup |
required |
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: shape (N, N), matrix of pairwise overlap scores |
Source code in pangadfs/misc.py
exposure(population=None)
¶
Returns dict of index: count of individuals
Parameters:
Name | Type | Description | Default |
---|---|---|---|
population
|
ndarray
|
the population |
None
|
Returns:
Type | Description |
---|---|
Dict[int, int]
|
Dict[int, int]: key is index, value is count of lineup |
Examples:
>>> fittest_population = population[np.where(fitness > np.percentile(fitness, 97))]
>>> exposure = population_exposure(fittest_population)
>>> top_exposure = np.argpartition(np.array(list(exposure.values())), -10)[-10:]
>>> print([round(i, 3) for i in sorted(top_exposure / len(fittest_population), reverse=True)])
Source code in pangadfs/misc.py
multidimensional_shifting(elements, num_samples, sample_size, probs)
¶
Based on https://medium.com/ibm-watson/incredibly-fast-random-sampling-in-python-baf154bd836a
Parameters:
Name | Type | Description | Default |
---|---|---|---|
elements
|
iterable
|
iterable to sample from, typically a dataframe index |
required |
num_samples
|
int
|
the number of rows (e.g. initial population size) |
required |
sample_size
|
int
|
the number of columns (e.g. team size) |
required |
probs
|
iterable
|
is same size as elements |
required |
Returns:
Name | Type | Description |
---|---|---|
ndarray |
ndarray
|
of shape (num_samples, sample_size) |
Source code in pangadfs/misc.py
multidimensional_shifting_fast(num_samples, sample_size, probs, elements=None)
¶
High-performance probabilistic sampling using random shifting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_samples
|
int
|
Number of sample rows to generate. |
required |
sample_size
|
int
|
Number of items to select per row. |
required |
probs
|
ndarray
|
Probability vector of shape (n_elements,), dtype float32 recommended. |
required |
elements
|
ndarray
|
Optional array of element IDs (defaults to np.arange(len(probs))). |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray of shape (num_samples, sample_size) |
Source code in pangadfs/misc.py
multidimensional_shifting_numba(num_samples, sample_size, probs, elements=None)
¶
Numba-accelerated version of multidimensional shifting. Fast for large numbers of samples and small element sets.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_samples
|
int
|
Number of rows to sample. |
required |
sample_size
|
int
|
Number of items per sample. |
required |
probs
|
ndarray
|
Probability vector of shape (n_elements,). |
required |
elements
|
ndarray
|
IDs to sample from. Defaults to np.arange(len(probs)). |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
np.ndarray: shape (num_samples, sample_size) |
Source code in pangadfs/misc.py
parents(population)
¶
Evenly splits population
Parameters:
Name | Type | Description | Default |
---|---|---|---|
population
|
ndarray
|
the population to crossover. Shape is n_individuals x n_chromosomes. |
required |
Returns:
Type | Description |
---|---|
Tuple[ndarray, ndarray]
|
Tuple[np.ndarray, np.ndarray]: population split into two equal-size arrays |