Deterministic python generator for K disparate M-sized subsets of a set
I'm looking for an efficient way to generate many M-sized subsets from a
set S, of size N.
Ideally I would like to generate all of these, but because I'm using them
for other computations, this becomes infeasible.
Instead, I would like to generate K disparate subsets of S, such that the
K chosen subsets minimize the sum of the size of the all pairwise
intersections between the K subsets.
In other words If I have K subsets And I take the pairwise intersection of
all of those subsets. And then I sum the size of all of those sets
together. I get as low of a number as I can.
Basically, I want these subsets to be as "far away" from each other was
possible. I've been trying to think of how I would go about doing this,
but I'm drawing a blank.
To simulate it in the meantime I've written this function
def subset_split(full_set, M, K):
np.random.seed(0) # repeatibility
seen = set([])
subset_list = []
for kx in xrange(K):
np.random.shuffle(full_set)
failsafe = 0
while True:
np.random.shuffle(full_set)
subset = tuple(full_set[0:M])
if not subset in seen:
seen.add(subset)
subset_list.append(subset)
break
failsafe += 1
if failsafe > 100:
break
return subset_list
which just generates K random subsets that haven't been seen before. But
this isn't exactly what I want, because I want those K subsets to be
repeatable and to not accidentally be close to each if they don't have to
be.
No comments:
Post a Comment