# Numpy random choice to produce a 2D-array with all unique values

297
January 25, 2018, at 10:42 PM

so I am wondering if there's a more efficient solution in generating a 2-D array using `np.random.choice` where each row has unique values.

For example, for an array with shape `(3,4)`, we expect an output of:

``````# Expected output given a shape (3,4)
array([[0, 1, 3, 2],
[2, 3, 1, 0],
[1, 3, 2, 0]])
``````

This means that the values for each row must be unique with respect to the number of columns. So for each row in `out`, the integers should only fall between 0 to 3.

I know that I can achieve it by passing `False` to the `replace` argument. But I can only do it for each row and not for the whole matrix. For instance, I can do this:

``````>>> np.random.choice(4, size=(1,4), replace=False)
array([[0,2,3,1]])
``````

But when I try to do this:

``````>>> np.random.choice(4, size=(3,4), replace=False)
``````

I get an error like this:

`````` File "<stdin>", line 1, in <module>
File "mtrand.pyx", line 1150, in mtrand.RandomState.choice
(numpy\random\mtrand\mtrand.c:18113)
ValueError: Cannot take a larger sample than population when
'replace=False'
``````

I assume it's because it's trying to draw `3 x 4 = 12` samples due to the size of the matrix without replacement but I'm only giving it a limit of `4`.

I know that I can solve it by using a `for-loop`:

`````` >>> a = (np.random.choice(4,size=4,replace=False) for _ in range(3))
>>> np.vstack(a)
array([[3, 1, 2, 0],
[1, 2, 0, 3],
[2, 0, 3, 1]])
``````

But I wanted to know if there's a workaround without using any for-loops? (I'm kinda assuming that adding for-loops might make it slower if I have a number of rows greater than 1000. But as you can see I am actually creating a generator in `a` so I'm also not sure if it has an effect after all.)

One trick I have used often is generating a random array and using `argsort` to get unique indices as the required unique numbers. Thus, we could do -

``````def random_choice_noreplace(m,n, axis=-1):
# m, n are the number of rows, cols of output
return np.random.rand(m,n).argsort(axis=axis)
``````

Sample runs -

``````In : random_choice_noreplace(3,7)
Out:
array([[0, 4, 3, 2, 6, 5, 1],
[5, 1, 4, 6, 0, 2, 3],
[6, 1, 0, 4, 5, 3, 2]])
In : random_choice_noreplace(5,7, axis=0) # unique nums along cols
Out:
array([[0, 2, 4, 4, 1, 0, 2],
[1, 4, 3, 2, 4, 1, 3],
[3, 1, 1, 3, 2, 3, 0],
[2, 3, 0, 0, 0, 2, 4],
[4, 0, 2, 1, 3, 4, 1]])
``````

Runtime test -

``````# Original approach
def loopy_app(m,n):
a = (np.random.choice(n,size=n,replace=False) for _ in range(m))
return np.vstack(a)
``````

Timings -

``````In : %timeit loopy_app(1000,100)
10 loops, best of 3: 20.6 ms per loop
In : %timeit random_choice_noreplace(1000,100)
100 loops, best of 3: 3.66 ms per loop
``````
POPULAR ONLINE

#### Memory exceded on php process ### Move chunks of list to new lines

I have created a script where I generate chunks of 2 elements within a List:

251 ### How to avoid orphan records in Django many-to-many relationships?

How do you ensure that you don't leave any orphan records when deleting records from Django tables that have a many-to-many relationship?

247 ### Resample pandas dataframe and apply mode

I would like to calculate mode for each group of resampled rows in pandas dataframeI try it like so:

633 ### Adding a property to an existing object instance

I want to create a object with certain propertiesI want to add them dynamically

247