Compare elements and return values larger than random number as true

131
February 01, 2018, at 9:38 PM

I'm trying to compare each unique variable in one array px to a random number in another array py. If the element in px is greater than or equal to that of py than I want to note that value as True.

Here's some code.

import pandas as pd
import random
px = np.array([0.360617,0.360617,0.360617,0.989699,0.989699,0.989699,-1.020482])
py = np.random.uniform(low=0, high=1, size=len(px))
df = pd.DataFrame({'px': px, 'py': py, 'status': px >= py})

The resultant dataframe looks like this:

         px        py  status
0  0.360617  0.509826   False
1  0.360617  0.129870    True
2  0.360617  0.818778   False
3  0.989699  0.953721    True
4  0.989699  0.740662    True
5 -1.020482  0.302593    False

But I need it to look something like this. Imagine that each unique px has its own associated random value py between 0 and 1.

name  px        py         status
a     0.360617  0.509826   False
a     0.360617  0.509826   False
a     0.360617  0.509826   False
b     0.989699  0.953721   True
b     0.989699  0.953721   True
c     -1.020482 0.302593  False

I imagine this can be done with a for loop where each name is associated with a certain random value.

Answer 1

Is this what you need ?

c,n=np.unique(px,return_counts=True)
py = np.random.uniform(low=0, high=1, size=len(n))
df = pd.DataFrame({'px':  np.repeat(c,n), 'py': np.repeat(py,n), 'status': np.repeat(c,n)>= np.repeat(py,n)})

df
Out[401]: 
         px        py  status
0 -1.020482  0.862371   False
1  0.360617  0.077589    True
2  0.360617  0.077589    True
3  0.360617  0.077589    True
4  0.989699  0.376675    True
5  0.989699  0.376675    True
6  0.989699  0.376675    True
Answer 2

Random numbers are being generated pseudo-random way (check out this for more information), the problem here is that everytime you call np.random.uniform(low=0, high=1, size=len(px)) it creates a brand new number. In order to obtain the same pseudo-random number just use random.seed(number) (that number must be the same for all py calls, but different from the other variables) every time you want to obtain the same number as before and then call the function I mentioned above. Thus, you'll obtain the same value for py.

EDIT

Due to the comments below, I've realized (thanks to @roganjosh) another way to approach the solution is to use a map to store some randomly determined values for specific variables:

First of all, I've created a new map: seeds = {"py": random.uniform(0,1)} and whenever you want to obtain again py value you have to call seeds using: seeds.get("py") (where you can change py for other values). I've also created a function so you can add keys with their numerical values as much as you want:

def raandomseed(key):
seeds.update({key:random.uniform(0,1)})

For further information about pseudo-random functions check out the Python wiki here

READ ALSO
add one row in a pandas.DataFrame

add one row in a pandas.DataFrame

I understand that pandas is designed to load fully populated DataFrame but I need to create an empty DataFrame then add rows, one by oneWhat is the best way to do this ?

735
Specify the location of steamed tweets

Specify the location of steamed tweets

I'm writing a code using python and PubNub to stream tweets, I tried to specify the language of the streamed tweets and it works but when it comes to the location I get:

173
How to execute select * into newtable from tablename using SQLAlchemy

How to execute select * into newtable from tablename using SQLAlchemy

I am trying to figure out how to execute the SQL statement "select* into newtable from table" using SQLAlchemyI am fairly new using this library

279
Edit text inside an existing PDF file with Python and acrobat pro [on hold]

Edit text inside an existing PDF file with Python and acrobat pro [on hold]

I used pdfminer to read data out of a PDF fileI did some calculations on the numbers and my next step would be to edit the text (numbers) in a complicated pdf file (with images, tables,

265