Calculating row-wise delta values in a DataFrame

439
March 13, 2017, at 7:24 PM

I'm trying to calculate what I am calling "delta values", meaning the amount that has changed between two consecutive rows.

For example

A  | delta_A
1  | 0
2  | 1
5  | 3
9  | 4

I managed to do that starting with this code (basically copied from a MatLab program I had)

df = df.assign(delta_A=np.zeros(len(df.A)))
df['delta_A'][0] = 0  # start at 'no-change'
df['delta_A'][1:] = df.A[1:].values - df.A[:-1].values

Which generates the dataframe correctly, and seems to have no further negative affects

However, I think there is something wrong with that approach becuase I get these messages.

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy .../__main__.py:5: SettingWithCopyWarning

So, I didn't really understand what that link was trying to say, and I found this post

Adding new column to existing DataFrame in Python pandas

And, as the latest edit to the answer there says to use this code, but I have already used that syntax...

 df1 = df1.assign(e=p.Series(np.random.randn(sLength)).values)

So, question is - Is the loc() function the way to go, or what is the more correct way to get that column?

Answer 1

It seems you need diff and then replace NaN with 0:

df['delta_A'] = df.A.diff().fillna(0).astype(int)
   A  delta_A
0  0        0
1  4        4
2  7        3
3  8        1

Alternative solution with assign

df = df.assign(delta_A=df.A.diff().fillna(0).astype(int))
   A  delta_A
0  0        0
1  4        4
2  7        3
3  8        1

Another solution if you need to replace only first NaN value:

df['delta_A'] = df.A.diff()
df.loc[df.index[0], 'delta_A'] = 0
print (df)
   A  delta_A
0  0      0.0
1  4      4.0
2  7      3.0
3  8      1.0

Your solution can be modified with iloc, but I think it's better to use the diff function:

df['delta_A'] = 0  # convert all values to 0
df['delta_A'].iloc[1:] = df.A[1:].values - df.A[:-1].values
#also works
#df['delta_A'][1:] = df.A[1:].values - df.A[:-1].values
print (df)
   A  delta_A
0  0        0
1  4        4
2  7        3
3  8        1
Rent Charter Buses Company
READ ALSO
How to handle exception in using(Py.GIL()) block pythonnet

How to handle exception in using(Py.GIL()) block pythonnet

Is there a way to handle exception in using(PyGIL()) block?

633
CNTK deactivate gpu locking with python API

CNTK deactivate gpu locking with python API

How can I deactivate the GPU locking by cntk from the python API (20 beta 12

601
can't serialize <type 'array.array'>

can't serialize <type 'array.array'>

I am trying to load a file(pkl) pickled in jython and access it through python using execnet

295
Recording Time taken to type sentence(Code not working)

Recording Time taken to type sentence(Code not working)

I've recently started coding and I thought about making this little programIt displays a sentence and the user has to type it as quickly as possible, then it returns the time taken for them to type it

348