Add new column to Pandas DataFrame and fill with first word from another column from same df

416
April 23, 2017, at 07:19 AM

I have a dataset of crimes reported by Gloucestershire Constabulary from 2011-16. It's a .csv file that I have imported to a Pandas dataframe. The data include a column stating the Lower Super Output Area (LSOA) in which the crime occurred, so for crimes in Tewkesbury, for instance, each record has the corresponding LSOA name, e.g. 'Tewkesbury 009D'; 'Tewkesbury 009E'.

I want to group these data by the town/city they relate to, e.g. 'Gloucester', 'Tewkesbury', ignoring the specific LSOAs within each conurbation. Ideally, I would append a new column to the dataframe, with just the place name copied across, and group on that. I am comfortable with how to do the grouping, just not the new column in the first place. Any advice on how to do this is gratefully received.

Answer 1

I am no Pandas expert but I think you can do string slicing to strip out the last five digits (it supports regex too if I recall correctly, so you can do a proper 'search' if required).

#x is the original dataframe
new_col = x.lsoa.str[:-5]    #lsoa is the column containing city names
pd.concat([x, new_col], axis=1)

The str method can be used to extract a string out of the lsoa column of the dataframe.

Answer 2

Something along these lines should work:

df['town'] = [x.split()[0] for x in df['LSOA']]
Answer 3

You can use regex to extract the city name from the DataFrame and then join the result to the original DataFrame. If your inital DataFrame is df

df = pd.DataFrame([ 'Tewkesbury 009D', 'Tewkesbury 009E'], columns=['LSOA'])
In [2]: df
Out[2]: 
              LSOA
0  Tewkesbury 009D
1  Tewkesbury 009E

Then you can extract the city name and optionally the LSOA code in to a new DataFrame df_new

df_new = df['LSOA'].str.extract('(\w*)\s(\d+\w*)', expand=True)
In [10]: df_new
Out[10]: 
            0     1
0  Tewkesbury  009D
1  Tewkesbury  009E

If you want to discard the code and just keep the city name remove the second bracket from the regex as '(\w*)\s\d+\w*' . Now you can append this result to the original DataFrame

In [11]: df.join(df_new)
Out[11]: 
              LSOA           0     1
0  Tewkesbury 009D  Tewkesbury  009D
1  Tewkesbury 009E  Tewkesbury  009E
READ ALSO
SetText TypeError Pyqt4

SetText TypeError Pyqt4

I recently started creating a web browser when I started to run into some problems

415
Making an index array that contains the position of some value in another array

Making an index array that contains the position of some value in another array

I am trying to create a Python list that contains indices of the elements equal to 1 in another integer list (or Numpy array)What I am trying is something like this (for either 1- or 2-dimensional case):

219
Graph based on pairwise comparison

Graph based on pairwise comparison

I compared two list of terms:

254
TypeError: unsupported operand type(s) for *: 'PositiveIntegerField' and 'int'

TypeError: unsupported operand type(s) for *: 'PositiveIntegerField' and 'int'

Hello I am having a problem which I guess is really simpleI have the following class:

298