Replacing special characters in pandas dataframe

946
August 10, 2017, at 02:41 AM

So, I have this huge DF which encoded in iso8859_15.

I have a few columns which contain names and places in Brazil, so some of them contain special characters such as "í" or "Ô".

I have the key to replace them in a dictionary {'í':'i', 'á':'a', ...}

I tried replacing it a couple of ways (below), but none of them worked.

df.replace(dictionary, regex=True, inplace=True) ###BOTH WITH AND WITHOUT REGEX AND REPLACE

Also:

df.udpate(pd.Series(dic))

None of them had the expected output, which would be for strings such as "NÍCOLAS" to become "NICOLAS".

Help?

Answer 1

The docs on pandas.DataFrame.replace says you have to provide a nested dictionary: the first level is the column name for which you have to provide a second dictionary with substitution pairs.

So, this should work:

>>> df=pd.DataFrame({'a': ['NÍCOLAS','asdč'], 'b': [3,4]})
>>> df
         a  b
0  NÍCOLAS  3
1     asdč  4
>>> df.replace({'a': {'č': 'c', 'Í': 'I'}}, regex=True)
         a  b
0  NICOLAS  3
1     asdc  4
Answer 2

replace works out of the box without specifying a specific column in Python 3.

Load Data:

df=pd.read_csv('test.csv', sep=',', low_memory=False, encoding='iso8859_15')
df

Result:

col1    col2
0   he  hello
1   Nícolas shárk
2   welcome yes

Create Dictionary:

dictionary = {'í':'i', 'á':'a'}

Replace:

df.replace(dictionary, regex=True, inplace=True)

Result:

 col1   col2
0   he  hello
1   Nicolas shark
2   welcome yes
Rent Charter Buses Company
READ ALSO
Statsmodels ARIMA - Different results using predict() and forecast()

Statsmodels ARIMA - Different results using predict() and forecast()

I would use (Statsmodels) ARIMA in order to predict values from a series:

495
Creating raw Bitcoin Cash Transaction in Python

Creating raw Bitcoin Cash Transaction in Python

If my understanding is correct Bitcoin Cash uses a new transaction format compared to Bitcoin Core

509
Tabula-py - ImportError: No module named tabula

Tabula-py - ImportError: No module named tabula

I am trying to use Tabula-py to read a pdfI installed tabula-py through pip install tabula-py

845
Where to programmattically define ProductAttributes

Where to programmattically define ProductAttributes

In developing a shipping method for a Oscar Commerce project, I found that I need two attributes; one containing a float value representing the product weight (I'm using the Scale class for weighing products) and an entity attribute that links to a shipping...

294