Dates from 1900-01-01 are added to my 'Time' after using df['Time'] = pd.to_datetime(phData['Time'], format='%H:%M:%S')

234
July 20, 2017, at 07:17 AM

I am a self taught coder (for around a year, so new). Here is my data

phData = pd.read_excel('phone call log & duration.xlsx')
called from called to   Date    Time    Duration in (sec)
0   7722078014  7722012013  2017-07-01  10:00:00    303
1   7722078014  7722052018  2017-07-01  10:21:00    502
2   7722078014  7450120521  2017-07-01  10:23:00    56
The dtypes are:
called from                   int64
called to                     int64
Date                 datetime64[ns]
Time                         object
Duration in (sec)             int64
dtype: object
phData['Time'] = pd.to_datetime(phData['Time'], format='%H:%M:%S')

phData.head(2)
called from called to   Date    Time    Duration in (sec)
0   7722078014  7722012013  2017-07-01  1900-01-01 10:00:00 303
1   7722078014  7722052018  2017-07-01  1900-01-01 10:21:00 502

I've managed to change the 'Time' to datetime64[ns] but somehow dates have been added?? From where I have no idea? I want to be able to analyse the Date and Time using Pandas which I'm happy to do. To explore calls made between dates and time, frequency etc. I think also I will be able to save it so it will work in Orange3. But Orange3 won't recognise the Time as a time format. I've tried stripping out the 1900-01-01 but get an error saying it can only be done if an object. I think the Time isn't a datetime but a datetime.time ??? and I'm not sure why this matters and how to simply have 2 columns one Date and another Time, that Pandas will recognise for me to mine. I have looked at countless posts and that's where I found how to use pd.to_datetime and that my issue might be datetime.time but I'm stuck after this.

Answer 1

Pandas doesn't have such dtype as Time. You can have either datetime or timedelta dtype.

Option 1: combine Date and Time into single column:

In [23]: df['TimeStamp'] = pd.to_datetime(df.pop('Date') + ' ' + df.pop('Time'))
In [24]: df
Out[24]:
   called from   called to  Duration in (sec)           TimeStamp
0   7722078014  7722012013                303 2017-07-01 10:00:00
1   7722078014  7722052018                502 2017-07-01 10:21:00
2   7722078014  7450120521                 56 2017-07-01 10:23:00

Option 2: convert Date to datetime and Time to timedelta dtype:

In [27]: df.Date = pd.to_datetime(df.Date)
In [28]: df.Time = pd.to_timedelta(df.Time)
In [29]: df
Out[29]:
   called from   called to       Date     Time  Duration in (sec)
0   7722078014  7722012013 2017-07-01 10:00:00                303
1   7722078014  7722052018 2017-07-01 10:21:00                502
2   7722078014  7450120521 2017-07-01 10:23:00                 56
In [30]: df.dtypes
Out[30]:
called from                    int64
called to                      int64
Date                  datetime64[ns]
Time                 timedelta64[ns]
Duration in (sec)              int64
dtype: object
Rent Charter Buses Company
READ ALSO
Comparing elements in list to every element in a list-of-lists

Comparing elements in list to every element in a list-of-lists

I'm trying to compare an element from one list to every element in a list-of-lists

335
Converting text to numerical codes in Python

Converting text to numerical codes in Python

I've been analyzing and annotating videos in ELAN, but my PI wants me to convert the annotations to acsv file (done), import the data to Python (done), and convert the data to numerical codes rather than text labels (HELP)

398
dataframe logical_and works fine with equals and don't work with not equals

dataframe logical_and works fine with equals and don't work with not equals

Please help me understand why the "not equal" condition doesn't work properly

231