-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
BugDuplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestGroupbyTestingpandas testing functions or related to the test suitepandas testing functions or related to the test suiteTimezonesTimezone data dtypeTimezone data dtype
Description
xref #12898 (same fix)
(c.f. http://stackoverflow.com/questions/31617084/how-to-have-groupby-first-not-remove-timezone-info-from-datetime-columns)
Take a dataframe with a column of tz-aware datetime.datetime objects, and group it by a different column, then return the first row from each group. There are some ways to do this that leave the datetime as it is; and then at least two ways that convert it to a tz-naive pandas Timestamp object.
In [1]: import pandas as pd
In [2]: import datetime
In [3]: import pytz
In [4]: dates = [datetime.datetime(2015,1,i,tzinfo=pytz.timezone('US/Pacific')) for i in range(1,5)]
In [5]: df = pd.DataFrame({'A': ['a','b']*2,'B': dates})
In [6]: df
Out[6]:
A B
0 a 2015-01-01 00:00:00-08:00
1 b 2015-01-02 00:00:00-08:00
2 a 2015-01-03 00:00:00-08:00
3 b 2015-01-04 00:00:00-08:00
In [7]: grouped = df.groupby('A')
In [8]: grouped.nth(0) #B stays a datetime.datetime with timezone info
Out[8]:
B
A
a 2015-01-01 00:00:00-08:00
b 2015-01-02 00:00:00-08:00
In [9]: grouped.head(1) #B stays a datetime.datetime with timezone
Out[9]:
B
0 2015-01-01 00:00:00-08:00
1 2015-01-02 00:00:00-08:00
In [10]: grouped.first() #B is naive pd.TimeStamp in UTC
Out[10]:
B
A
a 2015-01-01 08:00:00
b 2015-01-02 08:00:00
And apparently grouped.apply(lambda x: x.iloc[0])
does the same as .first()
.
leonsas
Metadata
Metadata
Assignees
Labels
BugDuplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestGroupbyTestingpandas testing functions or related to the test suitepandas testing functions or related to the test suiteTimezonesTimezone data dtypeTimezone data dtype