includes nan values; default name is ‘0’, use .reset_index(name=<new_name>)
pd.Series.mode
nums, str
basically ‘mode’; returns most common value(s)
.first()
nums, str
.rank()
“rank”
nums
returns as a series, assign to (new) column
.pct_change()
nums
% change since last entry
df.groupby("year")["passengers"].mean()df.groupby("year")["passengers"].min()df.groupby("year")["passengers"].max()df.groupby("year").count()df.groupby("year").size().reset_index(name="size")# mode can only be used in .aggdf.groupby("year").first()
month
passengers
year
1949
Jan
112
1950
Jan
115
1951
Jan
145
1952
Jan
171
1953
Jan
196
1954
Jan
204
1955
Jan
242
1956
Jan
284
1957
Jan
315
1958
Jan
340
1959
Jan
360
1960
Jan
417
.rank() and .pct_change() both return series that need to be made into their own columns.
Also the best function to start with is .describe(), because it returns a multiindex table with the functions: count, mean, std, min, 25%, 50%, 75%, max.
C:\Users\User\AppData\Local\Temp\ipykernel_16616\1929489681.py:2: FutureWarning: The provided callable <function min at 0x000001B6DD7C8FE0> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
multi = df.groupby("year").agg({"passengers": [np.min, np.mean, np.max]})
C:\Users\User\AppData\Local\Temp\ipykernel_16616\1929489681.py:2: FutureWarning: The provided callable <function mean at 0x000001B6DD7C98A0> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
multi = df.groupby("year").agg({"passengers": [np.min, np.mean, np.max]})
C:\Users\User\AppData\Local\Temp\ipykernel_16616\1929489681.py:2: FutureWarning: The provided callable <function max at 0x000001B6DD7C8EA0> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
multi = df.groupby("year").agg({"passengers": [np.min, np.mean, np.max]})
passengers
min
mean
max
year
1949
104
126.666667
148
1950
114
139.666667
170
1951
145
170.166667
199
1952
171
197.000000
242
1953
180
225.000000
272
1954
188
238.916667
302
1955
233
284.000000
364
1956
271
328.250000
413
1957
301
368.416667
467
1958
310
381.000000
505
1959
342
428.333333
559
1960
390
476.166667
622
Notice how they columns seem to be layered, and the multi.columns is giving a list of tuples instead of the normal list of strings. To get rid of this there are a few ways - including the function .to_flat_index(). But my favorite way is to join the names with a underscore (_).
multi.columns = ["_".join(col) for col in multi.columns.values]print(multi.columns)display(multi)