数据分析_用户维度复购周期可视化分析(Pandas和Matplotlib)
分析维度包括:
-
复购次数
-
复购周期
-
复购周期对比
构建测试数据
这里你可以了解到:
-
如何生成时间相关的数据。
-
如何从列表(可迭代对象)中生成随机数据。
-
Pandas 的 DataFrame 自行创建, 包含生成新字段。
-
Pandas 数据合并。
(1) 构建数据: 时间字段:
time_range = pd.date_range(start="2019/01/01", end="2021/12/31")
print("[Message] Time Range Built Through Pandas:")
print(time_range)
print("[Message] Time Length Built Through Pandas:")
print(len(time_range))
输出:
[Message] Time Range Built Through Pandas:
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
'2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
'2019-01-09', '2019-01-10',
...
'2021-12-22', '2021-12-23', '2021-12-24', '2021-12-25',
'2021-12-26', '2021-12-27', '2021-12-28', '2021-12-29',
'2021-12-30', '2021-12-31'],
dtype='datetime64[ns]', length=1096, freq='D')
[Message] Time Length Built Through Pandas:
1096
(2) 构建数据: 水果列表:
fruits = ["香蕉", "苹果", "葡萄", "橙子", "哈密瓜", "芭乐", "梨", "桃子"]
fruits_list = np.random.choice(fruits, size=len(time_range), replace=True)
print("[Message] Fruits List Built Through NumPy:")
print(fruits_list)
print("[Message] Length of Fruits List Built Through NumPy:")
print(len(fruits_list))
输出:
[Message] Fruits List Built Through NumPy:
['香蕉' '葡萄' '香蕉' ... '香蕉' '橙子' '桃子']
[Message] Length of Fruits List Built Through NumPy:
1096
(3) 构建数据: 客户列表:
names = ["Mike", "Jhon", "Tom", "Xiaoming", "Jimmy", "Lym", "Michk"]
names_list = np.random.choice(names, size=len(time_range), replace=True)
print("[Message] Customer List Built Through NumPy:")
print(names_list)
print("[Message] Length of Customer List Built Through NumPy:")
print(len(names_list))
输出:
[Message] Customer List Built Through NumPy:
['Mike' 'Michk' 'Michk' ... 'Xiaoming' 'Jhon' 'Lym']
[Message] Length of Customer List Built Through NumPy:
1096
(4) 构建数据: 订单数据:
order = pd.DataFrame({
"time": time_range, # -> 下单时间。
"fruit": fruits_list, # -> 水果名称。
"name": names_list, # -> 顾客名。
"kilogram": np.random.choice(list(range(50,100)), size=len(time_range),replace=True)}) # -> 购买量。
print("[Message] Generate Order Data Through Pandas DataFrame:")
print(order)
输出:
[Message] Generate Order Data Through Pandas DataFrame:
time fruit name kilogram
0 2019-01-01 香蕉 Mike 63
1 2019-01-02 葡萄 Michk 69
2 2019-01-03 香蕉 Michk 51
3 2019-01-04 香蕉 Mike 69
4 2019-01-05 香蕉 Tom 64
... ... ... ... ...
1091 2021-12-27 葡萄 Lym 94
1092 2021-12-28 梨 Xiaoming 60
1093 2021-12-29 香蕉 Xiaoming 95
1094 2021-12-30 橙子 Jhon 90
1095 2021-12-31 桃子 Lym 93
[1096 rows x 4 columns]
(5) 构建数据: 水果信息:
information = pd.DataFrame({
"fruit": fruits,
"price": [3.8, 8.9, 12.8, 6.8, 15.8, 4.9, 5.8, 7],
"region": ["华南", "华北", "西北", "华中", "西北", "华南", "华北", "华中"]})
print("[Message] Building Fruits Information Through Pandas DataFrame:")
print(information)
输出:
[Message] Building Fruits Information Through Pandas DataFrame:
fruit price region
0 香蕉 3.8 华南
1 苹果 8.9 华北
2 葡萄 12.8 西北
3 橙子 6.8 华中
4 哈密瓜 15.8 西北
5 芭乐 4.9 华南
6 梨 5.8 华北
7 桃子 7.0 华中
(6) 构建数据: 合并订单数据和水果信息:
# 将订单信息和水果信息直接合并成一个完整的 DataFrame, 这个 df 就是要用到的测试数据。
df = pd.merge(order, information, how="outer").sort_values(