公交车到站预测2----数据后处理_公交到站预测数据集-CSDN博客

本文介绍在公交车到站预测项目中，如何进行数据后处理，删除缺失值，保留完整的数据记录。同时提及scikit-learn库作为机器学习工具的潜力，并暗示未来可能分享相关教程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

之前从csv获取了数据，但是我们的目的是用机器学习的方式对其分类。目测使用sklearn的机器学习库，所以要把数据处理成符合要求的格式。

import time        
import numpy as np           
from sklearn import cluster,datasets

首先是我们需要的一些模块，time就是时间处理的模块，这里的作用就是把表示时间的字符串转换成时间戳，这样就可以计算字符串代表的时间的含义了。numpy个人理解就是可以让python实现基本的matlab功能，也就是说是矩阵处理和基本数学处理的模块。sklearn就是机器学习模块了。十分强大。

def clearDic(dic,number):#delet the unfitable records
    dicb = dic.copy()
    for k in dic:
        if len(dic[k].keys()) != 20:
        del dicb[k]
    return dicb#retrun the clear dictionary

这是将之前处理好的字典格式的记录中的残缺数据删除，只保留下具有完整数据组的记录。

def dic2list(dic):#transfer the dict to list for the convienence of the sklearn lib
    resultList = []#the list to store the result
    tempList = []
    for k in dic:
        for seKey in dic[k]:
            tempList = tempList+dic[k][seKey]
        resultList.append(tempList)
        tempList