首先导入的时候,就会报IntegerType not accept blabla
当你把schema指定为StringType,再用cast转成IntegerType,里面的值会是Null。
from pyspark.sql import SparkSession
from pyspark.sql.types import StructField, StructType, StringType, LongType
spark = SparkSession.builder.appName("example").getOrCreate()
schema = StructType([
StructField("created_at", StringType(), True)
])
df = spark.createDataFrame(documents, schema=schema)
df = df.withColumn("created_at", col("created_at").cast("integer"))
而且如果有这个类型的数据,随便df.show()一下它就会报:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 55) (driver-7b9bff5d64-v94tb executor driver): net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for bson.int64.Int64). This happens when an unsupported/unregistered class is being unpickled that requires construction argumen