Windows环境下使用pyspark创建和使用DataFrame出现Py4JJavaError错误
测试代码
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("myfirst_spark").master("local[*]").getOrCreate()
data_frame = spark.createDataFrame([
(1,134.5,5.2,23,'A'),
(2,162.2,5.5,45,'A'),
(3,114.1,5.2,53,'B'),
(4,144.4,5.4,32,'C'),
(5,122.2,5.7,24,'B'),
(3,154.1,5.1,43,'B'),
(5,122.5,5.2,56,'D'),
],['id','weight','height','age','score'])
data_frame.show()
执行之后发现报错Py4JJavaError: An error occurred while calling o45.showString.后来又试着执行data_frame.count()也会出现近似的错误,在网上找不到解决办法。想到网上大多都是在linux环境下操作的,有可能和spark版本有关,于是从2.4.0降到2.3.2版本后再次运行。
+---+------+------+---+-----+
| id|weight|height|age|score|
+---+------+------+---+-----+
| 1| 134.5| 5.2| 23| A|
| 2| 162.2| 5.5| 45| A|
| 3| 114.1| 5.2| 53| B|
| 4| 144.4| 5.4| 32|