文章目录
一、Spark
1.1 NetWork 网络
docker network ls
docker network create --driver=bridge hadoop-network
1.2 安装 Java8
下载基础镜像:
docker pull centos:7
1.3 安装 Python 环境
Miniconda3 安装:
root@hadoop:~# apt install wget
root@hadoop:~# wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
root@hadoop:~# bash Miniconda3-latest-Linux-x86_64.sh
root@hadoop:~# source .bashrc
(base) root@hadoop:~# python
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
可参考:Miniconda安装教程
1.4 Spark 安装部署
上传安装包:
docker cp spark-2.4.0-bin-hadoop2.7.tgz ubuntu-java8:/usr/local
docker cp scala-2.11.12.tgz ubuntu-java8:/usr/local
解压:
# 解压
tar -zxf spark-2.4.0-bin-hadoop2.7.tgz
tar -zxf scala-2.11.12.tgz
# 删除
rm -rf spark-2.4.0-bin-hadoop2.7.tgz
rm -rf scala-2.11.12.tgz
修改配置(要运行本地模式的话可忽略):
cd /usr/local/spark-2.4.0-bin-hadoop2.7/conf
cp spark-env.sh.template spark-env.sh
vim spark-env.sh
# 添加内容
export JAVA_HOME=/usr/java/jdk1.8.0_212
export SCALA_HOME=/usr/local/scala-2.11.12
export SPARK_HOME=/usr/local/spark-2.4.0-bin-hadoop2.7
export HADOOP_INSTALL=/usr/local/hadoop-3.0.0
export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
注:默认没有 vi 和 vim 命令,需要安装,安装命令:apt install vim
环境变量:
root@hadoop:~# vim .bashrc
# SPARK_HOME
export SPARK_HOME=/usr/local/spark-2.4.0-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
测试命令:
spark-submit --master local[*] --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.