CDH集群部署PySpark

https://docs.cloudera.com/documentation/enterprise/latest/topics/spark_python.html
python环境为3.7.2,通过Anaconda-5.3.1-el7.parcel部署安装

 

在CM配置Spark的Python环境,并重启相关服务


if [ -z "${PYSPARK_PYTHON}" ]; then
export PYSPARK_PYTHON=/opt/cloudera/parcels/Anaconda-5.3.1/bin/python
export PYSPARK_DRIVER_PYTHON=/opt/cloudera/parcels/Anaconda-5.3.1/bin/python
fi

使用Pyspark命令测试

点赞

发表评论

电子邮件地址不会被公开。必填项已用 * 标注