供兄弟们检索这个问题
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
py4j.protocol.Py4JError: An error occurred while calling
报错log全文
2024-11-06T14:15:57.638+0800: 1.362: [GC (Allocation Failure) [PSYoungGen: 209920K->13072K(279552K)] 209920K->13088K(2027520K), 0.0144828 secs] [Times: user=0.02 sys=0.01, real=0.02 secs]
2024-11-06T14:15:57.846+0800: 1.570: [GC (Metadata GC Threshold) [PSYoungGen: 43298K->6592K(489472K)] 43314K->6616K(2237440K), 0.0055518 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2024-11-06T14:15:57.852+0800: 1.576: [Full GC (Metadata GC Threshold) [PSYoungGen: 6592K->0K(489472K)] [ParOldGen: 24K->6391K(309760K)] 6616K->6391K(799232K), [Metaspace: 20673K->20673K(1067008K)], 0.0246174 secs] [Times: user=0.06 sys=0.01, real=0.03 secs]
2024-11-06T14:15:59.520+0800: 3.244: [GC (Metadata GC Threshold) [PSYoungGen: 225839K->13730K(489472K)] 232230K->20201K(799232K), 0.0128483 secs] [Times: user=0.03 sys=0.00, real=0.02 secs]
2024-11-06T14:15:59.533+0800: 3.257: [Full GC (Metadata GC Threshold) [PSYoungGen: 13730K->0K(489472K)] [ParOldGen: 6471K->18273K(479744K)] 20201K->18273K(969216K), [Metaspace: 33753K->33753K(1079296K)], 0.0308338 secs] [Times: user=0.06 sys=0.01, real=0.03 secs]
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/anti.py] specified in 'spark.submit.pyFiles' to Python path:YYYTTTUUU.jarxxxxxxxx/__pyfiles__xxxx/pyspark.zipxxxx/py4j-0.10.9.5-src.zipxxxx/executor_user_py_env/lib/python39.zipxxxx/executor_user_py_env/lib/python3.9xxxx/executor_user_py_env/lib/python3.9/lib-dynloadxxxx/executor_user_py_env/lib/python3.9/site-packageswarnings.warn(
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/common_define.py] specified in 'spark.submit.pyFiles' to Python path:YYYTTTUUU.jarxxxxxxxx/__pyfiles__xxxx/pyspark.zipxxxx/py4j-0.10.9.5-src.zipxxxx/executor_user_py_env/lib/python39.zipxxxx/executor_user_py_env/lib/python3.9xxxx/executor_user_py_env/lib/python3.9/lib-dynloadxxxx/executor_user_py_env/lib/python3.9/site-packageswarnings.warn(
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/config.py] specified in 'spark.submit.pyFiles' to Python path:YYYTTTUUU.jarxxxxxxxx/__pyfiles__xxxx/pyspark.zipxxxx/py4j-0.10.9.5-src.zipxxxx/executor_user_py_env/lib/python39.zipxxxx/executor_user_py_env/lib/python3.9xxxx/executor_user_py_env/lib/python3.9/lib-dynloadxxxx/executor_user_py_env/lib/python3.9/site-packageswarnings.warn(
xxxx/pyspark.zip/pyspark/context.py:340: RuntimeWarning: Failed to add file [EEE/utils.py] specified in 'spark.submit.pyFiles' to Python path:YYYTTTUUU.jarxxxxxxxx/__pyfiles__xxxx/pyspark.zipxxxx/py4j-0.10.9.5-src.zipxxxx/executor_user_py_env/lib/python39.zipxxxx/executor_user_py_env/lib/python3.9xxxx/executor_user_py_env/lib/python3.9/lib-dynloadxxxx/executor_user_py_env/lib/python3.9/site-packageswarnings.warn(
================================
[INFO] SQL DATA READING STARTING |
================================
2024-11-06T14:16:03.676+0800: 7.399: [GC (Metadata GC Threshold) [PSYoungGen: 333446K->23988K(641024K)] 351720K->42270K(1120768K), 0.0201206 secs] [Times: user=0.04 sys=0.01, real=0.02 secs]
2024-11-06T14:16:03.696+0800: 7.419: [Full GC (Metadata GC Threshold) [PSYoungGen: 23988K->0K(641024K)] [ParOldGen: 18281K->31872K(658432K)] 42270K->31872K(1299456K), [Metaspace: 55449K->55446K(1101824K)], 0.0935589 secs] [Times: user=0.24 sys=0.02, real=0.09 secs]
2024-11-06T14:16:08.425+0800: 12.149: [GC (Metadata GC Threshold) [PSYoungGen: 436446K->42196K(670720K)] 468318K->74076K(1329152K), 0.0335286 secs] [Times: user=0.06 sys=0.02, real=0.03 secs]
2024-11-06T14:16:08.459+0800: 12.182: [Full GC (Metadata GC Threshold) [PSYoungGen: 42196K->0K(670720K)] [ParOldGen: 31880K->55897K(916480K)] 74076K->55897K(1587200K), [Metaspace: 93222K->93222K(1134592K)], 0.0877626 secs] [Times: user=0.18 sys=0.03, real=0.09 secs]
ERROR:root:Exception while sending command.
Traceback (most recent call last):File "xxxx/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 516, in send_commandraise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is emptyDuring handling of the above exception, another exception occurred:Traceback (most recent call last):File "xxxx/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1038, in send_commandresponse = connection.send_command(command)File "xxxx/py4j-0.10.9.5-src.zip/py4j/clientserver.py", line 539, in send_commandraise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Traceback (most recent call last):File "xxxx/main.py", line 142, in <module>df_spark.show(3, truncate=False)File "xxxx/pyspark.zip/pyspark/sql/dataframe.py", line 615, in showFile "xxxx/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__File "xxxx/pyspark.zip/pyspark/sql/utils.py", line 190, in decodef __init__(self, *args, **kwargs):File "xxxx/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 334, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o190.showString
报错代码:
from pyspark.sql import SparkSession# 创建一个SparkSession
spark = (SparkSession.builder.appName("spark XX Demo").master("local[1]") # <----- 把这里删掉就好了.getOrCreate()
)
报错原因:
在使用 PySpark 本地模式时,.master("local[1]")
和没有指定 master 设置之间是有区别的:
.master("local[1]")
:
- 这个配置表示使用单线程在本地运行 Spark 应用程序。所有任务都会在同一个线程中顺序执行。
- 在这种模式下,Spark 的执行环境是非常受限的,尤其是当你的应用程序涉及到多线程或需要并行执行时,可能会遇到资源不足或死锁等问题。
- 当你使用
local[1]
时,所有的任务(包括驱动程序和执行器)都在同一个线程中运行,这可能导致 Py4J 的通信问题,因为没有足够的线程来处理 Python 和 JVM 之间的通信。
没有指定 master
:
- 如果没有指定
master
,Spark 默认为local[*]
,即使用本地机器上所有可用的 CPU 核心。 - 这种情况下,Spark 可以并行执行任务,充分利用多核 CPU 的资源,更加接近于集群模式的运行方式,也更不容易出现资源竞争的问题。
- 使用多个线程可以避免一些因为单线程执行导致的网络通信阻塞或资源竞争问题。
因此,删除 .master("local[1]")
之后,Spark 使用了多线程模式,这解决了之前由于单线程执行导致的 Py4J 网络通信问题。在开发和测试时,使用 local[*]
通常会提供更好的性能和更少的运行时问题。