Associate-Developer-Apache-Spark-3.5試験無料問題集「Databricks Certified Associate Developer for Apache Spark 3.5 - Python 認定」

A developer initializes a SparkSession:

spark = SparkSession.builder \
.appName("Analytics Application") \
.getOrCreate()
Which statement describes thesparkSparkSession?

解説: (GoShiken メンバーにのみ表示されます)
A Spark application is experiencing performance issues in client mode because the driver is resource- constrained.
How should this issue be resolved?

解説: (GoShiken メンバーにのみ表示されます)
A Spark DataFramedfis cached using theMEMORY_AND_DISKstorage level, but the DataFrame is too large to fit entirely in memory.
What is the likely behavior when Spark runs out of memory to store the DataFrame?

解説: (GoShiken メンバーにのみ表示されます)
A data engineer uses a broadcast variable to share a DataFrame containing millions of rows across executors for lookup purposes. What will be the outcome?

解説: (GoShiken メンバーにのみ表示されます)
Given a CSV file with the content:

And the following code:
from pyspark.sql.types import *
schema = StructType([
StructField("name", StringType()),
StructField("age", IntegerType())
])
spark.read.schema(schema).csv(path).collect()
What is the resulting output?

解説: (GoShiken メンバーにのみ表示されます)
A data engineer is working on a real-time analytics pipeline using Apache Spark Structured Streaming. The engineer wants to process incoming data and ensure that triggers control when the query is executed. The system needs to process data in micro-batches with a fixed interval of 5 seconds.
Which code snippet the data engineer could use to fulfil this requirement?
A)

B)

C)

D)

Options:

解説: (GoShiken メンバーにのみ表示されます)
A developer is trying to join two tables,sales.purchases_fctandsales.customer_dim, using the following code:

fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid')) The developer has discovered that customers in thepurchases_fcttable that do not exist in thecustomer_dimtable are being dropped from the joined table.
Which change should be made to the code to stop these customer records from being dropped?

解説: (GoShiken メンバーにのみ表示されます)
A data engineer noticed improved performance after upgrading from Spark 3.0 to Spark 3.5. The engineer found that Adaptive Query Execution (AQE) was enabled.
Which operation is AQE implementing to improve performance?

解説: (GoShiken メンバーにのみ表示されます)