Associate-Developer-Apache-Spark-3.5試験無料問題集（85題）「Databricks Certified Associate Developer for Apache Spark 3.5

出題：1

A Spark application suffers from too many small tasks due to excessive partitioning. How can this be fixed without a full shuffle?
Options:

A. Use the repartition() transformation with a lower number of partitions

B. Use the sortBy() transformation to reorganize the data

C. Use the distinct() transformation to combine similar partitions

D. Use the coalesce() transformation with a lower number of partitions

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

A data engineer needs to write a DataFramedfto a Parquet file, partitioned by the columncountry, and overwrite any existing data at the destination path.
Which code should the data engineer use to accomplish this task in Apache Spark?

A. df.write.mode("append").partitionBy("country").parquet("/data/output")

B. df.write.mode("overwrite").partitionBy("country").parquet("/data/output")

C. df.write.mode("overwrite").parquet("/data/output")

D. df.write.partitionBy("country").parquet("/data/output")

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

What is the benefit of using Pandas on Spark for data transformations?
Options:

A. It executes queries faster using all the available cores in the cluster as well as provides Pandas's rich set of features.

B. It computes results immediately using eager execution, making it simple to use.

C. It is available only with Python, thereby reducing the learning curve.

D. It runs on a single node only, utilizing the memory with memory-bound DataFrames and hence cost- efficient.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

A data engineer needs to persist a file-based data source to a specific location. However, by default, Spark writes to the warehouse directory (e.g., /user/hive/warehouse). To override this, the engineer must explicitly define the file path.
Which line of code ensures the data is saved to a specific location?
Options:

A. users.write.saveAsTable("default_table").option("path", "/some/path")

B. users.write.saveAsTable("default_table", path="/some/path")

C. users.write(path="/some/path").saveAsTable("default_table")

D. users.write.option("path", "/some/path").saveAsTable("default_table")

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：5

A Spark developer is building an app to monitor task performance. They need to track the maximum task processing time per worker node and consolidate it on the driver for analysis.
Which technique should be used?

A. Use an accumulator to record the maximum time on the driver

B. Configure the Spark UI to automatically collect maximum times

C. Broadcast a variable to share the maximum time among workers

D. Use an RDD action like reduce() to compute the maximum time

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

An engineer has a large ORC file located at/file/test_data.orcand wants to read only specific columns to reduce memory usage.
Which code fragment will select the columns, i.e.,col1,col2, during the reading process?

A. spark.read.format("orc").load("/file/test_data.orc").select("col1", "col2")

B. spark.read.orc("/file/test_data.orc").filter("col1 = 'value' ").select("col2")

C. spark.read.orc("/file/test_data.orc").selected("col1", "col2")

D. spark.read.format("orc").select("col1", "col2").load("/file/test_data.orc")

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

A developer runs:

What is the result?
Options:

A. It appends new partitions to an existing Parquet file.

B. It throws an error if there are null values in either partition column.

C. It creates separate directories for each unique combination of color and fruit.

D. It stores all data in a single Parquet file.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

Given the code fragment:

import pyspark.pandas as ps
psdf = ps.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
Which method is used to convert a Pandas API on Spark DataFrame (pyspark.pandas.DataFrame) into a standard PySpark DataFrame (pyspark.sql.DataFrame)?

A. psdf.to_pyspark()

B. psdf.to_dataframe()

C. psdf.to_spark()

D. psdf.to_pandas()

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

Associate-Developer-Apache-Spark-3.5試験無料問題集「Databricks Certified Associate Developer for Apache Spark 3.5 - Python 認定」