Databricks-Certified-Data-Engineer-Professional試験無料問題集（127題）「Databricks Certified Data Engineer Professional 認定」

出題：1

The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalid latitude and longitude values in the activity_details table have been breaking their ability to use other geolocation processes.
A junior engineer has written the following code to add CHECK constraints to the Delta Lake table:

A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed.
Which statement explains the cause of this failure?

A. The activity details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table.

B. The current table schema does not contain the field valid coordinates; schema evolution will need Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from to be enabled before altering the table to add a constraint.

C. The activity details table already contains records; CHECK constraints can only be added prior to inserting values into a table.

D. The activity details table already exists; CHECK constraints can only be added during initial table creation.

E. Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.
The proposed directory structure is displayed below:

Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

A. Yes; Delta Lake supports infinite concurrent writers.

B. No; Delta Lake manages streaming checkpoints in the transaction log.

C. No; only one stream can write to a Delta Lake table.

D. Yes; both of the streams can share a single checkpoint directory.
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

E. No; each of the streams needs to have its own checkpoint directory.

正解：E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

The following table consists of items found in user carts within an e-commerce website.
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

The following MERGE statement is used to update this table using an updates view, with schema evaluation enabled on this table.

How would the following update be handled?

A. The new nested field is added to the target schema, and files underlying existing records are updated to include NULL values for the new field.

B. The update throws an error because changes to existing columns in the target schema are not supported.

C. The update is moved to separate ''restored'' column because it is missing a column expected in the target schema.

D. The new restored field is added to the target schema, and dynamically read as NULL for existing unmatched records.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

A member of the data engineering team has submitted a short notebook that they wish to schedule as part of a larger data pipeline. Assume that the commands provided below produce the logically correct results when run as presented.
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

Which command should be removed from the notebook before scheduling it as a job?

A. Cmd 3

B. Cmd 5

C. Cmd 6

D. Cmd 4

E. Cmd 2

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：5

The data architect has mandated that all tables in the Lakehouse should be configured as external Delta Lake tables.
Which approach will ensure that this requirement is met?

A. When tables are created, make sure that the external keyword is used in the create table statement.

B. When the workspace is being configured, make sure that external cloud object storage has been mounted.

C. Whenever a table is being created, make sure that the location keyword is used.

D. When configuring an external data warehouse for all table storage. leverage Databricks for all ELT.

E. Whenever a database is being created, make sure that the location keyword is used Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

Which of the following technologies can be used to identify key areas of text when parsing Spark Driver log4j output?

A. Regex

B. C++

C. Scala Datasets

D. Julia

E. pyspsark.ml.feature

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

The data architect has decided that once data has been ingested from external sources into the Databricks Lakehouse, table access controls will be leveraged to manage permissions for all production tables and views.
The following logic was executed to grant privileges for interactive queries on a production database to the core engineering group.
GRANT USAGE ON DATABASE prod TO eng;
GRANT SELECT ON DATABASE prod TO eng;
Assuming these are the only privileges that have been granted to the eng group and that these users are not workspace administrators, which statement describes their privileges?

A. Group members have full permissions on the prod database and can also assign permissions to other users or groups.

B. Group members are able to create, query, and modify all tables and views in the prod database, but cannot define custom functions.

C. Group members are able to list all tables in the prod database but are not able to see the results of any queries on those tables.

D. Group members are able to query all tables and views in the prod database, but cannot create or edit anything in the database.

E. Group members are able to query and modify all tables and views in the prod database, but cannot create new tables or views.

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in the Kafka source.
That field was further missing from data written to dependent, long-term storage. The retention threshold on the Kafka service is seven days. The pipeline has been in production for three months.
Which describes how Delta Lake can help to avoid data loss of this nature in the future?

A. Ingestine all raw data and metadata from Kafka to a bronze Delta table creates a permanent, replayable history of the data state.Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

B. Data can never be permanently dropped or deleted from Delta Lake, so data loss is not possible under any circumstance.

C. Delta Lake schema evolution can retroactively calculate the correct value for newly added fields, as long as the data was in the original source.

D. The Delta log and Structured Streaming checkpoints record the full history of the Kafka producer.

E. Delta Lake automatically checks that all fields present in the source data are included in the ingestion layer.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：9

A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.
One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.
What approach would allow them to do this?

A. Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.

B. Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.

C. Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.

D. Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：10

The data engineering team maintains the following code:
Get Latest & Actual Certified-Data-Engineer-Professional Exam's Question and Answers from

Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and validated, which statement describes what will occur when this code is executed?

A. An incremental job will detect if new rows have been written to any of the source tables; if new rows are detected, all results will be recalculated and used to overwrite the enriched_itemized_orders_by_account table.

B. A batch job will update the enriched_itemized_orders_by_account table, replacing only those rows that have different values than the current version of the table, using accountID as the primary key.

C. An incremental job will leverage information in the state store to identify unjoined rows in the source tables and write these rows to the enriched_iteinized_orders_by_account table.

D. No computation will occur until enriched_itemized_orders_by_account is queried; upon query materialization, results will be calculated using the current valid version of data in each of the three tables referenced in the join logic.

E. The enriched_itemized_orders_by_account table will be overwritten using the current valid version of data in each of the three tables referenced in the join logic.

正解：E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

Databricks-Certified-Data-Engineer-Professional試験無料問題集「Databricks Certified Data Engineer Professional 認定」