DEA-C02試験無料問題集（354題）「Snowflake SnowPro Advanced: Data Engineer (DEA-C02) 認定」

出題：1

You are tasked with optimizing a Snowpipe Streaming pipeline that ingests data from Kafka into a Snowflake table named 'ORDERS' You notice that while the Kafka topic has high throughput, the data ingestion into Snowflake is lagging. The pipe definition is as follows: "sql CREATE OR REPLACE PIPE ORDERS_PIPEAS COPY INTO ORDERS FROM @KAFKA STAGE FILE_FORMAT = (TYPE = JSON); Which of the following actions, taken individually, would be MOST effective in improving the ingestion rate, assuming sufficient compute resources are available in your Snowflake virtual warehouse?

A. Implement batching within the Kafka producer to send larger messages.

B. Increase the size of the virtual warehouse associated with the Snowflake account.

C. Tune the parameter in the file format definition to a smaller value.

D. Increase the number of Kafka partitions and ensure Snowflake has enough compute to consume them in parallel using Snowpipe Streaming.

E. Enable auto-ingest for the Snowpipe.

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

You are tasked with managing a large Snowflake table called 'TRANSACTIONS'. Due to compliance requirements, you need to archive data older than one year to long-term storage (AWS S3) while ensuring the queries against the current 'TRANSACTIONS' table remain performant. What is the MOST efficient strategy using Snowflake features and considering minimal impact on query performance?

A. Use Time Travel to clone the "TRANSACTIONS' table to a point in time one year ago. Then, export the cloned table to S3 and drop the cloned table. Delete the archived data from the 'TRANSACTIONS table.

B. Create an external table pointing to S3. Then create new table named 'TRANSACTIONS_ARCHIVE in Snowflake, copy the historical data from 'TRANSACTIONS' table into 'TRANSACTIONS ARCHIVE, and then delete the archived data from the 'TRANSACTIONS' table.

C. Create a new table 'TRANSACTIONS_ARCHIVE in Snowflake, copy the historical data, and then delete the archived data from the 'TRANSACTIONS table.

D. Export the historical data to S3 using COPY INTO, truncate the 'TRANSACTIONS' table, and then create an external table pointing to the archived data in S3.

E. Partition the 'TRANSACTIONS table by date. Export the old partitions of the 'TRANSACTIONS' table to S3 using COPY INTO. Then, drop the old partitions from the 'TRANSACTIONS table and create an external table that points to the data in S3.

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

You are developing a Snowpark Python application that needs to process data from a Kafka topic. The data is structured as Avro records. You want to leverage Snowpipe for ingestion and Snowpark DataFrames for transformation. What is the MOST efficient and scalable approach to integrate these components?

A. Use Snowpipe to ingest the Avro data to a raw table stored as binary. Then, use a Snowpark Python UDF with an Avro deserialization library to convert the binary data to a Snowpark DataFrame.

B. Create external functions to pull the Avro data into a Snowflake stage and then read the data with Snowpark DataFrames for transformation.

C. Convert Avro data to JSON using a Kafka Streams application before ingestion. Use Snowpipe to ingest the JSON data to a VARIANT column and then process it using Snowpark DataFrames.

D. Configure Snowpipe to ingest the raw Avro data into a VARIANT column in a staging table. Utilize a Snowpark DataFrame with Snowflake's get_object field function on the variant to get an object by name, and create columns based on each field.

E. Create a Kafka connector that directly writes Avro data to a Snowflake table. Then, use Snowpark DataFrames to read and transform the data from that table.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

You are responsible for monitoring the performance of a Snowflake data pipeline that loads data from S3 into a Snowflake table named 'SALES DATA. You notice that the COPY INTO command consistently takes longer than expected. You want to implement telemetry to proactively identify the root cause of the performance degradation. Which of the following methods, used together, provide the MOST comprehensive telemetry data for troubleshooting the COPY INTO performance?

A. Query the 'COPY_HISTORY view and the view in 'ACCOUNT_USAG Also, check the S3 bucket for throttling errors.

B. Use Snowflake's partner connect integrations to monitor the virtual warehouse resource consumption and query the 'VALIDATE function to ensure data quality before loading.

C. Query the 'COPY HISTORY view in the 'INFORMATION SCHEMA' and enable Snowflake's query profiling for the COPY INTO statement.

D. Query the 'COPY HISTORY view in the 'INFORMATION SCHEMA' and monitor CPU utilization of the virtual warehouse using the Snowflake web I-Jl.

E. Query the ' LOAD_HISTORY function and monitor the network latency between S3 and Snowflake using an external monitoring tool.

正解：A,C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：5

A critical database, 'PRODUCTION DB', in your Snowflake account was accidentally dropped. You need to restore it as quickly as possible, but you're unsure if Time Travel retention is sufficient. Which method guarantees restoration of the database even if it falls outside the Time Travel window?

A. Fail-safe cannot be directly accessed by the user for restoration purposes; it is only used by Snowflake Support in extreme disaster recovery scenarios.

B. Utilize the data cloning feature: 'CREATE DATABASE CLONE PRODUCTION_DB BEFORE (STATEMENT 'DROP DATABASE PRODUCTION_DB');'

C. Restore from a Snowflake-managed backup using the 'CREATE DATABASE ... FROM BACKUP' command. Specify the timestamp before the drop occurred.

D. Use the 'UNDROP DATABASE PRODUCTION command.

E. Contact Snowflake Support and request restoration from Fail-safe.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

A Snowflake data warehouse contains a table 'WEB EVENTS' with columns like 'EVENT ID', 'EVENT TIMESTAMP, 'USER , 'PAGE URL', and 'SESSION ID'. The data engineering team has enabled search optimization on 'PAGE URL' because analysts frequently filter on specific URLs. However, they notice that queries filtering on multiple 'PAGE URL' values (e.g., using 'WHERE PAGE URL IN ('urll', 'ur12', are not performing as well as expected. What are the potential reasons for this behavior, and what strategies can be used to improve performance in this scenario? Select all that apply:

A. The warehouse size is too small to handle the complexity of the IN list lookup. Increase the warehouse size.

B. Search optimization is not designed to efficiently handle IN list lookups with a large number of values. Consider using a temporary table or common table expression (CTE) to pre-filter the data.

C. The number of distinct values in the 'PAGE URL' column is very high, leading to a large search access path, making IN list lookups inefficient. Consider clustering by PAGE_URL

D. Statistics on the 'PAGE URL' column are outdated. Run 'ANALYZE TABLE WEB EVENTS to refresh the statistics.

E. Search optimization is automatically disabled when using IN clause, therefore it is important to rewrite the query without using IN operator.

正解：B,C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

You are building a data pipeline that extracts data from a REST API, transforms it using Pandas DataFrames, and loads it into Snowflake. You need to implement error handling to gracefully handle network issues and API rate limits. Which of the following code snippets demonstrates the most robust approach to handle potential errors during data loading into Snowflake using the Python connector?

A. Option E

B. Option D

C. Option C

D. Option A

E. Option B

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

You are tasked with implementing a projection policy in Snowflake to restrict access to certain columns of the 'EMPLOYEE table based on the user's role. The table contains columns like 'EMPLOYEE 'NAME, 'SALARY', and 'DEPARTMENT. Users with the 'HR MANAGER role should have access to all columns, while other users should only be able to see 'EMPLOYEE ID, 'NAME, and DEPARTMENT. The initial attempt to create the projection policy results in an error. What could be the reasons?

A. The projection policy definition might contain syntax errors or reference non-existent roles or columns.

B. Projection policies can only be applied at the database level, not at the table level.

C. The user attempting to create the projection policy does not have the 'OWNERSHIP' privilege on the 'EMPLOYEE table.

D. Projection policies are not supported in Snowflake.

E. The EMPLOYEE table must have row-level security policies enabled before applying a projection policy.

正解：A,C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：9

You're tasked with building a data pipeline using Snowpark Python to incrementally load data into a target table 'SALES SUMMARY from a source table 'RAW SALES. The pipeline needs to ensure that only new or updated records from 'RAW SALES are merged into 'SALES SUMMARY' based on a 'TRANSACTION ID'. You want to use Snowpark's 'MERGE' operation for this, but you also need to handle potential conflicts and log any rejected records to an error table 'SALES SUMMARY ERRORS'. Which of the following approaches offers the MOST robust and efficient solution for handling errors and ensuring data integrity within the MERGE statement?

A. Use the 'WHEN MATCHED THEN UPDATE' clause to update existing records and the 'WHEN NOT MATCHED THEN INSERT clause to insert new records. Implement a separate process to periodically compare 'SALES_SUMMARY with 'RAW_SALES' to identify and log any inconsistencies.

B. Employ the 'MERGE statement with 'WHEN MATCHED THEN UPDATE' and 'WHEN NOT MATCHED THEN INSERT clauses, and use a stored procedure that executes the 'MERGE statement and then conditionally inserts rejected records into the 'SALES SUMMARY ERRORS' table based on criteria defined within the stored procedure. This will use the table function on the output.

C. Utilize the 'WHEN MATCHED THEN UPDATE and 'WHEN NOT MATCHED THEN INSERT clauses with a 'WHERE' condition in each clause to filter out potentially problematic records. Log these filtered records to using a separate 'INSERT statement after the 'MERGE operation.

D. Use a single 'MERGE statement with 'WHEN MATCHED THEN UPDATE and 'WHEN NOT MATCHED THEN INSERT clauses. Capture rejected records by leveraging the ' SYSTEM$PIPE STATUS function after the 'MERGE operation to identify rows that failed during the merge.

E. Incorporate an 'ELSE clause in the 'MERGE' statement to capture records that do not satisfy the update or insert conditions due to data quality issues. Use this 'ELSE clause to insert rejected records into 'SALES SUMMARY ERRORS'

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：10

A data engineer is using the Snowflake Spark connector to read a large table from Snowflake into a Spark DataFrame. The table contains a 'TIMESTAMP NTT column. After loading the data, the engineer observes that the values in the 'TIMESTAMP NTZ' column are not preserved accurately when retrieved from the DataFrame. What are the potential issues and what configurations can be adjusted in Snowflake to improve the result?

A. Option E

B. Option D

C. Option C

D. Option A

E. Option B

正解：B,E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：11

A data engineer is tasked with creating a Listing to share a large dataset stored in Snowflake. The dataset contains sensitive Personally Identifiable Information (PII) that must be masked for certain consumer roles. The data engineer wants to use Snowflake's dynamic data masking policies within the Listing to achieve this. Which of the following approaches is the MOST secure and maintainable way to implement this requirement, assuming that the consumer roles are pre-defined and known?

A. Create a view that applies conditional masking using 'CASE' statements based on the function and share the view in the Listing.

B. Implement an external function that masks the data based on the consumer's role and share this function in the Listing. Use this external function in a view shared through the listing.

C. Apply dynamic data masking policies directly to the base tables containing the PII and share these tables in the Listing. Policies should use the function to determine when to mask the data.

D. Create multiple versions of the shared tables, each with different masking applied. The data engineer must manually manage which version each consumer can access.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：12

You are designing a Snowflake data pipeline that continuously ingests clickstream dat a. You need to monitor the pipeline for latency and throughput, and trigger notifications if these metrics fall outside acceptable ranges. Which of the following combinations of Snowflake features and techniques would be MOST effective for achieving this goal?

A. Use Snowflake's 'QUERY_HISTORY view to track query execution times and implement a scheduled task that queries this view, calculates latency and throughput, and sends email notifications using Snowflake's built-in email integration if thresholds are exceeded.

B. Use Snowflake's Event Tables and Event Notifications to capture events related to data ingestion and processing. Configure alerts based on event patterns that indicate latency or throughput issues.

C. Rely on Snowflake's default resource monitors to track warehouse usage. If warehouse usage exceeds a certain threshold, assume there are performance issues and send a notification.

D. Create a custom dashboard using a Bl tool that connects to Snowflake via JDBC/ODBC and visualizes data ingestion and processing metrics. Manually monitor the dashboard for anomalies.

E. Implement a combination of Snowflake Streams, Tasks, and external functions. Streams capture changes, Tasks process the changes, and external functions send notifications to a monitoring service when latency or throughput issues are detected.

正解：B,E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：13

You are tasked with building a data pipeline to process image metadata stored in JSON format from a series of URLs. The JSON structure contains fields such as 'image_url', 'resolution', 'camera_model', and 'location' (latitude and longitude). Your goal is to create a Snowflake table that stores this metadata along with a thumbnail of each image. Given the constraints that you want to avoid downloading and storing the images directly in Snowflake, and that Snowflake's native functions for image processing are limited, which of the following approaches would be most efficient and scalable?

A. Create a Python-based external function that fetches the JSON metadata and image from their respective URLs. The external function uses libraries like PIL (Pillow) to generate a thumbnail of the image and returns the metadata along with the thumbnail's Base64 encoded string within a JSON object.

B. Create a Snowflake external table that points to an external stage which holds the JSON metadata files. Develop a spark process to fetch image URL, create thumbnails and store as base64 encoded strings in an external stage, create a view using the external table and generated thumbnails data

C. Create a Snowflake stored procedure that iterates through each URL, downloads the JSON metadata using 'SYSTEM$URL_GET, extracts the image URL from the metadata, downloads the image using 'SYSTEM$URL_GET , generates a thumbnail using SQL scalar functions, and stores the metadata and thumbnail in a Snowflake table.

D. Store just the 'image_url' in snowflake. Develop a separate application using any programming language to pre generate the thumbnails and host those at publicly accessible URLs. Within Snowflake, create a view to generate the links for image and thumbnail using 'CONCAT.

E. Create a Snowflake view that selects from a table containing the metadata URLs, using 'SYSTEM$URL GET to fetch the metadata. For each image URL found in the metadata, use a JavaScript UDF to generate a thumbnail. Embed the thumbnail into a VARCHAR column as a Base64 encoded string.

正解：A,D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：14

You are developing a Snowpark Python application that reads data from a large Snowflake table, performs several transformations, and then writes the results back to a new table. You notice that the write operation is taking significantly longer than the read and transformation steps. The target table is not clustered. Which of the following actions, either individually or in combination, would likely improve the write performance most significantly ?

A. Cluster the target table on the primary key before writing to it. Then, ensure the data being written is pre-sorted according to the clustering key.

B. Increase the size of the Snowflake warehouse used for the Snowpark session.

C. Disable auto-tuning for the warehouse to ensure consistent performance

D. Use the FILE SIZE', value)' method to reduce the size of the output files, potentially leading to more parallelism during the write operation.

E. Use the 'DataFrame.repartition(numPartitions)' method before writing to the table. Choose a 'numPartitionS value that is significantly higher than the number of virtual warehouses in your warehouse size.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：15

You are designing a data pipeline using Snowpipe to ingest data from multiple S3 buckets into a single Snowflake table. Each S3 bucket represents a different data source and contains files in JSON format. You want to use Snowpipe's auto-ingest feature and a single Snowpipe object for all buckets to simplify management and reduce overhead. However, each data source has a different JSON schem a. How can you best achieve this goal while ensuring data is loaded correctly and efficiently into the target table?

A. Use a single Snowpipe with a generic FILE FORMAT that can handle all possible JSON schemas. Implement a VIEW on top of the target table to transform and restructure the data based on the source bucket.

B. Use a single Snowpipe and leverage Snowflake's ability to call a user-defined function (UDF) within the 'COPY INTO' statement to transform the data based on the S3 bucket path. The UDF can parse the bucket path and apply the appropriate JSON schema transformation.

C. Create a separate Snowpipe for each S3 bucket. Although this creates more Snowpipe objects, it allows you to specify a different FILE FORMAT and transformation logic for each data source.

D. Since Snowpipe cannot handle multiple schemas with a single pipe, pre-process the data in S3 using an AWS Lambda function to transform all files into a common schema before they are ingested by the Snowpipe.

E. Use a single Snowpipe and leverage Snowflake's VARIANT data type to store the raw JSON data. Create separate external tables, each pointing to a specific S3 bucket, and use SQL queries to transform and load the data into the target table.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

DEA-C02試験無料問題集「Snowflake SnowPro Advanced: Data Engineer (DEA-C02) 認定」