
[2025年05月17日]Associate-Data-Practitioner問題集PDFとテストエンジン 試験問題
検証済みのAssociate-Data-Practitionerテスト問題集と解答で正確な108問題解答あります
Google Associate-Data-Practitioner 認定試験の出題範囲:
| トピック | 出題範囲 |
|---|---|
| トピック 1 |
|
| トピック 2 |
|
| トピック 3 |
|
質問 # 19
You want to process and load a daily sales CSV file stored in Cloud Storage into BigQuery for downstream reporting. You need to quickly build a scalable data pipeline that transforms the data while providing insights into data quality issues. What should you do?
- A. Create a batch pipeline in Cloud Data Fusion by using a Cloud Storage source and a BigQuery sink.
- B. Load the CSV file as a table in BigQuery, and use scheduled queries to run SQL transformation scripts.
- C. Create a batch pipeline in Dataflow by using the Cloud Storage CSV file to BigQuery batch template.
- D. Load the CSV file as a table in BigQuery. Create a batch pipeline in Cloud Data Fusion by using a BigQuery source and sink.
正解:A
解説:
Using Cloud Data Fusion to create a batch pipeline with a Cloud Storage source and a BigQuery sink is the best solution because:
Scalability: Cloud Data Fusion is a scalable, fully managed data integration service.
Data transformation: It provides a visual interface to design pipelines, enabling quick transformation of data.
Data quality insights: Cloud Data Fusion includes built-in tools for monitoring and addressing data quality issues during the pipeline creation and execution process.
質問 # 20
Your organization consists of two hundred employees on five different teams. The leadership team is concerned that any employee can move or delete all Looker dashboards saved in the Shared folder. You need to create an easy-to-manage solution that allows the five different teams in your organization to view content in the Shared folder, but only be able to move or delete their team-specific dashboard. What should you do?
- A. 1. Change the access level of the Shared folder to View for the All Users group. 2. Create Looker groups representing each of the five different teams, and add users to their corresponding group. 3.
Create five subfolders inside the Shared folder. Grant each group the Manage Access, Edit access level to their corresponding subfolder. - B. 1. Move all team-specific content into the dashboard owner s personal folder. 2. Change the access level of the Shared folder to View for the All Users group. 3. Instruct each user to create content for their team in the user's personal folder.
- C. 1. Create Looker groups representing each of the five different teams, and add users to their corresponding group. 2. Create five subfolders inside the Shared folder. Grant each group the View access level to their corresponding subfolder.
- D. 1. Change the access level of the Shared folder to View for the All Users group. 2. Create five subfolders inside the Shared folder. Grant each team member the Manage Access, Edit access level to their corresponding subfolder.
正解:A
解説:
Comprehensive and Detailed in Depth Explanation:
Why C is correct:Setting the Shared folder to "View" ensures everyone can see the content.
Creating Looker groups simplifies access management.
Subfolders allow granular permissions for each team.
Granting "Manage Access, Edit" allows teams to modify only their own content.
Why other options are incorrect:A: Grants View access only, so teams can't edit.
B: Moving content to personal folders defeats the purpose of sharing.
D: Grants edit access to all members of the team, not the team as a whole, which is not ideal.
質問 # 21
You work for a global financial services company that trades stocks 24/7. You have a Cloud SGL for PostgreSQL user database. You need to identify a solution that ensures that the database is continuously operational, minimizes downtime, and will not lose any data in the event of a zonal outage. What should you do?
- A. Create a read replica in the same region but in a different zone.
- B. Create a read replica in another region. Promote the replica to primary if a failure occurs.
- C. Continuously back up the Cloud SGL instance to Cloud Storage. Create a Compute Engine instance with PostgreSCL in a different region. Restore the backup in the Compute Engine instance if a failure occurs.
- D. Configure and create a high-availability Cloud SQL instance with the primary instance in zone A and a secondary instance in any zone other than zone A.
正解:D
解説:
Configuring a high-availability (HA) Cloud SQL instance ensures continuous operation, minimizes downtime, and prevents data loss in the event of a zonal outage. In this setup, the primary instance is located in one zone (e.g., zone A), and a synchronous secondary instance is located in a different zone within the same region. This configuration ensures that all data is replicated to the secondary instance in real-time. In the event of a failure in the primary zone, the system automatically promotes the secondary instance to primary, ensuring seamless failover with no data loss and minimal downtime. This is the recommended approach for mission-critical, highly available databases.
質問 # 22
Your company is building a near real-time streaming pipeline to process JSON telemetry data from small appliances. You need to process messages arriving at a Pub/Sub topic, capitalize letters in the serial number field, and write results to BigQuery. You want to use a managed service and write a minimal amount of code for underlying transformations. What should you do?
- A. Use a Pub/Sub to Cloud Storage subscription, write a Cloud Run service that is triggered when objects arrive in the bucket, performs the transformations, and writes the results to BigQuery.
- B. Use a Pub/Sub push subscription, write a Cloud Run service that accepts the messages, performs the transformations, and writes the results to BigQuery.
- C. Use a Pub/Sub to BigQuery subscription, write results directly to BigQuery, and schedule a transformation query to run every five minutes.
- D. Use the "Pub/Sub to BigQuery" Dataflow template with a UDF, and write the results to BigQuery.
正解:D
解説:
Using the "Pub/Sub to BigQuery" Dataflow template with a UDF (User-Defined Function) is the optimal choice because it combines near real-time processing, minimal code for transformations, and scalability. The UDF allows for efficient implementation of custom transformations, such as capitalizing letters in the serial number field, while Dataflow handles the rest of the managed pipeline seamlessly.
質問 # 23
Your organization has several datasets in BigQuery. The datasets need to be shared with your external partners so that they can run SQL queries without needing to copy the data to their own projects. You have organized each partner's data in its own BigQuery dataset. Each partner should be able to access only their data. You want to share the data while following Google-recommended practices. What should you do?
- A. Use Analytics Hub to create a listing on a private data exchange for each partner dataset. Allow each partner to subscribe to their respective listings.
- B. Grant the partners the bigquery.user IAM role on the BigQuery project.
- C. Export the BigQuery data to a Cloud Storage bucket. Grant the partners the storage.objectUser IAM role on the bucket.
- D. Create a Dataflow job that reads from each BigQuery dataset and pushes the data into a dedicated Pub
/Sub topic for each partner. Grant each partner the pubsub. subscriber IAM role.
正解:A
解説:
Using Analytics Hub to create a listing on a private data exchange for each partner dataset is the Google- recommended practice for securely sharing BigQuery data with external partners. Analytics Hub allows you to manage data sharing at scale, enabling partners to query datasets directly without needing to copy the data into their own projects. By creating separate listings for each partner dataset and allowing only the respective partner to subscribe, you ensure that partners can access only their specific data, adhering to the principle of least privilege. This approach is secure, efficient, and designed for scenarios involving external data sharing.
質問 # 24
Your organization is conducting analysis on regional sales metrics. Data from each regional sales team is stored as separate tables in BigQuery and updated monthly. You need to create a solution that identifies the top three regions with the highest monthly sales for the next three months. You want the solution to automatically provide up-to-date results. What should you do?
- A. Create a BigQuery table that performs a cross join across all of the regional sales tables. Use the rank() window function to query the new table.
- B. Create a BigQuery materialized view that performs a cross join across all of the regional sales tables.Use the row_number() window function to query the new materialized view.
- C. Create a BigQuery table that performs a union across all of the regional sales tables. Use the row_number() window function to query the new table.
- D. Create a BigQuery materialized view that performs a union across all of the regional sales tables. Use the rank() window function to query the new materialized view.
正解:D
解説:
Comprehensive and Detailed in Depth Explanation:
Why C is correct:Materialized views in BigQuery are precomputed views that periodically cache the results of a query. This ensures up-to-date results automatically.
A UNION is the correct operation to combine the data from multiple regional sales tables.
RANK() function is correct to rank the sales regions. ROW_NUMBER() would create a unique number for each row, even if sales amount is the same, this is not the desired function.
Why other options are incorrect:A and B: Standard tables do not provide automatic updates.
D: A CROSS JOIN would produce a Cartesian product, which is not appropriate for combining regional sales data.
Cross join is used when you want every combination of rows from tables, not a aggregation of data.
質問 # 25
Your organization has several datasets in their data warehouse in BigQuery. Several analyst teams in different departments use the datasets to run queries. Your organization is concerned about the variability of their monthly BigQuery costs. You need to identify a solution that creates a fixed budget for costs associated with the queries run by each department. What should you do?
- A. Assign each analyst to a separate project associated with their department. Create a single reservation for each department by using BigQuery editions. Create assignments for each project in the appropriate reservation.
- B. Assign each analyst to a separate project associated with their department. Create a single reservation by using BigQuery editions. Assign all projects to the reservation.
- C. Create a custom quota for each analyst in BigQuery.
- D. Create a single reservation by using BigQuery editions. Assign all analysts to the reservation.
正解:A
解説:
Assigning each analyst to a separate project associated with their department and creating a single reservation for each department using BigQuery editions allows for precise cost management. By assigning each project to its department's reservation, you can allocate fixed compute resources and budgets for each department, ensuring that their query costs are predictable and controlled. This approach aligns with your organization's goal of creating a fixed budget for query costs while maintaining departmental separation and accountability.
質問 # 26
You manage an ecommerce website that has a diverse range of products. You need to forecast future product demand accurately to ensure that your company has sufficient inventory to meet customer needs and avoid stockouts. Your company's historical sales data is stored in a BigQuery table. You need to create a scalable solution that takes into account the seasonality and historical data to predict product demand. What should you do?
- A. Use the historical sales data to train and create a BigQuery ML time series model. Use the ML.
FORECAST function call to output the predictions into a new BigQuery table. - B. Use Colab Enterprise to create a Jupyter notebook. Use the historical sales data to train a custom prediction model in Python.
- C. Use the historical sales data to train and create a BigQuery ML linear regression model. Use the ML.
PREDICT function call to output the predictions into a new BigQuery table. - D. Use the historical sales data to train and create a BigQuery ML logistic regression model. Use the ML.
PREDICT function call to output the predictions into a new BigQuery table.
正解:A
解説:
Comprehensive and Detailed In-Depth Explanation:
Forecasting product demand with seasonality requires a time series model, and BigQuery ML offers a scalable, serverless solution. Let's analyze:
* Option A: BigQuery ML's time series models (e.g., ARIMA_PLUS) are designed for forecasting with seasonality and trends. The ML.FORECAST function generates predictions based on historical data, storing them in a table. This is scalable (no infrastructure) and integrates natively with BigQuery, ideal for ecommerce demand prediction.
* Option B: Colab Enterprise with a custom Python model (e.g., Prophet) is flexible but requires coding, maintenance, and potentially exporting data, reducing scalability compared to BigQuery ML's in-place processing.
* Option C: Linear regression predicts continuous values but doesn't handle seasonality or time series patterns effectively, making it unsuitable for demand forecasting.
質問 # 27
Your company's ecommerce website collects product reviews from customers. The reviews are loaded as CSV files daily to a Cloud Storage bucket. The reviews are in multiple languages and need to be translated to Spanish. You need to configure a pipeline that is serverless, efficient, and requires minimal maintenance.
What should you do?
- A. Load the data into BigQuery using Dataproc. Use Apache Spark to translate the reviews by invoking the Cloud Translation API. Set BigQuery as the sink.U
- B. Load the data into BigQuery using a Cloud Run function. Create a BigQuery remote function that invokes the Cloud Translation API. Use a scheduled query to translate new reviews.
- C. Load the data into BigQuery using a Cloud Run function. Use the BigQuery ML create model statement to train a translation model. Use the model to translate the product reviews within BigQuery.
- D. Use a Dataflow templates pipeline to translate the reviews using the Cloud Translation API. Set BigQuery as the sink.
正解:B
解説:
Loading the data into BigQuery using aCloud Run functionand creating aBigQuery remote functionthat invokes theCloud Translation APIis a serverless and efficient approach. With this setup, you can use ascheduled queryin BigQuery to invoke the remote function and translate new product reviews on a regular basis. This solution requires minimal maintenance, as BigQuery handles storage and querying, and the Cloud Translation API provides accurate translations without the need for custom ML model development.
質問 # 28
Your company is migrating their batch transformation pipelines to Google Cloud. You need to choose a solution that supports programmatic transformations using only SQL. You also want the technology to support Git integration for version control of your pipelines. What should you do?
- A. Use Cloud Composer operators.
- B. Use Dataflow pipelines.
- C. Use Dataform workflows.
- D. Use Cloud Data Fusion pipelines.
正解:C
解説:
Dataform workflowsare the ideal solution for migrating batch transformation pipelines to Google Cloud when you want to perform programmatic transformations using only SQL. Dataform allows you to define SQL- based workflows for data transformations and supports Git integration for version control, enabling collaboration and version tracking of your pipelines. This approach is purpose-built for SQL-driven data pipeline management and aligns perfectly with your requirements.
The solution must use SQL for transformations and integrate with Git for version control, focusing on batch pipelines. Let's evaluate:
* Option A: Cloud Data Fusion uses a visual UI with plugins, not SQL-only transformations. It lacks native Git integration (requires external tools), missing a key requirement.
* Option B: Dataform is a SQL-based workflow tool for BigQuery transformations, defining pipelines as SQLX scripts. It integrates natively with Git for version control, supporting batch ELT processes with minimal overhead.
* Option C: Cloud Composer uses Python DAGs and operators, not SQL-only transformations. Git is possible but not intrinsic to its workflow design.
質問 # 29
You are constructing a data pipeline to process sensitive customer data stored in a Cloud Storage bucket. You need to ensure that this data remains accessible, even in the event of a single-zone outage. What should you do?
- A. Store the data in Nearline storaqe.
- B. Set up a Cloud CDN in front of the bucket.
- C. Enable Object Versioning on the bucket.
- D. Store the data in a multi-region bucket.
正解:D
解説:
Storing the data in a multi-region bucket ensures high availability and durability, even in the event of a single-zone outage. Multi-region buckets replicate data across multiple locations within the selected region, providing resilience against zone-level failures and ensuring that the data remains accessible. This approach is particularly suitable for sensitive customer data that must remain available without interruptions.
質問 # 30
Your organization uses a BigQuery table that is partitioned by ingestion time. You need to remove data that is older than one year to reduce your organization's storage costs. You want to use the most efficient approach while minimizing cost. What should you do?
- A. Set the table partition expiration period to one year using the ALTER TABLE statement in SQL.
- B. Create a view that filters out rows that are older than one year.
- C. Create a scheduled query that periodically runs an update statement in SQL that sets the "deleted" column to "yes" for data that is more than one year old. Create a view that filters out rows that have been marked deleted.
- D. Require users to specify a partition filter using the alter table statement in SQL.
正解:A
解説:
Setting the table partition expiration period to one year using the ALTER TABLE statement is the most efficient and cost-effective approach. This automatically deletes data in partitions older than one year, reducing storage costs without requiring manual intervention or additional queries. It minimizes administrative overhead and ensures compliance with your data retention policy while optimizing storage usage in BigQuery.
質問 # 31
Your organization plans to move their on-premises environment to Google Cloud. Your organization's network bandwidth is less than 1 Gbps. You need to move over 500 TB of data to Cloud Storage securely, and only have a few days to move the dat a. What should you do?
- A. Connect to Google Cloud using VPN. Use the gcloud storage command to move the data to Cloud Storage.
- B. Connect to Google Cloud using Dedicated Interconnect. Use the gcloud storage command to move the data to Cloud Storage.
- C. Request multiple Transfer Appliances, copy the data to the appliances, and ship the appliances back to Google Cloud to upload the data to Cloud Storage.
- D. Connect to Google Cloud using VPN. Use Storage Transfer Service to move the data to Cloud Storage.
正解:C
解説:
Using Transfer Appliances is the best solution for securely and efficiently moving over 500 TB of data to Cloud Storage within a limited timeframe, especially with network bandwidth below 1 Gbps. Transfer Appliances are physical devices provided by Google Cloud to securely transfer large amounts of data. After copying the data to the appliances, they are shipped back to Google, where the data is uploaded to Cloud Storage. This approach bypasses bandwidth limitations and ensures the data is migrated quickly and securely.
質問 # 32
You are a Looker analyst. You need to add a new field to your Looker report that generates SQL that will run against your company's database. You do not have the Develop permission. What should you do?
- A. Create a custom field from the field picker in Looker, and add it to your report.
- B. Create a calculated field using the Add a field option in Looker Studio, and add it to your report.
- C. Create a table calculation from the field picker in Looker, and add it to your report.
- D. Create a new field in the LookML layer, refresh your report, and select your new field from the field picker.
正解:A
解説:
Creating a custom field from the field picker in Looker allows you to add new fields to your report without requiring the Develop permission. Custom fields are created directly in the Looker UI, enabling you to define calculations or transformations that generate SQL for the database query. This approach is user-friendly and does not require access to the LookML layer, making it the appropriate choice for your situation.
質問 # 33
You used BigQuery ML to build a customer purchase propensity model six months ago. You want to compare the current serving data with the historical serving data to determine whether you need to retrain the model. What should you do?
- A. Compare the two different models.
- B. Evaluate data drift.
- C. Compare the confusion matrix.
- D. Evaluate the data skewness.
正解:B
解説:
Evaluating data drift involves analyzing changes in the distribution of the current serving data compared to the historical data used to train the model. If significant drift is detected, it indicates that the data patterns have changed over time, which can impact the model's performance. This analysis helps determine whether retraining the model is necessary to ensure its predictions remain accurate and relevant. Data drift evaluation is a standard approach for monitoring machine learning models over time.
質問 # 34
You work for a gaming company that collects real-time player activity data. This data is streamed into Pub
/Sub and needs to be processed and loaded into BigQuery for analysis. The processing involves filtering, enriching, and aggregating the data before loading it into partitioned BigQuery tables. Youneed to design a pipeline that ensures low latency and high throughput while following a Google-recommended approach.
What should you do?
- A. Use Cloud Composer to orchestrate a workflow that reads the data from Pub/Sub, processes the data using a Python script, and writes it to BigQuery.
- B. Use Cloud Run functions to subscribe to the Pub/Sub topic, process the data, and write it to BigQuery using the streaming API.
- C. Use Dataproc to create an Apache Spark streaming job that reads the data from Pub/Sub, processes the data, and writes it to BigQuery.
- D. Use Dataflow to create a streaming pipeline that reads the data from Pub/Sub, processes the data, and writes it to BigQuery using the streaming API.
正解:D
解説:
Comprehensive and Detailed in Depth Explanation:
Why C is correct:Dataflow is the recommended service for real-time stream processing on Google Cloud.
It provides scalable and reliable processing with low latency and high throughput.
Dataflow's streaming API is optimized for Pub/Sub integration and BigQuery streaming inserts.
Why other options are incorrect:A: Cloud Composer is for batch orchestration, not real-time streaming.
B: Dataproc and Spark streaming are more complex and not as efficient as Dataflow for this task.
D: Cloud Run functions are for stateless, event-driven applications, not continuous stream processing.
質問 # 35
You have created a LookML model and dashboard that shows daily sales metrics for five regional managers to use. You want to ensure that the regional managers can only see sales metrics specific to their region. You need an easy-to-implement solution. What should you do?
- A. Create separate Looker instances for each regional manager. Copy the LookML model and dashboard to each instance. Provision viewer access to the corresponding manager.
- B. Create separate Looker dashboards for each regional manager. Set the default dashboard filter to the corresponding region for each manager.
- C. Create five different Explores with the sql_always_filter Explore filter applied on the region_name dimension. Set each region_name value to the corresponding region for each manager.
- D. Create a sales_region user attribute, and assign each manager's region as the value of their user attribute. Add an access_filter Explore filter on the region_name dimension by using the sales_region user attribute.
正解:D
解説:
Using a sales_region user attribute is the best solution because it allows you to dynamically filter data based on each manager's assigned region. By adding an access_filter Explore filter on the region_name dimension that references the sales_region user attribute, each manager sees only the sales metrics specific to their region. This approach is easy to implement, scalable, and avoids duplicating dashboards or Explores, making it both efficient and maintainable.
質問 # 36
You created a curated dataset of market trends in BigQuery that you want to share with multiple external partners. You want to control the rows and columns that each partner has access to. You want to follow Google-recommended practices. What should you do?
- A. Create a separate project for each partner and copy the dataset into each project. Publish each dataset in Analytics Hub. Grant dataset-level access to each partner by using subscriptions.
- B. Publish the dataset in Analytics Hub. Grant dataset-level access to each partner by using subscriptions.
- C. Grant each partner read access to the BigQuery dataset by using 1AM roles.
- D. Create a separate Cloud Storage bucket for each partner. Export the dataset to each bucket and assign each partner to their respective bucket. Grant bucket-level access by using 1AM roles.
正解:B
解説:
Comprehensive and Detailed in Depth Explanation:
Why A is correct:Analytics Hub allows you to share datasets with external partners while maintaining control over access.
Subscriptions allow granular control.
Why other options are incorrect:B: Cloud storage is for files, not bigquery datasets.
C: IAM roles do not allow for granular row and column level control.
D: Creating a separate project for each partner is complex and not scalable.
質問 # 37
You are using your own data to demonstrate the capabilities of BigQuery to your organization's leadership team. You need to perform a one- time load of the files stored on your local machine into BigQuery using as little effort as possible. What should you do?
- A. Create a Dataproc cluster, copy the files to Cloud Storage, and write an Apache Spark job using the spark-bigquery-connector.
- B. Create a Dataflow job using the Apache Beam FileIO and BigQueryIO connectors with a local runner.
- C. Execute the bq load command on your local machine.
- D. Write and execute a Python script using the BigQuery Storage Write API library.
正解:C
解説:
Using the bq load command is the simplest and most efficient way to perform a one-time load of files from your local machine into BigQuery. This command-line tool is easy to use, requires minimal setup, and supports direct uploads from local files to BigQuery tables. It meets the requirement for minimal effort while allowing you to quickly demonstrate BigQuery's capabilities to your organization's leadership team.
質問 # 38
You have millions of customer feedback records stored in BigQuery. You want to summarize the data by using the large language model (LLM) Gemini. You need to plan and execute this analysis using the most efficient approach. What should you do?
- A. Create a BigQuery Cloud resource connection to a remote model in Vertex Al, and use Gemini to summarize the data.
- B. Query the BigQuery table from within a Python notebook, use the Gemini API to summarize the data within the notebook, and store the summaries in BigQuery.
- C. Use a BigQuery ML model to pre-process the text data, export the results to Cloud Storage, and use the Gemini API to summarize the pre- processed data.
- D. Export the raw BigQuery data to a CSV file, upload it to Cloud Storage, and use the Gemini API to summarize the data.
正解:A
解説:
Creating a BigQuery Cloud resource connection to a remote model in Vertex AI and using Gemini to summarize the data is the most efficient approach. This method allows you to seamlessly integrate BigQuery with the Gemini model via Vertex AI, avoiding the need to export data or perform manual steps. It ensures scalability for large datasets and minimizes data movement, leveraging Google Cloud's ecosystem for efficient data summarization and storage.
質問 # 39
You have a Dataproc cluster that performs batch processing on data stored in Cloud Storage. You need to schedule a daily Spark job to generate a report that will be emailed to stakeholders. You need a fully-managed solution that is easy to implement and minimizes complexity. What should you do?
- A. Use Dataproc workflow templates to define and schedule the Spark job, and to email the report.
- B. Use Cloud Scheduler to trigger the Spark job. and use Cloud Run functions to email the report.
- C. Use Cloud Run functions to trigger the Spark job and email the report.
- D. Use Cloud Composer to orchestrate the Spark job and email the report.
正解:A
解説:
Using Dataproc workflow templates is a fully-managed and straightforward solution for defining and scheduling your Spark job on a Dataproc cluster. Workflow templates allow you to automate the execution of Spark jobs with predefined steps, including data processing and report generation. You can integrate email notifications by adding a step to the workflow that sends the report using tools like a Cloud Function or external email service. This approach minimizes complexity while leveraging Dataproc's managed capabilities for batch processing.
質問 # 40
......
Google Associate-Data-PractitionerテストエンジンPDFで完全版無料問題集:https://www.goshiken.com/Google/Associate-Data-Practitioner-mondaishu.html
最新をゲットせよ!Associate-Data-Practitioner認定有効な試験問題集解答:https://drive.google.com/open?id=10wKT_1ZCmeVzQiUTMsojDGUS789UHgyn