[2023年12月20日] 最新更新されたのはProfessional-Data-Engineer試験問題2023年更新
無料更新されたGoogle Professional-Data-Engineerテストエンジン問題には270問題と解答
この試験は、複数選択と複数選択の質問で構成され、2時間続きます。候補者は合計50の質問に答える必要があり、合格スコアは70%です。試験は英語と日本語で利用でき、オンラインまたはテストセンターで撮影できます。試験の料金は200ドルで、2年間有効です。
質問 # 57
Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low. You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)
- A. Introduce data compression for each file to increase the rate file of file transfer.
- B. Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
- C. Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
- D. Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.
- E. Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
正解:D、E
質問 # 58
You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?
- A. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to `none' using a Cloud Dataprep job.
- B. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
- C. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to `none' using a Cloud Dataproc job.
- D. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.
正解:B
質問 # 59
Dataproc clusters contain many configuration files. To update these files, you will need to use the --properties option. The format for the option is: file_prefix:property=_____.
- A. value
- B. null
- C. id
- D. details
正解:A
解説:
To make updating files and properties easy, the --properties command uses a special format to specify the configuration file and the property and value within the file that should be updated. The formatting is as follows: file_prefix:property=value.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-properties#formatting
質問 # 60
All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a Cloud Bigtable node.
- A. only if
- B. after
- C. before
- D. once
正解:C
解説:
Explanation
In a Cloud Bigtable architecture all client requests go through a front-end server before they are sent to a Cloud Bigtable node.
The nodes are organized into a Cloud Bigtable cluster, which belongs to a Cloud Bigtable instance, which is a container for the cluster. Each node in the cluster handles a subset of the requests to the cluster.
When additional nodes are added to a cluster, you can increase the number of simultaneous requests that the cluster can handle, as well as the maximum throughput for the entire cluster.
Reference: https://cloud.google.com/bigtable/docs/overview
質問 # 61
You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.
You have the following requirements:
* You will batch-load the posts once per day and run them through the Cloud Natural Language API.
* You will extract topics and sentiment from the posts.
* You must store the raw posts for archiving and reprocessing.
* You will create dashboards to be shared with people both inside and outside your organization.
You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?
- A. Store the social media posts and the data extracted from the API in BigQuery.
- B. Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.
- C. Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.
- D. Store the social media posts and the data extracted from the API in Cloud SQL.
正解:B
解説:
Social media posts can images/videos which cannot be stored in bigquery/
質問 # 62
You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?
- A. Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query.
- B. Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query.
- C. Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query.
- D. Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query.
正解:B
解説:
Avro is compressed format and dataflow for parallel pipeline and bigquery for storage.
質問 # 63
You're training a model to predict housing prices based on an available dataset with real estate properties.
Your plan is to train a fully connected neural net, and you've discovered that the dataset contains latitude and longitude of the property. Real estate professionals have told you that the location of the property is highly influential on price, so you'd like to engineer a feature that incorporates this physical dependency.
What should you do?
- A. Create a feature cross of latitude and longitude, bucketize it at the minute level and use L2 regularization during optimization.
- B. Create a feature cross of latitude and longitude, bucketize at the minute level and use L1 regularization during optimization.
- C. Provide latitude and longitude as input vectors to your neural net.
- D. Create a numeric column from a feature cross of latitude and longitude.
正解:D
解説:
Explanation
Reference https://cloud.google.com/bigquery/docs/gis-data
質問 # 64
Cloud Dataproc charges you only for what you really use with _____ billing.
- A. hour-by-hour
- B. minute-by-minute
- C. month-by-month
- D. week-by-week
正解:B
解説:
Explanation
One of the advantages of Cloud Dataproc is its low cost. Dataproc charges for what you really use with minute-by-minute billing and a low, ten-minute-minimum billing period.
Reference: https://cloud.google.com/dataproc/docs/concepts/overview
質問 # 65
An organization maintains a Google BigQuery dataset that contains tables with user-level data. They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user- level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?
- A. Create and share an authorized view that provides the aggregate results.
- B. Create and share a new dataset and view that provides the aggregate results.
- C. Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.
- D. Create and share a new dataset and table that contains the aggregate results.
正解:C
解説:
Authorized view is used to protect the underlying data, authorized views are created in different dataset with restricted access.
質問 # 66
Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to design a solution that allows your world-wide marketing teams to perform ANSI SQL and other types of analysis on up-to-date YouTube channels log dat
a. How should you set up the log data transfer into Google Cloud?
- A. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Regional bucket as a final destination.
- B. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
- C. Use Storage Transfer Service to transfer the offsite backup files to a Cloud Storage Multi-Regional storage bucket as a final destination.
- D. Use BigQuery Data Transfer Service to transfer the offsite backup files to a Cloud Storage Regional
正解:A
解説:
storage bucket as a final destination.
質問 # 67
You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do?
- A. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.
- B. Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.
- C. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
- D. Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
正解:C
解説:
https://cloud.google.com/compute/docs/disks/customer-managed-encryption
質問 # 68
You work for a mid-sized enterprise that needs to move its operational system transaction data from an on- premises database to GCP. The database is about 20 TB in size. Which database should you choose?
- A. Cloud Spanner
- B. Cloud SQL
- C. Cloud Bigtable
- D. Cloud Datastore
正解:B
解説:
Explanation/Reference:
質問 # 69
Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data. Which three machine learning applications can you use?
(Choose three.)
- A. Supervised learning to predict the location of a transaction.
- B. Supervised learning to determine which transactions are most likely to be fraudulent.
- C. Reinforcement learning to predict the location of a transaction.
- D. Unsupervised learning to determine which transactions are most likely to be fraudulent.
- E. Unsupervised learning to predict the location of a transaction.
- F. Clustering to divide the transactions into N categories based on feature similarity.
正解:C、D、F
質問 # 70
You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?
- A. Make a call to the Stackdriver API to list all logs, and apply an advanced filter.
- B. Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.
- C. In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.
- D. In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.
正解:D
質問 # 71
You are working on a sensitive project involving private user dat
a. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users' privacy?
- A. Create an anonymized sample of the data for the consultant to work with in a different project.
- B. Grant the consultant the Cloud Dataflow Developer role on the project.
- C. Grant the consultant the Viewer role on the project.
- D. Create a service account and allow the consultant to log on with it.
正解:D
質問 # 72
You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the "Trust No One" (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your dat
a. What should you do?
- A. Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AAD). Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud.
- B. Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket. Manually destroy the key previously used for encryption, and rotate the key once and rotate the key once.
- C. Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret.
- D. Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access.
正解:B
質問 # 73
You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?
- A. Build and train a text classification model using TensorFlow. Deploy the model using Cloud Machine Learning Engine. Call the model from your application and process the results as labels.
- B. Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as labels.
- C. Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.
- D. Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.
正解:C
質問 # 74
You are deploying a new storage system for your mobile application, which is a media streaming service. You decide the best fit is Google Cloud Datastore. You have entities with multiple properties, some of which can take on multiple values. For example, in the entity 'Movie' the property 'actors' and the property 'tags' have multiple values but the property 'date released' does not. A typical query would ask for all movies with actor=<actorname> ordered by date_released or all movies with tag=Comedy ordered by date_released. How should you avoid a combinatorial explosion in the number of indexes?

- A. Option A
- B. Option B.
- C. Option C
- D. Option D
正解:A
質問 # 75
......
Google Professional-Data-Enginer認定試験は、Google Cloudが提供する試験であり、Google Cloudプラットフォームでデータ処理システムの設計と構築に習熟する機会を個人に提供します。この認定は、データ処理の経験があり、Google Cloud Platform Technologiesと協力してきた個人を対象としています。認定試験では、データ処理システムを設計、構築、運用、安全、および監視する候補者の能力を測定します。
100%の合格率を試そう!更新されたのはProfessional-Data-Engineer試験問題 [2023年更新]:https://www.goshiken.com/Google/Professional-Data-Engineer-mondaishu.html
ベストな問題集を使おうGoogle Cloud Certified Professional-Data-Engineer専門試験問題:https://drive.google.com/open?id=1xfGt1VnEmXlNkWo6bkjH8ElWcoxZKnnc