NCP-ADS試験無料問題集（303題）「NVIDIA-Certified-Professional Accelerated Data Science 認定」

出題：1

A machine learning engineer is working with a financial dataset that contains multiple numerical features, including income, loan amount, and transaction frequency. Some features are normally distributed, while others have a highly skewed distribution with extreme outliers.
Which of the following approaches best ensures uniformity across features before training a model?

A. Apply log transformation to skewed features before standardizing them with z-score normalization

B. Remove outliers before applying standardization

C. Use one-hot encoding to transform numerical features into categorical representations

D. Scale all numerical features using min-max normalization

正解：A 解答を投票する

出題：2

You are tasked with optimizing a data science workflow to scale across multiple GPUs using Dask.
Which of the following approaches would be most effective for implementing data parallelism in this scenario? (Select two)

A. Run the computation on a single GPU and let Dask handle CPU parallelism automatically

B. Use Dask's distributed scheduler along with dask_cuda to handle GPU resources for parallel processing

C. Use Dask's default scheduler to distribute work across GPUs without any specific configuration

D. Use dask_cuda's CUDACluster to manage the multi-GPU setup and parallelize computations

E. Manually partition the dataset and assign each partition to a specific GPU using dask.delayed

正解：B,D 解答を投票する

出題：3

You are deploying an NVIDIA GPU-accelerated machine learning model in a Docker container and want to ensure that your application can leverage the GPU efficiently.
What is the best way to manage CUDA dependencies and avoid compatibility issues inside your Docker container?

A. Disable GPU acceleration in Docker and force computations on the CPU to avoid CUDA compatibility issues.

B. Use NVIDIA's official Docker images from NVIDIA GPU Cloud (NGC), which come with pre-installed CUDA and AI frameworks.

C. Manually install CUDA and cuDNN inside the container by downloading them from NVIDIA's website and setting environment variables.

D. Use a base Ubuntu image and install TensorFlow, PyTorch, and CUDA using pip install inside the container.

正解：B 解答を投票する

出題：4

You are working with a large dataset containing millions of rows, and you need to store it efficiently for fast read and write operations while maintaining compatibility with CuDF and pandas.
Which of the following file formats is the best choice for efficient columnar storage and GPU-accelerated processing?

A. Parquet

B. TXT

C. CSV

D. JSON

正解：A 解答を投票する

出題：5

You are working with a large dataset containing millions of high-resolution images for a deep learning project. The dataset needs to be processed efficiently on a GPU before training a model.
Which NVIDIA technology is best suited for preprocessing, augmenting, and efficiently loading the dataset into memory?

A. NVIDIA Nsight Compute to optimize image dataset processing.

B. NVIDIA Triton Inference Server to preprocess the dataset before model training.

C. NVIDIA DALI (Data Loading Library) to accelerate data loading and preprocessing on the GPU.

D. NVIDIA RAPIDS cuDF to transform image data into tabular format for analysis.

正解：C 解答を投票する

出題：6

Which of the following Nvidia technologies is primarily used for performing benchmarking and optimizing GPU-accelerated deep learning workflows, especially focusing on model training performance?

A. Nvidia DeepStream SDK

B. Nvidia Nsight Systems

C. Nvidia Triton Inference Server

D. Nvidia CUDA

正解：B 解答を投票する

出題：7

You are working with a large dataset that contains missing values in multiple columns. Your goal is to prepare this dataset for training a machine learning model on an NVIDIA GPU using RAPIDS.
Which of the following approaches is the most efficient method to handle missing values in this scenario?

A. Convert the dataset to a NumPy array and manually replace missing values with the mean

B. Drop all rows containing missing values using Pandas before transferring data to the GPU

C. Use fillna() with a fixed value on the GPU using cuDF

D. Apply a deep learning-based imputation model before moving data to the GPU

正解：C 解答を投票する

出題：8

You are analyzing a transportation network where airports represent nodes and flight routes represent edges. You need to determine the most critical airports in the network based on how many shortest paths pass through them.
Which cuGraph centrality algorithm should you use for this task?

A. Triangle Count, as it determines the number of three-node cycles an airport is part of, indicating its importance in network structure.

B. Closeness Centrality, as it measures how close an airport is to all other airports, making it the best indicator of global connectivity.

C. Betweenness Centrality, as it identifies airports that act as major transit hubs by measuring how frequently they appear in shortest paths.

D. Degree Centrality, as it measures the number of direct connections an airport has, indicating its importance in the network.

正解：C 解答を投票する

出題：9

You are training a deep learning model on a large dataset. Initially, you train the model on a single GPU and achieve a training time of 10 hours. To speed up training, you switch to a multi-GPU setup with four GPUs. However, after testing, you notice that the training time is only reduced to 3.5 hours instead of the expected 2.5 hours (a linear speedup).
What is the most likely reason for this sublinear speedup?

A. The optimizer is not designed for multi-GPU training, causing inefficiency.

B. Data transfer and communication overhead between GPUs limit scalability.

C. Increased memory bandwidth usage causes a bottleneck in the system.

D. The learning rate is too low, causing slower convergence despite multiple GPUs.

正解：B 解答を投票する

出題：10

You are working with large datasets in cuDF and have noticed significant performance bottlenecks due to repeated computation and excessive shuffling in your workflow. You want to use data caching to optimize the execution plan and reduce redundant operations.
Which of the following is the best way to implement data caching in cuDF to avoid repeated recomputation and excessive shuffling?

A. Store intermediate DataFrames as temporary CSV files on disk and reload them when needed to simulate a cache.

B. Use the .to_pandas() method to convert the cuDF DataFrame into a Pandas DataFrame, leveraging CPU caching instead of GPU caching.

C. Use the .persist() method on a cuDF DataFrame to store intermediate results in GPU memory, reducing redundant calculations.

D. Use the .reset_index() method after performing transformations to force materialization of the dataset and cache results.

正解：C 解答を投票する

出題：11

You are implementing a Dask-based solution for distributed data parallelism across a multi-GPU system.
Which configuration steps would ensure effective use of GPUs for parallel computation? (Select two)

A. Use dask_cudf to convert DataFrame computations into GPU-accelerated operations using cuDF

B. Create a LocalCUDACluster and manually specify the GPUs you want to use for each Dask worker

C. Use dask_cuda's LocalCUDACluster with proper GPU memory management to handle multiple GPUs

D. Use dask_cuda's LocalCUDACluster and let Dask automatically allocate GPUs without any configuration

E. Use Dask's Cluster class with the distributed scheduler and specify CPU cores only for GPU workloads

正解：A,C 解答を投票する

出題：12

You are processing a multi-terabyte dataset in CuDF and want to optimize query performance and storage efficiency.
Which approach should you follow to ensure that the dataset remains efficiently partitioned and easily accessible?

A. Split the dataset into multiple files based on a logical partitioning key, such as a date column.

B. Convert all numeric columns to float64 for higher precision, even if float32 is sufficient.

C. Use pandas for large-scale processing instead of CuDF since pandas is more stable for big data processing.

D. Store the entire dataset in a single large file to minimize the number of files in the directory.

正解：A 解答を投票する

NCP-ADS試験無料問題集「NVIDIA-Certified-Professional Accelerated Data Science 認定」