NCA-GENM試験無料問題集（403題）「NVIDIA Generative AI Multimodal 認定」

出題：1

You are developing a multimodal model that combines time-series data from sensor readings with natural language descriptions of events. The time-series data has varying sampling rates and the text descriptions are often vague and ambiguous. How would you best address the challenge of aligning and fusing these two modalities to improve model performance?

A. Use a dynamic time warping (DTW) algorithm to align the time-series data with the text descriptions and then use a cross-modal attention mechanism for fusion.

B. Resample the time-series data to a uniform sampling rate and directly concatenate it with the text embeddings.

C. Ignore the time-series data and train the model only on the text descriptions.

D. Average the time-series data over a fixed time window and concatenate it with the text embeddings.

E. Train separate models for time-series and text and average their predictions.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：2

You're using Stable Diffusion with a custom prompt to generate images of landscapes. You notice that the generated images consistently lack detail and appear blurry, despite increasing the number of inference steps. Which of the following prompt engineering techniques, combined with appropriate parameter tuning, is MOST likely to address this issue and improve the image's sharpness and detail?

A. Using completely unrelated keywords to encourage the model to create something unique.

B. Using a very short and general prompt to allow the model more freedom.

C. Specifying 'oil painting' or another artistic style to mask the lack of detail.

D. Adding keywords like 'photorealistic', 'high resolution', '8k', 'detailed', and adjusting the 'clip_skip' parameter.

E. Decreasing the 'guidance_scale' to allow for more creative freedom.

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：3

You are fine-tuning a pre-trained multimodal model for a new task. You have limited computational resources. Which of the following fine-tuning strategies would be the MOST computationally efficient while still achieving good performance?

A. Freeze the lower layers of the model and fine-tune the upper layers and the classification head.

B. Freeze all layers except the classification head and fine-tune only the classification head.

C. Randomize the model to train, if it improves the training rate.

D. Train a new random model from scratch for the task, which will avoid the need to load the pre-trained model.

E. Fine-tune all the layers of the model.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

Consider the following Python code snippet using PyTorch. What does this code do in the context of data preprocessing for a Generative AI model?

正解：E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：5

Given the following code snippet using NVIDIA Triton Inference Server for deploying a multimodal model:

What does 'format: FORMAT NCHW' signify for the 'image_input'?

A. The image data is in a channel-first format (Number of Images, Channels, Height, Width).

B. The image data is normalized to a range between 0 and 1.

C. The image data is in a channel-last format (Number of Images, Height, Width, Channels).

D. The image data is in a compressed JPEG format.

E. The image data is represented as a NumPy array.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：6

You are deploying a Riva-based speech-to-text service in a production environment. You observe high latency and CPU utilization on your server Which of the following actions would be most effective in optimizing the performance of your Riva service?

A. Increasing the audio chunk size sent to the Riva server to reduce the number of requests.

B. Disabling automatic punctuation and capitalization to simplify the ASR process.

C. Enabling batching and concurrency in the Riva server configuration to process multiple requests simultaneously.

D. Deploying the Riva server on a CPU-only instance to reduce cost.

E. Switching to a smaller, less accurate ASR model to reduce computational load.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：7

You have a multimodal model that takes video and audio as input for activity recognition. You want to evaluate the impact of different fusion strategies (early fusion, late fusion, intermediate fusion) on the model's accuracy and computational cost. Which of the following statements is generally TRUE regarding these fusion strategies?

A. Early fusion typically has the lowest computational cost but may limit the model's ability to capture modality-specific features.

B. Intermediate fusion is always superior to both early and late fusion in terms of accuracy.

C. Early fusion is always the best choice for real-time applications due to its low latency.

D. Late fusion typically has the highest computational cost but allows for the most effective interaction between modalities.

E. Late fusion generally easier to implement than early fusion as it doesn't require modification to the individual modality encoders.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：8

You are working on a generative A1 model that creates descriptions of images. During experimentation, you notice the model consistently generates descriptions that are factually incorrect about objects in the image, despite the image quality being high. For example, it might describe a 'cat' as a 'dog'. What is the MOST critical step to address this issue?

A. Use a more complex model architecture.

B. Increase the training data size with more diverse images.

C. Implement a mechanism to verify the generated descriptions against an external knowledge base or object recognition system.

D. Apply image sharpening filters to the input images.

E. Fine-tune the model using a smaller learning rate.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：9

A research team is developing a multimodal model to predict stock prices using financial news articles, company filings (text), historical stock prices (time-series), and executive interviews (audio). They are experiencing significant performance issues due to inconsistent data quality across modalities. What specific strategies would you recommend to address these data quality challenges?

A. All of the above.

B. Focus exclusively on improving the quality of the most readily available data source.

C. Implement audio transcription and sentiment analysis on executive interviews to extract key information and emotional tone.

D. Normalize and scale historical stock prices to a consistent range to avoid dominance by high-magnitude values.

E. Apply Named Entity Recognition (NER) to financial news and company filings to standardize company names and financial terms.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：10

You are tasked with visualizing the performance of a Generative A1 model across different categories of input dat a. You need to show both the accuracy and the number of data points in each category. Which visualization technique would be MOST effective for this purpose?

A. A bar chart showing the accuracy for each category, with error bars indicating the sample size.

B. A pie chart showing the accuracy for each category.

C. A combination chart (e.g., bar and line) with bars showing the accuracy and a line showing the sample size.

D. A scatter plot showing the relationship between accuracy and sample size for each category.

E. A table showing the accuracy and sample size for each category.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：11

You are fine-tuning a large pre-trained language model for a specific downstream task using a limited amount of training dat a. Which of the following techniques is MOST likely to prevent overfitting and improve the model's generalization performance?

A. Increasing the batch size as much as possible to maximize GPU utilization.

B. Training the entire model from scratch using the limited training data.

C. Applying aggressive weight decay and dropout regularization.

D. Removing all regularization techniques to allow the model to perfectly fit the training data.

E. Using a very large learning rate during fine-tuning.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：12

You are developing a system that generates 3D models from text descriptions. The system currently produces models that are geometrically accurate but lack fine-grained surface details and realistic textures. Which of the following steps would be MOST effective in improving the visual realism of the generated 3D models?

A. Use a simpler text encoder to focus on geometric information.

B. Train a separate texture generation model conditioned on the text description and the generated 3D geometry.

C. Reduce the size of the training dataset.

D. Increase the number of polygons used to represent the 3D models.

E. Rely solely on procedural generation techniques.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：13

You are tasked with optimizing a multimodal A1 model that processes both images and text. You observe significant latency during the image encoding phase using a pre-trained ResNet50 model. Which of the following techniques would be MOST effective in reducing latency while preserving accuracy, considering energy efficiency?

A. Use full precision floating point operations throughout the ResNet50 model.

B. Replace ResNet50 with a larger, more complex model like ResNeXt101.

C. Increase the batch size for image processing.

D. Disable GPU acceleration for image processing to reduce power consumption.

E. Apply knowledge distillation, training a smaller, faster model to mimic the ResNet50 output.

正解：E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：14

Which of the following techniques is most appropriate for mitigating the vanishing gradient problem in very deep neural networks, particularly when training generative models?

A. Early stopping

B. Residual connections (skip connections)

C. Dropout

D. Data augmentation

E. Weight decay

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：15

You are working on a project that involves analyzing customer reviews which contains the following dataset: 1. customer_id(categorical) 2. customer_review(text) 3. product_image(image) 4. video_of_product_usage(video) What is the best way to handle and address the problem of skewness across each modailities?

A. Balance the dataset by oversampling under-represented data points within each modality independently.

B. Treat all modalitites with equal weights during model training, ignoring potential skewness issues.

C. Do nothing about the skewness, as the model will learn to adapt to the imbalanced data distribution.

D. Design a loss function that explicitly penalizes the model for being biased towards dominant modalities.

E. Apply modality-specific weighting schemes that assign higher weights to modalities with less representation.

正解：A,D,E 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：16

Consider a multimodal generative A1 model that produces images based on textual prompts. The model is prone to generating images that are similar to those in the training data, resulting in a lack of novelty. Which hyperparameter adjustment would be MOST effective in increasing the diversity of the generated images?

A. Increase the temperature parameter in the decoding process.

B. Decrease the number of layers in the decoder network.

C. Reduce the learning rate during fine-tuning.

D. Increase the weight decay during training.

E. Decrease the batch size during inference.

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：17

You are developing a system to automatically generate image descriptions for visually impaired users. The system uses a combination of object detection, attribute recognition, and relationship extraction. However, the generated descriptions often lack detail and fail to capture the nuances of the image content. Which of the following strategies would MOST effectively address this limitation?

A. Use a more powerful transformer-based model (e.g., GPT-3) to generate the image descriptions from the extracted object, attribute, and relationship information.

B. Combine B and C.

C. Increase the size of the training dataset for the object detection model.

D. Incorporate visual attention mechanisms that allow the description generation model to focus on the most salient regions of the image.

E. Manually rewrite a subset of descriptions to be more in line with the requirements.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：18

You're developing a multimodal A1 system that takes image data, text descriptions, and user interaction data (clicks, dwell time) to generate personalized product recommendations. To effectively combine these modalities and capture complex relationships, which model architecture would be most suitable?

A. A k-nearest neighbors (KNN) algorithm.

B. A Naive Bayes classifier.

C. A deep learning architecture incorporating attention mechanisms and cross-modal fusion layers, with separate embedding layers for each modality, followed by a shared representation layer for joint learning and prediction.

D. A decision tree-based model.

E. A simple linear regression model.

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：19

You are fine-tuning a pre-trained language model for a specific task. You notice that the model performs well on the training data but poorly on the validation dat a. Which of the following techniques can help mitigate this overfitting problem? (Select TWO)

A. Apply weight decay (L2 regularization).

B. Use dropout regularization.

C. Increase the size of the training data.

D. Increase the learning rate.

E. Decrease the batch size.

正解：A,B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：20

You are building a multimodal generative A1 system that creates 3D models from text descriptions. The system produces accurate shapes but struggles to generate realistic textures and surface details. What approach would BEST address this limitation?

A. Increase the batch size during the 3D model generation phase.

B. Train a separate texture generation network conditioned on the generated 3D shape.

C. Reduce the resolution of the generated 3D models to simplify the texture generation process.

D. Add more layers to the shape decoder.

E. Increase the number of parameters in the text encoder.

正解：B 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

NCA-GENM試験無料問題集「NVIDIA Generative AI Multimodal 認定」