C1000-185試験無料問題集「IBM watsonx Generative AI Engineer - Associate 認定」

You are preparing a dataset for fine-tuning a model to classify customer complaints by category. The dataset is imbalanced, with 70% of the data representing complaints about billing, 20% representing complaints about technical issues, and 10% representing complaints about product quality.
Which of the following actions would help address the imbalance while preparing the dataset for fine-tuning? (Select two)

You are using InstructLab to fine-tune a large language model (LLM) for generating technical documentation. The model's output is inconsistent, sometimes too verbose and other times lacking critical details.
Which of the following actions within InstructLab will best help customize the model to consistently produce balanced, concise, yet informative outputs?

You are tasked with building a Retrieval-Augmented Generation (RAG) system to assist users in retrieving relevant documents from a vast knowledge base. The first step in this process is to generate vector embeddings for the documents using a pre-trained model. After generating embeddings, you notice that the model is sometimes failing to retrieve semantically similar documents.
Which of the following is the most appropriate approach to ensure that semantically similar documents are retrieved effectively?

You are working on a generative AI model for a wide variety of language generation tasks. Your team is debating whether to use soft prompts for optimizing the model's performance.
Which of the following is an accurate benefit of using soft prompts, and what is a potential drawback?

Which prompt engineering strategy is most effective for reducing the risk of generating biased content in a generative AI model?

You have just finished a prompt tuning experiment for a large language model (LLM) to optimize its output for generating customer support summaries. The tuning results show that while the accuracy of the generated summaries is high (95%), the response time for generating them has significantly increased. The experiment data suggests that increasing the maximum token length during tuning led to better quality summaries but with slower generation.
Which parameter should you adjust to improve the model's response time without sacrificing too much summary quality?

You are working as a generative AI engineer and have developed a custom large language model (LLM) optimized for a specific use case. You are tasked with deploying this model on the IBM Watsonx platform.
Which of the following steps is most essential to ensure the successful deployment of your custom model, given that the model uses a third-party transformer architecture?

You are designing a customer support chatbot using watsonx.ai as the primary generative model. You want to enhance the chatbot's capabilities by integrating it with IBM Watson Assistant to handle structured conversations while allowing watsonx.ai to generate responses for open-ended queries.
Which integration approach would most effectively combine both services while maintaining optimal performance and accuracy?

During the fine-tuning of a large language model (LLM) with InstructLab for a legal document classification task, you notice that the model performs exceptionally well on the training set but poorly on the validation set.
What could be done to address the overfitting issue and improve the model's generalization? (Select two)

In a Retrieval-Augmented Generation (RAG) system designed for technical document retrieval, you are tasked with implementing text chunking techniques using the LangChain library. The technical documents are large and contain numerous tables, figures, and bullet points.
What is the most effective way to handle text splitting to ensure high-quality retrieval?

You are tasked with generating synthetic data for a fine-tuning task on an IBM watsonx model. The goal is to mimic the distribution of existing training data while ensuring the synthetic data maintains its statistical similarity to the original. You are provided with two algorithms, Algorithm A (Kolmogorov-Smirnov Test) and Algorithm B, to assess the similarity between the original and synthetic data distributions.
Which of the following best describes how you should implement synthetic data generation using the User Interface and choose the correct algorithm?

In planning the deployment of a generative AI model that relies on a large corpus of data, you need to organize and version the data repository used for training prompts.
Which of the following approaches best ensures efficient data versioning, integrity, and easy rollback during prompt refinement?

Which of the following techniques can be most effectively used to mitigate the generation of hate speech, abuse, and profanity in generative AI models when applying prompt engineering?