D-DS-FN-23試験無料問題集「EMC Dell Data Science Foundations 認定」

What are challenges presented by Big Data?

Match each task to its description.
正解:

You are given 10, 000, 000 user profile pages of an online dating site in XML files, and they are stored in HDFS. You are assigned to divide the users into groups based on the content of their profiles.
You have been instructed to try K-means clustering on this data. How should you proceed?

Consider these itemsets:
(hat, scarf, coat)
(hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
What is the confidence of the rule (gloves -> hat)?

When building a K-means clustering model, you notice that the clusters did not segment on variables that you expected. What should you do?

解説: (GoShiken メンバーにのみ表示されます)
You have been assigned to perform a study of the daily revenue effect of a pricing model of online transactions. All the data currently available has been loaded into an analytics database.
This data includes revenue data, pricing data, and online transaction data. You have completed a thorough univariate analysis of all data and have decided that there are three different models you want to test.
Preliminary results show that all models have equally effective results.
What is the next step?

What does a leaf node represent in a decision tree?

After which phase of the data analytics lifecycle should you determine if the model needs any recalibration?

解説: (GoShiken メンバーにのみ表示されます)
A Data Scientist is assigned to build a model from a reporting data warehouse. The warehouse contains data collected from many sources and transformed through a complex, multi-stage ETL process.
What is a concern the data scientist should have about the data?

Consider the following itemsets:
(hat, scarf, coat)
(hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
If the minimum support is 50%, what represents the complete list of frequent 2-itemsets?

A study was run to identify general dietary patterns among the residents of a small town. Twelve thousand people were surveyed and the data was subject to K-means clustering.
In one of the iterations, there were six clusters formed with 38, 1560, 1799, 2560, 2893, and 3150 respondents.
What should be the next step in identifying optimal clusters?

What is one modeling or descriptive statistical function in MADlib that is typically not provided in a standard relational database?

Consider the following text:
"Stop!" he shouted. "Don't go there!"
What set of words result from using a tokenizer for punctuation on the text?

解説: (GoShiken メンバーにのみ表示されます)