D-DS-FN-23試験無料問題集「EMC Dell Data Science Foundations 認定」

Consider these itemsets:
(hat, scarf, coat)
(hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
What is the confidence of the rule (gloves -> hat)?

You submit a MapReduce job to a Hadoop cluster. Although the job was successfully submitted, you notice that it is not completing.
What should be done?

Which component of a final presentation provides a succinct overview of the business situation that was the impetus to initiate the project?

解説: (GoShiken メンバーにのみ表示されます)
Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?

You do a Student's t-test to compare the average test scores of sample groups from populations A and B.
Group Aaveraged 10 points higher than group B.
You find that this difference is significant, with a p-value of 0.03.
What does that mean?

A business colleague who is new to Hadoop approaches you with a question. The colleague wants to know the best approach to access their data. The colleague has previously worked extensively with SQLand databases.
Which query interface should be recommended?

Refer to the exhibit.

In association rules, for itemsets X and Y, which expression defines leverage?

You are performing a market basket analysis using the Apriori algorithm.
Which measure is a ratio describing the how many more times two items are present together than would be expected if those two items are statistically independent?

What action occurs during feature selection in the model building phase of the data analytics lifecycle?

解説: (GoShiken メンバーにのみ表示されます)
Which two tools support query languages that can be used with Hadoop?

You are assigned the task of creating customer profiles for your company. In your database, you have
25 key input variables that come together to define 2,500 customers. You decide to run a K-means cluster analysis on the 25 input variables based on k=4 to build your profiles.
Your analysis resulted in four cluster populations:
Cluster A=1,000 customers
Cluster B=560 customers
Cluster C=925 customers
Cluster D=15 customers
What should be attempted first to more evenly distribute the customer population across clusters?

Which method is used to solve for coefficients b0, b1, .., bn in your linear regression model: Y = b0 + b1x1+b2x2+....+bnxn

Which visualization technique should be avoided?

解説: (GoShiken メンバーにのみ表示されます)
What is the output of the K-means clustering algorithm?

You have two tables of customers in your database. Customers in cust_table_1 were sent an e-mail promotion last year, and customers in cust_table_2 received a newsletter last year.
Customers can only be entered in once per table. You want to create a table that includes all customers, and any of the communications they received last year.
Which type of join would you use for this table?

In a t-test with unknown variance, what values are used to calculate the t-statistic?