D-DS-FN-23試験無料問題集（358題）「EMC Dell Data Science Foundations 認定」

出題：1

Consider these itemsets:
(hat, scarf, coat)
(hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
What is the confidence of the rule (gloves -> hat)?

A. 66%

B. 80%

C. 75%

D. 60%

正解：C 解答を投票する

出題：2

You submit a MapReduce job to a Hadoop cluster. Although the job was successfully submitted, you notice that it is not completing.
What should be done?

A. Ensure that the NameNode is running

B. Ensure that the TaskTracker is running

C. Ensure that the JobTracker is running

D. Ensure that a DataNode is running

正解：B 解答を投票する

出題：3

Which component of a final presentation provides a succinct overview of the business situation that was the impetus to initiate the project?

A. Approach

B. Model description

C. Recommendations

D. Project goals

正解：D 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：4

Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?

A. Try different analytical techniques

B. Try different variables

C. Transform existing variables

D. Run a pilot

正解：D 解答を投票する

出題：5

You do a Student's t-test to compare the average test scores of sample groups from populations A and B.
Group Aaveraged 10 points higher than group B.
You find that this difference is significant, with a p-value of 0.03.
What does that mean?

A. The difference in scores between a sample from population A and a sample from population B will tend to be within 3% of 10 points.

B. There is a 97% chance that a sample group from population A will score 10 points higher that a sample group from population B.

C. There is a 3% chance that you have identified a difference between the populations when in reality there is none.

D. There is a 3% chance that a sample group from population A will score 10 points higher that a sample group from population B.

正解：C 解答を投票する

出題：6

A business colleague who is new to Hadoop approaches you with a question. The colleague wants to know the best approach to access their data. The colleague has previously worked extensively with SQLand databases.
Which query interface should be recommended?

A. Hive

B. Howl

C. HBase

D. Pig

正解：A 解答を投票する

出題：7

Refer to the exhibit.

In association rules, for itemsets X and Y, which expression defines leverage?

A. b

B. a

C. c

D. d

正解：B 解答を投票する

出題：8

You are performing a market basket analysis using the Apriori algorithm.
Which measure is a ratio describing the how many more times two items are present together than would be expected if those two items are statistically independent?

A. Confidence

B. Leverage

C. Lift

D. Support

正解：C 解答を投票する

出題：9

What action occurs during feature selection in the model building phase of the data analytics lifecycle?

A. Identify the most useful input variables

B. Create new combinations of attributes

C. Select a superset of variables to shorten training times

D. Overfit the model to improve prediction accuracy

正解：A 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：10

Which two tools support query languages that can be used with Hadoop?

A. Hive and Pig

B. Hive and Hbase

C. Hbase and Howl

D. Pig and Howl

正解：A 解答を投票する

出題：11

You are assigned the task of creating customer profiles for your company. In your database, you have
25 key input variables that come together to define 2,500 customers. You decide to run a K-means cluster analysis on the 25 input variables based on k=4 to build your profiles.
Your analysis resulted in four cluster populations:
Cluster A=1,000 customers
Cluster B=560 customers
Cluster C=925 customers
Cluster D=15 customers
What should be attempted first to more evenly distribute the customer population across clusters?

A. Increase K from 4 to 5

B. Remove some of the input variables from the analysis

C. Reduce K from 4 to 3

D. Remove the 15 customers in Cluster D from the population

正解：C 解答を投票する

出題：12

Which method is used to solve for coefficients b0, b1, .., bn in your linear regression model: Y = b0 + b1x1+b2x2+....+bnxn

A. Apriori Algorithm

B. Integer programming

C. Ridge and Lasso

D. Ordinary Least squares

正解：D 解答を投票する

出題：13

Which visualization technique should be avoided?

A. Achieving a high data-ink ratio

B. Using visuals to illustrate key points

C. Using tables of numbers to present all of the data visually

D. Using a small number of contrasting colors to draw distinctions

正解：C 解答を投票する

解説: (GoShiken メンバーにのみ表示されます)

出題：14

What is the output of the K-means clustering algorithm?

A. Centroid positioning and entropy of each record in each cluster

B. Center of each discovered cluster and mapping of each record to a cluster

C. Intercept and coefficients for each input variable in the dataset

D. Two dimensional representation of the data and the clusters

正解：B 解答を投票する

出題：15

You have two tables of customers in your database. Customers in cust_table_1 were sent an e-mail promotion last year, and customers in cust_table_2 received a newsletter last year.
Customers can only be entered in once per table. You want to create a table that includes all customers, and any of the communications they received last year.
Which type of join would you use for this table?

A. Cross join

B. Full outer join

C. Left outer join

D. Inner join

正解：B 解答を投票する

出題：16

In a t-test with unknown variance, what values are used to calculate the t-statistic?

A. Mean, standard deviation, and population size

B. Mean, sample standard deviation, and population size

C. Sample mean, standard deviation, and sample size

D. Sample mean, sample standard deviation, and sample size

正解：D 解答を投票する

D-DS-FN-23試験無料問題集「EMC Dell Data Science Foundations 認定」