DEA-7TT2試験無料問題集「EMC Associate - Data Science and Big Data Analytics v2 認定」

Which word or phrase completes the statement; "Excessive emphasis color is to Bar chart as __________________."?
Response:


You have created a scatter plot using R from household income and education data as shown in the graphic. What can be done to improve the visualization?
Response:

Which characteristic applies only to Business Intelligence as opposed to Data Science?
Response:

Refer to the exhibit, which shows pairwise counts for items purchased together.

Consider the following association rules:
- Milk -> Eggs
- Eggs -> Milk
- Bread -> Milk
- Milk -> Bread
Which rule has a confidence higher than 70%?
Response:

Which SQL OLAP extension provides all possible grouping combinations?
Response:

You have an automotive database containing numeric characteristics such as engine size, horsepower, and top speed. Which technique could you use to group similar cars together?
Response:

Consider a database with 4 transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
The minimum support is 25%. Which rule has a confidence equal to 50%?
Response:

A call center for a large electronics company handles an average of 35, 000 support calls a day. The head of the call center would like to optimize the staffing of the call center during the rollout of a new product due to recent customer complaints of long wait times.
You have been asked to create a model to optimize call center costs and customer wait times. The goals for this project include:
1. Relative to the release of a product, how does the call volume change over time?
2. How to best optimize staffing based on the call volume for the newly released product, relative to old products.
3. Historically, what time of day does the call center need to be most heavily staffed?
4. Determine the frequency of calls by both product type and customer language.
Which goals are suitable to be completed with MapReduce?
Response:

Refer to the exhibit.

Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents for the topic "solid state disk".
In the Exhibit, Table A provides the inverse document frequency for each term across the corpus. Table B provides each term's frequency in four documents selected from corpus.
Which of the four documents is most relevant to the analyst's search?
Response:

Consider these itemsets:
(hat, scarf, coat)
(hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
What is the confidence of the rule (hat, scarf) => gloves?
Response:

Refer to the exhibit.

You have plotted the distribution of savings account sizes for your bank. How would you proceed, based on this distribution?
Response:

In a fitted ARIMA(1,2,3) model, how many differences are applied?
Response:

Refer to the exhibit.

You are using k-means clustering to discover groupings within a data set. You plot within- sum-of-squares (wss) of multiple cluster sizes. Based on the exhibit, how many clusters should you use in your analysis?
Response:


You have created a Linear Regression model to predict total sales based on variables M, N, P and Q as shown in the graphic. You originally expected all variables to have positive coefficients. Which action would you take?
Response:

What is the reason for using LOESS?
Response:

You have just completed the Discovery phase of a project and finished interviewing the main stakeholders. You have identified the necessary data feeds and are now beginning to set up the analytic sandbox. What is the next step?
Response:

Consider a database with 4 transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
You decide to run the association rules algorithm where minimum support is 50%. Which rule has a confidence at least 50%?
Response:

You are provided with the following list. Which window function is missing?
cume_dist()
dense_rank()
rank()
percent_rank()
first_value()
last_value()
lag()
lead()
ntile()
Response:

You have been assigned to run a linear regression model for each of 5, 000 distinct districts, and all the data is currently stored in a PostgreSQL database. Which tool/library would you use to produce these models with the least effort?
Response:

What describes a true limitation of a Logistic Regression method?
Response: