A. Worker node
B. Discretized streams (DStreams)
C. Driver node
D. Kafka topics
E. HDFS storage
A. Guaranteeing there is no personally identifiable information in the extracted data
B. Making the data volume consistent, or at least easily predictable
C. Ensuring that proper backups are available in the event of a data corruption
D. Monitoring is simply accomplished by the end users reporting any data quality issues
A. Apache Spark
B. Relational database schema
C. MongoDB
D. Append-only logs
A. Spark SQL and GraphX
B. Spark SQL and RDD
C. Spark NoSQL and RDD
D. Spark NoSQL and GraphX
A. Meta information is generated based on the client-server communication protocol
B. Data is shared among many computers to enable fault tolerance and data accessibility
C. The developer copies and pastes the data in multiple databases
D. Client-server communication protocol is used, whereby every client replicates data from the server
A. Processing real-time data streams
B. Storing historical data
C. Running machine learning algorithms
D. Running complex SQL queries
A. Enforcing referential integrity
B. Increasing data security
C. Reducing data redundancy
D. Improving query performance
A. Set
B. List
C. UUID
D. Map
A. Minimize Bias-Variance trade-off
B. Reduce misuse of data and facilitate its transparency
C. Decrease trust between humans and machines
D. Minimize transparency at the point of data collection