Table of contents |
|
Multiple Choice Questions |
|
Fill in the Blanks |
|
True or False |
|
Short Answer Questions |
|
Long Answer Questions |
|
Q.1: What is one of the major applications of data science?
a) Game development
b) Targeted advertising
c) Mobile app creation
d) Hardware design
Ans: b) Targeted advertising
Explanation: Targeted advertising is a major application of data science, used across digital marketing to achieve higher click-through rates by targeting ads based on user behavior.
Q.2: What type of machine learning model is used in the restaurant food waste prediction project?
a) Clustering
b) Regression
c) Classification
d) Reinforcement learning
Ans: b) Regression
Explanation: The restaurant project uses a regression model, a supervised learning model that handles continuous data over 30 days to predict the next day's food quantity.
Q.3: Which Python library is primarily used for numerical and logical operations on arrays?
a) Pandas
b) Matplotlib
c) NumPy
d) Scikit-learn
Ans: c) NumPy
Explanation: NumPy, or Numerical Python, is the fundamental package for mathematical and logical operations on arrays in Python.
Q.4: What type of data does Pandas handle efficiently?
a) Image data
b) Tabular data with heterogeneously-typed columns
c) Audio data
d) Unstructured text data
Ans: b) Tabular data with heterogeneously-typed columns
Explanation: Pandas is designed to handle tabular data efficiently, particularly with Series and DataFrame structures that support heterogeneously-typed columns.
Q.5: Which type of plot is used to represent the frequency of a variable over time?
a) Scatter plot
b) Bar chart
c) Histogram
d) Pie plot
Ans: c) Histogram
Explanation: Histograms represent the frequency of a variable over time using bins, showing variation in a single entity.
Q.6: What does the K-Nearest Neighbor (KNN) algorithm primarily rely on to make predictions?
a) Random sampling
b) Surrounding points or neighbors
c) Predefined rules
d) Statistical averages
Ans: b) Surrounding points or neighbors
Explanation: KNN relies on the surrounding points or neighbors to determine the class or group of an unknown point, based on proximity.
Q.7: What is the purpose of the interquartile range (IQR) in a box plot?
a) To show the mean of the data
b) To represent the spread between the 25th and 75th percentiles
c) To identify the mode of the data
d) To calculate the variance
Ans: b) To represent the spread between the 25th and 75th percentiles
Explanation: The IQR in a box plot covers the data between the 25th and 75th percentiles, indicating the spread of the middle 50% of the data.
Q.8: Which of the following is a source of online data collection?
a) Manual record-keeping
b) Open-sourced government portals
c) Physical surveys
d) Direct observations
Ans: b) Open-sourced government portals
Explanation: Open-sourced government portals are a source of online data collection, alongside reliable websites like Kaggle.
Q.9: What type of data issue involves incorrect values like a decimal in a phone number column?
a) Missing data
b) Outliers
c) Erroneous data
d) Null values
Ans: c) Erroneous data
Explanation: Erroneous data includes incorrect values, such as a decimal in a phone number column, which do not match the expected data type.
Q.10: In the KNN algorithm, why is K often chosen as an odd number in classification problems?
a) To reduce computation time
b) To ensure a tiebreaker in majority voting
c) To increase model complexity
d) To eliminate outliers
Ans: b) To ensure a tiebreaker in majority voting
Explanation: K is often an odd number in classification to act as a tiebreaker in majority voting, avoiding ambiguous predictions.
Q.1: Data science combines Python with mathematical concepts like __________, data analysis, and probability.
Ans: Statistics
Explanation: Data science integrates Python with mathematical concepts such as statistics, data analysis, and probability to analyze and process data.
Q.2: The __________ canvas in problem scoping identifies who is experiencing the problem.
Ans: Who
Explanation: The "Who" canvas in the 4Ws problem scoping framework identifies the stakeholders experiencing the problem, such as restaurants in the given scenario.
Q.3: In the restaurant project, the dataset includes the __________ of dish produced daily.
Ans: Quantity
Explanation: The dataset for the restaurant project includes the quantity of dish produced per day, critical for predicting food requirements.
Q.4: The Python library __________ is used for creating visualizations like bar graphs and scatter plots.
Ans: Matplotlib
Explanation: Matplotlib is a visualization library in Python for creating plots like scatter plots, bar charts, and histograms.
Q.5: The statistical measure __________ represents the most frequent value in a sequence.
Ans: Mode
Explanation: Mode is the most frequent value in a sequence, a key statistical measure used in data analysis.
Q.1: Data science algorithms are not used in search engines like Google.
Ans: False
Explanation: Search engines like Google use data science algorithms to deliver results for queries, processing vast amounts of data daily.
Q.2: NumPy arrays can contain multiple data types, unlike Python lists.
Ans: False
Explanation: NumPy arrays are homogeneous, containing only one data type, while Python lists can contain multiple data types.
Q.3: Pandas is built on top of NumPy for enhanced data manipulation.
Ans: True
Explanation: Pandas is built on top of NumPy, integrating well for scientific computing and data manipulation.
Q.4: A histogram is used to represent discontinuous data with gaps.
Ans: False
Explanation: Histograms represent continuous data, showing the frequency of a variable over time, unlike scatter plots or bar charts used for discontinuous data.
Q.5: In the KNN algorithm, a lower K value (e.g., K=1) makes predictions more stable.
Ans: False
Explanation: A lower K value (e.g., K=1) makes KNN predictions less stable, as they rely on fewer neighbors, increasing the risk of incorrect predictions.
Q.1: What is the goal of the restaurant food waste prediction project?
Ans: The goal is to predict the quantity of food dishes to be prepared for daily consumption in restaurant buffets to minimize food waste and reduce losses. The project aims to predict food quantities to ensure less or no food is left unconsumed, reducing financial losses for restaurants.
Q.2: Name two benefits of using data science in airline route planning.
Ans: Data science helps predict flight delays and decide which class of airplanes to buy. These benefits enable airlines to optimize operations and reduce losses by improving scheduling and fleet decisions.
Q.3: How does Pandas handle missing data in datasets?
Ans: Pandas handles missing data by representing it as NaN, allowing easy identification and removal or manipulation during data processing. Missing data, represented as NaN, can be easily identified and processed, facilitating data cleaning in Pandas.
Q.4: Explain the difference between a scatter plot and a histogram in data visualization.
Ans: A scatter plot displays discontinuous data with gaps, showing relationships between two or more parameters using points, while a histogram represents continuous data, showing the frequency of a single variable over time using bins. Scatter plots visualize relationships in discontinuous data, while histograms show frequency distributions in continuous data.
Q.5: What is the purpose of the K-Nearest Neighbor (KNN) algorithm in data science?
Ans: The KNN algorithm predicts the class or value of an unknown point by using the properties of its nearest neighbors, based on their proximity. KNN is a supervised learning algorithm that classifies or predicts values based on the majority properties of nearby points.
Q.1: Describe the 4Ws problem canvas and how it is applied to the restaurant food waste problem.
Ans: The 4Ws problem canvas includes:
Q.2: Explain the steps involved in evaluating the regression model for the restaurant food waste prediction project.
Ans: The evaluation steps are:
Q.3: Discuss the differences between NumPy arrays and Python lists, highlighting their key features.
Ans: NumPy arrays and Python lists differ as follows:
Q.4: Explain how Matplotlib is used for data visualization, including the types of plots it can create.
Ans: Matplotlib is a Python library for creating 2D visualizations of arrays, enabling visual access to large datasets. It supports:
Q.5: Describe the K-Nearest Neighbor (KNN) algorithm and explain how the choice of K affects its predictions, using the fruit sweetness example.
Ans: The K-Nearest Neighbor (KNN) algorithm is a supervised learning method that predicts an unknown point’s class or value based on the majority properties of its K nearest neighbors, calculated by distance. In the fruit sweetness example:
24 videos|87 docs|8 tests
|
1. What is Data Science, and why is it important in today's world? | ![]() |
2. What are some common tools and programming languages used in Data Science? | ![]() |
3. What are the key components of the Data Science process? | ![]() |
4. How does machine learning fit into Data Science? | ![]() |
5. What ethical considerations should be taken into account in Data Science? | ![]() |