Class 10 Exam  >  Class 10 Notes  >  Artificial Intelligence for Class 10  >  Worksheet Solutions: Data Science

Worksheet Solutions: Data Science | Artificial Intelligence for Class 10 PDF Download

Multiple Choice Questions

Q.1: What is one of the major applications of data science?
a) Game development
b) Targeted advertising
c) Mobile app creation
d) Hardware design

Ans: b) Targeted advertising

Explanation: Targeted advertising is a major application of data science, used across digital marketing to achieve higher click-through rates by targeting ads based on user behavior.

Q.2: What type of machine learning model is used in the restaurant food waste prediction project?
a) Clustering
b) Regression
c) Classification
d) Reinforcement learning

Ans: b) Regression

Explanation: The restaurant project uses a regression model, a supervised learning model that handles continuous data over 30 days to predict the next day's food quantity.

Q.3: Which Python library is primarily used for numerical and logical operations on arrays?
a) Pandas
b) Matplotlib
c) NumPy
d) Scikit-learn

Ans: c) NumPy

Explanation: NumPy, or Numerical Python, is the fundamental package for mathematical and logical operations on arrays in Python.

Q.4: What type of data does Pandas handle efficiently?
a) Image data
b) Tabular data with heterogeneously-typed columns
c) Audio data
d) Unstructured text data

Ans: b) Tabular data with heterogeneously-typed columns

Explanation: Pandas is designed to handle tabular data efficiently, particularly with Series and DataFrame structures that support heterogeneously-typed columns.

Q.5: Which type of plot is used to represent the frequency of a variable over time?
a) Scatter plot
b) Bar chart
c) Histogram
d) Pie plot

Ans: c) Histogram

Explanation: Histograms represent the frequency of a variable over time using bins, showing variation in a single entity.

Q.6: What does the K-Nearest Neighbor (KNN) algorithm primarily rely on to make predictions?
a) Random sampling
b) Surrounding points or neighbors
c) Predefined rules
d) Statistical averages

Ans: b) Surrounding points or neighbors

Explanation: KNN relies on the surrounding points or neighbors to determine the class or group of an unknown point, based on proximity.

Q.7: What is the purpose of the interquartile range (IQR) in a box plot?
a) To show the mean of the data
b) To represent the spread between the 25th and 75th percentiles
c) To identify the mode of the data
d) To calculate the variance

Ans: b) To represent the spread between the 25th and 75th percentiles

Explanation: The IQR in a box plot covers the data between the 25th and 75th percentiles, indicating the spread of the middle 50% of the data.

Q.8: Which of the following is a source of online data collection?
a) Manual record-keeping
b) Open-sourced government portals
c) Physical surveys
d) Direct observations

Ans: b) Open-sourced government portals

Explanation: Open-sourced government portals are a source of online data collection, alongside reliable websites like Kaggle.

Q.9: What type of data issue involves incorrect values like a decimal in a phone number column?
a) Missing data
b) Outliers
c) Erroneous data
d) Null values

Ans: c) Erroneous data

Explanation: Erroneous data includes incorrect values, such as a decimal in a phone number column, which do not match the expected data type.

Q.10: In the KNN algorithm, why is K often chosen as an odd number in classification problems?
a) To reduce computation time
b) To ensure a tiebreaker in majority voting
c) To increase model complexity
d) To eliminate outliers

Ans: b) To ensure a tiebreaker in majority voting

Explanation: K is often an odd number in classification to act as a tiebreaker in majority voting, avoiding ambiguous predictions.

Fill in the Blanks

Q.1: Data science combines Python with mathematical concepts like __________, data analysis, and probability.
Ans: Statistics

Explanation: Data science integrates Python with mathematical concepts such as statistics, data analysis, and probability to analyze and process data.

Q.2: The __________ canvas in problem scoping identifies who is experiencing the problem.
Ans: Who

Explanation: The "Who" canvas in the 4Ws problem scoping framework identifies the stakeholders experiencing the problem, such as restaurants in the given scenario.

Q.3: In the restaurant project, the dataset includes the __________ of dish produced daily.
Ans: Quantity

Explanation: The dataset for the restaurant project includes the quantity of dish produced per day, critical for predicting food requirements.

Q.4: The Python library __________ is used for creating visualizations like bar graphs and scatter plots.
Ans: Matplotlib

Explanation: Matplotlib is a visualization library in Python for creating plots like scatter plots, bar charts, and histograms.

Q.5: The statistical measure __________ represents the most frequent value in a sequence.
Ans: Mode

Explanation: Mode is the most frequent value in a sequence, a key statistical measure used in data analysis.

True or False

Q.1: Data science algorithms are not used in search engines like Google.
Ans: False

Explanation: Search engines like Google use data science algorithms to deliver results for queries, processing vast amounts of data daily.

Q.2: NumPy arrays can contain multiple data types, unlike Python lists.
Ans: False

Explanation: NumPy arrays are homogeneous, containing only one data type, while Python lists can contain multiple data types.

Q.3: Pandas is built on top of NumPy for enhanced data manipulation.
Ans: True

Explanation: Pandas is built on top of NumPy, integrating well for scientific computing and data manipulation.

Q.4: A histogram is used to represent discontinuous data with gaps.
Ans: False

Explanation: Histograms represent continuous data, showing the frequency of a variable over time, unlike scatter plots or bar charts used for discontinuous data.

Q.5: In the KNN algorithm, a lower K value (e.g., K=1) makes predictions more stable.
Ans: False

Explanation: A lower K value (e.g., K=1) makes KNN predictions less stable, as they rely on fewer neighbors, increasing the risk of incorrect predictions.

Short Answer Questions

Q.1: What is the goal of the restaurant food waste prediction project?
Ans: The goal is to predict the quantity of food dishes to be prepared for daily consumption in restaurant buffets to minimize food waste and reduce losses. The project aims to predict food quantities to ensure less or no food is left unconsumed, reducing financial losses for restaurants.

Q.2: Name two benefits of using data science in airline route planning.
Ans: Data science helps predict flight delays and decide which class of airplanes to buy. These benefits enable airlines to optimize operations and reduce losses by improving scheduling and fleet decisions.

Q.3: How does Pandas handle missing data in datasets?
Ans: Pandas handles missing data by representing it as NaN, allowing easy identification and removal or manipulation during data processing. Missing data, represented as NaN, can be easily identified and processed, facilitating data cleaning in Pandas.

Q.4: Explain the difference between a scatter plot and a histogram in data visualization.
Ans: A scatter plot displays discontinuous data with gaps, showing relationships between two or more parameters using points, while a histogram represents continuous data, showing the frequency of a single variable over time using bins. Scatter plots visualize relationships in discontinuous data, while histograms show frequency distributions in continuous data.

Q.5: What is the purpose of the K-Nearest Neighbor (KNN) algorithm in data science?
Ans: The KNN algorithm predicts the class or value of an unknown point by using the properties of its nearest neighbors, based on their proximity. KNN is a supervised learning algorithm that classifies or predicts values based on the majority properties of nearby points.

Long Answer Questions

Q.1: Describe the 4Ws problem canvas and how it is applied to the restaurant food waste problem.
Ans: The 4Ws problem canvas includes:  

  • Who: Identifies the stakeholders, such as restaurants serving buffet food.  
  • What: Defines the problem, which is the large amount of unconsumed food left daily, leading to losses.  
  • Where: Specifies the context, such as restaurants at the end of the day when no further consumption is possible.  
  • Why: Explains why the problem is worth solving, as predicting food quantities reduces waste and financial losses.
    In the restaurant scenario, the canvas identifies restaurants as stakeholders facing losses due to unconsumed buffet food, occurring daily, and solving it improves efficiency and profitability.

Q.2: Explain the steps involved in evaluating the regression model for the restaurant food waste prediction project.
Ans: The evaluation steps are:  

  1. Feed the trained model data on the dish name and quantity produced.  
  2. Input data on the quantity of food left unconsumed for the same dish previously.  
  3. The model processes these inputs based on its training.  
  4. The model predicts the quantity of food to be prepared for the next day.  
  5. Compare the prediction to the testing dataset’s ideal value (total quantity minus unconsumed quantity).  
  6. Test the model on 10 days of reserved testing data.  
  7. Compare predicted values to actual values.  
  8. If predictions are similar to actual values, the model is accurate; otherwise, adjust the model or train with more data.

Q.3: Discuss the differences between NumPy arrays and Python lists, highlighting their key features.
Ans: NumPy arrays and Python lists differ as follows:  

  • Homogeneity: NumPy arrays are homogeneous, containing only one data type (e.g., numbers), while Python lists can contain multiple data types (e.g., numbers, strings).  
  • Functionality: NumPy arrays support advanced mathematical and logical operations, making them ideal for numerical computations, whereas lists are more general-purpose.  
  • Structure: NumPy arrays can be multi-dimensional (N-dimensional arrays), while lists are typically one-dimensional, though they can be nested.  
  • Performance: NumPy arrays are optimized for performance in scientific computing, unlike lists, which are less efficient for large datasets.
    NumPy arrays are used for efficient array-based computations, while lists are flexible for varied data storage.

Q.4: Explain how Matplotlib is used for data visualization, including the types of plots it can create.
Ans: Matplotlib is a Python library for creating 2D visualizations of arrays, enabling visual access to large datasets. It supports:  

  • Scatter Plots: For discontinuous data, showing relationships between up to four parameters using points, colors, and sizes.  
  • Bar Charts: For discontinuous data, depicting parameters with bars, including single or double bar charts for comparisons (e.g., men vs. women).  
  • Histograms: For continuous data, showing the frequency of a variable over time using bins.  
  • Box Plots: For displaying data distribution across quartiles, highlighting outliers and the interquartile range.
    Matplotlib allows customization of plots for better clarity and communication of trends and patterns.

Q.5: Describe the K-Nearest Neighbor (KNN) algorithm and explain how the choice of K affects its predictions, using the fruit sweetness example.
Ans: The K-Nearest Neighbor (KNN) algorithm is a supervised learning method that predicts an unknown point’s class or value based on the majority properties of its K nearest neighbors, calculated by distance. In the fruit sweetness example:  

  • For K=1, the algorithm considers the single nearest neighbor. If it’s a non-sweet fruit (blue dot), the prediction is non-sweet, but this is less stable if surrounded by sweet fruits (green dots).  
  • For K=2, two neighbors are considered. If one is sweet and one is non-sweet, the prediction is ambiguous, making it unreliable.  
  • For K=3, three neighbors are considered. If two are sweet and one is non-sweet, the prediction is sweet, based on the majority.
    A lower K (e.g., K=1) makes predictions less stable, as they depend on fewer points, risking errors. A higher K increases stability through majority voting but may introduce errors if K is too large. Odd K values are preferred in classification to avoid ties.
The document Worksheet Solutions: Data Science | Artificial Intelligence for Class 10 is a part of the Class 10 Course Artificial Intelligence for Class 10.
All you need of Class 10 at this link: Class 10
24 videos|87 docs|8 tests

FAQs on Worksheet Solutions: Data Science - Artificial Intelligence for Class 10

1. What is Data Science, and why is it important in today's world?
Ans. Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data. It plays a crucial role in decision-making across various sectors, including business, healthcare, and technology, by providing data-driven insights that can lead to improved efficiency, innovation, and understanding of complex phenomena.
2. What are some common tools and programming languages used in Data Science?
Ans. Common tools and programming languages used in Data Science include Python, R, SQL, and software such as Tableau, Apache Hadoop, and Microsoft Excel. Python and R are particularly popular due to their extensive libraries and frameworks that facilitate data manipulation and analysis, while SQL is essential for database management and querying data.
3. What are the key components of the Data Science process?
Ans. The key components of the Data Science process typically include data collection, data cleaning, data exploration and analysis, model building, and deployment. Each step is crucial for ensuring that the data is accurate, relevant, and effectively utilized to generate insights and predictions.
4. How does machine learning fit into Data Science?
Ans. Machine learning is a subset of Data Science that involves algorithms and statistical models that enable computers to perform tasks without explicit instructions. It allows Data Scientists to make predictions or decisions based on data patterns, enhancing the ability to analyze large data sets and derive actionable insights.
5. What ethical considerations should be taken into account in Data Science?
Ans. Ethical considerations in Data Science include data privacy, bias in algorithms, and the responsible use of data. It is essential to ensure that data is collected and used in compliance with regulations, that algorithms do not reinforce existing biases, and that the implications of data-driven decisions are considered for their social impact.
Related Searches

practice quizzes

,

ppt

,

Worksheet Solutions: Data Science | Artificial Intelligence for Class 10

,

Viva Questions

,

shortcuts and tricks

,

video lectures

,

MCQs

,

Extra Questions

,

Free

,

Previous Year Questions with Solutions

,

pdf

,

study material

,

mock tests for examination

,

Important questions

,

Summary

,

Exam

,

Sample Paper

,

Worksheet Solutions: Data Science | Artificial Intelligence for Class 10

,

past year papers

,

Objective type Questions

,

Semester Notes

,

Worksheet Solutions: Data Science | Artificial Intelligence for Class 10

;