Table of contents |
|
Multiple Choice Questions |
|
Fill in the Blanks |
|
True or False |
|
Short Answer Questions |
|
Long Answer Questions |
|
Q.1: Which of the following is an application of computer vision in smart homes?
a) Weather forecasting
b) Facial recognition for security
c) Voice command processing
d) Temperature control
Ans: b) Facial recognition for security
Explanation: Computer vision enables facial recognition for security purposes in smart homes, such as guest recognition or visitor log maintenance.
Q.2: What is the primary function of Google’s Search by Image feature?
a) To edit images
b) To compare image features with a database for search results
c) To translate text in images
d) To generate new images
Ans: b) To compare image features with a database for search results
Explanation: Google’s Search by Image compares features of an input image to a database of images to provide relevant search results.
Q.3: Which computer vision task involves assigning a single label to an input image?
a) Object Detection
b) Instance Segmentation
c) Image Classification
d) Classification + Localisation
Ans: c) Image Classification
Explanation: Image Classification assigns a single label from a fixed set of categories to an input image, used in various practical applications.
Q.4: What is the smallest unit of information in a digital image?
a) Byte
b) Pixel
c) Bit
d) Kernel
Ans: b) Pixel
Explanation: A pixel, or picture element, is the smallest unit of information in a digital image, typically arranged in a 2D grid.
Q.5: What is the pixel value range for an 8-bit grayscale image?
a) 0 to 100
b) 0 to 255
c) 0 to 512
d) 0 to 1024
Ans: b) 0 to 255
Explanation: An 8-bit grayscale image has pixel values ranging from 0 (black) to 255 (white), as each pixel is represented by 8 bits.
Q.6: Which library is used for image processing tasks like resizing and cropping?
a) NumPy
b) Pandas
c) OpenCV
d) Matplotlib
Ans: c) OpenCV
Explanation: OpenCV is used for image processing tasks such as resizing, cropping, and analyzing images or videos for objects and features.
Q.7: What is the purpose of the convolution operation in image processing?
a) To resize the image
b) To extract features from the image
c) To convert the image to grayscale
d) To classify the image
Ans: b) To extract features from the image
Explanation: Convolution multiplies an image array with a kernel to extract features like edges, used for further processing in applications like CNNs.
Q.8: Which layer in a Convolutional Neural Network (CNN) is responsible for reducing the spatial size of the feature map?
a) Convolution Layer
b) Rectified Linear Unit (ReLU) Layer
c) Pooling Layer
d) Fully Connected Layer
Ans: c) Pooling Layer
Explanation: The Pooling Layer reduces the spatial size of the feature map while retaining important features, making the image more manageable.
Q.9: What does the ReLU layer do in a CNN?
a) Reduces image size
b) Removes negative values from the feature map
c) Classifies the image
d) Extracts high-level features
Ans: b) Removes negative values from the feature map
Explanation: The ReLU layer removes negative values from the feature map, introducing non-linearity and making feature changes more abrupt.
Q.10: Which image feature is considered the easiest to locate due to its distinct appearance?
a) Flat surfaces
b) Edges
c) Corners
d) Blobs
Ans: c) Corners
Explanation: Corners are the easiest to locate because they appear different at specific locations, unlike flat surfaces or edges which look similar across areas.
Q.1: Computer vision enables machines to process and analyze ______ data.
Ans: Image
Explanation: Computer vision focuses on processing and analyzing image data to perform tasks like object detection and facial recognition.
Q.2: The process of ______ is used in Google Translate to recognize text in images.
Ans: Optical Character Recognition
Explanation: Google Translate uses optical character recognition to identify text in images for translation into a preferred language.
Q.3: In an RGB image, each pixel is represented by three values corresponding to the ______, ______, and ______ channels.
Ans: Red, Green, Blue
Explanation: Each pixel in an RGB image has three values, one for each Red, Green, and Blue channel, determining its color.
Q.4: A ______ is a matrix slid across an image to perform convolution for feature extraction.
Ans: Kernel
Explanation: A kernel is a matrix used in convolution, slid across an image to multiply pixel values and extract features like edges.
Q.5: The ______ layer in a CNN flattens the convolution/pooling output into a single vector for classification.
Ans: Fully Connected
Explanation: The Fully Connected Layer flattens the convolution/pooling output into a vector, assigning probabilities to classify the image into a label.
Q.1: Computer vision was first introduced in the 1990s.
Ans: False
Explanation: Computer vision was first introduced in the 1970s, with significant advancements in recent years making it widely accessible.
Q.2: Grayscale images have three channels representing red, green, and blue.
Ans: False
Explanation: Grayscale images have a single channel with shades of gray, ranging from black (0) to white (255), unlike RGB images with three channels.
Q.3: The convolution operation multiplies an image array with a kernel to produce a new array.
Ans: True
Explanation: Convolution involves element-wise multiplication of an image array with a kernel, followed by summation to produce a new array.
Q.4: The Pooling Layer in a CNN increases the spatial size of the feature map.
Ans: False
Explanation: The Pooling Layer reduces the spatial size of the feature map while retaining important features, making processing more efficient.
Q.5: Corners are considered good features in image processing because they are easy to locate.
Ans: True
Explanation: Corners are unique and distinct at specific locations, making them easier to locate compared to edges or flat surfaces.
Q.1: What is the role of computer vision in inventory management for retail?
Ans: Computer vision analyzes camera images to estimate available items, track shelf space usage, and suggest optimal item placement. By analyzing images, computer vision helps retailers manage inventory accurately and optimize product placement for better customer experience.
Q.2: Explain the difference between Object Detection and Instance Segmentation.
Ans: Object Detection identifies and locates multiple objects in an image or video, while Instance Segmentation detects objects, assigns categories, and labels each pixel within those objects.
Q.3: How is resolution defined in the context of digital images?
Ans: Resolution is the number of pixels in an image, expressed as width by height (e.g., 1280x1024) or as a total pixel count (e.g., 1.31 megapixels).
Q.4: What is the purpose of the ReLU layer in a Convolutional Neural Network?
Ans: The ReLU layer removes negative values from the feature map, introducing non-linearity and making feature changes more abrupt for better processing.
Q.5: Why are corners considered better features than edges in image processing?
Ans: Corners are unique and distinct at specific locations, making them easier to locate, while edges look similar along their length, complicating precise identification.
Q.1: Describe three applications of computer vision mentioned and explain how they utilize image processing.
Ans: 1. Facial Recognition: Used in smart homes and schools for security and attendance, it processes images to identify facial features and match them against a database.
2. Google Translate App: Translates text in images by using optical character recognition to detect text and augmented reality to overlay translations.
3. Self-Driving Cars: Employs image processing to identify objects, determine navigational routes, and monitor the environment for autonomous driving.
Q.2: Explain the process of how the Google Translate app uses computer vision to translate text in images.
Ans: The Google Translate app uses a phone’s camera to capture an image of text in a foreign language. It applies optical character recognition to identify the text within the image. Then, augmented reality overlays the translated text in the user’s preferred language almost instantly, aligning it with the original text’s position.
Q.3: Discuss the structure of an RGB image and how it differs from a grayscale image in terms of storage and pixel values.
Ans: An RGB image consists of three channels (Red, Green, Blue), each storing pixel intensity values from 0 to 255. Each pixel has three values, combining to form the visible color. A grayscale image has a single channel with pixel values from 0 (black) to 255 (white), representing shades of gray. RGB images require three times the storage of grayscale images due to the multiple channels, and their pixel values represent color combinations rather than a single intensity.
Q.4: Describe the convolution operation and how it is used to extract features from an image, including the role of the kernel.
Ans: Convolution is an element-wise multiplication of an image array with a kernel, followed by summation to produce a new array. The kernel, a small matrix, slides across the image, multiplying its values with overlapping pixel values to extract features like edges or corners. Different kernels produce varied effects, enhancing specific image aspects. To maintain output size, edge pixels are extended with zero values. This process is crucial for feature extraction in applications like Convolutional Neural Networks.
Q.5: Explain the layers of a Convolutional Neural Network (CNN) and their roles in processing an input image for classification.
Ans: A CNN consists of:
24 videos|92 docs|8 tests
|
1. What is computer vision and how does it differ from traditional image processing? | ![]() |
2. What are some common applications of computer vision in everyday life? | ![]() |
3. What are the key challenges faced in the field of computer vision? | ![]() |
4. What role does machine learning play in computer vision? | ![]() |
5. How has computer vision evolved over the years? | ![]() |