All questions of Computer Architecture & Organisation (CAO) for Computer Science Engineering (CSE) Exam

Which of the following is not involved in a memory write operation?
  • a)
    MAR
  • b)
    PC
  • c)
    MDR
  • d)
    data bus
Correct answer is option 'B'. Can you explain this answer?

Ravi Singh answered
Program counter register used to read the value or instruction not in write operation. For write operation MAR, IR and MDR registers are used.

In a C program, an array is declared as float A[2048]. Each array element is 4 Bytes in size, and the starting address of the array is 0x00000000. This program is run on a computer that has a direct mapped data cache of size 8 Kbytes, with block (line) size of 16 Bytes.
a. Which elements of the array conflict with element A[0] in the data cache? Justify your answer briefly.
b. If the program accesses the elements of this array one by one in reverse order i.e., starting with the last element and ending with the first element, how many data cache misses would occur? Justify your answer briefly. Assume that the data cache is initially empty and that no other data or instruction accesses are to be considered.
Correct answer is '(a)'. Can you explain this answer?

Ravi Singh answered
(a)
Data cache size = 8KB.
Block line size = 16B.
Since each array element occupies 4B, four consecutive array elements occupy a block line (elements are aligned as
starting address is 0)
Number of cache blocks = 8KB/16B = 512. Number of cache blocks needed for the array = 2048/4 = 512. So, all the array
elements has its own cache block and there is no collision.
We can also explain this with respect to array address. Starting address is 0x00000000 = 0b0000..0 (32 0's). Ending
address is 0x00001FFF = 0b0000..0111111111111 (4*2048 = 8192 location).
Here, the last 4 bits are used as OFFSET bits and the next 9 bits are used as SET bits. So, since the ending address is not
extending beyond these 9 bits, all cache accesses are to diff sets.
(b) If the last element is accessed first, its cache block is fetched. (which should contain the previous 3 elements of the
array also since each cache block hold 4 elements of array and 2048 is and exact multiple of 4). Thus, for every 4
accesses, we will have a cache miss => for 2048 accesses we will have 512 cache misses. (This would be same even if we
access array in forward order).

The correct matching for the following pairs is:
(A) DMA I/O (1) High speed RAM
(B) Cache (2) Disk
(C) Interrupt I/O (3) Printer
(D) Condition Code Register (4) ALU
  • a)
    A-4 B-3 C-1 D-2
  • b)
    A-2 B-1 C-3 D-4
  • c)
    A-4 B-3 C-2 D-1
  • d)
    A-2 B-3 C-4 D-1
Correct answer is option 'B'. Can you explain this answer?

Matching of computer terms:

(A) DMA I/O (1) High speed RAM
(B) Cache (2) Disk
(C) Interrupt I/O (3) Printer
(D) Condition Code Register (4) ALU

Explanation:

DMA I/O: Direct Memory Access (DMA) is a method of transferring data from the computer's memory to a device without involving the CPU. It is used to transfer large amounts of data at high speeds. Therefore, DMA I/O is associated with high-speed RAM.

Cache: Cache is a small amount of memory that is used to store frequently used data. It is used to speed up the access to data that is located in main memory. Therefore, cache is associated with disk.

Interrupt I/O: Interrupt I/O is a method of input/output in which a device sends a signal to the CPU to request its attention. The CPU then interrupts its current task and services the request. Therefore, Interrupt I/O is associated with printers.

Condition Code Register: The Condition Code Register (CCR) is a register that contains information about the status of the last arithmetic or logical operation performed by the CPU. It is used to determine the outcome of the operation. Therefore, the CCR is associated with the Arithmetic Logic Unit (ALU).

Hence, the correct matching of computer terms is:

A-2 B-1 C-3 D-4

The stage delays in a -stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage (with delay 800 picoseconds) is replaced with a functionality equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is ___________ percent.
Correct answer is '= 33.33%'. Can you explain this answer?

In pipeline ideally CPI = 1
So in 1 cycle 1 instruction gets completed
Throughout is instructions in unit time
In pipeline 1, cycle time = max stage delay = 800psec
In 800 * psec, we expect to finish 1 instuction
So, in 1s, 1/800 instructions are expected to be completed, which is also the throughput for pipeline 1.
Similarly pipeline 2, throughput = 1/600
Throughput increase in percentage
= 33.33%

______ is a piece of hardware that executes a set of machine-language instructions.
  • a)
    Controller
  • b)
    Bus
  • c)
    Processor
  • d)
    Motherboard
Correct answer is option 'C'. Can you explain this answer?

Processor or processing unit is an electronic CKT which perform operations on some external source, usually memory or some data stream. Processor executes a set of machine language instructions.

A pipelined processor uses a 4-stage instruction pipeline with the following stages: Instruction fetch (IF), Instruction decode (ID), Execute (EX) and Writeback (WB). The arithmetic operations as well as the load and store operations are carried out in the EX stage. The sequence of instructions corresponding to the statement X = (S - R * (P + Q))/T is given below. The values of variables P, Q, R, S and T are available in the registers R0, R1, R2, R3 and R4 respectively, before the execution of the instruction sequence.
ADD R5, R0, R1 ; R5 → R0 + R1
MUL R6, R2, R5 ; R6 → R2 * R5
SUB R5, R3, R6 ; R5 → R3 - R6
DIV R6, R5, R4 ; R6 → R5/R4
STORE R6, X ; X ← R6
The IF, ID and WB stages take 1 clock cycle each. The EX stage takes 1 clock cycle each for the ADD, SUB and STORE operations, and 3 clock cycles each for MUL and DIV operations. Operand forwarding from the EX stage to the ID stage is used. The number of clock cycles required to complete the sequence of instructions is
  • a)
    10
  • b)
    12
  • c)
    14
  • d)
    16
Correct answer is option 'B'. Can you explain this answer?

Pallabi Sharma answered
Understanding the Pipelined Execution
In a 4-stage pipelined processor, we have the following stages: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), and Writeback (WB). Each stage takes 1 clock cycle, except for MUL and DIV, which take 3 cycles.
Instruction Breakdown
1. ADD R5, R0, R1
- Takes 1 cycle in IF, ID, EX, and 1 cycle in WB (Total: 4 cycles).
2. MUL R6, R2, R5
- Takes 1 cycle in IF, ID, 3 cycles in EX, and 1 cycle in WB (Total: 6 cycles).
3. SUB R5, R3, R6
- Takes 1 cycle in IF, ID, EX, and 1 cycle in WB (Total: 4 cycles).
4. DIV R6, R5, R4
- Takes 1 cycle in IF, ID, 3 cycles in EX, and 1 cycle in WB (Total: 6 cycles).
5. STORE R6, X
- Takes 1 cycle in IF, ID, EX, and 1 cycle in WB (Total: 4 cycles).
Pipelining and Clock Cycles
- The pipeline can overlap the execution of instructions.
- Each instruction progresses through the stages simultaneously, and the total cycles will be determined by the longest EX stage.
Cycle Calculation
- The critical path is defined by the longest operation, which is MUL (6 cycles) and DIV (6 cycles).
- Thus, the total number of clock cycles required to complete all instructions is determined by the longest EX stages, plus the time taken for the first instruction to complete its WB stage.
Total Clock Cycle Count
- The total clock cycles required:
- 1 (IF for ADD) + 1 (ID for ADD) + 1 (EX for ADD) + 1 (WB for ADD)
- + 1 (IF for MUL, overlaps with ID for ADD)
- + 1 (ID for MUL) + 3 (EX for MUL) + 1 (WB for MUL)
- + 1 (IF for SUB, overlaps with ID for MUL)
- + 1 (ID for SUB) + 1 (EX for SUB) + 1 (WB for SUB)
- + 1 (IF for DIV, overlaps with ID for SUB)
- + 1 (ID for DIV) + 3 (EX for DIV) + 1 (WB for DIV)
- + 1 (IF for STORE, overlaps with ID for DIV)
- + 1 (ID for STORE) + 1 (EX for STORE) + 1 (WB for STORE)
The total comes up to 12 cycles.
Thus, the correct answer is 12 cycles (option b).

More than one word are put in one cache block to
  • a)
    exploit the temporal locality of reference in a program
  • b)
    exploit the spatial locality of reference in a program
  • c)
    reduce the miss penalty
  • d)
    none of the above
Correct answer is option 'B'. Can you explain this answer?

Krish Datta answered
exploit the spatial locality of reference in a program as, if the next locality is addressed immediately, it will already be in the cache.
Consider the scenario similar to cooking, where when an ingredient is taken from cupboard, you also take the near by ingredients along with it- hoping that they will be needed in near future.

Which of the following are NOT true in a pipelined processor?
I. Bypassing can handle all RAW hazards
II. Register renaming can eliminate all register carried WAR hazards
III. Control hazard penalties can be eliminated by dynamic branch prediction
  • a)
    I and II only
  • b)
    I and III only
  • c)
    II and III only
  • d)
    I, II and III
Correct answer is option 'B'. Can you explain this answer?

Aman Menon answered
Pipelined Processor and Hazards

In a pipelined processor, instructions are divided into multiple stages and executed concurrently. This allows for improved performance by overlapping the execution of multiple instructions. However, this concurrent execution can introduce hazards, which are situations that prevent instructions from executing as intended. There are three types of hazards: data hazards, control hazards, and structural hazards.

I. Bypassing can handle all RAW hazards
Explanation:
- RAW (Read After Write) hazard occurs when a instruction depends on the result of a previous instruction that has not yet completed. This can lead to incorrect results if not handled properly.
- Bypassing, also known as forwarding, is a technique that allows data to be directly transferred from one stage of the pipeline to another without going through registers. It can help to resolve RAW hazards by providing the required data to instructions that need it.
- However, bypassing cannot handle all RAW hazards. There are situations where bypassing is not possible, such as when there are multiple dependent instructions in the pipeline or when the required data is not available in the pipeline.
- Therefore, the statement I is NOT true.

II. Register renaming can eliminate all register carried WAR hazards
Explanation:
- WAR (Write After Read) hazard occurs when a instruction writes to a register that is later read by another instruction before the write operation completes. This can lead to incorrect results if not handled properly.
- Register renaming is a technique that assigns multiple physical registers to the same logical register, allowing instructions to write to and read from different physical registers. This eliminates WAR hazards by ensuring that the read operation gets the correct value from the renamed register.
- However, there are situations where register renaming cannot eliminate WAR hazards, such as when there are dependencies between instructions that cannot be resolved by renaming registers.
- Therefore, the statement II is NOT true.

III. Control hazard penalties can be eliminated by dynamic branch prediction
Explanation:
- Control hazards occur when the control flow of the program is altered due to branches, jumps, or other control transfer instructions. This can lead to pipeline stalls and reduced performance.
- Dynamic branch prediction is a technique that predicts the outcome of branch instructions based on the behavior of previous branches. It allows the pipeline to continue executing instructions speculatively based on the predicted outcome.
- While dynamic branch prediction can significantly reduce the penalties associated with control hazards, it cannot eliminate them entirely. There are cases where the prediction may be incorrect, leading to incorrect execution and the need to flush the pipeline.
- Therefore, the statement III is NOT true.

Conclusion
- In summary, the correct answer is option 'B' because statements II and III are NOT true in a pipelined processor. Bypassing cannot handle all RAW hazards, and register renaming cannot eliminate all register carried WAR hazards. Similarly, while dynamic branch prediction can reduce control hazard penalties, it cannot eliminate them entirely.

If M denotes the number of memory locations and N denotes the word size, then an expression that denotes the storage capacity is ______________
  • a)
    M * N
  • b)
    M + N
  • c)
    2M + N
  • d)
    2M - N
Correct answer is option 'A'. Can you explain this answer?

Sudhir Patel answered
Storage capacity is the product of a number of memory locations that is the number of words and the word size or the number of bits stored per location. Storage capacity should be as large as possible.

Which of the following addressing modes permits relocation without any change whatsoever in the code?
  • a)
    Indirect addressing
  • b)
    Indexed addressing
  • c)
    PC relative addressing
  • d)
    PC relative addressing
Correct answer is option 'D'. Can you explain this answer?

Swara Dasgupta answered
PC relative addressing mode:
The effective memory address is obtained by computing an offset from the current PC. No matter where the code is loaded in memory, the offsets are always the same so the code needs no modification. In this addressing, there is absolutely no change in code needed.
Therefore the answer is Option 4 i.e, PC relative Addressing

Consider a processor with 64 registers and an instruction set of size twelve. Each instruction has five distinct fields, namely, opcode, two source register identifiers, one destination register identifier, and twelve-bit immediate value. Each instruction must be stored in memory in a byte-aligned fashion. If a program has 100 instructions, the amount of memory (in bytes)
consumed by the program text is _________.
Correct answer is '=> 500 bytes'. Can you explain this answer?

Ameya Goyal answered
Answer => 500 bytes
Number of register = 64
Number of bits to address register = [log264] = 6 bits
Number of Instruction = 12
Opcode size = [log212] = 4
Opcode(4) reg1(6) reg2(6) reg2(6) Immediate(12)
Total bits per instruction = 34
Total bytes per instruction= 4.25
Due to byte alignment we cannot store 4.25 bytes, without wasting 0.75 bytes ,
So Total bytes per instruction = 5
Total instruction = 100
Total size = Number of instruction x Size of instruction 100 x 5= 500 Byes

The main memory of a computer has 2 cm blocks while the cache has 2c blocks. If the cache uses the set associativemapping scheme with 2 blocks per set, then block k of the main memory maps to the set
  • a)
    (k mod m) of the cache
  • b)
    (k mod c) of the cache
  • c)
    (k mod 2c) of the cache
  • d)
    (k mod 2 cm) of the cache
Correct answer is option 'B'. Can you explain this answer?

Janani Joshi answered
Main Memory and Cache Structure
In a computer system, the organization of memory is crucial for performance. The main memory consists of blocks, and in this scenario, it has 2 cm blocks. The cache, which is faster and smaller, contains 2c blocks.
Cache Mapping Scheme
The cache employs a set associative mapping scheme with 2 blocks per set. This means that the cache can hold multiple blocks from the main memory in each set, allowing for more flexibility and faster access.
Understanding Block Mapping
To determine where a block from main memory maps in the cache, we can use the formula for set indexing. The formula typically involves the total number of cache blocks and the number of sets. Here’s how to analyze it:
- Total cache blocks: 2c
- Number of sets: Since there are 2 blocks per set, the number of sets in the cache = (2c) / 2 = c.
Now, to find out which set a block k from main memory maps to:
- Mapping formula: Block k will map to a set in the cache based on the modulo operation with the number of sets.
Calculating the Set Index
Since we have c sets, the formula to find the set for block k is:
- Set index = k mod c
This is because c is the number of sets in the cache, and using modulo c will provide the correct set index for the block.
Conclusion
Thus, the correct answer to where block k of the main memory maps in the cache is option b) (k mod c) of the cache. Understanding this mapping is crucial for optimizing cache performance in computer systems.

Which of the following is the fastest memory?
  • a)
    Cache Memory
  • b)
    DVD ROMs
  • c)
    Static RAM
  • d)
    Hard Disk
Correct answer is option 'A'. Can you explain this answer?

G.K Academy answered
  • Cache memory is a special very high-speed memory.
    • It is used to speed up and synchronize with the high-speed CPU. Cache memory is more expensive than main memory or disk memory but economical as compared to CPU registers.
    • Cache memory is an extremely fast memory type that acts as a buffer between the RAM and the CPU.
    • It holds frequently requested data and instructions so that they are immediately available to the CPU when needed.
    • Cache memory is used to reduce the average time to access data from the main memory.

Cache memory is the onboard storage.
  • a)
    True
  • b)
    False
  • c)
    Can't be Determined
  • d)
    None of Above
Correct answer is option 'A'. Can you explain this answer?

Sudhir Patel answered
Cache Memory is the memory closest to the CPU. Registers, Cache and the main memory are the means of onboard storage in the computer system.

Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non-pipelined but identicalCPU, we can say that
  • a)
    T1 ≤ T2
  • b)
    T1 ≥ T2
  • c)
    T1 < T2
  • d)
    T1 and T2 plus the time taken for one instruction fetch cycle
Correct answer is option 'B'. Can you explain this answer?

Aarav Malik answered
Here we are comparing the execution time of only a single instruction. Pipelining in no way increases the execution time of a single instruction (the time from its start to end). It increases the overall performance by splitting the execution to multiple pipeline stages so that the following instructions can use the finished stages of the previous instructions. But in doing so pipelining causes some problems also as given in the below link, which might slow some instructions. So, (B) is the answer.

 The CPU initializes the DMA by sending ______ .
  • a)
    The starting address of the memory blocks where data is available or where data is to be stored.
  • b)
    The word count.
  • c)
    Control for mode and start the transfer.
  • d)
    All of these
Correct answer is option 'D'. Can you explain this answer?

Soumya Dey answered
The CPU initializes the DMA by sending:
The starting address of the memory blocks where data are available (for read) or where data are to be stored. (For write)
The word count, which is the number of words in the memory.
Control to specify the mode of transfer such as read or write.

Which among the following memories is used in Digital camera?
  • a)
    Flash memory 
  • b)
    Virtual memory
  • c)
    Main memory 
  • d)
    Cache memory 
Correct answer is option 'A'. Can you explain this answer?

G.K Academy answered
  • Digital Camera
    • It is a camera that captures photographs in flash memory.
    • The basis for digital camera image sensors is metal-oxide-semiconductor technology.
    • Steven Sasson invented the world's first digital camera in 1975.
  • Flash Memory
    • It is a type of nonvolatile memory that erases data in units called blocks and rewrites data at the byte level.
    • It is widely used for storage and data transfers in consumer devices, industrial applications.
    • Flash memory is a non-volatile computer storage chip,
    • It is made up of EEPROM.
    • EEPROM is the acronym of electrically erasable programmable read-only memory,
    • EEPROM is a Programmable ROM that can be erased and reprogrammed using an electrical charge.
    • It was first was developed by George Perlegos.
    • It allows the user to update the computer BIOS without having to open the computer or remove any chips.
    • It is organized as arrays of floating-gate transistors.
    • USB flash drives, Memory cards are examples of flash memory storage.
    • BIOS is stored in a Flash memory chip.

Consider two cache organizations. First one is 32 kb 2-way set associative with 32 byte block size, the second is of same size but direct mapped. The size of an address is 32 bits in both cases . A 2-to-1 multiplexer has latency of 0.6 ns while a k - bit comparator has latency of
k/10 ns The hit latency of the set associative organization is hwhile that of direct mapped is h2
.
The value of h1 is:
  • a)
    2.4 ns
  • b)
    2.3 ns
  • c)
    1.8 ns
  • d)
    1.7 ns
Correct answer is option 'A'. Can you explain this answer?

Cache size is 32 KB and cache block size is 32 B. So,

So, number of index bits needed = 9 ( since 29 = 512). Number of offset bits = 5 (since 25 = 32 B is the block size and assuming byte addressing). So, number of tag bits = 32 - 9 - 5 = 18 (as memory address is of 32 bits).
So, time for comparing the data = Time to compare the data + Time to select the block in set = 0.6 + 18/10 ns = 2.4 ns.
(Two comparisons of tag bits need to be done for each block in a set, but they can be carried out in parallel and the succeeding one multiplexed as the output).

The principle of locality justifies the use of
  • a)
    Interrupts
  • b)
    DMA
  • c)
    Polling
  • d)
    Cache Memory
Correct answer is option 'D'. Can you explain this answer?

Mahesh Pillai answered
It is D.
Locality of reference is actually the frequent accessing of any storage location or some value. We can say in simple language that whatever things are used more frequently, they are stored in the locality of reference. So we have cache memory for the purpose.

An instruction pipeline has five stages where each stage take 2 nanoseconds and all instruction use all five stages. Branch instructions are not overlapped. i.e., the instruction after the branch is not fetched till the branch instruction is completed. Under ideal conditions,
a. Calculate the average instruction execution time assuming that 20% of all instructions executed are branch instruction. Ignore the fact that some branch instructions may be conditional.
b. If a branch instruction is a conditional branch instruction, the branch need not be taken. If the branch is not taken, the following instructions can be overlapped. When 80% of all branch instructions are conditional branch instructions, and 50% of the conditional branch instructions are such that the branch is taken, calculate the average instruction execution time.
Correct answer is 'Each stage is 2 ns.'. Can you explain this answer?

Jatin Joshi answered
Each stage is 2ns. So, after 5 time units each of 2ns, the first instruction finishes (i.e., after 10ns), in every 2ns after that a new instruction gets finished. This is assuming no branch instructions. Now, once the pipeline is full, we can assume that the initial fill time doesn't matter our calculations and average execution time for each instruction is 2ns assuming no
branch instructions.
(a) Now, we are given that 20% of instructions are branch (like JMP) and when a branch instruction is executed, no further instruction enters the pipeline. So, we can assume every 5th instruction is a branch instruction. So, with this assumption, total time to finish 5 instruction will be 5 * 2 + 8 = 18 ns (as when a branch instruction enters the pipeline and before it finishes, 4 pipeline stages will be empty totaling 4 * 2 = 8 ns, as it is mentioned in question that the next instruction fetch starts only when branch instruction completes). And this is the same for every set of 5 instructions, and hence the average instruction execution time = 18/5 = 3.6 ns
(b) This is just a complex statement. But what we need is to identify the % of branch instructions which cause a branch to be taken as others will have no effect on the pipeline flow.
20% of branch instructions are branch instructions. 80% of branch instructions are conditional.
That means .2*.8 = 16% of instructions are conditional branch instructions and it is given that 50% of those result in a branch being taken.
So, 8% of instructions are conditional branches being taken and we also have 20% of 20% = 4% of unconditional branch
instructions which are always taken.
So, percentage of instructions where a branch is taken is 8+4 = 12% instead of 20% in (a) part.
So, in 100 instructions there will be 12 branch instructions. We can do a different calculation here as compared to (a) as 12 is not a divisor of 100. Each branch instruction causes a pipeline delay of 4*2 = 8 ns. So, 12 instructions will cause a
delay of 12 * 8 = 96 ns. For 100 instructions, we need 100 * 2 = 200 ns without any delay and with delay we require 200 + 96 = 296 ns for 100 instructions.
So, average instruction execution time = 296/100 = 2.96 ns (We can also use this method for part (a) which will give 100 * 2 + 20*8 = 360 ns for 100 instructions)

The computer can execute 1,000,000 instructions per second: A program running on this computer performs on average a one sector read and one sector write for every 200 instructions that it executes. The disk drive handling the I/O transfers requires 0.00010 seconds each to perform the read and write operations. Assuming no overlap of these operations, the percent of CPU time spent in the wait state is
  • a)
    12%
  • b)
    39%
  • c)
    57%
  • d)
    91%
Correct answer is option 'D'. Can you explain this answer?

Subhankar Shah answered
•  The response time of I/O devices are magnitude order slower than that of CPU, hence direct interfacing is impossible.
• It is always better to off load the I/O processing to a secondary processor on the I/O controller board then to depend on the primary CPU for I/O processing.
• The variety of I/O devices in the market, requires that separate I/O controller exist for each device.

Chapter doubts & questions for Computer Architecture & Organisation (CAO) - GATE Computer Science Engineering(CSE) 2026 Mock Test Series 2026 is part of Computer Science Engineering (CSE) exam preparation. The chapters have been prepared according to the Computer Science Engineering (CSE) exam syllabus. The Chapter doubts & questions, notes, tests & MCQs are made for Computer Science Engineering (CSE) 2026 Exam. Find important definitions, questions, notes, meanings, examples, exercises, MCQs and online tests here.

Chapter doubts & questions of Computer Architecture & Organisation (CAO) - GATE Computer Science Engineering(CSE) 2026 Mock Test Series in English & Hindi are available as part of Computer Science Engineering (CSE) exam. Download more important topics, notes, lectures and mock test series for Computer Science Engineering (CSE) Exam by signing up for free.

Top Courses Computer Science Engineering (CSE)