Computer Architecture

October 27, 201213 yr

Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has a width of 32, and each SIMD processor contains 8 lanes for single-precision arithmetic and load/store instructions, meaning that each non-diverged SIMD instruction can produce 32 results every 4 cycles. Assume a kernel that has divergent branches that causes on average 80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-precision arithmetic and 20% are load/store. Since not all memory latencies are covered, assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock speed of 1.5 GHz.

a. Compute the throughput, in GLOPS/sec for this kernel on this GPU.

Everyone in my study group is having problems with this one. The book does not go anywhere near into this much detail in the few pages of the section that covers this, so we are lost. If anyone could possibly point us in the right direction as to how to approach this problem, we would greatly appreciate it.

November 13, 201213 yr

I am also having difficulty solving this problem. Does anyone know how to proceed with this problem?

Sign In

Computer Architecture

Featured Replies

Archived

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)