Jump to content

Recommended Posts

Posted

Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has a width of 32, and each SIMD processor contains 8 lanes for single-precision arithmetic and load/store instructions, meaning that each non-diverged SIMD instruction can produce 32 results every 4 cycles. Assume a kernel that has divergent branches that causes on average 80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-precision arithmetic and 20% are load/store. Since not all memory latencies are covered, assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock speed of 1.5 GHz.

 

a. Compute the throughput, in GLOPS/sec for this kernel on this GPU.

 

Everyone in my study group is having problems with this one. The book does not go anywhere near into this much detail in the few pages of the section that covers this, so we are lost. If anyone could possibly point us in the right direction as to how to approach this problem, we would greatly appreciate it.

  • 3 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.