I hope this isn't to over-the-top for this forum. I have some comments on the following example from a book of mine on parallel computing:
Consider a computation running on a machine with a 1 GHz clock, 4-word cache line, single cycle accesss to the cache, and 100 ns latency to DRAM. The computation has a cache hit ration of 1 KB of 25% and 32 KB of 90%.
Consider two cases: first, a single threaded execution in which the entire cache is available to the serial context, and second, a multithreaded execution with 32 threads where each thread has a cache residency of 1 KB. If the computation makes one data request in every cycle of 1 ns, in the first case the bandwidth requirement to DRAM is one word every 10 ns since the other words come from the cache (90% cache hit ratio). This corresponds to a bandwidth of 400 MB/s. In the second case, the bandwidth requirement to DRAM increases to three words every four cycles of each thread (25% cache hit ratio). Assuming that all threads exhibit similar cache behavior, this corresponds to 0.75 words/ns, or 3 GB/s.
My basic comment is on the initial information about the computer in the example. They include irrelevant information such as the cache line and the latency of memory but forgot to include what the size of each word is in bytes! 0.75 words/ns = 0.75 gigawords/s or 750 megawords/s. If 750 megawords is equivalent to 3 gigabytes, then each word is 4 bytes. Judging by this example, do you think this books is weird? I think it is. I will post another weird example from this book after I get some feedback here.