Jump to content

Recommended Posts

Posted

Hi there, I am using Gaussian doing a calculation, which calls subroutine DGEMM to operate a matrix multiplication. And I am sure the DGEMM has been parallelized. When I ran a test job on a node with 4 processors, the speed up is very good, about 3.6 times faster compared to the speed of serial running. Then I thought if I change it to 8 processors (the limit of the node is 8 processors), it should be much faster. However, the output confused me. When I ran the job with 8 processors, the speed is about the same as running the job serially. I am really confused:confused:, does any one know how to solve this? Thank you so much.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.