loveislonely Posted August 7, 2008 Posted August 7, 2008 Hi there, I am using Gaussian doing a calculation, which calls subroutine DGEMM to operate a matrix multiplication. And I am sure the DGEMM has been parallelized. When I ran a test job on a node with 4 processors, the speed up is very good, about 3.6 times faster compared to the speed of serial running. Then I thought if I change it to 8 processors (the limit of the node is 8 processors), it should be much faster. However, the output confused me. When I ran the job with 8 processors, the speed is about the same as running the job serially. I am really confused:confused:, does any one know how to solve this? Thank you so much.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now