Jump to content

Recommended Posts

Posted

Concurrency is perhaps the foremost issue today's programmers must learn to cope with. CPU designers have exhausted the level of optimization they can provide to programmers working with a single thread of execution, and more modern designs take transistors which would've ordinarily gone to improve sequential performance and have instead thrown it at more cores. Intel demoed a CPU with 80 cores, yet one third of the transistors seen in a Core 2 Duo. This is the future of processors: more cores that do less. CPU designers are going to devote fewer transistors to each individual core while trying to pack more and more cores onto a single chip. Some crazy people are predicting that the number of CPU cores we see on a single chip is going to grow exponentially. I don't know about that, but it's a reasonable enough guess as any at this point, and it seems like it's where the trends are going.

 

This is posing a big problem for programmers. Now that all of these cores are available, how do you write programs that can make use of them all? The typical way most programmers are used to dealing with this problem is with threads. Threads let you spread your program's execution across multiple CPU cores, which is great! But threads are error-prone. The synchronization is as hard to get right as manual memory management. Pay attention and use good strategies and it's manageable. Put it in the hands of mediocre programmers and things can quickly fall apart. And once they do it can be incredibly hard to debug.

 

But it gets worse. Let's assume you've managed to use threads correctly in your program. Great! How well can it scale across an increasing number of CPU cores? The synchronization model employed by most programs uses a locking mechanism to synchronize state. To get a threaded program to scale well, locks must be scrutinized, benchmarked, and made increasingly granular in order for threaded programs to scale well. While this is all well and doable, granular locking is complicated, and the reality is most programs trying to pull it off aren't going to do it right, and for that reason many programmers may avoid it entirely.

 

So how well will programmers be able to leverage these new multicore CPUs? That question remains up in the air. Threaded approaches are difficult at best, and often hard to scale across an increasing number of CPU cores, especially when complex data structures are involved. For that reason some perceive that programmers are in a sort of crisis in regard to how they will leverage an increasing number of CPU cores.

 

Some solutions to this problem have emerged. One that gets a lot of attention is software transactional memory. This is a model which looks at memory in a manner similar to a database, using transactions to control concurrent access to shared state. Unfortunately this model has failed to achieve widespread popularity. Theories abound as to what exactly the problem is, generally surrounding its difficulty to apply to real-world problems. So far, this approach to concurrency remains in the collective unconscious as a theoretical one which is rarely applied.

 

Another approach which has received a considerable amount of hype in certain communities is "Erlang style-concurrency." Erlang is a programming language originally developed for telephony applications which uses lots of small processes that each do one thing and do it well, and talk to each other over message queues. Some liken this to the Unix philosophy: "Write programs that do one thing and do it well. Write programs to work together." However the actual approach of using processes which communicate with messages is known as the Actor model which grew out of Smalltalk.

 

Erlang processes have lots of nice properties which make concurrency easy. Like OS processes they run simultaneously and the Erlang runtime can distribute them across all available CPU cores with no additional work on the programmer's part. They are pre-emptable: if a process is doing something computationally intensive then after a certain number of "reductions" (e.g. function calls) the Erlang scheduler will pre-empt it and let another process to run. Erlang processes don't share state and can be garbage collected independent of one another. This means there's no "stop the world" condition with the garbage collector. But also, by not sharing state synchronization gets much easier. Now you don't have to synchronize access to shared state, instead you need to synchronize the way your program behaves, which is a far easier problem.

 

What I would like to see is an object model which works like the Erlang process and messaging model. This approach would mark a rapid departure from most other object oriented languages. Most object oriented languages have a concurrency primitive like threads which are completely divorced from objects. Threads give you no guarantees about how state is managed or synchronized, and threads have no knowledge of method invocation. All objects are concurrent and synchronize around the method dispatch protocol. You can think of each object as being its own thread, except each object's state is independent of all of the others, so there's no need for semaphores or mutexes or critical sections. All synchronization is handled through message dispatch itself, and thanks to the shared-nothing process architecture you only need to worry about synchronizing behavior, not state.

Posted

Interesting post, a good summary of the current situation and your ideas about Erlang sound intriguing.

 

You know, if you whipped together some sort of project to test some concurrency ideas, I'm sure we could find some volunteers here to run your program on a set of test data and report back their calculation times and hardware configurations for comparison. You could have the program run a set of single-core and multi-core benchmarks.

Posted
You know, if you whipped together some sort of project to test some concurrency ideas, I'm sure we could find some volunteers here to run your program on a set of test data and report back their calculation times and hardware configurations for comparison. You could have the program run a set of single-core and multi-core benchmarks.

 

I have a project in the works, but it's nowhere near ready for performance benchmarks

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.