PhDP Posted July 18, 2008 Posted July 18, 2008 I'm looking for a good introduction to C/C++, either a book, an ebook, a website, anything. I already know JAVA and Python, so I'm not completely new to programming.
bascule Posted July 18, 2008 Posted July 18, 2008 First you should really decide if you want to learn C or C++. They're pretty different languages, and trying to use C features in C++ programs (namely pointers) is quite likely to make them unstable and buggy. But before going down the C++ route, I strongly recommend reading Linus Torvalds rant on the matter: http://thread.gmane.org/gmane.comp.version-control.git/57643/focus=57918 All this said, I would suggest either using C (for a smaller program with interfacing requirements with other C/C++ libraries), or using something more modern than C++ such as Java or C#. All that said, back to your original question! I'd recommend the book by C's creator, The C Programming Language. The style of C the book teaches (as in its visual formatting style, known as K&R for the authors) is considered passe nowadays, but everything else they teach is still as applicable as it was 30 years ago. 1
PhDP Posted July 18, 2008 Author Posted July 18, 2008 I read Linus' post, but I don't know enough about computer science to say if what he says is true, and anyway, it doesn't seem relevant to me. I'm only interested in scientific computing (simulations, solving math problems). and I'm especially concerned with relatively simple programs with for/while loops. I want to use Python as my primary language, but I would also like to know how to handle a faster language. I know C and C++ are different, but they share a very similar syntax and for what I want to do they are basically the same. C/C++ seems to be the way to go (although Fortran is a serious alternative), it's widely used, fast, and it can be used with my beloved Python (go Cython!). AFAIK, both JAVA and C# are not widely used for scientific computing.
Pangloss Posted July 18, 2008 Posted July 18, 2008 I'm not sure I understand why managed-code languages aren't used for at least general/casual scientific programming, but it's my understanding that that's typically the case as well. Perhaps it's just the fact that there's not all that much in the managed code libraries to make up for the stigma of using managed code.
bascule Posted July 18, 2008 Posted July 18, 2008 I worked on two different projects when I was employed in scientific computing. The main project (which was actually a set of several different projects we were trying to hook together) was primarily Fortran 90. The other project was written in C++. Both projects were fairly nightmarish from a maintenance standpoint. With the Fortran project I found the main intractable problems stemmed from compiler problems. The compilers had odd internal limits on array sizes and would often produce invalid code or crash (at compile time) if these limits were exceeded. We eventually moved to better compilers, which was a rather painful process as it required much revamping of the build toolchain. C++ was similarly nightmarish due to its weirdness. I remember spending an entire week helping one of our programmers work his way through a template bug in his code. Tracing the problem was extraordinarily difficult as the error was occurring in the STL after going through layer upon layer of the modeling framework. I'm incredibly surprised Java hasn't become more popular for scientific computing. While I typically stick my nose up at Java, it does represent a massive improvement over C, C++, and Fortran in terms of maintainability and ease-of-use, while sacrificing relatively little speed. Java typically runs half as slow as equivalent C code. Running half as slow as C may seem like a big deal until you realize that the speed at which the model runs is irrelevant until you actually run it. I found with our projects that large development delays cost substantially more time than the speed of the models. Perhaps the main difference is that these are atmospheric models and were constantly being added to. All that said, I think it's really terrible how a language as bad as Fortran has a virtual stranglehold on the scientific community, with C++ being a close runner up. One of the things I learned rather quickly was that scientists (at least the one I worked with) are not programmers and only learned programming to accomplish the science they were interested in. This meant dealing with mundane issues like stack overflows, compiler limits/bugs, and linking errors were often intractable problems to them which got in the way of science. These are the sort of problems I ended up dealing with on a regular basis, and ones which made me long for a higher level language. If I were starting a brand new scientific computing project today, with scientists who had never programmed before, I would absolutely pick a functional language. Functional languages like Haskell and OCaml are very fast, certainly on par with Java. But the real win, in my mind, is that I think these languages do a much better job of matching the mental model of scientists than imperative languages do. Functions written in these languages much more closely match the mathematical descriptions of the same systems on paper. There's less mental translation a scientist must do in his/her head to express what they can do on paper in a computer. Furthermore, the issues of distribution and concurrency (typically solved with pathetically low level packages like MPI in Fortran/C programs) are solved at a substantially higher level in these languages. MPI was an enormous source of issues at my job... we tried many different MPI libraries such as mpich and lam-mpi, and all of them had nasty issues, not to mention if any one node in our cluster crashed it took down the entire model. This is a terrible problem for any distributed computing system, as the harsh reality of the situation is that computers break all the time, and the more computers you have in your cluster, the greater the chance of a given one breaking. Distributed programs like scientific models should really be fault-tolerant, as otherwise you'll find yourself wasting enormous amounts of time rerunning your model after each and every hardware failure.
PhDP Posted July 19, 2008 Author Posted July 19, 2008 ...but scientific computing is mostly about two things; clarity & speed. If you're looking for clarity, you will likely end up with MatLab/Octave, Python, or some similar very high level language. If you're looking for speed, you're going to use either C/C++ or Fortran. For most of its history, JAVA was much slower than C/C++/Fortran, and considering that it's not much clearer, it's easy to understand why it was never really popular with the scientific community. Now that it's getting faster, it might get more popular, but personally I prefer to stick with Python and use Pyrex/CPython when I need more speed.
bascule Posted July 19, 2008 Posted July 19, 2008 .If you're looking for clarity, you will likely end up with MatLab/Octave, Python, or some similar very high level language. If you're looking for speed, you're going to use either C/C++ or Fortran. This isn't an either/or situation. CPython is incredibly slow... roughly some 20x slower than C. Even Psycho only gets you down to about 6X slower than C and it's not exactly for primetime. This is just the nature of dynamic languages... they're incredibly hard to optimize for performance. Functional languages like Haskell and OCaml are arguably higher level than Python, and also faster... both exist in the ~2X slower than C range, more or less on par with Java. Haskell and OCaml both employ static typing with type inference, a happy medium between the ease of use of dynamic languages and the performance of static ones. So you can have your cake and eat it too...
PhDP Posted July 20, 2008 Author Posted July 20, 2008 (edited) It depends on how you see it, Python IS slow compared to C, but it's as fast as one of the most popular "language"; MatLab. Also, with Numpy/Scipy, Python is very similar to MatLab, which is a very important advantage. Also, with Cython (or Pyrex), which is basically a way to use C within Python, you can get about as fast as C. I don't want to start one of those "my language is stronger than yours" debate, but for scientists like me, Haskell is not very attractive. The language is very different from everything I've seen, it's not as flexible as Python (which support functional programming btw) and, most importantly, it's not very popular, so I would not be able to send the source code to my colleagues, and I will certainly not find wonderful libraries like this one. Edited July 20, 2008 by PhDP
bascule Posted July 20, 2008 Posted July 20, 2008 Also, with Cython (or Pyrex), which is basically a way to use C within Python, you can get about as fast as C. Manually hotspotting in C is certainly a technique I use, however that said I must warn you it's a particularly error-prone one. Interfacing between a language like Python and C is fraught with the potential for errors, even more so than a pure C program as the toolchain available for the latter is substantially more comprehensive. Tracing down bugs in C extensions for dynamic languages is just asking for trouble, so be sure you're prepared before going down this road. The main argument against manual hotspotting, of course, is that languages with better compilers / runtimes can do this automatically without forcing the programmer to go down to the C level and introduce hard-to-debug errors into their programs. In a CPU bound application, given the choice between automatic or manual hotspotting, it really seems silly to me to try to do it by hand. This will eat up considerable development cycles which are better spent elsewhere, such as getting the model correct to begin with. Haskell is not very attractive. The language is very different from everything I've seen I'm curious if you've looked at R. R is a language which has gained considerable popularity in the scientific computing community, and bears a number of uncanny similarities to Haskell (particularly lazy evaluation). However, this is certainly a fairly common complaint against Haskell. In that regard, I'd recommend OCaml. While it's generally slightly slower than Haskell, it incorporates a number of imperative idioms which make it easier for programmers who are familiar with imperative languages to transition to OCaml. Also, in many cases OCaml will come out on top in regard to performance. it's not as flexible as Python (which support functional programming btw) If you're looking for a flexible language, Python would be one of my last choices, although I agree with your point: Haskell is an even more inflexible language than Python. Python is a language which has been designed with a single idiomatic style in mind and in that regard flexibility has been completely tossed out the window. In areas where there are multiple approaches to solving the same problem, Guido, the language's creator, has been fairly vocal about doing away with redundant approaches and picking a single idiomatic style. In that regard, the language is extremely inflexible. While Python is a language which originally began with quite a bit of functional ornamentation, the push by Guido lately has been to eliminate it. Furthermore, in scientific computing using Python's functional style may gain you programs which more closely resemble the underlying mathematics of the science they're expressing, but at a cost of performance. Your best best for high performance Python, short of reimplementing your hotspots in C, is to write your code in the most imperative manner possible. That said, by using a language like OCaml (or Haskell) you can retain the clarity of a functional approach while actually gaining performance. Again, the performance of OCaml or Haskell is literally an order of magnitude above strictly imperative performance-oriented Python (2X slower than C vs 20X slower than C). Functional Python is generally going to perform much worse than 20X slower than C. most importantly, it's not very popular, so I would not be able to send the source code to my colleagues, and I will certainly not find wonderful libraries like this one. This is certainly a valid criticism and perhaps the main reason why it's probably not pragmatic to push functional languages within a scientific computing environment. It would require scientists to learn a new set of idioms, idioms which had they been exposed to them at first would probably have saved everyone a lot of time and grief, but sadly that's not the case at all and moving to a functional approach would cause more harm than good, at least for the short term. This is a transition I have only seen in the financial sector. Languages like OCaml (for analysis) and Erlang (for messaging) have seen considerable use on Wall Street (Jane Street being one of the biggest success stories). I think, perhaps, the financial sector is better at doing a cost benefits analysis of switching to a functional environment and determined the long term benefits outweigh the short term costs of transitioning. All this said, don't get me wrong: modern dynamic scripting languages like Python are by far my favorite language family. However, in my present job I do not typically deal with what are primarily CPU bound problems in the way I did when I was in scientific computing. Our problems are largely I/O or database bound, and in that regard, a dynamic scripting language is wonderful. I also have great hope that better compiler algorithms can speed up dynamic language execution substantially. Static type inferencing could bring language like Python to the same levels of performance as languages like Haskell, OCaml, or Java. Check out Starkiller, a static type inferencing engine for Python: http://web.mit.edu/msalib/www/urop/
PhDP Posted July 22, 2008 Author Posted July 22, 2008 (edited) First thank you very much for all the info you give me, I appreciate it. I must admit I never seriously considered functional languages such as Haskell, but you convinced me to take a serious look at it (but later, for now C/C++ is on the top of my list). Manually hotspotting in C is certainly a technique I use, however that said I must warn you it's a particularly error-prone one. Interfacing between a language like Python and C is fraught with the potential for errors, even more so than a pure C program as the toolchain available for the latter is substantially more comprehensive. Honestly, I'm not even sure I will use C within Python, I mostly want to learn C/C++ because of the speed and the wonderful libraries available for mathematics. Using hotspotting is certainly something I consider, but I don't know much about it, also, the structure of the programs I have to write is generally simple, so it would probably be easier to write the entire program in C/C++. Something is certain; most of the simulations I have to write involve a few lines where the program spend 95%+ of its time, for example; for i in range(0, y): f,t=a0,0 while ((f>0)&(f<N)): a = 0 p = ((1+s)*f)/((1+s)*f+N-f) [color="Red"] for k in range(0, N): if r() < p: a += 1[/color] f=a t+=1 if f==N: fix+=1 F=hstack((F,t)) elif f==0: ext+=1 E=hstack((E,t)) ...it's a small Python program to estimate the % and time of fixation/extinction of an allele with drift + selection, using the Wright-Fisher model. Very often, both N and i are much greater than 1000 (and the while loop can easily be repeated more than 1000 times/simulations). I would be curious to see how fast this code would be if everything remained in Python BUT the few lines in red. I'm curious if you've looked at R. I heard about it, but I never tried it. If you're looking for a flexible language, Python would be one of my last choices, although I agree with your point: Haskell is an even more inflexible language than Python. Flexible... at least from the perspective of someone 'raised' with Maple & MatLab. Edited July 22, 2008 by PhDP
bascule Posted July 22, 2008 Posted July 22, 2008 I would be curious to see how fast this code would be if everything remained in Python BUT the few lines in red. One thing's certain... it wouldn't be building lists every iteration solely for the purpose of iterating them (only to have them garbage collected afterwards). I've wondered why Python does it that way, or if there's some magic when using ranges with for loops. When I talk about hotspotting I mean finding hotspots in your code, such as the one you pointed out in red. These are generally tight loops that, if compiled to C code, can live completely in your CPU's cache. There's lots of tools for speeding up hotspots, such as Pyrex as you've mentioned. I'm not sure how complex your model is, but in ours, we had tens of thousands of these sorts of loops scattered across dozens of modules. If you can avoid using pointers, C++ may not be a bad choice depending on the size of your model.
Severian Posted August 12, 2008 Posted August 12, 2008 I was surprised to read Linus Torvalds's rant against C++, mainly because I completely agree with him but have been a bit shy to say so simply because of the weight of all the specialists who love to use C++. I must say, I really hate classes with a vengeance because they are the antithesis of the way I program. I need my codes to be very modular, not so other people can use them, but purely for my own sanity. Whenever I made a class in C++, I would find that I wrote the class in a way which helped solve the current problem I was working on. But then when I wanted to use the class for another problem I would find that I had to alter the class considerably since I was looking at the objects from a different angle. In fundamental physics there is no way around this since you often don't appreciate the other angle until you are in the research. As a result, I have abandoned classes, and tend to program entirely in C instead of C++. The thing I hate most about C is the way it handles arrays as pointers. I actually really like Python - it seems to force structures that are very much aligned with the way scientists think. But I tend not to use it, mainly for compatibility issues.
bascule Posted August 12, 2008 Posted August 12, 2008 I must say, I really hate classes with a vengeance because they are the antithesis of the way I program. I need my codes to be very modular, not so other people can use them, but purely for my own sanity. Whenever I made a class in C++, I would find that I wrote the class in a way which helped solve the current problem I was working on. But then when I wanted to use the class for another problem I would find that I had to alter the class considerably since I was looking at the objects from a different angle. This is a fairly frequent experience in OOP, and among other things has lead to "design patterns" for building object oriented systems. While there are some design patterns I find helpful, many seem to work around the absence of language features, and those I find rather annoying. That said I find myself refactoring quite often, especially when building systems from the ground-up without using frameworks. This involves changing my test cases for my APIs to use the new interface I have in mind, then changing the existant interface so the test cases pass again. It's just one of those things you have to do when building object oriented systems. That said, you might find functional languages nice. They probably do a much better job of allowing you to break down problems in the ways you're familiar with. I actually really like Python - it seems to force structures that are very much aligned with the way scientists think. But I tend not to use it, mainly for compatibility issues. I've noticed Python has grown in popularity in the scientific computing community, and that's rather nice to see.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now