Jump to content

Recommended Posts

Posted

Eventually, I want to be able to do some computationally-heavy modeling work, which obviously requires programming knowledge.

 

Is there a specific language that would be good for this type of interest. Where can a newb start learning about building the tools to develop skills to develop stochastic-type models?

 

Thanks!

Posted

If it requires significant speed to be useful, C is the obvious choice. C is also, in my experience, rather painful to use.

 

If you want something relatively simple that can do the modeling (just not as quickly), try Python. It's fairly simple to learn and you can probably find pre-existing Python code that'll help you do what you want. O'Reilly's Learning Python is a good place to start.

Posted

I'd recommend R. We have several very complicated modelling codes written in it... It's also quite an easy place to start.

Posted

I should explain that I don't have any modeling project in mind, per se. I just want to learn something that would be adaptable to future use. I've heard of C and python, and got a recommendation of Java from a friend of mine...

What's R, klaynos?

Posted

I still like Java as a language to start with when learning. It covers a lot of the core concepts, and has a similar syntax structure to C++ on which a number of languages are based.

It's simplified in terms of memory management, but gives you a decent foundation in OOP, multithreaded applications, and primitive datatypes. The APIs seem "over engineered" somewhat, but the documentation is all quite excellent. It's a fairly strict language (enforces specific rules one way) but that can help when you are learning a new language, because you don't have 10 different flavors of "shortcuts" to absorb while reading example code and trying to master the basics.

Posted (edited)
I should explain that I don't have any modeling project in mind, per se.

 

Learn Python. It provides a much more powerful model for describing most types of problems than C does. C requires you restate your problem in terms the CPU can understand, and "what the CPU can do" is typically an overcomplicated and error-prone model for describing most problems.

 

If you feel Python is too slow, there are still lots of alternatives to C while still affording you high level, declarative descriptions of problems. OCaml is a fast, compiled mixed-paradigm language which is used for all sorts of performance critical modeling.

 

I would recommend learning C after learning a higher level language. I think too many people start out with a language like C and end up with a "C shaped brain" that tries to model every program directly around the Von Neumann architecture. Learning a higher level language (particularly a functional one) will essentially require you relearn everything you know. I started out with a very heavy C background and it took me many years to move on. Nowadays I never write "for" loops (although they have their place in Python) and can't believe I wasted so much time writing them over and over again in C. There are better ways!

Edited by bascule
Posted
Learn Python. It provides a much more powerful model for describing most types of problems than C does. C requires you restate your problem in terms the CPU can understand, and "what the CPU can do" is typically an overcomplicated and error-prone model for describing most problems.

 

I'm curious about Python, but I've been skeptical so far - what does it do that is difficult in other languages? From what I can tell, it appears to have a lot of keywords and built in functions to handle various complex data types and has short hand syntax to perform tasks you would use functions for otherwise - but nothing you couldn't do with a library if so inclined.

So far it sort of reminds me of Perl, which is great for writing short crypto-code but for the most part is just an idiosyncratic way of doing with syntax what function patterns would do generally. Personally, I have been drawn to languages with the fewest possible keywords and syntactic rules and puts most of the functionality in well designed libraries written in the language.

I will say syntax does make a difference (passing functions as parameters for instance can't be overcome with a library) but how does python make a difference?

 

I am genuinely curious because I've heard it recommended many times. On the surface it looks like an obscure syntax structure with a lot of top-level datatypes that would be suited to an STL implementation.

Posted
I'm curious about Python, but I've been skeptical so far - what does it do that is difficult in other languages?

 

Python has a real, full-featured object model, which includes metaclasses which not only provide a nice API for singletons but are also excellent for reflection. You can define methods in the metaclass which can be invoked in the body of subclasses, allowing you to generate code on-the-fly declaratively.

 

What do you use that for? The same sort of thing you might use the factory pattern for in languages like Java. Rather than creating a FooFactory you can just call a method of the Foo metaclass which builds an appropriate object. Compare this Python pseudocode:

 

obj = Foo.create_from_bar(bar)

 

as opposed to some Java pseudocode:

 

FooFromBarFactory factory = new FooFromBarFactory(bar);

Foo obj = factory.newInstance();

 

The code is both clearer and more concise.

 

From what I can tell, it appears to have a lot of keywords and built in functions to handle various complex data types and has short hand syntax to perform tasks you would use functions for otherwise - but nothing you couldn't do with a library if so inclined.

 

Again, having first class syntax for types like maps/dicts makes code dramatically clearer.

 

So far it sort of reminds me of Perl, which is great for writing short crypto-code but for the most part is just an idiosyncratic way of doing with syntax what function patterns would do generally.

 

Shorter code should improve clarity, not diminish it. Perl was a particularly egregious case of overly terse and confusing syntax. However in a discussion about Python it's a red herring.

 

Using more code typically makes programs harder to read and more difficult to maintain. Java programs are typically ridden with tons of boilerplate code because the language lacks the features to abstract it away.

 

Personally, I have been drawn to languages with the fewest possible keywords and syntactic rules and puts most of the functionality in well designed libraries written in the language.

 

Sounds like you would love Lisp. Personally I prefer languages with rich, expressive grammars. I feel they make code both easier to write and read.

 

I am genuinely curious because I've heard it recommended many times. On the surface it looks like an obscure syntax structure with a lot of top-level datatypes that would be suited to an STL implementation.

 

Aieeeeeeeee! The STL is TERRIBLE! C++ is the only popular language which has to embed a Turing complete functional templating language, because without it the amount of boilerplate code you would have to write would boggle the mind. Templates make the language terribly confusing, and bely an inability to perform real metaprogramming like you can in a language like Python (or Ruby, Lisp, Smalltalk, Haskell, and many others)

 

"Throw more code at the problem" is a terrible attitude. Every line of code I write for work is a line I have to maintain. More lines of code means more work for myself. I would rather create as little work for myself as possible. This doesn't have to come at the price of clarity, on the contrary, fewer lines of code should improve clarity in most cases.

 

Your typical program in C++ or Java is going to be full of far more lines of code which don't express a solution to the problem but rather are just thunking around deficiencies of the language. This makes programs harder to read as you have to mentally skip over this code to read the actual solution to the problem.

 

Python gets rid of all the boilerplate. It also has a dynamic type system which further reduces boilerplate needed for type thunking. Dynamic type systems make interfaces less brittle and allow you to preserve backwards compatibility in cases which simply aren't possible with static type systems.

Posted
Python has a real, full-featured object model, which includes metaclasses which not only provide a nice API for singletons but are also excellent for reflection. You can define methods in the metaclass which can be invoked in the body of subclasses, allowing you to generate code on-the-fly declaratively.

 

Reflection is definitely a place that extra syntax can help.

What do you use that for? The same sort of thing you might use the factory pattern for in languages like Java. Rather than creating a FooFactory you can just call a method of the Foo metaclass which builds an appropriate object. Compare this Python pseudocode:

 

obj = Foo.create_from_bar(bar)

 

as opposed to some Java pseudocode:

 

FooFromBarFactory factory = new FooFromBarFactory(bar);

Foo obj = factory.newInstance();

 

The code is both clearer and more concise.

Fair enough, I suspect that I may be jaded on anything that allows for more "relaxed" definitions because of the amount of time I spend working with existing code bases - open source projects that choke any IDE's ability to deduce class types due to poor standards or strange techniques by the original programmers.

 

What do you recommend for an IDE when working with Python?

Shorter code should improve clarity, not diminish it. Perl was a particularly egregious case of overly terse and confusing syntax. However in a discussion about Python it's a red herring.

Which is fair. I drew the Perl comparison because it reminded me of that on the surface, but I haven't delved into Python enough for a real investigation.

Using more code typically makes programs harder to read and more difficult to maintain. Java programs are typically ridden with tons of boilerplate code because the language lacks the features to abstract it away.

 

Sounds like you would love Lisp. Personally I prefer languages with rich, expressive grammars. I feel they make code both easier to write and read.

It depends on the project. I find java is the easiest to understand when I have to jump into something that was coded by a dozen monkeys that used a copy of "design patterns" to line their habitat and started banging on keyboards.

Java is excessive on boilerplate and definitely makes you jump through a lot of hoops to do what is generally simple in a lot of languages.

Aieeeeeeeee! The STL is TERRIBLE! C++ is the only popular language which has to embed a Turing complete functional templating language, because without it the amount of boilerplate code you would have to write would boggle the mind. Templates make the language terribly confusing, and bely an inability to perform real metaprogramming like you can in a language like Python (or Ruby, Lisp, Smalltalk, Haskell, and many others)

The template syntax is very complex - a lot more so than I like to deal with, but it is a pretty powerful feature that makes a lot of things possible. Do you find that using templates is terrible (the way they designed templates) or the STL code base itself?

 

When it comes to grammar I like simpler ones but accept additions if they improve the power of the language, but find it often gets cluttered with what ends up being hardwired function calls.

"Throw more code at the problem" is a terrible attitude. Every line of code I write for work is a line I have to maintain. More lines of code means more work for myself. I would rather create as little work for myself as possible. This doesn't have to come at the price of clarity, on the contrary, fewer lines of code should improve clarity in most cases.

I'll have to play with it to learn more. I do want to try new languages that break from the classical mold (expand how I think in terms of programming) but I also don't want to just learn an idiosyncratic way of doing the same thing in something else.

 

Is Python a good language for that? If so, what "philosophy" should I approach it with? You've mentioned some languages in the past with novel approaches to issues such as concurrency, is Python primarily a novel approach to code reduction in terms of reducing the need for loops, strong typing and other boilerplate code?

 

I just want to investigate it without "missing the point" by just trying to learn it as "another C++/Java/Etc" and miss the goals of the language.

Posted
It depends on the project. I find java is the easiest to understand when I have to jump into something that was coded by a dozen monkeys that used a copy of "design patterns" to line their habitat and started banging on keyboards.

Java is excessive on boilerplate and definitely makes you jump through a lot of hoops to do what is generally simple in a lot of languages.

 

Lots of people turn to GoF when trying to deal with problems which arise because features do not exist in their given language to address them. The Factory pattern is perhaps the most infamous in Java, and there are some comically bad usage examples to be found, like "Factory Factories":

 

http://ws.apache.org/xmlrpc/apidocs/org/apache/xmlrpc/server/RequestProcessorFactoryFactory.html?rel=html

 

It's not like the Factory pattern doesn't have its uses. I use it in Ruby, particularly when writing test cases (I use factories to build my fixture data). Java forces you to use it far more often than you should, both because it lacks metaclasses and because it makes you declare constructors as final.

 

The template syntax is very complex - a lot more so than I like to deal with, but it is a pretty powerful feature that makes a lot of things possible. Do you find that using templates is terrible (the way they designed templates) or the STL code base itself?

 

Templates can be... ok. Boost tries to do them right. The STL is absolutely horrid.

 

For declarative code generation, which is what templates try to do to a certain extent, I think it's much nicer to use a language with first class metaprogramming. Python certainly has this. Rather than having a wacky "template" language which is its own animal, you can use Python to generate Python.

 

I do want to try new languages that break from the classical mold (expand how I think in terms of programming) but I also don't want to just learn an idiosyncratic way of doing the same thing in something else.

 

Is Python a good language for that? If so, what "philosophy" should I approach it with?

 

Python is certainly a conceptual playground for a lot of features you haven't been exposed to. I'd suggest experimenting with a functional approach, trying to use features like list comprehensions and map/filter/reduce. Avoid using for loops when you don't have to, for there are better ways!

 

You've mentioned some languages in the past with novel approaches to issues such as concurrency, is Python primarily a novel approach to code reduction in terms of reducing the need for loops, strong typing and other boilerplate code?

 

Yes to the latter, however Python has a sequential memory model. There is Stackless Python to address this, but that won't help you scale across multiple CPU cores, and Stackless Python is both slow and not a "first class" Python platform.

 

I just want to investigate it without "missing the point" by just trying to learn it as "another C++/Java/Etc" and miss the goals of the language.

 

If you really want to break out of the Java/C++ mold you should try to learn Lisp. You probably won't ever write anything useful in Lisp, but it will change your perspective on programming, and many of the ideas you pick up will translate into languages like Python (or Ruby, or any functional language)

 

I'd suggest downloading PLT Scheme:

 

http://download.plt-scheme.org/drscheme/

 

And following along with the SICP videos:

 

http://groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures/

 

You can just watch the first one, and Abelson will teach you Lisp in about an hour:

 

http://groups.csail.mit.edu/mac/classes/6.001/abelson-sussman-lectures/videos/Lecture-1a.mpg

 

Python is another language you can pick up quickly.

Posted (edited)

I learned Lisp twenty+ years ago. It changed my life, literally. That I knew Lisp, could teach others how to use it properly, and could apply it to solve some rather complex problems were motivating factors in my then-employer moving me to my current locale. (That I brought Symbolic's Lisp machine #2 with me as my dowry didn't hurt ...) Lisp teaches you to think differently, and better. The best thing to do is to learn a bunch of very different languages.

 

 

I do not particularly like Python (I loathe Python), even though you can do neat things with it ...

python.png

 

My bias is a bit personal. A coworker has attuned me to accessibility issues. I have a few Python modules that I just modified to add open/close braces as stupid comments because the Python developers reject requests to adopt braces as "overmydeadbody". I'd dump those Python modules for a more sane language if I possibly could do so. I absolutely loathe the whitespace indentation in Python. It's one of those coffee stains on the tray tables issues for me.

Edited by D H
Posted

Personally I'm a Ruby fan. I like languages with a purely expression-based grammar (like Lisp). Python breaks things apart into expressions and statements, which is a bit weird and very imperative-feeling to me. Ruby also has these things called "blocks" which are basically syntactic sugar for passing an anonymous function as an argument to another. They're completely awesome, but something you really have to use to "get".

Posted
Eventually, I want to be able to do some computationally-heavy modeling work, which obviously requires programming knowledge.

 

Is there a specific language that would be good for this type of interest. Where can a newb start learning about building the tools to develop skills to develop stochastic-type models?

 

Thanks!

 

Most computationally-heavy modeling is done in C, C++, or Fortran. Now... both C and C++ are quite messy language, but ultimately, if you're interested in scientific computing, you'll have to deal with them. C++ isn't that bad with good libraries such as Blitz++, and since Fortran95, Fortran is actually quite a nice language with a perfect syntax for array manipulations... on the other hand it's nearly dead as a general purpose programming language. Yet, to be honest, I prefer managed language such as Java or C#, they're just much safe and much more fun to use. For most tasks Java is doing just fine. From my experience Java and C# are between 0 and 200% slower than C++, which isn't a problem.

 

Also, once you understand the basic concepts of C++, you'll easily understand Java and C# and many other languages. The opposite is also true to some extend, with some basic knowledge in C (which is quite a minimalistic language) and Java, you'll be able to read the code of most projects.

 

Personally, I think the best deal in science is; Python + an object-oriented C-like language (either C++, Java or C#). Python 2.x is very elegant, very simple, and with librairies such as matplotlib and numpy/scipy you'll be able to do a lot; including publication-quality graphics with a few lines of code. Also with SciPy, Python has a very MatLabish syntax for arrays, a syntax which is both very simple and is going to be recognized by many.

 

But ! Even CPython (the fastest implementation of Python as far I as know) is ridiculously slow compared to C/C++/Java/C#. For example I wrote a simple Wright-Fisher simulation which need 1 000 000 000 random numbers generated by the Mersenne Twister. Very simple; it takes a few lines of code. With Java, C++ or C# it takes about 20-40 s, it takes about 20 minutes with Python !

 

I like the Python + C++/Java/C# approach because you have both a simple language for exploration and to generate graphics (Python), and a language for the heavy stuff. Another thing; popularity. If you look at TIOBE index, you'll see how dominant the language I named are. It matters for a scientist, it matters a lot. I would be seriously annoyed if I had to review an article with code written in Ruby, OCaml, Haskel or Lisp = because nobody use these languages in science. I think Lisp is common in AI and Haskel is some area of math, just as R is widely used in statistics/phylogenetics, but outside their niche the C++ beast dominate.

 

They might be great languages, I was actually impressed by Haskell and F# (basically an implementation of OCaml), but how many scientists will be able to understand the code ?

Posted
Most computationally-heavy modeling is done in C, C++, or Fortran. Now... both C and C++ are quite messy language, but ultimately, if you're interested in scientific computing, you'll have to deal with them.

That's the same boat I'm in. "Modern" computer languages seem to have forgotten that some of us still use computers for computing.

 

From my experience Java and C# are between 0 and 200% slower than C++, which isn't a problem.

I've never used C# (we don't do windows), and while Java is nice, it makes Matlab look fast. My experience: Implement the same algorithm in Fortran, C, C++, Matlab, Java, and Python. Fortran and C are about the same, with Fortran just a tad faster. C++, if you are careful, and if you cache a lot of things as pointers, can be almost as fast as C (but making that happen makes the code darned ugly). Without paying attention to speed, I find C++ to be about half as fast as C. Matlab and Java are, in my experience, well over an order of magnitude slower than C, and can be much, much worse. Python is just pathetic.

 

An order of magnitude or more increase in computation time means that the overnight ten thousand case Monte Carlo simulation would take a week or more to accomplish, or (more likely) that pathetically weak arguments would have to be given for reducing the number of cases to a few hundred. The week-long machine learning analysis I once did, spread over a boatload of machines to boot: Forget it.

 

Efficiency is usually the last thing I worry about in scientific computing. That nasty slowdown caused by using C++ in lieu of C or Fortran can usually be mitigated by hacking at a small portion of a very small number of computationally-expensive algorithms. I'll take C++ over C any day. There is no hacking around an order of magnitude or more slowdown.

Posted
They might be great languages, I was actually impressed by Haskell and F# (basically an implementation of OCaml), but how many scientists will be able to understand the code ?

 

Fortran, C, and C++ SUCK. Why on earth would you ever argue these languages are easier for scientists to understand?

 

Having supported an atmospheric model developed in Fortran and C for 5 years of my life, I can personally attest to the nightmares these languages bring to scientists. Scientists don't care about incompatibilities between Fortran compilers and are quite confused when they try to crank up the resolution of their grids and suddenly find their programs no longer run because they've exceeded some hardcoded maximums for array sizes in the compilers we were using.

 

As scientists are well-versed in math, and implementing scientific models which are essentially translations of mathematical functions, I would think functional languages would in fact be easier to understand.

 

Functional languages like OCaml have been immensely successful on Wall Street, where they're used for financial modeling. I think cultural reasons are to blame for why they aren't used more often in scientific computing: scientists aren't typically programmers by nature, but rather programming is something they must learn to do science.

 

Scientists don't often develop models from scratch, but work with legacy, decades old models written in Fortran, C, and C++, so there's immense inertia to stick with low level imperative languages, even though the way they model problems is much, much different from how scientists use math to model programs, and these languages are difficult to use, error prone, and require substantially more code.

 

Honestly, I think if scientists were exposed to functional programming first, they'd have a lot easier time. The only reason scientists would have a hard time understanding functional languages is because they were taught a language like C or Fortran first.

Posted

Modern computer languages SUCK when it comes to performance and when it comes to mathematical descriptions of physical processes such as the atmosphere. Trying to force scientists and engineers who model physical systems to think functionally is just wrong. That is not how they think. They think procedurally. FORTRAN was written for the scientific community and has evolved with continual oversight by the scientific community because it works well with the way they think.

 

I personally do not like FORTRAN. However, in my opinion, it is better suited to how engineers and scientists, and particularly how atmospheric scientists think, than any "modern" language.

Posted

D H,

 

I'm on linux (openSUSE) 90% of the time and I still use C# a lot with MonoDevelop, it has many of the features Java lack to be a great language (operator overloading, struct; light 'classes' created on the stack, good generics). It's actually my favorite language, it's very fast on windows and is getting faster with Mono on Linux/Mac. Still... I'm sure I'll spend most of my Ph.D. with C/C++/Fortran (hopefully Fortran).

 

Except for GUI, from my experience, Java is never an order of magnitude slower than C++ but it certainly depend on what you're doing. For example the Wright-Fisher simulation I tried with several languages ran much faster on Java than with MatLab, which is optimized for array operation but tend to suck with loops.

 

Fortran, C, and C++ SUCK. Why on earth would you ever argue these languages are easier for scientists to understand?

 

Simple: because we'll all using these languages. ecoli asked an advice and I think he would be disappointed if he learned a functional language. He would be stuck with a language very few fellow scientists would understand, and he wouldn't have the programming skills so many supervisors like (i.e.: being able to deal with C/C++). I think Java is just fine to start in computer science and it's quite easy to get from Java to C++, while OCaml and Haskell are completely different language based on a radically different paradigm which isn't widely used in science. He would be even more disappointed by Ruby, because nobody uses that in science, and it's slow as hell. We're talking about the best language to learn for a scientist, not the best general purpose language, and I think the two most important thing for a scientists are speed and popularity within the field.

 

Python is just pathetic.

 

In terms of speed, yes it is. It's still a nice language for exploration and graphics. I doubt any scientist can get away with Python 100% of the time.

 

I personally do not like FORTRAN

 

Can I know why ? I don't know that much about Fortran but it seems to have a nice syntax for array manipulations (much better than C/C++).

Posted
ecoli asked an advice and I think he would be disappointed if he learned a functional language. ... We're talking about the best language to learn for a scientist, not the best general purpose language ...
Exactly. Scientists and engineers, at least the ones I work with, do not think functionally. They tend to think procedurally -- even the ones who have never touched a line of code.

 

... and I think the two most important thing for a scientists are speed and popularity within the field

Speed is important because we do some very computationally intensive calculations. Just imagine watching the nightly news, where the weather forecaster says "Our new Ruby-based atmospheric model finally churned out an answer. We had a 50% chance of rain six months ago." Popularity is important because we have been using computers to solve problems for fifty years. We have a lot of existing solutions, some written a long, long time ago. Switching to a different language is an extremely expensive undertaking. Doing so is only justified when (a) the new language offers a *lot* of improvements and (b) it is almost impossible to hire skilled people with knowledge of the archaic language used in the legacy systems.

 

I personally do not like FORTRAN.[/quote']Can I know why ? I don't know that much about Fortran but it seems to have a nice syntax for array manipulations (much better than C/C++).

Most of the FORTRAN code I encounter (e.g., atmospheric models) is anything but modern Fortran. There is a lot of FORTRAN IV code out there in the scientific world. Admit knowledge of FORTRAN (FORTRAN became Fortran with Fortran 90) and you might well be the stuckee in interfacing with (or horrors, maintaining) that incredibly poorly written FORTRAN code.

Posted (edited)
Modern computer languages SUCK when it comes to performance

 

It depends which ones you're talking about. Functional languages like Haskell and OCaml perform just fine:

 

http://shootout.alioth.debian.org/u32/benchmark.php?test=all〈=all&box=1

 

Neither of the languages are interpreted. Both are compiled and execute as native code, and both provide unboxing for both integers and floats, making them both great for numerical computing.

 

I'm not particularly a fan of Scala as a language but I've certainly heard of it being used in scientific computing.

 

...and when it comes to mathematical descriptions of physical processes such as the atmosphere.

 

Having worked with a scientific model its operation followed this basic pattern:

 

[math]t_1 = F(t_0)[/math]

[math]t_2 = F(t_1)[/math]

[math]t_3 = F(t_2)[/math]

 

Or, that is to say, the state at each timestep is the result of a functional transformation of the previous timestep, an approach which maps quite well to functional languages.

 

Trying to force scientists and engineers who model physical systems to think functionally is just wrong. That is not how they think. They think procedurally.

 

They think procedurally because they were taught to think procedurally. Again, it's a cultural thing. And moreover, the systems being modeled are described in terms of pure functions, are they not? Is it honestly easier to take a declarative description in the form of pure functions and translate it into an imperative procedure? Or is that just what everyone is used to...

 

FORTRAN was written for the scientific community

 

Yes, and COBOL was written for the business community, but even they eventually figured out it's retarded.


Merged post follows:

Consecutive posts merged
Except for GUI, from my experience, Java is never an order of magnitude slower than C++ but it certainly depend on what you're doing.

 

Yes, depending on what you're doing Java can be faster than C++ by virtue of its ability to automatically profile and recompile itself at runtime.

 

Simple: because we'll all using these languages. ecoli asked an advice and I think he would be disappointed if he learned a functional language. He would be stuck with a language very few fellow scientists would understand

 

And that's exactly what I said above: scientists are stuck in the 1980s when it comes to programming languages, simply because there's so much inertia dragging them down into low level imperative land.

 

I think Java is just fine to start in computer science

 

Fortunately I've seen gradual movement away from Java. My university was very much a Java school in the past but has since moved on to become far more polyglot oriented. MIT is switching to Python for the introduction to programming classes.

 

Java is, sadly, in an "uncanny valley" of abstraction: not high level enough to model problems well, and not low level enough that students actually understand what's going on behind the scenes. This makes Java a particularly bad language to start out with.

 

Speed is important because we do some very computationally intensive calculations.

 

It's easy to think of modeling speed in terms of solely execution performance, but there are many hidden time costs to consider.

 

Can your model resume where it left off if it crashes? If not, all the time that was spent running the model is wasted every time it crashes.

 

How long is the debugging cycle? When you buy a new cluster, how much time is spent getting scientists up-to-speed on it? How often do you switch compilers, and how much time does it take to work out the kinks whenever making a compiler change?

 

How long does it take to add new features to the model, or merge existing codebases?

 

Where are the bottlenecks in your model, and how difficult is it to address them? One model I worked on had sinusoidal CPU usage because it would go between alternating periods of being CPU bound and I/O bound. Nobody could fix this because nobody knew how: the codebase was a gigantic mound of spaghetti and trying to make changes to core behavior was horrible black voodoo.

 

This is the traditional "I'll write it in C... FOR SPEED!" trap, when really the amortized speed is a lot slower thanks to the amount of time it takes to address problems stemming from the language's low-level nature.

 

"Our new Ruby-based atmospheric model finally churned out an answer. We had a 50% chance of rain six months ago."

 

You can laugh at Ruby if you want and try to argue it has no place in scientific computing, but really it's an issue of right-tool-for-the-job. While I would never advocate it or any other interpreted language for numerical computing (or even a JITed language with boxed integers/floats), Ruby is an excellent automation language.

 

These maps are generated using a system which was automated using Ruby:

 

http://www.ngdc.noaa.gov/dmsp/night_light_posters.html

 

I used Ruby frequently at my previous (scientific computing) job as an automation language for everything from scheduling jobs and letting scientists automate parameter tweaking for their runs to serving the data we generated through our web site.

 

Again, right tool for the job.

 

Popularity is important because we have been using computers to solve problems for fifty years. We have a lot of existing solutions, some written a long, long time ago. Switching to a different language is an extremely expensive undertaking. Doing so is only justified when (a) the new language offers a *lot* of improvements and (b) it is almost impossible to hire skilled people with knowledge of the archaic language used in the legacy systems.

 

Yep, and that's exactly what I said before:

 

Scientists don't often develop models from scratch, but work with legacy, decades old models written in Fortran, C, and C++, so there's immense inertia to stick with low level imperative languages, even though the way they model problems is much, much different from how scientists use math to model programs, and these languages are difficult to use, error prone, and require substantially more code.

 

Understand that this is the only good argument for using Fortran. You use Fortran because it's what everyone else is and has been using and what your legacy code is written in.

 

It's not as if you have magical requirements that are different from anywhere else in the computing world which Fortran as a language fulfills. You're just doing numerical computing, and for that there are any number of languages that fit the bill. Financial analysts do numerical computing too, and guess what they're using? C++ and OCaml. Fortran is an old, archaic language riddled with a number of problems. No computer scientist working outside of scientific computing would ever even consider Fortran for new development. I think most competent computer scientists would do everything in their power to avoid Fortran.

 

All that said:

 

I can understand the rationale for Fortran (culture and legacy code). I can understand the rationale for C++ (higher level identity modeling and avoiding pointer hell). But if you're a scientist programming directly in C, sorry, you're just being silly. Find a better language, like this one perhaps:

 

http://www.sac-home.org/

Edited by bascule
Consecutive posts merged.
Posted
It depends which ones you're talking about. Functional languages like Haskell and OCaml perform just fine:

 

http://shootout.alioth.debian.org/u32/benchmark.php?test=all〈=all&box=1

Surely you jest. Those gross statistics include several programs that are at best peripheral to scientific computing. Haskell even beat C/C++ on a few of those, particular those that are heavy on threads. When you look at the benchmarks that are not peripheral to scientific computing you get a completely different picture. Take one that is near and dear to me: the n-body problem. C++ beats Haskell by nearly a factor of 3, Ruby by a factor of 58, Python by a factor of 73. To boot, this is using the very simple symplectic Euler integration scheme with spherical gravity models. If you look at the Haskell and Ruby and Pascal code, it looks gasp procedural. How much more procedural would the Haskell code look if you had to use a non-simplistic integration scheme and spherical harmonics to represent gravity? How much slower would they be compared to a language suited to scientific computing?

 

They think procedurally because they were taught to think procedurally. Again, it's a cultural thing.

Exactly, and you are not going to fight that culture. Scientists and engineers who never touch a line of code in their lives think procedurally. Scientists and engineers who program are first and foremost scientists and engineers.

 

My employers over the past 30 years have uniformly found that it is generally a bad idea to hire computer scientists for anything by computer science type work because computer science majors, for the most part, are incapable of thinking like a physical scientist or engineer. It is a cultural thing.

Posted (edited)
When you look at the benchmarks that are not peripheral to scientific computing you get a completely different picture. Take one that is near and dear to me: the n-body problem.

 

http://shootout.alioth.debian.org/u32/benchmark.php?test=nbody〈=all

 

C++ beats Haskell by nearly a factor of 3

 

Interesting to see that Fortran only barely eeks out a lead over Scala... Haskell is only a little more than half as slow as Fortran.

 

Ruby by a factor of 58, Python by a factor of 73.

 

To reiterate from my previous post:

 

I would never advocate [Ruby'] or any other interpreted language for numerical computing (or even a JITed language with boxed integers/floats)

 

To boot, this is using the very simple symplectic Euler integration scheme with spherical gravity models. If you look at the Haskell and Ruby and Pascal code, it looks gasp[/i'] procedural.

 

Well, Haskell was the only language you listed which I was advocating for numerical computing, so let's look at some of that:

 

do
   m <- foldr (.+.) (Vec 0 0 0)
            `fmap` (mapM momentum
                  . take (nbodies - 1)
                  . iterate next $ next planets)

   setVec (vel planets) $ (-1/solar_mass) *. m
 where
   momentum !p = liftM2 (*.) (mass p) (getVec (vel p))

 

Hmm, maps and folds... not particularly procedural at all. In fact, that's pretty much some quintessential functional programming right there.

 

Colloquially (thanks to Google) this approach is typically described as a "MapReduce" and is an extremely easy operation to parallelize.

 

There are ways to write imperative/procedural type code in Haskell, using things like the state monad. This program does not use them.


Merged post follows:

Consecutive posts merged

Here's an interesting success story using Haskell for scientific modeling:

 

http://www.haskell.org/pipermail/glasgow-haskell-users/2009-April/017050.html

Edited by bascule
Posted

Probably not. Bascule hates useful languages. He has been fully indoctrinated.

 

A few questions to help you narrow things down, ecoli:

  • What domain are you interested in? I do not mean stochastic modeling; that is far too broad. I mean something like atmospheric modeling, chemical modeling, biological systems, ...
  • Academia or industry?
  • Does one, maybe two, languages dominate in that field? If so, you know what you eventually need to learn.

 

It would also behoove you to learn something of the art of computer programming. Scientists and engineers for the most part are quite lousy at programming because they have either learned it on their own or have learned it with the aid of other (equally inept) scientists and engineers.

Posted
Probably not. Bascule hates useful languages.

 

I'm a fan of writing correct programs quickly.

 

Utility is in the eye of the beholder. Ever use YouTube? You can thank Python for that one.

 

One of the more interesting apps I've seen demonstrated at our local Ruby group is a web application for controlling the power grid. That's some utility... literally.

 

He has been fully indoctrinated.

 

I'm a polyglot who doesn't fall victim to the Blub Paradox:

 

As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to. But when our hypothetical Blub programmer looks in the other direction, up the power continuum, he doesn't realize he's looking up. What he sees are merely weird languages. He probably considers them about equivalent in power to Blub, but with all this other hairy stuff thrown in as well. Blub is good enough for him, because he thinks in Blub.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.