Tuesday, November 10, 2009

Mark Chu-Carroll on Haskell

Simon Peyton JonesImage via Wikipedia

Always a sign of "programming maturity" when a programmer can readily admit the serious short-comings of their preferred language. It raises the respect for the programmer and the language discussed.
Philosophizing about Programming; or "Why I'm learning to love functional programming" : Good Math, Bad Math: "

But it's not all good. Haskell has some serious problems. In particular, it's got two issues that worry me enough that I'm still a bit hesitant to recommend it for a lot of applications. Those two are what I call lazy confusion, and monad complexity.

By lazy confusion, I mean that it's often extremely difficult to predict what's going to happen in what order in a Haskell program. You can say what the result will be, but you can't necessarily say what order the steps will happen in. That's because Haskell uses lazy evaluation, which means that no computation in Haskell is really evaluated until its result is used. You can write Haskell programs that generate infinitely long lists -but it's not a problem, because no element of the list is ever evaluated until you try to use it, and you'll never use more that a finite number of elements. But lazy evaluation can be very confusing: even Haskell experts - even people who've implemented Haskell compilers! - sometimes have trouble predicting what code will be executed in what order. In order to figure out the computational complexity of algorithms or operations on data structures, people often wind up basically treating the program as if it were going to be evaluated eagerly - because analyzing the laziness is just too difficult. Laziness is not a bad thing; in fact, I'm pretty convinced that very frequently, it's a good thing, which can make code much cleaner and clearer. But the difficulty of analyzing it is a major concern.


Wired plug board for an IBM 402 Accounting Mac...Image via Wikipedia

Monad complexity is a very different problem. In Haskell, most code is completely stateless. It's a pure functional language, so most code can't possibly have side effects. There's no assignments, no I/O, nothing but pure functions in most Haskell code. But state is absolutely essential. To quote Simon Peyton-Jones, one of the designers of Haskell: "In the end, any program must manipulate state. A program that has no side effects whatsoever is a kind of black box. All you can tell is that the box gets hotter." The way that Haskell gets around that is with a very elegant concept called a monad. A monad is a construct in the program that allows you to create an element of state, and transparently pass it through a sequence of computations. This gives you functional semantics for a stateful computation, without having to write tons of code to pass the state around.


The reason that that's a problem is that there are multiple different monads, to represent different kinds of state. There are monads for mutable arrays - so that you can write efficient matrix code. There are monads for parsing, so that you can write beautiful parsers. There are monads for IO, so that you can interact with the outside world. There are monads for interacting with external libraries written in non-functional libraries. There are monads for building graphical UIs. But each of them has a packet of state that needs to be passed between the steps. So if you want to be able to do more than one monadic thing - like, say, write a program with a GUI that can also read and write files - you need to be able to combine monads. And the more monads you need to combine, the more complicated and confusing things can get.

Python took the idea of "generators", added the easy syntax of the "yield" statement, and, most importantly, let the generator be a first-class object. It ended up more powerful than the language that generators was inspired from, the Icon programming language. Maybe a multi-paridigm language like Python - where the ideas of "state handling/managing", "across-process global/intra-process global state handling/managing", and ways of expressing different forms of lazy evaluation are added - and none of these are required and all of these are available - and all able to be wrapped up in a first class object - would be the correct approach.

Reblog this post [with Zemanta]

No comments: