Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Saturday, June 26, 2010

PyFilesystem and "filepath": abstracting the file-system in Python

PyFilesystem 0.3 released

Will McGugan: I am pleased to announce a new version of PyFilesystem (0.3), which is a Python module that provides a common interface to many kinds of filesystem. Basically it provides a way of working with files and directories that is exactly the same, regardless of how and where the file information is stored. Even if you don't plan on working with anything other than the files and directories on your hard-drive, PyFilesystem can simplify your code and reduce the potential of error.

PyFilesystem is a joint effort by myself and Ryan Kelly, who has created a number of new FS implementations such as Amazon S3 support and Secure FTP, and some pretty cool features such as FUSE support and Django storage integration.

[MMMG: Compare this to "filepath 0.1" from Jp Calderone]

http://jcalderone.livejournal.com/56137.html

Jp Calderone: I'm happy to announce the initial release of filepath.

filepath is an abstract interface to the filesystem. It provides APIs for path name manipulation and for inspecting and modifying the filesystem (for example, renaming files, reading from them, etc). filepath's APIs are intended to be easier than those of the standard library os.path module to use correctly and safely.

filepath is a re-packaging of the twisted.python.filepath module independent from Twisted (except for the test suite which still depends on Twisted Trial).

The low number of this release reflects the newness of this packaging. The implementation is almost entirely mature and well tested in real-world situations from its time as part of Twisted.

You can find the package on PyPI or Launchpad:

http://pypi.python.org/pypi/filepath/0.1
https://launchpad.net/filepath

MMMG: This is all great stuff. From what I saw, the API of PyFilesystem seems like the winner, at least to my eyes. I will steal the best code from "filepath" to augment my personal version of PyFilesystem, then I will see what I can contribute back to these two wonderful projects.
Enhanced by Zemanta

Saturday, March 27, 2010

A very well thought out Git branching model

Vincent Driessen nvie.com: A successful Git branching model

A successful Git branching model
In this post I present the development model that I’ve introduced for all of my projects (both at work and private) about a year ago, and which has turned out to be very successful. I’ve been meaning to write about it for a while now, but I’ve never really found the time to do so thoroughly, until now. I won’t talk about any of the projects’ details, merely about the branching strategy and release management.

Very nice, and well thought out, with developers working and collaborating naturally and productively, but still keeping the high quality main releases flowing out steadily.

Reblog this post [with Zemanta]

Tuesday, January 5, 2010

Optimize python functions by marking certain promises about its behavior : Python

promise: bytecode optimisation using staticness assertions.


This is a module for applying some simple optimizations to function bytecode. By promising that a function doesn't do certain things at run-time, it's possible to apply optimizations that are not legal in the general case.


Reddit Comments: Optimize python functions by marking certain promises about its behavior : Python: "
I gave a talk on this at our local users group a few weeks ago, the slides are online if anyone's interested:

http://wiki.python.org/moin/MelbournePUG?action=AttachFile&do=get&target=promise.odp

"

The above presentation is really nice - great, simple example of the power of this technique.

I was thinking along these lines. Having code where we specify two "speeds":

(1) flexibility/expressiveness/global-mutability/side-effects-happen-globally-immediately are important (at cost to throughput and low-latency)

i.e. dispatch on pattern matching ASTs on global mutable list of patterns, global mutable generic functions, global mutable generic methods

Allow global mutable objects in general. All mutable state is handled like distributed-revision-control & write-on-change, with all communication going through a key-hole (enforcing low latency by throttling large data transfers) as asynchronous messaging as transactions (and building transactions).

Side effects are GO!  Allow global state to change, allow outside communication, all happening ASAP, might pre-calculate while waiting for reply, but wait to the bitter end for reply none-the-less.

(2) throughput and low-latency (performance) are important (at cost to flexibility/expressiveness/global-mutability) - no side effects (all "side-effect messages" are stored, to be returned as a group when function returns, maybe each "side-effect message" is paired with a continuation)


A simple directed acyclic graphImage via Wikipedia
i.e. dispatch on low level bytecode (think: LLVM), preferred data structures are immutable and have the "shape" of directed acyclic graph (possibly a much more restricted form of directed acyclic graph, where each node has an immutable index, and parent indexes are always less than child indexes), the limited explicitly mutable state is local to OS-thread or green-thread.  No traditional asynchronous side-effects allowed.

And there are speeds in the middle, by being explicit about what you are relaxing and what you are constraining. Typically, you expect to pay a "compiling" cost to get down to the low level bytecode - again, by being explicit, if you are doing a lot of this "compiling", you can improve by being explicit about what you are relaxing and what you are constraining.

Low-latency (ability to best react to asynchronous signals) will always be preferred over throughput. If you want to put throughput over low-latency, you have to explicitly say so, with the knowledge that you have very little influence over that isolated code (isolated so able to make no compromise in throughput)

[Edit]

Hey, cool.  People are trying "Promise" out on code in the wild.  Here is creator Ryan Kelly explaining how to make use of the bytecode improvements:

http://panela.blog-city.com/the_promise_of_faster_python_1.htm#1

1. Ryan Kelly left...
2010.01.05 Tue 4:25 pm :: http://www.rfk.id.au/
Matt, thanks for taking the time to put this together. The optimizations applied by promise are certainly not in the same league as something like psyco - they have to be quite well targeted to have any measurable effect.
Some clarifications: promising a function pure() doesn't optimise that function at all, but it can speed up things that call that function by inlining its bytecode at the call site. To get this to work, you have to use constant() to promise that references to the pure function won't change. Example:

..@promise.pure()
..def calculate(a,b):
......return a + 2*b
..@promise.constant(("calculate",))
..def aggregate(pairs):
......return

In this scenario, the bytecode for "calculate" will be inlined directly into the "aggregate" function and will save the overhead of many function calls.
By far the biggest speedup that can currently be obtained using promise is inlining pure functions that get called in a loop.
Reblog this post [with Zemanta]

Friday, December 18, 2009

Aggressive-Competence in Software Development : Titus Brown

Wow, this post is great stuff!


Process and data modelingImage via Wikipedia
Aggressive-Competence in Software Development : Titus Brown: "
At the end of the day, there are things you can control, and things you can't control. You can't control what other people think of you, and you can't control how other people (including project leaders and professors) evaluate you. But you can visibly work hard, and defend yourself based upon that evidence.
I call the general approach of throwing energy at a project 'aggressive competence', and I think it's a necessary component of effective team software development. Everyone has days, or weeks, or even months where they look incompetent or ineffective; often that's because outsiders don't understand or appreciate the work that you've done. Tough on you, but I don't think it's reasonable to expect your boss, or colleagues, to look hard at your work to find reasons to praise you. Fundamentally, it's your responsibility to 'manage up' and communicate your progress to others effectively.
...

Three software development patterns mashed tog...Image via Wikipedia
This is where I think there were mismatched expectations. The students expected that they were going to be managed, helped, and given clear expectations. They weren't. So they got bad evaluations.
What do I plan to do? Well, assuming that UCOSP + MSU goes forward next term, I will be communicating my expectations quite clearly to the students. And I will be asking for regular progress reports, sent to me and CCed to the project leaders. And I'll be sending them this blog post. And I'll be failing the ones that don't listen.
I'll end with a paraphrase of one of my favorite sci-fi authors: 'every new developer has problems on a new project. The extent of our sympathy for those problems, however, will be dictated by the efforts made to overcome them.'
"

The extent of our sympathy for those problems, however, will be dictated by the efforts made to overcome them. - David Weber, The Short Victorious War
Reblog this post [with Zemanta]

Tuesday, November 10, 2009

The Limits of Test Driven Design in Software Development (TDD)

Test-driven developmentImage via Wikipedia

Great example, shows the real promise and realities of Test Driven Design (TDD)
The Limites of Test Driven Design in Software Development (TDD): "
  • Complaint: TDDed tests are prescriptive
  • Response: This is a feature. Stating our assumptions up front exposes misunderstandings.
  • Complaint: Choosing tests is hard
  • Response: This is also a feature. It tells us that our design is bad or that we don't understand the problem.
  • Complaint: The code you TDDed was bad!
  • Response: TDD does not free us from thinking. TDD is not magic.
  • Complaint: It's too much typing.
  • Response: Typing is not the bottleneck.

Many complaints about TDD are complaints that it doesn't solve some problem. These are not problems with TDD – it's not supposed to solve every problem!

The new terminal at Barajas airport in Madrid,...Image via Wikipedia

Dynamic languages don't make coffee, continuous integration doesn't shine shoes, and TDD doesn't make code scale. It's simply the basis of a solid, disciplined process for building software – a beginning, not an end.

"
Reblog this post [with Zemanta]

Scientific software quality and Global Warming

Instrumental temperature record of the last 15...Image via Wikipedia

Very well written response by Michael Tobis to this question from Jon Pipitone "Do likely bugs in the software of climate models cast doubt on the scientific consensus on human factors in global warming?"
jon pipitone: Scientific software quality: what would it take to convince software engineers?: "
Michael Tobis said...

regebro asks what looks like a reasonable question, but it's based on a fundamental misunderstanding of a question that is roughly equivalent to what the difference between weather and climate is. We have very little skill predicting one year out, even assuming no volcanoes and such. Most of the predictability of the detailed state atmosphere vanishes in three weeks or so. But at a multidecadal time scale the problem changes character. Technically, we are no longer dealing with an initial value problem but with a boundary value problem, even though the underlying dynamics are the same. We are not in that case looking for details in any specific year, but for the statistics over an extended period. In a mathematical sense that is an 'easier' problem; it is more constrained by energy balances than by nonlinear fluid dynamics. The messy stuff basically averages out and the residual is what we try to predict. In fact, maybe I'll use that as a definition of climate. It's 'what you can say about the system after the messy unpredictable part gets averaged out'. That's the basis for climate change modeling.

Temperature predictions from some climate mode...Image via Wikipedia

As for Jon's question, the models are wretched pieces of engineering. No commercial shop would release anything nearly as balky, hard to deploy, or prone to failure. It makes open source look good. And that has almost no bearing on whether they are suitable for the purpose. In fact, sometimes models are used well and sometimes they are used badly. This in turn is a scientific, not a software question. All this said, I desperately wish the software were better, and I think we could address many more scientifically meaningful problems much more effectively if it were. Finally, if you think the question is 'global warming, yes or no' the large models in question are much less relevant than many people would have you believe. The answer to that question is yes, to the extent of about 3 degrees per CO2 doubling. The idea that such a conclusion comes from complex models is wrong.
"
I will forgive Mr. Tobis for the dig against open source software ;) I maintain the difference in quality between the very best and the very worst engineered open source software projects makes it very difficult to say anything sensible about the totality.
Also, metrics in software development cannot predict the quality of output of any particular group working on a particular problem - too many confounding issues. For example, the developer with the highest bug count tends to be the best developer on the team - nobody else is trusted to tackle the hardest coding issues.

The geographic distribution of surface warming...Image via Wikipedia

Another issue is the issue of making scientific computations reproducible. Even more basic than if a particular computation is correct is making sure that computation can be reproduced by another group.
Making scientific computations reproducible
Computing in Science and Engineering archive
Volume 2 , Issue 6 (November 2000)
Pages: 61 - 67
ISSN:1521-9615
Authors
Matthias Schwab
Martin Karrenbach
Jon Claerbout
Reblog this post [with Zemanta]

Mark Chu-Carroll on Haskell

Simon Peyton JonesImage via Wikipedia

Always a sign of "programming maturity" when a programmer can readily admit the serious short-comings of their preferred language. It raises the respect for the programmer and the language discussed.
Philosophizing about Programming; or "Why I'm learning to love functional programming" : Good Math, Bad Math: "

But it's not all good. Haskell has some serious problems. In particular, it's got two issues that worry me enough that I'm still a bit hesitant to recommend it for a lot of applications. Those two are what I call lazy confusion, and monad complexity.

By lazy confusion, I mean that it's often extremely difficult to predict what's going to happen in what order in a Haskell program. You can say what the result will be, but you can't necessarily say what order the steps will happen in. That's because Haskell uses lazy evaluation, which means that no computation in Haskell is really evaluated until its result is used. You can write Haskell programs that generate infinitely long lists -but it's not a problem, because no element of the list is ever evaluated until you try to use it, and you'll never use more that a finite number of elements. But lazy evaluation can be very confusing: even Haskell experts - even people who've implemented Haskell compilers! - sometimes have trouble predicting what code will be executed in what order. In order to figure out the computational complexity of algorithms or operations on data structures, people often wind up basically treating the program as if it were going to be evaluated eagerly - because analyzing the laziness is just too difficult. Laziness is not a bad thing; in fact, I'm pretty convinced that very frequently, it's a good thing, which can make code much cleaner and clearer. But the difficulty of analyzing it is a major concern.

...

Wired plug board for an IBM 402 Accounting Mac...Image via Wikipedia

Monad complexity is a very different problem. In Haskell, most code is completely stateless. It's a pure functional language, so most code can't possibly have side effects. There's no assignments, no I/O, nothing but pure functions in most Haskell code. But state is absolutely essential. To quote Simon Peyton-Jones, one of the designers of Haskell: "In the end, any program must manipulate state. A program that has no side effects whatsoever is a kind of black box. All you can tell is that the box gets hotter." The way that Haskell gets around that is with a very elegant concept called a monad. A monad is a construct in the program that allows you to create an element of state, and transparently pass it through a sequence of computations. This gives you functional semantics for a stateful computation, without having to write tons of code to pass the state around.

...

The reason that that's a problem is that there are multiple different monads, to represent different kinds of state. There are monads for mutable arrays - so that you can write efficient matrix code. There are monads for parsing, so that you can write beautiful parsers. There are monads for IO, so that you can interact with the outside world. There are monads for interacting with external libraries written in non-functional libraries. There are monads for building graphical UIs. But each of them has a packet of state that needs to be passed between the steps. So if you want to be able to do more than one monadic thing - like, say, write a program with a GUI that can also read and write files - you need to be able to combine monads. And the more monads you need to combine, the more complicated and confusing things can get.

"
Python took the idea of "generators", added the easy syntax of the "yield" statement, and, most importantly, let the generator be a first-class object. It ended up more powerful than the language that generators was inspired from, the Icon programming language. Maybe a multi-paridigm language like Python - where the ideas of "state handling/managing", "across-process global/intra-process global state handling/managing", and ways of expressing different forms of lazy evaluation are added - and none of these are required and all of these are available - and all able to be wrapped up in a first class object - would be the correct approach.

Reblog this post [with Zemanta]

Monday, October 26, 2009

More female programmers

From Recycled Knowledge
by John Cowan
Hilarious pair of comments:
hotaru said...
> We can't afford to constrain ourselves on what size or shape or color the bodies are that house those brains.
you say this, but you think we should prefer female programmers over male programmers?
jcowan said...
That comment is both accusatory and inaccurate (I think and say no such thing). I can take accusation or inaccuracy, but the combination is poison.

Wired plug board for an IBM 402 Accounting Mac...Image via Wikipedia

Not enough is said about the time spent unlearning bad habits in computer science training.
More female programmers: "

I tried to post this comment to a public site, but failed repeatedly. The topic of the original post isn't relevant to my comment, which was in response to a comment that read, in its entirety:

Why would we would want more female programmers?

My answer:

The world needs more effectively mobilized brains. We can't afford to constrain ourselves on what size or shape or color the bodies are that house those brains. Also, diversity is good in itself: it improves flexible response, and it's silly to throw away a cheap source of diversity.

A major U.S. university with a strong CS program (I am contractually prevented from naming it) that had female CS undergraduate admissions in the single digits year after year was able to raise their admission to the same rate as other engineering programs by changing just one thing: they no longer gave people who already had programming experience preferential admission. There have been no changes in the overall performance of the student body in the years since.

"
Reblog this post [with Zemanta]

Saturday, October 24, 2009

Python News: PSF adopts Diversity Statement

What do I think about diversity statements? The Python Software Foundation has published and endorsed one:
The Python Software Foundation and the global Python community welcome and encourage participation by everyone. Our community is based on mutual respect, tolerance, and encouragement, and we are working to help each other live up to these principles. We want our community to be more diverse: whoever you are, and whatever your background, we welcome you.
Also see: For additional resources on efforts to promote diversity within the Python community, please see the main diversity page.

17th century painting from Hasht-Bahesht palac...Image via Wikipedia

Diversity/multiculturalism is a means to an end, not an end in itself.
I am for diversity/multiculturalism, now, in western culture, because there are too few natural experiments in effective living. More natural experiments by different cultural, national, religious, social-sexual groups lead to better outcomes.
Society would give more people happiness, fulfilment, health, reproductive opportunities in the absence of pain and anxiety if more people copied the best of other cultural, national, religious, social-sexual groups.
I take exception to the use of the word "tolerance". To tolerate one thing, you must be intolerant of forces in opposition to that thing. "Tolerance", taken by itself, is meaningless, unless the form of the corresponding intolerance is unspecified.
As a human, I have a built in bias against other cultural, nationalreligious, social-sexual groups. I strive to rise above these biases, and act accordingly. I am intolerant of those who would act against other cultural, religious, social-sexual groups, for their own benefit or comfort. I also take exception to the use of the word "diverse". There are some forms of diversity I value, and other I do not. I would not appreciate diversity that welcomes motivated stupidity, motivated sloth, motivated controversy, and motivated egoistic criticism and . I feel strongly that those who practice motivated stupidity, motivated sloth, motivated controversy, and motivated egoistic criticism should feel unwelcome. ("motivated is important here, I don't mean those who do so out of understandable ignorance, or who are incapable of constructive communication because of situational stress.)
The diversity I value are, again, cultural, national, religious, and social-sexual, for the reasons stated above. Not that anyone should care, I just wanted the exercise of making my thoughts more precise.

Communication major dimensions schemeImage via Wikipedia

I have a terrible temper, and act like a lout, many times. I hate communities that rate politeness over technical correctness, but I cannot defend my loutish stupidity. I don't mind getting called out - I deserve it.
Python News: PSF adopts Diversity Statement: "

On October 12, 2009, the board of the Python Software Foundation voted to adopt a Diversity Statement.

"
Reblog this post [with Zemanta]

Thursday, September 24, 2009

The Duct Tape Programmer vs. The Test Infected Programmer

Installation disk of Netscape 2.Image via Wikipedia

Joel likes JWZ. They both don't care for unit tests. Joel points to JWZ as a guy who ships. You could argue that JWZ was a guy who shipped, and became a burn-out poster child. The Netscape stuff just wasn't good enough to withstand the Microsoft onslaught. If you take JWZ + (Some) Unit Test + (Some) Test Infected Attitude, you really got something. I am happy to compete against coders who religiously avoid writing unit tests, so I am glad there is considerable variation in opinion. So, keep it up, Joel! The Duct Tape Programmer:
"Zawinski didn’t do many unit tests. They “sound great in principle. Given a leisurely development pace, that’s certainly the way to go. But when you’re looking at, ‘We’ve got to go from zero to done in six weeks,’ well, I can’t do that unless I cut something out. And what I’m going to cut out is the stuff that’s not absolutely critical. And unit tests are not critical. If there’s no unit test the customer isn’t going to complain about that.”

Netscape NavigatorImage via Wikipedia

Remember, before you freak out, that Zawinski was at Netscape when they were changing the world. They thought that they only had a few months before someone else came along and ate their lunch. A lot of important code is like that.

...

Duct tape programmers have to have a lot of talent to pull off this shtick. They have to be good enough programmers to ship code, and we’ll forgive them if they never write a unit test, or if they xor the “next” and “prev” pointers of their linked list into a single DWORD to save 32 bits, because they’re pretty enough, and smart enough, to pull it off."
32 bits saved on a linked list? I'm convinced! Duct tape ahoy! OK, what is the alternative? [Edited 10/23/09] Found a very cute comment thread on jwz's own website:
It seems to me more like you use foresight and pessimism to avoid getting into situations where you need to demonstrate exceptional programming ability. Absolutely no offense, even by faint praise, intended. ... "Use foresight and pessimism" This needs to become some sort of meme metric. ... The formulation owes something to a pilot's maxim: "a superior pilot uses his superior judgment to avoid having to exercise his superior skill."
This sums up Spolsky's mistake - confounding hard-won pessimism about fruity techniques with making a conscious decision to accrue short term technical debt. [exhaustive definition of technical debt here by Brad Appleton http://bradapp.blogspot.com/2009/06/technical-debt-definition-and-resources.html ] If practically 100% of the value of a piece of code comes from the speed of publishing, and you are skilled enough to take on some technical debt, confident you can pay it back later, by all means, go ahead. Publish the code, and forget the unit-tests and forget test-driven-development. But only if you earned that confidence. Here is an excellent diagram about who can and who cannot make this judgement, by Martin Fowler: http://martinfowler.com/bliki/TechnicalDebtQuadrant.html [Edited 10/26/09 ] I found this comment from Reddit by terror406: http://www.reddit.com/r/programming/comments/9nipa/joel_on_software_the_duct_tape_programmer/c0dj2dt
Architecture Astronauts and Duct Tape programmers are fictional characters. There are only competent and incompetent programmers. And competent programmers love and understand elegant architecture, but also know when to go into 'duct tape mode' on order to get the job done and ship the damn product. Edit: There is an easy way to tell the two apart. Ask them about their Technical Debt. A competent programmer will start talking about all the stuff that could have been done better, the incompetent programmer will just give you a blank stare.
Again, the issue is having the skill and judgment to take on short-term technical debt.
Reblog this post [with Zemanta]

Thursday, August 27, 2009

Dave Beazley writes more on Python's GIL

Inside the "Inside the Python GIL" Presentation
My comment: Thanks to Dave Beazley for doing some of the very best writing and research on Python's GIL. It removed much of my own stupidity on the subject.

A multithreaded process with two threads execu...Image via Wikipedia

What seems to be the answer: *** I/O bound threaded code - no risk of context switches swamping the CPU: For CPython's implementation, implementing the GIL using both a condition variable and an pthreads/OS mutex lock is the way to go, so that code can be developed on a single core machine, and not blow up on a multiple core machine (and vice-versa, to a degree). *** I/O bound threaded code - possibility of context switches swamping the CPU: As the number of "concurrent" events grows, there is a possibility of the CPU being swamped by the context switches involved in OS threading. The solution is, in pure Python, using event-driven multi-threading and deferred objects, using the Twisted library or rolling your own. You have the problem of avoiding writing blocking code, such as loops. And code that has the possibility of blocking must regularly check/pause to handle queued events. *** CPU intensive threaded code: 1) event-driven and deferred objects (using Twisted or rolling your own) - being careful to avoid loops and other long running code 2) "green" threads, implemented in pure Python - maintain your own stack of tuples of ("functionname", arg0, arg1) and ("continuationname", statevar0, statevar1), and dispatch on name from that stack - being careful to avoid loops and other long running code - if you have some code with a loop, break it up into more than one "continuationname0" "continuationname1" etc.

Die of an Intel 80486DX2 microprocessor (actua...Image via Wikipedia

3) message passing between Python processes 4) Python module multiprocessing - threading work-alike 5) combination of the above four 6) Why not modern shared mutable state threading like in Java and C#: the implementation in the virtual machines and the language and library constructs? Sweep away all the complexities of the above 5 with a single broom? I am prejudiced against general shared mutable state threading because it is brittle and non-deterministic. That makes it a non-starter: you are never able to make ANY guarantees about low-latency and performance after ANY change in the code, no matter how small. And to regain adequate low-latency and performance, your implementation could get very hairy very quickly. Of course, the penalty paid by my suggested approach is a hairy implementation right off the bat - I have to be honest about that. It seems to me: any techniques adequate to handle CPU intensive multi-threaded code would be overkill for I/O bound multi-threaded code. So best to deal with the cases separately. [ Use Google books to find out about "shared mutable state threading in Java" http://books.google.com/books?q=shared+mutable+state+threading+java&btnG=Search+Books ] The biggest missing piece: in a long running high availability application (a candidate for multi-threaded code), code reloading on the fly. Right now, a terrible solution is using Erlang as a thin layer of supervisor code, where the real work is farmed out to Python. The only advantage of this approach is avoiding predictable failure modes. Armin Ronacher blogged about this problem: http://lucumr.pocoo.org/2009/7/24/singletons-and-their-problems-in-python
Reblog this post [with Zemanta]

Tuesday, July 7, 2009

Why Dynamic Languages? Why Strongly Typed Languages?

From Reddit commenter "munificent", the best analogy of why productive coders use Dynamic Programming Languages and Strongly Typed Programming Languages (best analogy I have ever seen...):
> > So, the problem is that software developers have poor foresight and a complete lack of self-awareness? No, the problem is that the success of most software projects can not be predicted.

* {{en}} Picture of Yukihiro Matsumoto, creato...Image via Wikipedia

When you go camping, you just pitch a tent. When you're moving, you build a house. With many software projects, it's impossible to tell beforehand if you're just going camping or will be taking permanent residence. It doesn't make sense to spend a month building a new house every time you go somewhere, if 90% of the time you end up leaving after a couple of days. What does make sense, and is becoming a common pattern is this: 1. Stake out a new territory and pitch a tent (v1 in a dynamic language) 2. If it turns out to be a hospitable place, start building a new house next door (migrate back-end services to more strongly-typed languages). 3. Once that's done, ditch the tent and move in (move the production system over to the new back-end). See: twitter (Ruby -> Scala), Facebook (PHP -> Erlang), etc.

هذا و توّ الليل ماراح نصفه . .Image by ThaRainbowRaider. via Flickr

Yes, a lot of academic computer science completely ignores the economic constraints of software engineering.
Reblog this post [with Zemanta]

Thursday, July 2, 2009

Using an inner function for breaking out of nesting

From Fuzzyman, a blog post on different ways of handling breaking out of nesting. Here is a snippet:
def find_match():
    for x in range(max_x):
        for y in range(max_y):
            if match(x, y):
                return x, y
result = find_match()
if result is None:
    # match not found
else:
    x, y = result
Yup, looks clean to me, and the named "result" and "inner_function" give the opportunity for self-documentation, with appropriate names instead of "result" and "inner_function".

Great Blue Heron pair preparing a nest (bird),...Image by mikebaird via Flickr

For your esthetical enrichment, a picture of a nest!
Reblog this post [with Zemanta]

Friday, June 12, 2009

Jp Calderone knows how to override __eq__. Do you?

Took me a while to find this, so let me blog it now, for prosperity:

CPythonImage via Wikipedia

How to override comparison operators in Python Jp Calderone goes into much more detail than just how to write proper "__eq__" and "__ne__" methods for your own Python classes, but it is surprising how well hidden the details for correctly implementing "__eq__" and "__ne__" are. I believe the issue is less critical in Python3, because it does the correct thing when only "__eq__" is implemented. Here is the sample code:
class A(object):
    def __init__(self, foo):
        self.foo = foo
    def __eq__(self, other):
        if isinstance(other, A):
            return self.foo == other.foo
        return NotImplemented
    def __ne__(self, other):
        result = self.__eq__(other)
        if result is NotImplemented:
            return result
        return not result
If you want an immutable object that can be used as a dictionary key, you will want to implement "__hash__", along with "__eq__" and "__ne__". If you are implementing inequality comparisons - Be Careful - supply the full complement of inequality comparisons and take care when using "NotImplemented". The default implementations of "less-than __lt__" "less-than-or-equals __le__" "greater-than __gt__" "greater-than-or-equals __ge__" aren’t very useful - they compare by address using id(). This default inequality comparison can introduce intermittent bugs in your comparison code. If there is no meaningful comparison between different types or classes, raise a TypeError, so there is no risk of falling back on the terrible default inequality comparison implementation. This problem will be fixed in Python3. The fastest and most complete solution is this code from Raymond Hettinger - Python Cookbook recipe 576685: Total ordering class decorator.
def total_ordering(cls):
    'Class decorator that fills-in missing ordering methods'    
    convert = {
        '__lt__': [('__gt__', lambda self, other: other < self),
                   ('__le__', lambda self, other: not other < self),
                   ('__ge__', lambda self, other: not self < other)],
        '__le__': [('__ge__', lambda self, other: other <= self),
                   ('__lt__', lambda self, other: not other <= self),
                   ('__gt__', lambda self, other: not self <= other)],
        '__gt__': [('__lt__', lambda self, other: other > self),
                   ('__ge__', lambda self, other: not other > self),
                   ('__le__', lambda self, other: not self > other)],
        '__ge__': [('__le__', lambda self, other: other >= self),
                   ('__gt__', lambda self, other: not other >= self),
                   ('__lt__', lambda self, other: not self >= other)]
    }
    roots = set(dir(cls)) & set(convert)
    assert roots, 'must define at least one ordering operation: < > <= >='
    root = max(roots)       # prefer __lt __ to __le__ to __gt__ to __ge__
    for opname, opfunc in convert[root]:
        if opname not in roots:
            opfunc.__name__ = opname
            opfunc.__doc__ = getattr(int, opname).__doc__
            setattr(cls, opname, opfunc)
    return cls
For a lower tech solution, consider using this Mixin class for inequality comparison special methods [from Fuzzyman: http://www.voidspace.org.uk/python/articles/comparison.shtml]
class RichComparisonMixin(object):

    def __eq__(self, other):
        raise NotImplementedError("Equality not implemented")

    def __lt__(self, other):
        raise NotImplementedError("Less than not implemented")

    def __ne__(self, other):
        return not self.__eq__(other)

    def __gt__(self, other):
        return not (self.__lt__(other) or self.__eq__(other))

    def __le__(self, other):
        return self.__eq__(other) or self.__lt__(other)

    def __ge__(self, other):
        return not self.__lt__(other)

Monty Python's Flying Circus album coverImage via Wikipedia

[Aside & Plug] Let me take this opportunity to give a plug to the book IronPython in Action, by Michael Foord (Fuzzyman) and Christian Muirhead. The publisher, Manning, has a great service to Python Programmers on the book's website:

FuzzymanImage by Michael Foord via Flickr

Python Magic Methods I was a little disappointed (and surprised) that this great Python magic methods reference didn't give more tips about "__eq__" and "__ne__". But, otherwise, this is all great material and this is all new material, not just a re-hash of the original on-line Python docs. The best summary I have seen; even better than Alex Martelli's Python in a Nutshell.
Reblog this post [with Zemanta]

Thursday, March 26, 2009

Adding Google's google-code-prettify to Blogger


Guido Van RossumImage by palewire via Flickr
Wanted to be able to add highlighted code snippets, so copied how Python's Guido does it (nice example), as a stop-gap.

Blogger's support for programming code is embarrassing; zero-support given. So this is at best a partial fix. It makes use of Google's own google-code-prettify Javascript code. It is easy to use, so the whole of the docs fits easily on a single page:


Using "Blogger Control Panel | Layout | Edit HTML": just inside the "<head>" block, add these CSS and Javascript links:
<link
href='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.css'
rel='stylesheet' type='text/css'/>
<script
src='http://google-code-prettify.googlecode.com/svn/trunk/src/prettify.js'
type='text/javascript'></script>
Then add "onload='prettyPrint()'" to the opening "<body>" tag
<body onload='prettyPrint()'>
You will need to use HTML escape if the code had naughty HTML syntax characters (&,",',>,<). A good writeup: http://wiki.python.org/moin/EscapingHtml. [Don't be confused like I was: this is different functionality than the xml.sax.saxutils function quoteattr !]. You can also use an online HTML escape tool like htmlescape.net.

Then put whatever code you wish between these HTML brackets
<pre class="prettyprint"> ... </pre>
Here is a tiny snippet of Python highlighted code:
def hello(i):
print "hello"
print i
For a whitespace sensitive language like Python, it pays to take a little caution, to compensate for the typical laxness of whitespace preservation in HTML and Blogger's basic editing tools. Type or cut-and-paste the code in the "Edit Html" tab - you will get good and predictable results.

In general, when using Blogger's in browser WYSIWYG editing tools, check the "Edit Html" tab last, just to make sure that Blogger's editing tools did not strip extra spaces from your Python code
Reblog this post [with Zemanta]

Wednesday, February 18, 2009

Python's user class implementation, and GIT

CPythonImage via Wikipedia

Comment to Guido's post "The History of Python: Adding Support for User-defined Classes"
I like dictionaries/namespaces better than objects. For me, your accomplishment with Python's implementation of user classes is on a par with Linus's insight to base a DVCS on a "mere" filesystem that only tracks content based on cryptographic hashes. A single idea, very simple but not _too_ simple, implies everything else, to great success.

Guido Van RossumImage by palewire via Flickr

Computer science is too important to leave in the hands of people who are not implementors.
Reblog this post [with Zemanta]