Showing posts with label Causality. Show all posts
Showing posts with label Causality. Show all posts

Thursday, March 25, 2010

Andrew Gelman said something significant, above my head

The Illustrated Sutra of Cause and Effect: 8th...Image via Wikipedia
A new post by Andrew Gelman, with a quite wordy title

The single most useful piece of advice I can give you, along with a theory as to why it isn't better known, all embedded in some comments on a recent article that appeared in the Journal of the American College of Cardiology


I would summarize, but I am embarrassed to say I understand very little of it.  In a comment, I made an attempt:
Hello Prof. Gelman,

Are you saying "model building" will naturally lead to applying fruitful transformations that will lead to statistics that do more than only prove "a formally statistically significant difference for a trivial effect"?

(By "model building" you mean the scientist taking responsibility for an abstraction that goes beyond statistics, i.e. causality and value judgments about what is more than a trivial effect.)

I am having trouble translating your description into something I can understand, so I would appreciate your help if I made a hash of things with my little summary.
I will add edits to this as I learn more.

I feel Pearl's causality graphs (directed acyclic graphs, to be specific) are the appropriate format to present any model.  If you want to allow the possibility of "no true zeros", then use multiple models, and "collapse" all the points where you wish to use statistics to show the possibility of "no true zeros", maybe even "collapsing" everything into a single point!  The multiple models you then have will now compete in different uses - based on predictive power, accuracy, ability to calculate meaningful error ranges, cost of collecting data, cost of computation, cost of comparison, ability to predict outcomes from interventions, cost of understanding, etc.

"No true zeros" - see Andrew Gelman's Review Essay "Causality and Statistical Learning" Section Heading: "There are (almost) no true zeroes: difficulties with the research program of learning causal structure" http://www.stat.columbia.edu/~cook/movabletype/archives/2010/03/causality_and_s.html

I also have a hard time understanding all this in isolation from a model of a rational being working under a motivating sense of responsibility to make a decision about an action (or remaining inactive).  Especially statistical analysis divorced from utility in making a decision.  Comprehension is nice, but comprehension that cannot play a part in any morally motivated decision is valueless.
Reblog this post [with Zemanta]

Friday, March 5, 2010

Statistics versus Causality - A predictable impasse

Stop LandminesImage by Cedric Favero via Flickr
My undignified reply to Andrew Gelmans's take on Causality and Statistical Learning


The causality people and the statistics people are talking past each other, your [Andrew Gelman's] 12 page magnum opus included.

Point 0) Sense of responsibility → decision → commitment to action/inaction → action/inaction ⇒ implies you possess a general description of reality, unless you are limiting yourself to a very narrow sphere of responsibility.

Point 1) Statistics cannot be the basis for a general description of reality because of Simpson's Paradox.  When it arises, the paradox can only be eliminated by an appeal to plausible causality, directly or indirectly.  Also, no statistical test exist, for a static situation, to make a prediction of what relationships would prevail if conditions change -- again, only an appeal to causality can do such.  (See Judea Pearl's book Causality, chapter 6)

Point 2) Causality cannot be the basis for a general description of reality because reality violates the assertion of independent variables needed for effective causal analysis ("no true zeroes" as you put it).  Reality doesn't even adhere to the laws of conditional probability [ http://www.stat.columbia.edu/~cook/movabletype/archives/2009/09/the_laws_of_con.html ] much less the structure of independence needed for causal analysis.

Illustration of the continuous version of Simp...Image via Wikipedia
Point 3) There are no other contenders for general descriptions of reality besides statistics or causality.

Conclusion) SOL

So people, under the burden of responsibility, must maintain several models of reality, over smaller and larger domains of applicability, some statistical, some causal, some based on symmetry & curve fitting, some based on the laws of probability, some based on scientific laws, some based on economic laws, some based on rules of thumb, some based on multiple simulation runs, some hybrids.  These models compete against each other, at the cost of maintenance, data collection, computation, and comparison, with the benefit of correct probabilistic predictions of consequences of action/inaction, or the benefit of demonstrations of broad range of uncertainty that swamps discernment of effects between decisions.

And the sense of responsibility is made of shifting sands, and human values and goals are not static.  So you could pay all the costs for a model, just to dispense with it.

An Inglehart-Welzel Cultural Map of the World:...Image via Wikipedia
But all this *still* can be done for individuals or small groups.  Once you get past 30 members, what is rewarded are techniques for rubber stamping decisions already taken by the politically powerful, under the name of "objective analysis" for political cover.

So "small" decisions can be made quite well, with effort.  And "large" decisions are made quite poorly, because evidence of a cold calculated analysis would be blood on the hands of the politically powerful (besides, the ability to perform such analysis is in opposition to dumb loyalty, which is the most prized character trait of the privileged in-group).  But these "large" lousy decisions possess notoriety, and thus human appeal.  So a thousand pages each over describing a thousand theories chase after a relative small number of very poor decision making processes.

The consequences of all this may dim my sparkling optimism, so I must leave that as an exercise for others.



Reblog this post [with Zemanta]

Thursday, January 7, 2010

Math - could slack off, and still get good grades

To be honest, what I liked best about math was that I could slack off and still be at or near the top of the class. The harder the math got, the better I did, relative to the rest of the class.
A hoodie with the w:University of California, ...Image via Wikipedia

Ultimately, I got a pretty decent undergrad math education from UCLA, took some graduate level algebra as an undergrad. I never had to rise above a loping slacker's pace.

Lacked and would have appreciated: category theory (zip zero zilch taught), better differential equations (I got a decent grade with no need to understand the subject, which is lame), continuous distributions and how they relate to the Fourier transform (if this was taught to me, I don't remember it).

It is shocking how much I need to learn now was invented/developed after I graduated from college. Before 1993 - No distributed revision control, no distributed operating systems, no decent theory of just-in-time compiling, no decent published real-world examples of Bayesian probability, Judea Pearl's theory of causality wasn't invented yet, no decent cryptographic hashes, no Bloom filters, no decent introductory texts on decision analysis. My wife says that is why she doesn't want our daughter to go into computers - you have to keep learning and forget the crap that just doesn't matter anymore. Whatever. I think it is inescapable that you have to keep learning just to keep somebody from eating your lunch.


Mathematical FlowerImage by hyperboreal via Flickr
Frankly, I am glad I didn't do well enough, overall, to attempt a post-graduate degree in math or computers, back in 1993. A lot of what passes in academia today is pretty weak sauce, in applied math and computer science. God bless the IntarWebs. I can just grab the info I need and go. One problem, the tempo has increased.
Reblog this post [with Zemanta]

Tuesday, December 8, 2009

Everything about causal inference in 40 pages

I will go over this with a fine tooth comb. I am looking forward to Judea Pearl's 2nd editon of _Causality_. I will have to transfer a whole bunch of margin notes!


Ishikawa fishbone-type cause-and-effect diagramImage via Wikipedia
My take. Causality is a hack. The structure of the universe doesn't guarantee anything that would causality a fool-proof technique. But it is very effective, and it is a built-in hack in the brains of humans. Humans handle causal relationships quite naturally and usually correctly in simple (and not so simple) situations. But, very effective or not, it is still a hack.

Hack or not, you have to use causality to understand the world, and make decisions about the world, and make rational actions inside of the world.


A directed graph.Image via Wikipedia
I would make the cycle-free directed graph the _definition_ of causality. Then there is the "do" operator on the graph, where a node is constrained to a particular value, chosen by analysis or chosen by a randomizer (from a range suggested by analysis).

In this causal graph, you can have "nodes" that, inside, contain

** relations from statical properties of historical data
** relations from statical properties of simulations, however the simulation is constructed
** relations from scientific laws
** relations from non-scientific laws - like "rules of thumb" and other relations that no one would defend as being laws of nature
** relations suggested by symmetry
** relations suggested by "smoothness" - filling in gaps or smoothing errors with "simple" equations

Bayes theoremImage by disownedlight via Flickr
** relations suggested by naive Bayesian - using the Bayesian equation to add missing information, possibly quite naively

The analysis begins with 3 or more of these causality graphs - each chosen from a different discipline of analysis, and each chosen as the "simplest" thing that could possibly work, to start.

The graphs suggest experiments to carry out, and the graphs can be scrutinized.

The graphs change, more are added, some are deleted, just keeping 3 or more, with examples from different effective domains of analysis (one graph may be suggested by the laws of physics, another by the laws of economics, another strictly statistical from historical data, another from political science storytelling, etc.)

Some quite optimistic (borrowing the analysis of the most active participants) and some quite pessimistic (even so pessimistic so that a majority of current activity would not be advised by the model).

Each is must simpler than reality, and, thus, suitable to run against many many different future eventualities. Keep them simple so they are nimble and responsive tools.

None would really be defended as the complete, because so simple. The cost of any complexity is considered to be great. Simple models, easily manipulated, quickly manipulated.

We would have a personal standard of rationality, and a lower standard for semi-rational actions in a group (economic/social groups are demonstratively loath to hold themselves to the highest disciplines of rationality). For the personal standard of rationality - think Socrates. For the lower standard of group semi-rationality - think Rush Limbaugh and Glenn Beck. Neither is better or worse than another, each is to be used in their own domain. The real-world economic/social group would immediately feel paralyzing anxiety when presented arguments from people who hold themselves to the highest disciplines of rationality - so it is irrational to present that to them.

Everything about causal inference in 40 pages: "
Judea Pearl describes his new article Causal inference in statistics: An Overview as 'a recent submission to Statistics Survey which condenses everything I know about causality in only 40 pages.' That seemed like a bold claim, but after reading it I'm sold. I don't come from Pearl's 'camp' per se, but I found this a really impressive overview of his approach to causation. His overtures to folks like me who use the potential outcomes framework were much appreciated, although it is clear throughout that there is still intense debate on some of the issues. The bottom line: if you've ever wondered what the structural equation modeling approach to causal inference is all about, this is your one-stop, must-read introduction (and an insightful, engaging, and thorough one at that).
"
Reblog this post [with Zemanta]

Wednesday, July 8, 2009

This started as a posting about Statistical Causality...

Interesting blog posts back and forth between Don Rubin, Judea Pearl, and Andrew Gelman. I cannot understand the original point of contention between Don Rubin and Judea Pearl. I have no idea what "M-bias", "controlling for pre-treatment versus post-treatment variables" mean. http://www.stat.columbia.edu/~cook/movabletype/archives/2009/07/disputes_about.html

The Illustrated Sutra of Cause and Effect: 8th...Image via Wikipedia

Happily, the fundamental point of disagreement seems to be getting clear with a comment by Philip Dawid (who offers a very nice on-line document "Principles of Statistical Causality" which covers a lot of concepts in only 94 pages). This is a very clear summary of what is meant by "causal inference":
Like Pearl, I like to think of "causal inference" as the task of inferring what would happen under a hypothetical intervention, say F_E = e, that sets the value of the exposure E at e, when the data available are collected, not under the target "interventional regime", but under some different "observational regime". We could code this regime as F_E = idle. ... It should be obvious that, even to begin to think about the task of using data collected under one regime to infer about the properties of another, we need to make (and should attempt to justify!) assumptions as to how the regimes are related.
(Emphasis on the turn of phase that caught Andrew Gelman's eye) What is the nature of the discipline of "statistical causal modeling"? Beats me. I have the idea of several DAGs (gaps filled in with plausible causal contributors), several multi-dimensional collected sample distributions (gaps filled in with smoothing or naive-Bayesian techniques), several simulations based on plausible natural or un-natural laws (simulations providing multi-dimensional distributions). Between creatures of all three types, we test them against historical data, future data, and intervention experiments. They fight against themselves, and only a few remain. I have the idea of defining a distribution as the opportunity to open a sample portal to an alternate universe to the same sample stream. You can open sample portals at a certain rate, so the total number of samples available to you has a rate that increases as the square of time. You desire the sample portals to all have the same correct driving distribution. Now, obviously, you cannot actually create sample portals to alternate universes. So you create more fake sample portals. Each has the possibility of one or more failure modes. Taken all together, these form a pessimistic analysis. Human action is inherently optimistic, so, take the opportunity to make explicit biases of personality and human real-world population studies of life outcomes consistent with living a happy and fulfilled life. The third leg of the stool is a model of human effectiveness:
  • rationality,
  • choice,
  • free will,
  • changes in capability,
  • changes in habitual action,
  • goal-directed action,
  • effectiveness,
  • consciousness,
  • self-knowledge/introspection,
  • knowledge of the world,
  • consistency between behavior and professed mental abstractions of desired behavior,
  • and morality -
leading to the outcome of a happy and fulfilled life. That is what I got right now. The way to study how "human action is inherently optimistic" is with happiness research (covered in Daniel Gilbert's Stumbling On Happiness, and the sections on happy outcomes in Dan Ariely's Predictably Irrational). The model of human effectiveness is based on there being only a few chances over time for exercising free-will. We greatly limit the role of free-will, because human behavior is better explained by:
  • habitual actions,
  • personality,
  • IQ,
  • daily exercised capability,
  • daily exercised responses,
  • environment,
  • etc.
We expect to see meaningful change caused by free-will events over the course of 10 years - just because there are so few meaningful free-will events during those many years.

A sketch of the human brain by artist Priyan W...Image via Wikipedia

Like Julian Jaynes arguing in The Origin of Consciousness in the Breakdown of the Bicameral Mind that consciousness only came into existence 3000 years ago from stresses on growing populations searching and competing for sustaining resources, I don't think that free-will is a capability of all humans since Homo sapiens originated 200,000 years ago. I think meaningful free-will also is a recent phenomenon. And I don't think it is expressed in all people - it only gets exercised under a peculiar stress of personality and environment, when the capability exists. I reject the idea that free-will is simply random behavior, or the absence of a mechanism for the deterministic prediction of behavior. It cannot be meaningfully separated from these: rationality, goal-directed action, effectiveness, consciousness, self-knowledge/introspection, consistency between behavior and professed mental abstractions of desired behavior, morality, and other issues. The reason for the need to consider time periods on the scale of a decade is that free-will events are the residue of actions and behaviors that cannot be explained by simpler means. Because it is a residue, we are only interested if there is a over-riding consistency and progression - because we don't want to give the label of "free-will" to an odd-ball collection of junk. That's enough for now. Added: Andrew Gelman explores issues further: More on Pearl's and Rubin's frameworks for causal inference He describes the idea of using Minimal Rubin, Full Rubin, Minimal Pearl, Full Pearl as techniques for statistical causality. And this note on the benefit of Rubin's approach:
Be explicit about data collection. For example, if you're interested in the effect of inflation on unemployment, don't just talk about using inflation as a treatment; instead, specify specific treatments you might consider (adding these to the graphs, in keeping with Pearl's principles). This also goes for missing data. ...
I don't understand the section on "Controlling for intermediate outcomes". Pearl then contributes a long and difficult comment. Gelman and Pearl argue over this: "the correct thing to do is to ignore the subgroup identity" - I just don't understand this at all.
Reblog this post [with Zemanta]

Wednesday, June 10, 2009

[Public Service Message] Jury Nullification: Please inform yourself about this fundamental right

[Public Service Message] Jury Nullification: Please inform yourself about this fundamental right

First a public service message... Please inform yourself of the right of Jury Nullification. This is a fundamental right you posses as a juror, that doesn't get a lot said about it. If the law is unjust, you have the right to acquit - to say "Not Guilty" - even in the face of the defendant's violation of the letter of the law.

This is Swampyank's copy of Image via Wikipedia

You can read more about Jury Nullification here: http://www.law.umkc.edu/faculty/projects/ftrials/zenger/nullification.htmlt http://en.wikipedia.org/wiki/Jury_nullification I will not lie - exercising your right to Jury Nullification takes guts. I have never yet practiced jury nullification. In my jury duty career I have only had the guts to admit to the prosecuting attorney during jury selection that I could not find any defendant guilty of a drug offense, even if the defendant was a drug dealer (which was the nature of this trial). I was excused, and all I did was make the jury selection for this particular drug trial slightly harder to complete. I felt a little conspicuous back in the juror duty waiting room, when I returned, because the majority of people in my area are very conservative, and I don't welcome people pointing at me behind my back. But in the end, I felt not too bad, because through the whole process I was simply being completely truthful. If I was a twelfth juror, I am not sure I would have the guts to be the lone holdout and have the whol

cover of the pulp novel, I, the JuryImage via Wikipedia

e weight of the trial proceedings on my shoulders. So, what I did was a compromise, by being truthful to the prosecuting attorney during jury selection. But no matter what your personal values, it is important to not throw away your right to Jury Nullification, without some consideration first. After consideration, you can proceed as you see fit. OK, done with the public service message...

The book I would take if I had jury duty today would be Judea Pearl's _Causality: Models, Reasoning, and Inference_, because I have to review the book again (and review the notes I wrote in the margins in my own copy). It is a very difficult book - I was only able to read about six pages a day. The book is about Causality -- effect following cause. We take the concept for granted, but it has been on shaky ground since David Hume in the 18th century. But now, because of researchers like Judea Pearl, it is on solid footing. Here is the 5 minute summary: Write down all the possible causes and effects on a piece of paper - we will call these "points". Draw arrows from things you think are direct causes to the things you think are direct effects. Now you will have a bunch of "points" and "arrows" between those points. Look for loops - search for any "loops" that can be made by tracing your finger from point to point, always tracing your finger from arrow tail to arrow head. Are there zero loops? If so - good! - you have a description of causality, and you can use this diagram to understand the casual effects and interactions. Now, how can you tell if your diagram is, in fact, representative of the real world? Well, on the points of the graph, there is a "DO operation" linked with a manipulation you can do in the real world. Manipulations can be like fixing different variables in a scientific experiment, or like making a careful experiment that isolates the object of interest from anything that might mess things up. Consider these points: (A) Lawn Sprinkler near sidewalk; (B) Rain cloud over sidewalk; (C) wetness of sidewalk; (D) slipperiness of sidewalk; (E) number of people falling on slippery sidewalk Consider these arrows: (A -> C); (B -> C); (C -> D -> E) Draw it, run finger over graph, cannot find any loops - good! - ready to begin. We will try the DO operation on (C). On the graph, I just set "wetness" to "very wet" or "very dry" or whatever, and make a prediction. In the real world, I can either water down the sidewalk with a garden hose, or shield the sidewalk from the sprinklers with a tarp and shield from rain with a canopy and dry the sidewalk with a towel then a hairdryer. If the predictions match between the graph and number of people falling on the slippery sidewalk, I feel good that I have a graph representative of the real world. (This is a dumb example, because humans can handle the casual analysis of this situation with no problem. It only gets interesting as the graph gets more complicated.) Check the book out of your nearest university library (there is a 2nd edition coming out soon, but I wouldn't worry about it - no major revisions). Read only the last chapter, which is the text and slides from an informal lecture Judea Pearl gave. It is enough. The rest of the book is very difficult, or at least it was for me. I will probably have to read it two or three more times, on top of already reading it once carefully, to fully understand everything developed. My copy is heavily marked up, and I think I made a lot of mistakes, because I needed to think through it all more. So you probably don't want to borrow my marked up copy (not that I would actually lend it to you - the guy who lends out my books is sick today :P )

Reblog this post [with Zemanta]