Saturday, October 30, 2010

statistical inference and racism

Up-front disclaimer: None of the following should be construed as supporting racism. If anything, my point is that racism is as shaky intellectually as it is morally. It's simply wrong to unfairly treat an individual based on indirect inferences from their similarities to a particular race (or ethnicity, gender, religion, etc.). I also apologize up-front for offending statisticians.

It occurred to me, as it doubtless has to others, that racism has some connections to faulty statistical inferences. Of course, I don't mean that the typical racist is computing actual statistical regressions, but that part of their error consists of an "intuitive" misunderstanding that would appear unsound if it was quantified and written out. And naturally, the inability to estimate correct conclusions about probability and statistics is present in any untrained person, not just the racist one.

the instrumental variable

I've heard people aver that, since racial discrimination has continued to decrease, race should no longer be considered a determining factor in people's lives. While this is a happy thought, it naively misses a crucial aspect: (direct) discrimination is not the only way that racism acts.

Assume a statistical model or equation in which race and discrimination aren't present. It could be for any personal outcome like educational attainment, as long as race isn't included in the model's variables. Essentially, the mathematical equation expresses a complete lack of obvious relevance and causation between race and the outcome. Discrimination doesn't even enter the equation, as we posited.

Now run a thought experiment on our model. Suppose we apply the model to randomly selected people, i.e. we grab the info for each person and then calculate (not measure) the outcome with the model. Further suppose that we also record each person's race. Then we find, as in reality, that when we partition the calculated outcomes by race, the distribution is drastically and consistently different. How can this be? Race wasn't one of our variables, because without outright discrimination it had no mechanism of causation.

The answer is that, in this hypothetical model and thought experiment, race could still act as an instrumental variable. Roughly speaking, race is correlated with the outcome through the model variables. For instance, in a model of personal income, it's highly reasonable to have the parents' income as a variable. Although the model may leave out "race", race can nevertheless affect personal income (the outcome) as long as it affects the parents' income (the modeled variables). Thus race can be a cause of an outcome despite not being a "cause".

Moreover, race is likely to be an instrumental variable for an array of models, and all the models are likely to be related too. Bad (or missing) education tends to lead to low-paying employment or unemployment, which tends to lead to poverty, which tends to lead to crime, which tends to lead to low property values and taxes, which tends to lead to low school funding, which tends to lead to bad education. Regardless of whether there's a blatant racial barrier presently at work, the long-lasting "fingerprints" of racial differences may be active (e.g., systematic segregation sometime in the past that restricted access to the same set of opportunities).

the overeager Bayesian

Stereotypes are simplifications by nature. Sometimes a stereotype can be a helpful mental shortcut, but often it doesn't reflect the most accurate concept of a "typical" sample from a population. This is notoriously true of racial stereotypes that highlight the worst representatives.

For compared to a proper Bayesian analysis, "reasoning" by a stereotype is overeager to apply shaky conditional probabilities. Consider a very hypothetical population with only two races that are each 50% of the population. 10% of the entire population has an undesirable characteristic (left unnamed). Of the people in the entire population who have the undesirable characteristic, 75% are of race A. Whenever someone in this population is of race A, what is the probability of having the undesirable characteristic?

When it comes to a stereotype for the undesirable characteristic, chances are the stereotype is of race A due to prevalence within that subgroup. So the stereotype would tend to override the right Bayesian answer, 7.5%, with a much greater probability that overlooks the facts that 1) only 10% of the entire population has the undesirable characteristic at all and 2) therefore 85% of race A does not have it.

Friday, October 29, 2010

peeve no. 261 is Obi and programming language mentality

The Force can have a strong influence on the weak-minded. (--Obi-Wan)

Similarly, a programming language can have a strong influence on weak-minded programmers. If your programming language "changed your life", "made you a better coder", "expanded your mind", etc., then I suggest that you may be going about this programming thing the wrong way. If a lack of language support for OO results in you producing a tangled web of dependencies among procedural functions, or easy access to global/file scope results in you spraying program state everywhere, then the programming language is your crutch. It's the guide-rail for your brain.

I'm glad that you found a programming language that you enjoy, but consider me unimpressed by rather impractical claims about the ways that it alters your consciousness.