A Mathematical Analysis Of The Iranian Elections

larryfreeman · on June 21, 2009

The article is very interesting in my view. It states that a sampling of the numbers do not follow expected random patterns.

But let's talk about the real issue. When you forbid independent monitors, announce results before the votes gets counted, and conclude absolute certainty in the results before there is a statistical analysis, it is hard to believe in a fair election.

WilliamLP · on June 21, 2009

> The article is very interesting in my view. It states that a sampling of the numbers do not follow expected random patterns.

If you try a whole bunch of fairly arbitrary metrics, e.g. how many end in 7, how many end in 5, how many have consecutive increasing digits, and you mine the data into finding two arbitrary ones that random chance would say occur 5% of the time, does that really tell you anything about the data? I find this article a stunning example of the dangers of mathematical ignorance.

smanek · on June 21, 2009

Go read up on Benford's law. (See http://www.journalofaccountancy.com/Issues/1999/May/nigrini.... for an example of how it was used)

I'm not familiar with the digit-distribution law the article mentioned, but on its face it doesn't seem entirely unreasonable. I'd prefer to see some study applying this same analysis to thousands of other elections, of course.

But, prima facie, using digit distributions to detect fraud is not at all unreasonable.

scscsc · on June 21, 2009

I don't get why you are modded down. Mathematically, the article is a piece of junk (which doesn't mean the elections were fair -- I don't think there's anyone in the democratic parts of Earth who thinks they were fair).

You are of course right, why choose exactly those two metrics? Also, the two metrics are probably not independent, so taking their product is bogus.

The article was written by a 'PhD candidate' in political sciences. I think he should get his degree. He learned how to convincingly misuse statistics.

jimfl · on June 21, 2009

Weird. The article is invoking Benford's Law without identifying it.

http://en.wikipedia.org/wiki/Benford%27s_law

Here's an article that concludes that the election results do not run afoul of Benford's Law:

http://www.jgc.org/blog/2009/06/benfords-law-and-iranian-ele...

jules · on June 21, 2009

The article is about the last digits of the numbers. Benfords law is about the first digits.

jimfl · on June 21, 2009

http://en.wikipedia.org/wiki/Benford%27s_law#Generalization_...

jules · on June 22, 2009

Right, that's why I said digits. The law pretty much says "uniform" for the last digits, like this article.

jimfl · on June 21, 2009

And an even more in-depth analysis [PDF]:

http://arxiv.org/pdf/0906.2789

itistoday · on June 21, 2009

Would anyone be so kind as to translate that into plain English?

WilliamLP · on June 21, 2009

The real question is: given any set of data, after the fact, what is the chance that a numerology detective can find some metric that will indicate suspicious activity? (Pretty much 100%)

I'm calling bullshit.

earl · on June 21, 2009

It's not unreasonable to expect that the last two digits of the voting counts should be distributed uniformly. Given that we also know that the wiring in human brains keeps us from properly sampling a uniform distribution by hand, the article employed a simple strategy of looking at the last two digits, assuming they were drawn from a uniform distribution, and asking if that were so, what is the probability that this would be our draw?

You gave no reason to be skeptical, and simply illustrated your lack of familiarity with basic tools of probability. Funny that you would below speak of a "stunning example [...] of mathematical ignorance." If you wanted to substantively contribute to the discussion, perhaps you should try explaining why this was an invalid test instead of just labeling the authors as "numerology detectives?"

bilbo0s · on June 21, 2009

To be fair. It should be noted that US and UK elections have far larger variances than the elections in Iran. I suspect it is because incumbent politicians are allowed to engage in 're-districting'. I would need more data to make a definitive statement on the 'why'. In my view though, when Republicans carve out new districts in Mississippi and Louisiana, and when Democrats carve out new districts in Chicago, it is a form of election tampering. A very effective form of election tampering. Only the evidence of having tampered with the elections via re-districting shows up via Benford's law. Again, that part is just a theory. The explanation could be as simple as tamperers stuffing ballot boxes in the US as well.

kvh · on June 21, 2009

I think his point is that if a hundred such political scientists used a hundred different methods of evaluating election results, one of them would be guaranteed to find evidence of fraud. the other 99 would not report there results, not having found anything exciting. im not convinced there ARE that many tests you could perform, but the general point is interesting...

itistoday · on June 21, 2009

Calling bullshit on your own post might be a smarter idea. Just because it's not intuitive to you doesn't mean it's not mathematically sound.

Your post amounts to "I don't understand statistics or what the hell this article is talking about, therefore I'm going to be suspicious of it."

WilliamLP · on June 21, 2009

It's not that difficult an idea, even if the trend of comment voting here indicates a disturbing ignorance of how statistics can be manipulated.

What would be mathematically sound would be to agree on a standard set of tests, perhaps including Benford's law, digit distribution, two digit combinations shown to be popular when humans pick numbers at random. And to do so _BEFORE THE DATA HAVE BEEN AVAILABLE_.

How would I trick people into believing a false conclusion? One way would be to find 30 or so measures of what could suggest suspicious patterns. Then I'd test the data with them, and perhaps there would be 2 which have a 5% chance of occurring purely by chance. Then I say "Aha! Look at this! The chance of these two things happening are .05 * 0.05 or practically impossible!"

Apparently I could fool some pretty smart people this way, since they obviously aren't immune to basic statistical fallacy.

As another poster said, none of this suggests that the data weren't manipulated, but just that the stats in the article, as given, are purely bullshit.

kvh · on June 21, 2009

For instance, the same authors, in a previous paper analyzing nigerian election results, state the following "lab experiments indicate that individuals tend to favor small numbers, even when subjects have incentives to properly randomize. Second, individuals underestimate the likelihood of digit repetition in sequences of random integers, so we should observe relatively fewer instances of repeated numbers in manipulated vote tallies." there is no mention of either of these statistics in the Iran analysis.

itistoday · on June 21, 2009

You raise an interesting point, but I'm not sure I buy this argument. What is the chance that your scenario is realistic? In other words, what is the chance that given a set of legitimate vote counts, and a set of 30 known legitimate authenticity tests, that 2 of those tests would report a value of 5%?

It could turn out that if the set of 30 tests are all "good tests", then the chance of your being able to find 2 tests that report low numbers like that approaches 0%, in which case you would simply be wrong.

I am not a statistician, and statistics is notoriously difficult for humans to deal with [1], so until I find an answer to the question above I'm going to maintain a neutral position in this argument. Unless you happen to have a doctorate in statistics, in which case I will take your word for it. ;-)

[1] TED talk: http://www.ted.com/talks/peter_donnelly_shows_how_stats_fool...

jibiki · on June 21, 2009

It's easier to compute the probability that less than 2 tests report a value of 5%. This is:

  (30 choose 0)*(.95^30) + (30 choose 1)*(.95^29)*(.05^1)

  = 0.553542075

So there's about a 50-50 chance (obviously I'm assuming that the tests are independent of each other.)

The relevant Wikipedia article:

http://en.wikipedia.org/wiki/Binomial_distribution

earl · on June 22, 2009

You are, frankly, speaking nonsense. As you pointed out, the tests will assuredly not be independent of each other.

Second, what you have calculated is... I don't know. You are calculating the cumulative distribution function for a binomial distribution with n=30, p=0.95. How on earth does that relate to the previous post? Are you somehow confusing a p-value -- which is a statement about the minimal alpha for a rejection region such that given a set of data X and a test, we would reject our null hypothesis H_0 -- and a probability? Because they have almost nothing to do with each other.

jibiki · on June 22, 2009

Ok, here's my null hypothesis: the election is not rigged. We are going to assume that this null hypothesis is true. Now, we run 30 tests on the data. What is the probability that at least two tests reject the null hypothesis with 95% confidence? That's the question I'm answering.

earl · on June 22, 2009

I would say no:

Because if the tests are independent, then something is well and truly broken! Your bernoulli CDF is for independent draws.

Further, consider a hypothesis test. You feed it a desired chance of type I error -- say 0.05 -- and it gives you rejection regions for your test statistic. That is, it picks some subset of the domain of the test statistic and labels that as "reject H0" and labels the complement as "accept H0." Claiming that these 30 acceptance / rejection regions will somehow be independent of each other isn't true -- since our tests are accurate, or at least reasonably so, the accept / reject regions will be the same, or nearly so.

Could it be possible to find tests with totally conflicting accept / reject regions? I'd say that's highly unlikely -- the regions would then be complements of each other. What is possible is that you will find paired tests with AR regions that are slightly different. But now you are asking the probability that your test statistic lands in conflicting areas of at least 2 pairs of these test regions. Again, this can be modeled -- try MCMC, perhaps -- but it will not be anything like bernoulli.

Further,

Does this make any sense?

jibiki · on June 22, 2009

> Claiming that these 30 acceptance / rejection regions will somehow be independent of each other isn't true -- since our tests are accurate, or at least reasonably so, the accept / reject regions will be the same, or nearly so.

Ok, here's an example. Test 1 tests the following hypothesis: the last two digits of polling data are uniformly distributed amongst the values 00, 01, ..., 99. Test 2 tests the following hypothesis: the first digit is distributed according to Benford's law. For large data (like vote totals) these tests are (nearly) independent, no?

itistoday · on June 22, 2009

Thanks earl for this post, I thought the calculation looked off... but I don't have the statistical background to say much. :-)

itistoday · on June 21, 2009

"less than 2"? Assuming you're doing the math correctly, it means that there's a 50-50 chance that "0 or 1" of the tests "report a value of 5%" (and does that mean less than or equal to 5%?). Which, again, doesn't say very much...

The question was: What is the chance of being able to find exactly 2 tests?

jibiki · on June 21, 2009

> The question was: What is the chance of being able to find exactly 2 tests

Sure thing. That's:

  (30 choose 2) * (.95^28) * (.05^2)

  = 0.258636738

I thought you meant "at least two tests", for obvious reasons. (If 4 tests showed vote rigging, the researchers would still report vote rigging...)

> (and does that mean less than or equal to 5%?)

Yes.

itistoday · on June 22, 2009

Thanks, looks like I'll have to brush up on my stats to analyze this further... But even without direct knowledge of how this works, shouldn't the "goodness" of the tests play a role in the numbers you're using? It doesn't seem to...

> I thought you meant "at least two tests", for obvious reasons.

Yes, sorry, I did, I guess you meant to say I should subtract 0.553542075 from 1 to get my answer.

(just saw earl's post above...)

scscsc · on June 21, 2009

Negative. It is intuitive (and wrong). It is not mathematically sound. He does understand statistics. The point he's making is that the author of the article chose the metrics after the act. There are tons of such metrics to use; if you are not honest, you can search for the ones that prove your point.

Dilpil · on June 21, 2009

Is there a link to a more technical version of this somewhere? That might settle alot of the debates going on (on this website that is).

earl · on June 21, 2009

You can go straight to their website, but here is an alternate analysis. My apologies for not having time to do both their analyses, I'll try when I get home, but I have to run.

Grab their data: Iran_2009.csv Then: (apparently I can't embed preformatted text. Sorry.) Try this: http://img.skitch.com/20090621-xrmw8yrxaec15jku414ffb2cjq.pn... -or- http://earlh.com/HN/iran.R , and http://earlh.com/HN/Iran_2009.csv

Here's a histogram: http://img.skitch.com/20090621-ewdbtr9q2cw5hyemn64fg3kbqd.pn...

The p-value I get is p-value = 0.07685

NB: there is something weird going on with their data: it doesn't sum properly. http://img.skitch.com/20090621-dx6nqws4u9rc5p3nghsnuy24j3.pn...

--------- So, the way this works is that I look at the last digit in each of the totals, then ask what is the probability that we should observe such a distribution of final digits if the final digit were distributed as discrete uniform U[0,9]. I use a chi2 goodness of fit test, then calculate the p-value with H0 = the final digits are distributed uniformly. We would reject H0 and say the election was rigged at an alpha level of 0.9, but not at 0.95. Take of this what you will, but I'd hope to have a higher significance level for elections. : shrug :

mattyb · on June 21, 2009

http://news.ycombinator.com/formatdoc

earl · on June 22, 2009

Oh thanks! I suck for missing that...

Code for above:

   d <- read.csv(file='~/stuff/earlh/Iran_2009.csv', header=T, sep=',')
   
   lastDigit <- function(v){
   	v - 10*floor(v/10)
   }
   
   digits <- lastDigit( c(d$Ahmadinejad, d$Karroubi, d$Mousavi, d$Rezaee))
   hist(digits, breaks=10)
   
   #chi2 gof
   tab <- table(digits)
   n <- length(digits)
   
   model <- chisq.test(x=tab, p=rep(0.1, 10))
   model
   
   # hand generated -- check our work above
   ts <- 0
   for(i in 1:length(tab)){
   	ts <- ts + ( tab[[i]] - 0.1*n)^2 / (0.1*n)
   }
   qchisq(p=1-0.076, df=9)

ars · on June 21, 2009

Again, people fall into the fallacy that statistical averages must be followed exactly. It is not so.

Edit: Very surprised at the -3 mod. Are people here really so bad at statistics that they think this?

Example: Just because the average rainfall in a spot is 10 inches, does NOT mean it has to be 10 inches every year. If you have 6 inches one year that is NOT an anomaly. That is perfectly normal. Sure it might be unlikely - but unlikely things still happen! In fact they must happen, just not as often at the more likely things.

So the analysis of the votes, sure it might not match the average, but that doesn't mean a whole lot. It could simply mean that you happened to get the 10% chance.

PS. Not that the vote fraud is in question. It's not, it's obviously fake. But not because of this analysis.

pmjordan · on June 21, 2009

A lot of scientists would be very pleased to have results with a 99.5% confidence. The fact that the results have a 0.5% chance of being real, combined with the fact that Ahmadinejad's votes have a significantly different distribution from the other candidates' makes this a pretty strong argument.

robryan · on June 21, 2009

He wasn't claiming though that it defiantly pointed to the election being rigged, just pointing out the statistic likelihood based on that method.

TweedHeads · on June 21, 2009

"It's not the people who vote that count. It's the people who count the votes." (Josef Stalin)

Either if the elections are rigged or not, 49% of the population can not be subdued to the will of the ruling 51%

We need to change electoral systems, specially "winner takes all" ideologies, the tyranny of the majority.

We need to learn to live together and share power together.

And to separate if we can not. Nothing should be imposed.

quizbiz · on June 21, 2009

I think calling electoral systems, a tyranny of the majority, is a gross exaggeration. A multiparty system, one with an empowered opposition where bills require 2/3s not merely 1/2, encourages this within an electoral system.

In Iran, trust in the system was lost. This is much about Iran's supreme leader as it is about the election for president.

bilbo0s · on June 21, 2009

This is the first sensible statement on political systems I have heard in a long time.

You are absolutely correct, the problem in Iran is not a lack of democracy, it is tyranny through democratic means. Just because you win an election, does that mean you should be able to do something that the other 49% of the people find intolerable? It depends on whether you believe in democracy as a guiding principle, or in democracy as the letter of the law. The belief in democracy as the letter of the law, is, in my opinion, democracy taken to the extreme. It is the definition proffered by those who would use democracy as a cover for wanting, winning, and wielding power over those who may, legitimately, disagree with them.

Democracy is a beautiful idea. Now, thanks to the efforts of politicians in democracies of all stripes, not just those in Iran, it has become the principle endorser of divisiveness in the world of ideas. It is being used to give violently divisive ideas an air of legitimacy that they do not deserve.

mixmax · on June 21, 2009

In Denmark we have a multi party system, and the government is a coalition of the parties that are represented in parlaiment. Often the government is a minority government and relies upon one or more of the other parties to get laws passed. Sometimes coalitions with similar interests appear, and as long as they have a majority they can pass laws.

This system has the advantage that minority views are not only heard but also have votes. And since politics is a game of agreements, coalitions and I'll scratch your back if you scratch mine even small parties have a chance of getting legislation passed if they are willing to compromise on something else. It equates to more consensus seeking legislation, and an often vocal minority that actually has political influence.