The article is very interesting in my view. It states that a sampling of the numbers do not follow expected random patterns.
But let's talk about the real issue. When you forbid independent monitors, announce results before the votes gets counted, and conclude absolute certainty in the results before there is a statistical analysis, it is hard to believe in a fair election.
> The article is very interesting in my view. It states that a sampling of the numbers do not follow expected random patterns.
If you try a whole bunch of fairly arbitrary metrics, e.g. how many end in 7, how many end in 5, how many have consecutive increasing digits, and you mine the data into finding two arbitrary ones that random chance would say occur 5% of the time, does that really tell you anything about the data? I find this article a stunning example of the dangers of mathematical ignorance.
I'm not familiar with the digit-distribution law the article mentioned, but on its face it doesn't seem entirely unreasonable. I'd prefer to see some study applying this same analysis to thousands of other elections, of course.
But, prima facie, using digit distributions to detect fraud is not at all unreasonable.
I don't get why you are modded down. Mathematically, the article is a piece of junk (which doesn't mean the elections were fair -- I don't think there's anyone in the democratic parts of Earth who thinks they were fair).
You are of course right, why choose exactly those two metrics? Also, the two metrics are probably not independent, so taking their product is bogus.
The article was written by a 'PhD candidate' in political sciences. I think he should get his degree. He learned how to convincingly misuse statistics.
The real question is: given any set of data, after the fact, what is the chance that a numerology detective can find some metric that will indicate suspicious activity? (Pretty much 100%)
It's not unreasonable to expect that the last two digits of the voting counts should be distributed uniformly. Given that we also know that the wiring in human brains keeps us from properly sampling a uniform distribution by hand, the article employed a simple strategy of looking at the last two digits, assuming they were drawn from a uniform distribution, and asking if that were so, what is the probability that this would be our draw?
You gave no reason to be skeptical, and simply illustrated your lack of familiarity with basic tools of probability. Funny that you would below speak of a "stunning example [...] of mathematical ignorance." If you wanted to substantively contribute to the discussion, perhaps you should try explaining why this was an invalid test instead of just labeling the authors as "numerology detectives?"
To be fair. It should be noted that US and UK elections have far larger variances than the elections in Iran. I suspect it is because incumbent politicians are allowed to engage in 're-districting'. I would need more data to make a definitive statement on the 'why'. In my view though, when Republicans carve out new districts in Mississippi and Louisiana, and when Democrats carve out new districts in Chicago, it is a form of election tampering. A very effective form of election tampering. Only the evidence of having tampered with the elections via re-districting shows up via Benford's law. Again, that part is just a theory. The explanation could be as simple as tamperers stuffing ballot boxes in the US as well.
I think his point is that if a hundred such political scientists used a hundred different methods of evaluating election results, one of them would be guaranteed to find evidence of fraud. the other 99 would not report there results, not having found anything exciting. im not convinced there ARE that many tests you could perform, but the general point is interesting...
It's not that difficult an idea, even if the trend of comment voting here indicates a disturbing ignorance of how statistics can be manipulated.
What would be mathematically sound would be to agree on a standard set of tests, perhaps including Benford's law, digit distribution, two digit combinations shown to be popular when humans pick numbers at random. And to do so _BEFORE THE DATA HAVE BEEN AVAILABLE_.
How would I trick people into believing a false conclusion? One way would be to find 30 or so measures of what could suggest suspicious patterns. Then I'd test the data with them, and perhaps there would be 2 which have a 5% chance of occurring purely by chance. Then I say "Aha! Look at this! The chance of these two things happening are .05 * 0.05 or practically impossible!"
Apparently I could fool some pretty smart people this way, since they obviously aren't immune to basic statistical fallacy.
As another poster said, none of this suggests that the data weren't manipulated, but just that the stats in the article, as given, are purely bullshit.
For instance, the same authors, in a previous paper analyzing nigerian election results, state the following "lab experiments indicate that individuals tend
to favor small numbers, even when subjects have incentives to properly randomize. Second, individuals underestimate the likelihood of digit repetition in sequences of random integers, so we should observe relatively fewer instances of repeated numbers in manipulated vote tallies."
there is no mention of either of these statistics in the Iran analysis.
You raise an interesting point, but I'm not sure I buy this argument. What is the chance that your scenario is realistic? In other words, what is the chance that given a set of legitimate vote counts, and a set of 30 known legitimate authenticity tests, that 2 of those tests would report a value of 5%?
It could turn out that if the set of 30 tests are all "good tests", then the chance of your being able to find 2 tests that report low numbers like that approaches 0%, in which case you would simply be wrong.
I am not a statistician, and statistics is notoriously difficult for humans to deal with [1], so until I find an answer to the question above I'm going to maintain a neutral position in this argument. Unless you happen to have a doctorate in statistics, in which case I will take your word for it. ;-)
You are, frankly, speaking nonsense. As you pointed out, the tests will assuredly not be independent of each other.
Second, what you have calculated is... I don't know. You are calculating the cumulative distribution function for a binomial distribution with n=30, p=0.95. How on earth does that relate to the previous post? Are you somehow confusing a p-value -- which is a statement about the minimal alpha for a rejection region such that given a set of data X and a test, we would reject our null hypothesis H_0 -- and a probability? Because they have almost nothing to do with each other.
Ok, here's my null hypothesis: the election is not rigged. We are going to assume that this null hypothesis is true. Now, we run 30 tests on the data. What is the probability that at least two tests reject the null hypothesis with 95% confidence? That's the question I'm answering.
Because if the tests are independent, then something is well and truly broken! Your bernoulli CDF is for independent draws.
Further, consider a hypothesis test. You feed it a desired chance of type I error -- say 0.05 -- and it gives you rejection regions for your test statistic. That is, it picks some subset of the domain of the test statistic and labels that as "reject H0" and labels the complement as "accept H0." Claiming that these 30 acceptance / rejection regions will somehow be independent of each other isn't true -- since our tests are accurate, or at least reasonably so, the accept / reject regions will be the same, or nearly so.
Could it be possible to find tests with totally conflicting accept / reject regions? I'd say that's highly unlikely -- the regions would then be complements of each other. What is possible is that you will find paired tests with AR regions that are slightly different. But now you are asking the probability that your test statistic lands in conflicting areas of at least 2 pairs of these test regions. Again, this can be modeled -- try MCMC, perhaps -- but it will not be anything like bernoulli.
> Claiming that these 30 acceptance / rejection regions will somehow be independent of each other isn't true -- since our tests are accurate, or at least reasonably so, the accept / reject regions will be the same, or nearly so.
Ok, here's an example. Test 1 tests the following hypothesis: the last two digits of polling data are uniformly distributed amongst the values 00, 01, ..., 99. Test 2 tests the following hypothesis: the first digit is distributed according to Benford's law. For large data (like vote totals) these tests are (nearly) independent, no?
"less than 2"? Assuming you're doing the math correctly, it means that there's a 50-50 chance that "0 or 1" of the tests "report a value of 5%" (and does that mean less than or equal to 5%?). Which, again, doesn't say very much...
The question was: What is the chance of being able to find exactly 2 tests?
Thanks, looks like I'll have to brush up on my stats to analyze this further... But even without direct knowledge of how this works, shouldn't the "goodness" of the tests play a role in the numbers you're using? It doesn't seem to...
> I thought you meant "at least two tests", for obvious reasons.
Yes, sorry, I did, I guess you meant to say I should subtract 0.553542075 from 1 to get my answer.
Negative. It is intuitive (and wrong). It is not mathematically sound. He does understand statistics. The point he's making is that the author of the article chose the metrics after the act. There are tons of such metrics to use; if you are not honest, you can search for the ones that prove your point.
You can go straight to their website, but here is an alternate analysis. My apologies for not having time to do both their analyses, I'll try when I get home, but I have to run.
---------
So, the way this works is that I look at the last digit in each of the totals, then ask what is the probability that we should observe such a distribution of final digits if the final digit were distributed as discrete uniform U[0,9]. I use a chi2 goodness of fit test, then calculate the p-value with H0 = the final digits are distributed uniformly. We would reject H0 and say the election was rigged at an alpha level of 0.9, but not at 0.95. Take of this what you will, but I'd hope to have a higher significance level for elections. : shrug :
Again, people fall into the fallacy that statistical averages must be followed exactly. It is not so.
Edit: Very surprised at the -3 mod. Are people here really so bad at statistics that they think this?
Example: Just because the average rainfall in a spot is 10 inches, does NOT mean it has to be 10 inches every year. If you have 6 inches one year that is NOT an anomaly. That is perfectly normal. Sure it might be unlikely - but unlikely things still happen! In fact they must happen, just not as often at the more likely things.
So the analysis of the votes, sure it might not match the average, but that doesn't mean a whole lot. It could simply mean that you happened to get the 10% chance.
PS. Not that the vote fraud is in question. It's not, it's obviously fake. But not because of this analysis.
A lot of scientists would be very pleased to have results with a 99.5% confidence. The fact that the results have a 0.5% chance of being real, combined with the fact that Ahmadinejad's votes have a significantly different distribution from the other candidates' makes this a pretty strong argument.
I think calling electoral systems, a tyranny of the majority, is a gross exaggeration. A multiparty system, one with an empowered opposition where bills require 2/3s not merely 1/2, encourages this within an electoral system.
In Iran, trust in the system was lost. This is much about Iran's supreme leader as it is about the election for president.
This is the first sensible statement on political systems I have heard in a long time.
You are absolutely correct, the problem in Iran is not a lack of democracy, it is tyranny through democratic means. Just because you win an election, does that mean you should be able to do something that the other 49% of the people find intolerable? It depends on whether you believe in democracy as a guiding principle, or in democracy as the letter of the law. The belief in democracy as the letter of the law, is, in my opinion, democracy taken to the extreme. It is the definition proffered by those who would use democracy as a cover for wanting, winning, and wielding power over those who may, legitimately, disagree with them.
Democracy is a beautiful idea. Now, thanks to the efforts of politicians in democracies of all stripes, not just those in Iran, it has become the principle endorser of divisiveness in the world of ideas. It is being used to give violently divisive ideas an air of legitimacy that they do not deserve.
In Denmark we have a multi party system, and the government is a coalition of the parties that are represented in parlaiment. Often the government is a minority government and relies upon one or more of the other parties to get laws passed. Sometimes coalitions with similar interests appear, and as long as they have a majority they can pass laws.
This system has the advantage that minority views are not only heard but also have votes. And since politics is a game of agreements, coalitions and I'll scratch your back if you scratch mine even small parties have a chance of getting legislation passed if they are willing to compromise on something else. It equates to more consensus seeking legislation, and an often vocal minority that actually has political influence.
But let's talk about the real issue. When you forbid independent monitors, announce results before the votes gets counted, and conclude absolute certainty in the results before there is a statistical analysis, it is hard to believe in a fair election.