Russian Election Interference
Russian Election Interference : Voting Precinct-Level Analysis of the 2008 Russian Presidential Election : In this paper we consider a number of statistical methods for detecting fraud in an election we were unable to be present for and may know little about such that we may study it long after it occurs. Incidents of electoral fraud can take many forms. The forms we are capable of detecting, and thus interested in measuring, are those incidents of fraud in which final vote tallies are not indicative of the number of valid ballots submitted.
Some examples of this kind of fraudulent activity are ballot box stuffing or falsification of poll returns. Examples of external influence on an election that would not be detectable by our statistical methods include: gerrymandering, disenfranchisement, vote buying, and campaign promises.
We choose to focus on the recent 2008 Russian Presidential Election due to the interesting political situation it presents as a previous two-term president selects his successor and proceeds to stay in a position of power as Prime Minister. Given the widespread culture of corruption in Russia, and accusations that Putin has been particularly prone to fraud, we strongly expect that there will be fraud in this dataset for our methods to detect. We have also been provided with rayon-level data regarding Russian national elections between 1995 and 2004. As this data does not have the same resolution as that for the 2008 election nor does it contain any labels uniquely identifying the rayons, we are unable to compare many of our methods or data across datasets.
The significance of electoral fraud
The intent of electoral fraud is to substitute the public opinion with one’s own and have it carry the force of law, which is of course directly contrary to the intent of an electoral system. The purpose of instituting a voting process is to have a defined means by which a populace can inform the government of the preferred course of action. These voting procedures are quite varied, even within a given country, and basing a system of fraud detection upon an understanding of the voting system would inevitably be quite complex and likely require very close experience with its operations. Hence, we desire approaches that are agnostic to
the actual voting procedures and depend solely upon the final tallies. However, an understanding of the political climate of a country can help explain why the fraud is widespread or localized or why it is blatant or subtle in a given region. Fraud is committed when it is believed that the voting process will lead to an adverse result and identifying where it was committed can help indicate the influence of the party behind the fraud.
A young democracy, such as a former member of the Soviet Bloc, will likely have very distinct perceptions of fraud in comparison to established democracies like the United States and Great Britain. This is due to a variety of reasons. The country may have had periodic votes whose results were completely fraudulent in order to bolster the perceived power of the government. Elections may have been performed with a clear message of a “correct” way to vote or the votes may have even been identifiable so as to locate dissidents. In the transition to democratic voting, it is possible that the distinction between the old and new systems of voting was not made apparent, or the claim of more “honest” elections was distrusted due to historical precedent. In such a country, it is likely that an individual would have very low expectations that their vote would actually be counted and accusations of fraud would not come as surprising. At the same time, the officials in charge of the election in the new democracy may very well be the same as those who ran the fraudulent elections previously and, if committing fraud, would likely resort to tactics that were acceptable in the old regime, such as forging poll returns. These officials may also be ignorant of pre election polling techniques that can be very accurate in detecting fraud once given sufficient elections for calibration; however, such techniques require substantial resources to perform and cannot be done for elections that have already occurred. Additionally, newer democracies may experience substantial logistical issues, intentional or accidental, that prevent large numbers of people from being able to vote. As this country went through a recent change of government, it is possible that evidence of fraud would be grounds for a revolution; the distinction is largely due to the relative strength of the government when the fraud is committed.
An established democracy, such as the United States, is more likely to have the expectation that an individual’s vote will be counted ingrained into the public consciousness, yet will be tolerant to other forms of voter manipulation. While a young democracy may have the expectation of fraud in its elections, the concept of a falsified election becomes progressively more repugnant as a democracy becomes more established. While this does not protect against fraud, it does provide a political incentive to those committing fraud to use more subtle methods than falsifying returns or stuffing ballot boxes. Such a country will likely have anonymity guarantees, in order to prevent the possibility of retaliation for voting in a particular manner. It will also have established political parties, which may collude to hide evidence of fraud both sides are committing. An established democracy is also less likely to experience serious logistical issues regarding the election due to decades of experience running polls. Any fraud in an established democracy is likely to be sophisticated and highly targeted for the legal and political situation of that country; we therefore do not expect our statistical methods to be of much use in detecting fraud in these countries.
The ability to identify a fraudulent election has far-reaching political effects. Not only would one be able to identify governments that are misrepresenting themselves to their nation and the international community but the mere knowledge that the capability to detect fraud exists would serve as a deterrent to some degree. Approaches such as ballot stuffing would yield to more subtle approaches; while a politician buying votes by paving the roads in front of people’s homes with public money may not be more palatable, such manipulation does not strip the voter of the actual ability to choose.
In the following sections, we will be pursuing several methods in an attempt to identify fraudulent data in the poll returns from the 2008 Russian Presidential Election. Those methods are Turnout Histogram Analysis, Linear Regression Analysis, and Digit Analysis.
In this section we analyze the distribution of turnout in each oblast. We can detect several types of fraud via this analysis because for a homogeneous distribution of people, we should expect a Gaussian distribution of turnout. This Gaussian can be centered at different locations depending upon the characteristics of the oblast, but we take strictly non-Gaussian turnout to be an indication of either inhomogeneity or outright fraud. We expect more populous and more diverse oblasts to have wider Gaussians and more homogeneous and smaller oblasts to have narrow Gaussians.
We will be specifically looking at ratios of valid ballots to registered voters for a precinct in each oblast, with each data point being a particular precinct. We subdivide the interval zero to one into one hundred bins, and plot the number of precincts in each bin, creating a histogram for each oblast.
We then divide the histograms into two groups: those that seem “legitimate” and those that seem “suspicious”. We categorize a histogram into the suspicious group if its distribution falls into one of the four “categories”: Egregiously Low, Ascending, Double Peak, Egregiously High and Narrow, High Tail/Bump.
We categorize a histogram into the Egregiously Low category if there is a sharp peak at a turnout percentage near zero, or if the center of mass of the distribution is unusually close to zero. The type of fraud this could indicate is outright tossing out of ballots. since the data we are looking at is valid ballots divided by number of registered voters. Low statistics could mimic this distribution just by chance, so we must be careful to check the population size. Since the voting so heavily favors Medvedev in all oblasts, we don’t expect to see any histograms in this category, since tossing out Medvedev’s challenger’s votes wouldn’t make a statistically significant difference to shift the mass in many cases, and tossing out Medvedev’s votes would be too obviously fraudulent and too easily caught for the pollsters to attempt.
We categorize a histogram into the Ascending category if the number of precincts reporting a turnout within a particular bin seems to steadily increase with turnout. The type of fraud that this would possibly suggest is a distributed effort across many different precincts to add in extra ballots for voters who actually didn’t show up to vote. This type of distribution could also be mimicked by non-homogeneous political activism if there is a “blurry” transition from the number of people living in rural areas to the number of people living in urban areas, since rural populations in a young democracy might have lower turnout ratios. By it self this is not a definitive and foolproof method for distinguishing between non-homogeneous political activism and ballot fraud, but in the context of other factors, this can also provide extra justifying evidence for fraud.
We categorize a histogram into the Double Peak category if the distribution seems to look like a superposition of two Gaussian peaks. The type of fraud that this could indicate is coordinated ballot stuffing in a select area. It would show up as a double peak because the natural average would show up as a Gaussian with a certain width, and would have a sharp ”stuffing” fraud peak. It is possible for these two peaks to be completely distinct, in which case this distribution could also be mimicked by sharp population disparity (such as a large rural and a large urban population), or it is possible for one very wide peak on top of which a much sharper superimposed peak is found, which might indicate either a coordinated (but somewhat shotgunned in geographic area) tossing out of large numbers of ballots in areas where there could have been significant turn out for Medvedev’s opposition and high turnout in general, or ballot stuffing in low turnout areas . This would non-uniformly down-shift part of a Gaussian so that it’d be superimposed with part of itself, creating an unnatural “peak within a peak”. By itself this category doesn’t definitively determine fraud, but the peak within a peak more strongly suggests fraud than the two separate peaks do, especially if either are observed in what is supposed to be a homogeneous population.
We categorize a histogram into the Egregiously High category if there is a sharp peak at near one hundred percent turnout. We have a subcategory for peaks which are exceptionally narrow, which would indicate pure unadulterated ballot fiction, i.e. simply falsifying the numerical results of the votes. Such fraud would almost certainly have to be coordinated if it is distributed, or conducted at the oblast level if it is falsified. We can differentiate between the two based on numerical last digit analyses of the oblast. (We might suspect that repeated digits would be more strongly avoided if they were done by one person rather than if they were done by many people in a distributed fashion). This is by far the strongest indicator of fraud and even on its own would suggest massive fraud. It would only be further strengthened by a non-homogeneous population or a large population, but for small populations and for extremely homogeneous ones, we must be more careful in our analyses.
We categorize a histogram into the High Tail/Bump category if there is a “tail” of turnout that’s larger than background near one hundred percent turnout levels, or if there is a bump which has a peak near one hundred percent turnout levels such that the bump is larger than the statistical fluctuation in the data at similar turnout amplitudes. The type of fraud that this would suggest is local, possibly coordinated, ballot stuffing where at the precinct level the pollsters might just make up numbers near one hundred percent for a particular candidate. It would be most suggestive if the opposing candidate had a strong showing in that oblast suggesting that this might be a last minute attempt to steal the oblast for a particular candidate. A case where this strongly suggests fraud is if the bump is a significant fraction of the size of the peak of the actual data. Once again we must watch out for low statistics in this analysis, but also pockets of super-homogeneity: for example a military base within the oblast might cause this type of histogram, so the histogram alone won’t absolutely differentiate between super-homogeneous pockets and fraud, but this does serve as another useful indicator, which when coupled with more information can help one come to a conclusion about fraud in the oblast.
We realize that there are some overlaps between the categories, and that there is some subjectivity in these categorizations, but each histogram’s classification is chosen by which category seems to dominate, and in the in-depth analysis we consider the overlaps in categories.
The following is a list of Oblasts found to be “legitimate” under the previously defined histogram categorization scheme, where legitimate means not obviously falling in one of the “Suspicious” categories:
|Altai Kalmykia Oblast Krasnoyarsk Krai Nenets AO Rostov Oblast Sverdlovsk Oblast||Altai Krai Khakassia Kurgan Oblast Novgorod Oblast Ryazan Oblast Tomsk Oblast Vologda Oblast||Ivanovo Oblast Komi Kursk Oblast Orenburg Oblast Sakha-Yakutiya Udmurtia Yaroslavl Oblast||Kaliningrad Oblast Kostroma Oblast Moscow Oblast Primorsky krai Smolensk Oblast Vladimir Oblast|
A histogram that typifies the type of histograms present in the “Legitimate” group is that of Ryazan Oblast’s
As we can see we have a peak at a “reasonable” turnout percentage, where reasonable is a qualifier in comparison to the “egregious” Oblasts, because in truth the turnout levels in general suggests that Russians absolutely love democracy, their right to vote, and Medvedev, which seems slightly unrealistic. While we have a tail, its fluctuations are on the order of or smaller than the fluctuations throughout the distribution, which suggests that they are just outlier demographic pockets, and since we have a relatively large population in the moderately sized (ranked 44 in terms of population) Ryazan Oblast, it is not unusual for there to be some diversity. depending on the rural and urban distributions of populations. There is no clear double peak, the distribution is clearly not ascending and the average is not egregiously high or narrow. The relative unremarkableness of this type of histogram is for the most part what classifies it into the “legitimate” category, along with the other 26 Oblasts. We show a few examples here of the similarity of the histograms:
One might argue that Kalmykia and Nenets AO seem out of place, but their low statistics unfortunately cannot let us conclude anything about their distributions. One might also argue that Yaroslavl Oblast belongs in High Tail, or Bump, but the size of the Tail and Bump near one hundred percent is about the size of the natural fluctuations in the distribution, though it does appear to have a tail.
No images were categorized into this category as expected due to the heavily Medvedev favored vote dwarfing challenger toss-outs as simple statistical fluctuations and the relative obviousness and danger or tossing out Medvedev votes, as he was supported by Putin.
The following is a list of Oblasts found to be “Ascending”:
|Amur Oblast Buryatia Krasnodar Krai Omsk Oblast Tuva Yamalo-Nenets AO||Belgorod Oblast Kirov Oblast Lipetsk Oblast Saratov Oblast Tambov Oblast|
The typifying example of the Ascending histogram is the one for Saratov Oblast.
Its steady linear ascension is clearly visible from forty percent of the vote all the way to one hundred percent (up to statistical fluctuations). Omsk Oblast also is also categorized in the Ascending category, and while it is arguable that it could be classified as a “Double Peak” since the apparent peaks are in fact larger and wider than the fluctuations, the steady ascending growth dominates as the trend. Tuva is unfortunately plagued by low statistics, as is Yamalo-Nenets AO to some extent, but the relative compactness of Yamalo-Nenets AO seems to suggest that it’s not due merely to statistical fluctuation.
The outstanding and most suspicious candidates are Belgorod, Buryatia, Krasnodar, Lipetsk, and Yamalo-Nenets.
Ethnic Russians make up nearly 87 percent of Saratov Oblast’s population, and every other ethnic group each comprises less than 3 percent which indicates a slightly diverse population. Saratov’s histogram’s slope is quite pronounced and uniform. Such a homogeneous population would hardly indicate such a vast spread of turnout ratios in the first place.
There doesn’t appear to be a smooth transition between populous urban and rural areas, as according to the 2002 census there are only 3 major cities, Saratov, Balakovo, and Engels, with populations of 870,000, 200,000 and 193,000 as of the 2002 census, and most other cities have approximately 35,000 people each, but the precincts within them could be far more diversely distributed, allowing for the spread. By itself Saratov’s ascension does not prove fraud for Medvedev, but it does suggest that it’s possible since there is a large distributed base of ethnic Russians.
Kirov’s histogram, populations, and fraud possibilities are similar to Saratov. Amur Oblast similarly has a 2 to 1 urban to rural population ratio which would allow for the type of ascending turnout distribution. Omsk has similar demographics to Saratov, but has a much more worrisome histogram, which pushes towards a bolder possibility of fraud.
Belgorod Oblast has two major cities, Belgorod City and Stary Oskol, which together comprise a third of the total oblast population, the rest of which is rural which could explain the two-bump structure present within the ascension, but the possibility of fraud still asserts itself based on the incline in the second half of the distribution and also based on the strong double peak, which wasn’t present in the other ascension histograms. The Oblast has 93 percent ethnic Russians and 4 percent Armenians which would not account for such a strong double peak, but the 66 percent urban population vs rural might explain the double bump as the relative area of under the first bump does seem to conform to approximately a third of the total area.
Buryatia has a similar demographic distribution to Yamalo-Nenets (roughly 60 percent Eth nic Russian and 30 percent Buryat) which wouldn’t suggest the type of ascending distribution, nor the (perhaps not statistically significant) peak near 100 percent turnout. Accordingly we interpret it as evidence of fraud.
Krasnodar Krai is far less diverse than Buryatia or Yamalo, with 86 percent Ethnic Russians 5 percent Armenians, and 2.5 percent Ukrainians, but with half its population rural and half urban, one might not expect such a narrow ascension, especially in such a large oblast, which suggests possible fraud.
Lipetsk has a similar urban vs. rural population distribution to Belgorod Oblast, a weaker ascension, but has a rather significant peak within a peak, which could suggest fraud.
Yamalo-Nenets is a rather diverse area with 58.8 percent Russians, 13.3 percent Ukraini ans, and 5.4 percent Tatars, and 5.2 percent Nenets, according to the 2002 census, which would hardly justify such uniform turnout. Accordingly we interpret this as a pointer to electoral fraud.
The following is a list of Oblasts found to be “Double Peaked”:
|Adygea Jewish Ao Karelia Novosibirsk Oblast Pskov Oblast||Bryansk Oblast Kaluga Oblast Magadan Oblast Oryol Oblast Samara Oblast Voronezh Oblast||Chuvashia Kamchatka Krai Nizhny Novgorod Oblast Penza Oblast Ulyanovsk Oblast|
Of these Adygea, Jewish AO, Kamchatka Krai, and Magadan Oblast, all have statistics too low to be able to make any reasonable or definitive claim. Kamchatka Krai:
In this category there are two typifying examples of how histograms could possibly look. One example type of histogram of this category would be given by Novosibirsk’s.
Two clearly separated peaks of approximately similar widths are evident. Most of the examples are of this type.
The other type of histogram in this category is exemplified by Nizhny Novgorod’s histogram:
We see two clear peaks of very different widths, one overlaid on top of another. The histograms which most indicate fraud are those of Karelia, Novosibirsk, Oryol, Penza. In all of these histograms the ratio of areas under each of the peaks do not match the ratios for disparities in demographics.
In the case of Karelia’s histogram, we see a sharp dip around 60 percent turnout and a sudden spike at around 90 percent turnout. The sizes of the dip and spike are such that if one were to shift the spike to the left it would form a much smoother gaussian. The peak is too small to be representative of the 75-25 split in either urban rural population or ethnic majority-minority split, which strongly indicates fraud.
In the case of Novosibirsk there is a 75-25 urban-rural split, whereas the ethnic majority minority split is virtually nonexistent, with 93 percent of the population ethnically russians, and the rest minority. The histogram clearly shows two large peaks of approximately comparable area. Since they aren’t explained by the demographics, this somewhat pointedly suggests fraud.
Oryol’s histogram suggests possible fraud, but not as clearly as Novosibirsk, due to lack of ethnic information. We see that the peak sizes don’t match up to the urban-rural ratio, and as Oryol Oblast was created from what was originally Kursk, West, and Voronezh Oblast, of which Kursk has a very uniform, low diversity population, this evidence is not complete enough to claim fraud.
Penza’s histogram also suggests possible fraud, as the small peak’s size is not even close to the 35 percent of area it would have to take up in order to represent the 65-35 urban rural ratio, but as we do not have demographic information, the evidence is not complete enough to claim fraud.
The other Oblasts are not so clear: for example, in Chuvashia there is a 60-40 urban rural split, and a 74 percent ethnic Chuvash, Tatar, and Mordovin versus 26 percent Ethnic Russian split, yet the peak on the left, which would presumably represent Ethnic Russian turnout is a bit too small, but it’s fair to assume that since this was originally the Chuvash Republic, the Russians would be mostly in the city, since typically one doesn’t emigrate to rural areas, which would be dominated by the Ethnic Chuvash. So this is somewhat split in between fraudulent and legitimate, since the small peak has more of a 1 out of 5 ratio than a 1 out of 4 ratio, but this could be accounted for through the urban rural split.
Similarly, in the cases of Bryansk, Nizhny Novgorod, Pskov, Samara, Ulyanovsk, Voronezh, the population ratios and urban ratios are either close enough to matching up to the apparent turnout distribution, or have countervailing effects in the ethnic vs urban distributions, leaving us unable to claim election fraud from these histograms and ethnic data alone.
Egregiously High and Narrow
The following are a list of the histograms classified under the “Egregiously High and Narrow Category”.
|Bashkortostan Chukotka AO Ingushetia Karachay-Cherkessia Overseas Territory Tyumen Oblast||Chechnya Dagestan Kabardino-Balkar Mordovia Tatarstan|
Of these Chukotka AO has statistics that are too low to make any claim based on the histogram. All the rest have unnaturally sharp peaks at close to 100 percent turnouts which cannot be explained based on the Ethnic and Urban distribution.
Consider the above histogram of Dagestan. Dagestan has nearly a 50-50 split between the urban and rural population and has huge population diversity, with Avars making up 30 percent, Dargins making up 17 percent, Kumyks making up 14 percent, Lezgyns making up 13.1 percent, Russians making up 7 percent, Laks making up 5 percent, Tabasarans making up 4 percent, and Azeris making up 4 percent. These figures cannot possibly account for the astronomically high peak at 93 percent turnout. This pattern is similar in the cases of all the others in this category with the exception of possibly Russia’s overseas territories, which could include military bases, which would explain abnormally high turnout.
This category by far most strongly suggests fraud, and can even be taken on its own to be strong evidence of fraud, considering that in cases like Bashkortostan there are nearly 600+ precincts within 1 bin. These are by far the strongest candidates for fraud generated by this particular histogram analysis.
The following are a list of the histograms classified under the “High Tail or High Bump” Category.
|Arkhangelsk Oblast Irkutsk Oblast Khanty-mansi AO Murmansk Oblast Saint Petersburg Tula Oblast||Astrakhan Oblast Kemerovo Oblast Leningrad Oblast North Ossetia-Alania Sakhalin Oblast Tver Oblast Zabaykalsky Krai||Chelyabinsk Oblast Khabarovsk Krai Mari El Perm Krai Stavropol Krai Volgograd Oblast|
Within these, there aren’t any that have statistics too low to analyze. There appear to be three levels of significance in these histograms: In the cases of Irkutsk Oblast, Kemerovo Oblast, Mari El, Murmansk Oblast, Saint Petersburg, Volgograd Oblast, and Zabaykalsky Krai, the peaks and tails near 100 percent turnout are too unbelievably large to be realistic. Consider Volgograd Oblast,
which has nearly a uniform distribution throughout the interval of 50 percent to 100 percent turnout, or consider Zabaykalsky Krai, which has a large and significant sudden bump from 95 percent to 100 percent, which is wider than statistical fluctuations and is higher than the average statistical fluctuation.
And consider Murmansk Oblast which, after its main peak, at around 70 percent turnout, proceeds to have a steady ascension with increasingly large fluctuations up till 100 percent.
All these most strongly suggest election fraud within this category.
Now consider, Arkhangelsk Oblast, Chelyabinsk Oblast, Khabarovsk Krai, Khanty-mansi AO, Leningrad Oblast, Perm Krai, Stavropol Krai, and Tula Oblast, their peaks/tails near 100 percent aren’t unimaginably large, but are somewhat suspicious, especially a uniform tail such as Chelyabinsk Oblast or Leningrad Oblasts’.:
The somewhat lower statistics of Khanty Mansi’s histogram is the reason we put it in the second category instead of the first, and the high variance of Astrakhan Oblasts is why we don’t consider the peak at 100 percent to be significant, since there’s comparably equal fluctuation elsewhere.
The rest of the oblasts do have peaks or tails near 100 percent, but are not significant enough compared to the statistical fluctuations to fairly claim evidence of fraud.
Review of Histogram Analysis
Lastly we remind ourselves that none of these histogram breakdowns or analyses methods are foolproof, but that they are only there to give one piece of the puzzle and that only with the appropriate circumstantial details and agglomeration of consistent evidence can we conclude there to be fraud from evidence suggesting fraud. Though the Egregiously High and Narrow Category is an exceptionally strong indicator of fraud, it too possibly can be fooled by military bases overseas, so we must keep all this in mind before we draw any premature conclusions based on these histograms alone.
Linear Regression Analysis
We seek to consider metrics that are agnostic to the specifics of the voting method and are sensitive to manipulations on the output. One such output is the turnout percentage for a given region; this is simply the percentage of potential voters that voted in a given election. We can predict some simple trends in this metric; we expect more voters to show up to important elections than unimportant ones. In the United States, elections are held every two years for Representatives, every four years for President, and every six years for Senators. Given the relative importance of the Presidential election, we could correctly guess that there would be a higher turnout in those years that had an election for President rather than just one for Representatives.
If we consider a region with many voting precincts, we can consider whether there should be 16 a correlation between the turnout in a given precinct and a particular candidate’s percent of the vote in that precinct. If the region is rather homogeneous, we expect a slope of approx imately zero. Let us suppose that the region is populated by members of two parties: one with enthusiastic members, and the other with apathetic members. If party membership is non-uniformly distributed, we expect the candidate of the enthusiastic party to do extremely well in areas with more of his party’s members, as well as pull a very high turnout. However, in areas populated by the other party, he will do poorly and turnout will be much lower. Thus, we have developed a strongly non-zero correlation between turnout and performance.
In this model we can predict that there will be regions that frequently oscillate between parties due to fluctuations in mean turnout and with no change to the population itself. This model may be appropriate for shorter timescales, while we must account for population drift on longer timescales (∼10 years). Further, we can put bounds on the behavior of the correlation. In higher turnout precincts, we never expect turnout for a party to decrease relative to lower turnout precincts. Such behavior would give the opposing party more than one voter per additional vote cast; thus, we find linear regressions that yield slopes with magnitude greater than 1 extremely suspicious. This assumes that voters are not discouraged from voting due to a system similar to the electoral college due to living in a region strongly aligned to the opposing party; we assume this effect is negligible if voting is aggregated at a much higher level than the voting precinct.
The data from the 2008 Presidential Election in Russia was available at the voting precinct level and a linear regression was performed to obtain the slopes for a given rayon. Of 2724 rayons with at least three voting precincts, 142 rayons were identified as having a slope with magnitude greater than 1. A table with the frequency per region is shown below:
|Bashkortostan Tatarstan Tyumen Oblast Mordovia Belgorod Oblast Dagestan Oryol Oblast Chuvashia Kemerovo Oblast Penza Oblast Saratov Oblast Mari El Voronezh Oblast Adygea Tuva||15 14 11 7 6 5 5 4 4 4 4 3 3 2 2|
|Arkhangelsk Oblast Chechnya Chelyabinsk Oblast Kamchatka Krai Karachay-Cherkessia Khabarovsk Krai Kirov Oblast Kostroma Oblast Lipetsk Oblast Omsk Oblast Orenburg Oblast Rostov Oblast Sakhalin Oblast Samara Oblast Yamalo-Nenets AO||1 1 1 1 1 1 1 1 1 1 1 1 1 1 1|
Approximately 26% of the rayons in Bashkortostan, 23% of the rayons in Tatarstan, 54% of the rayons in Tyumen Oblast, 48% of the rayons in Mordovia, and 19% of the rayons in Dagestan were implicated by this metric. The remainder of the rayons either had too few precincts to seriously consider or were within the expected parameters of this test.
If we consider the non-outlying rayons, we find the following distribution of slopes for Medvedev:
Slopes for Medvedev
-1.5 -1 -0.5 0 0.5 1 1.5 Slope
This distribution is remarkably Gaussian (albeit somewhat skewed); the nonzero mean is not surprising considering that Medvedev received approximately 80% of the vote. By contrast, the same plot for Zyuganov is far more sharply peaked:
Slopes for Zyuganov
-1.5 -1 -0.5 0 0.5 1 1.5 Slope
While it would be reasonable to guess that these distributions would be approximately Gaussian, we have no theory on which to interpret the significance of a Gaussian fit to the data. However, if we consider news reports of workers and students being compelled to turn up to vote in favor of the incumbent party, we can see that Medvedev would indeed benefit from increased turnout.
It is well known that the last digit, and usually even the second to last digit within a region should be approximately uniformly distributed. As a result, one of the tests we perform is a χ2 p-value test on this distribution. While this has problems for detecting fraud in massively aggregate data (for example, at the Region level, this test detects nothing). When testing the valid ballots field of the data, we found some very clear discrepancies in the data. In fact, if we use a 1% significance level, many rayons have statistically improbable last digits. In most cases, the high χ2 value was caused almost exclusively by the digits 0 and 5, further suggesting human involvement here. However, even though this very clearly indicates that a human probably fudged these numbers, it is not immediately clear whether or not this was due to true election fraud or just laziness (poor bookkeeping on these columns, which might seem unimportant to election officials might involve a lot of rounding, which would produce similar figures). However, when we bring in the candidates totals as well, we begin to see a pattern. As an appendix, I have included a table of all of the regions with at least one p-value under the 1% significance level. First, it is interesting to note that Benford’s law did
almost nothing to detect this fraud, and second-to-last digit analysis was decidedly weaker than last digit analysis (in terms of number of potentially fraudulent cases reported).
Obvious and Expected
When going through the data with a high chance of something being amiss, it usually occurred in either Zyuganov, Medvedev, or the valid ballots column. Additionally, when only one or two of these three entries had a p-value below the significance line, the other one was frequently low (usually below 40%). All of this suggests that there was some amount of vote fraud going back and forth between Zyuganov and Medvedev.
One clean example, with p-values for Benford’s Law, second to last digit, and last digit going from left to right:
“Ballots” [0.9972437050283071,0.9999926722297033,0.776707772041324] “Valids” [0.9992652801893581,0.9999979600353261,0.5825459573127367] “Boganov” [0.9999999999999656,1.0,NaN]
“Zyuganov” [0.9999197683528138,0.9983769328819282,2.992957383857075e-3] “Medvedev” [0.9972201343582339,0.9996425431802614,0.15447292238055008]
We see here that Zyuganov has enough votes that last digit should have looked right, and there are enough precincts for this kind of analysis to be valid. However, Zyuganov’s p value is really horrible, and in this particular area, Medvedev swept with essentially 30 times Zyuganov’s vote. This suggests Medvedev being given the vote here.
There is also some support for the idea that the major candidates were also leeching votes off of the minor candidates. For example, there were on occasion also incidents including Zhirinovsky. These incidents were noticeable for an especially low p-value on his numbers (order e-4), likely because it is harder to cover this up with fewer votes. Many of them include a lower than normal, though not statistically significant, p-value for one or the other main candidate. Boganov also has some events like this, though not as many.
”Pskov Oblast::0846088108870884088608900887093108860929088608760943 0879089009280890088009290820
“Zyuganov” [0.9999999999887418,0.9999999999927812,0.669824544679129] “Medvedev” [1.0,0.9999999649134609,0.13828327088238312]
Here, we have an atypically high ratio of Zyuganov votes to Medvedev votes. There is a lowered P-value in Medvedev, though not quite statistically significant, while Boganov here finally has enough votes to be considered, but has a horrible P-value. This suggests that Zyuganov may have received some share of Boganov’s votes here, likely as he would be considered an easy target, and finally has enough votes for it to be worth stealing from him.
Some data points were so weird, that our first reaction was that our program must be broken: ”Moscow::08560890088708820876088908840889089008780929088608840885 09280876088508900889”
“Ballots” [0.36288059283060103,0.5627258110861324,0.5627258110861324] “Valids” [0.13888352659425512,0.5627258110861324,0.5627258110861324] “Boganov” [0.36288059283060103,0.5627258110861324,0.5627258110861324] “Zhirinovsky” [0.13888352659425512,0.5627258110861324,0.5627258110861324] “Zyuganov” [0.36288059283060103,0.5627258110861324,0.5627258110861324] “Medvedev” [5.624973021675013e-4,0.5627258110861326,0.5627258110861324]
However, these are actually just places with only one precinct, so all the numbers look wonky. Other than situations like this, Benford’s law was never violated.
When each vote tally, number of ballots cast, and validated ballots were incorporated into a joint distribution, no P values below 99% were found. All this really says is that there weren’t any rayons or regions where there was evidence of miscounting everywhere. Indeed, it would have been troubling if the joint distributions showed problems, as it would require much more coordination for the joint distributions to show things we did not find in the individual distributions.
Overall, we have about 73 rayons where it is exceedingly clear that some form of incorrect vote reporting is occurring. A good proportion of these appear fraudulent in some way. There are so many rayons where there appears to be vote fraud that listing them exhaustively would take forever. However, areas flagged as probable vote fraud are included in a separate text file delivered with this report.
Additionally, any fraud we detected occurred at the rayon level. There were no regions with clear systematic patterns of fraud after aggregation. Bashkortostan had a reasonably high number of probably fraudulent rayons (3), Dagestan had at least 2, maybe 3 or 4 if you do a joint distribution on second and last digits, Irkutsk Oblast had 3, Kirov Oblast had 3, Moscow had 3, Pskov Oblast had 3, Tatarstan had 3 and Volgograd Oblast had 4.
Dagestan was interesting because there were 2 clear fraud cases, but two more had fairly low numbers that if you took a product you got way below the significance threshold, almost as if they had stopped their fraud a little before the significance line. However, that is pure speculation.
We assert that by analyzing voting data at the precinct level our methods have detected substantial amounts of electoral fraud in the 2008 Russian Presidential Election. We histogrammed turnout distribution, linearly regressed on percentage of a vote for a candidate versus turnout, and histogrammed digit frequency in voting data to obtain candidate fraudulent regions. Certain regions were identified by two or more analyses; we assert that electoral fraud most likely occurred in those regions and list them here:
|Bashkortostan Chuvashia Irkutsk Oblast Karelia Mari El Novosibirsk Pskov Oblast Tyumen Oblast||Belgorod Oblast Dagestan Kabardino-Balkar Kemerovo Oblast Mordovia Oryol Oblast Saratov Oblast||Chechnya Ingushetia Karachay-Cherkessia Kirov Oblast Moscow Penza Oblast Tatarstan Voronezh Oblast|
While the histogram analysis partly depended upon demographic data, none of the anal ysis included knowledge of events and circumstantial conditions related to the election. By examining news articles regarding the election, we find that many of the regions we identified as fraudulent have also been identified by the media as regions in which the fairness of the election results was contested. For instance, two opposition candidates for Dagestan’s MP in the Russian Duma were murdered prior to the election.1 Many reports of fraudulent activity, ranging from ballot box stuffing, to ferrying voters to multiple voting stations, to compulsory voting for college students and government employees2. Such reports support our data-generated conclusions. Notably, the pervasiveness of the fraud prevents us from being able to estimate how much Medvedev benefited from it and we are thus unable to determine whether the fraud ultimately changed the result of the election.
Medvedev Slopes with Magnitude ≥ 1
|Dagestan – Laki Tuva – Sut-Holskaya Arkhangelsk Oblast – Novaya Zemlya Yamalo-Nenets AO – Krasnoselkupsky Samara Oblast – Isaklinskaya Bashkortostan – Chishminskaya Bashkortostan – Gafuriyskaya Tyumen Oblast – Tobolsk Tatarstan – Kaybitskaya Chuvashia – Shemurshinskij Voronezh Oblast – Verhnemamonskaya Mordovia – Staroshaygovskaya Mordovia – Romodanovskaya Tyumen Oblast – Abatskij Mordovia – Bolshebereznikovskaya Mordovia – Insar Penza Oblast – Bessonovskaya Voronezh Oblast – Anninsky Kamchatka Krai – Petro.-Kam. city (ship) Dagestan – Kulinskii Belgorod Oblast – Prokhorovskaya Lipetsk Oblast – Dobrovskij Oryol Oblast – Krasnozorenskaya Tyumen Oblast – Yalutorovsk Adygea – Shovgenovskaya Tatarstan – Menzelinsk Dagestan – Khasavyurt Belgorod Oblast – Krasnenskaye Bashkortostan – Nurimanovskaya Saratov Oblast – Bazarno-Karabulakskiy Omsk Oblast – Krutinskaya Tyumen Oblast – Golyshmanovskoe Bashkortostan – Aurgazinskaya Belgorod Oblast – Ivnyanskaya Mari El – Novotoryalskaya Dagestan – Kizlyar City Tatarstan – Laishevsky Tatarstan – Aznakaevsky Chuvashia – Yalchikskaya||34 7 8 3 25 60 43 40 35 22 19 25 28 54 25 24 37 57 147 17 47 43 17 29 17 51 64 24 35 30 45 30 48 23 21 17 47 42 42||7.18 5.17 4.54 3.68 2.29 2.27 2.24 1.97 1.95 1.92 1.87 1.81 1.8 1.76 1.73 1.67 1.66 1.65 1.64 1.61 1.6 1.58 1.58 1.56 1.53 1.52 1.51 1.51 1.5 1.49 1.48 1.47 1.44 1.41 1.41 1.38 1.37 1.36 1.36||-5.77 -4.17 -3.6 -2.6 -1.42 -1.27 -1.28 -1.02 -0.98 -0.92 -0.98 -0.82 -0.82 -0.78 -0.74 -0.67 -0.77 -0.68 -0.95 -0.53 -0.62 -0.68 -0.67 -0.57 -0.66 -0.56 -0.5 -0.57 -0.53 -0.55 -0.74 -0.52 -0.47 -0.42 -0.48 -0.39 -0.39 -0.39 -0.45||0.6 0.92 0.23 0.99 0.17 0.72 0.5 0.34 0.1 0.59 0.24 0.72 0.79 0.29 0.76 0.58 0.46 0.5 0.15 0.23 0.4 0.21 0.68 0.31 0.46 0.39 0.31 0.4 0.63 0.18 0.1 0.09 0.63 0.78 0.25 0.24 0.47 0.56 0.46||8.67E-008 0 0.23 0.06 0.04 1.62E-017 1.18E-007 8.26E-005 0.07 3.38E-005 0.04 8.56E-008 3.26E-010 2.80E-005 1.28E-008 1.75E-005 3.81E-006 6.77E-010 1.08E-006 0.05 1.90E-006 0 4.27E-005 0 0 1.10E-006 2.20E-006 0 1.15E-008 0.02 0.03 0.12 2.27E-011 2.18E-008 0.02 0.05 1.01E-007 1.16E-008 8.16E-007|3
3Translations of rayon names provided by Google Translate and are thus, at best, approximations 30
|Belgorod Oblast – Krasnoyaruzhskaya Tatarstan – Tukaevskaya Tyumen Oblast – Vagayskaya Oryol Oblast – Korsakov Tatarstan – Yutazinskaya Bashkortostan – Alsheevskaya Mordovia – Kochkurovskaya Tatarstan – Kama-Ustinskiy Tyumen Oblast – Vikulovskaya Bashkortostan – Bakalinskij Bashkortostan – Bizhbulyakskaya Tatarstan – Mendeleev Kemerovo Oblast – Tyazhinskaya Penza Oblast – Neverkinskaya Kemerovo Oblast – Chebulinskaya Adygea – Teuchezhskaya Tatarstan – Cheremshanskaya Tatarstan – Nizhnekamsk Bashkortostan – Kuyurgazinskaya Bashkortostan – Arkhangelsk Tyumen Oblast – Sladkovsky Khabarovsk Krai – Komsomolskaya Tatarstan – Agryzskaya Oryol Oblast – Kolpnyanskaya Tyumen Oblast – Nizhnetavdinskaya Chukotka Autonomous Okrug – Beringovskij Tatarstan – Atninskaya Saratov Oblast – Tatischevskaya Karachay-Cherkessia – Prikubanskaya Tyumen Oblast – Berdyuzhskaya Sakhalin Oblast – Nevelskaya ship Mordovia – Lyambir Kemerovo Oblast – Prokopievsk Kirov Oblast – ZATO Pervomajskij Dagestan – Sergokalinskaya Mari El – Kilemarsky Orenburg Oblast – Ponomarevskaya Bashkortostan – Sterlibashevskaya Krasnodar Krai – Tuapse city Bashkortostan – Kaltasinskaya Mari El – Orsha Chechnya – Itum-Kali Bashkortostan – Haybullinskaya Oryol Oblast – Znamenskaya||17 43 44 10 32 54 14 39 34 55 46 37 44 26 24 21 34 34 52 36 21 26 50 28 37 4 25 30 22 33 51 35 37 4 29 19 29 36 28 37 17 12 58 12||1.34 1.33 1.32 1.29 1.29 1.27 1.27 1.26 1.26 1.25 1.25 1.24 1.24 1.24 1.24 1.22 1.22 1.22 1.22 1.21 1.21 1.21 1.21 1.2 1.2 1.18 1.18 1.17 1.16 1.15 1.14 1.13 1.13 1.13 1.12 1.11 1.1 1.09 1.08 1.08 1.07 1.06 1.06 1.06||-0.46 -0.36 -0.33 -0.33 -0.31 -0.29 -0.27 -0.3 -0.29 -0.27 -0.28 -0.25 -0.32 -0.3 -0.38 -0.26 -0.23 -0.25 -0.25 -0.25 -0.25 -0.4 -0.23 -0.25 -0.2 -0.27 -0.18 -0.2 -0.16 -0.2 -0.52 -0.16 -0.21 -0.03 -0.12 -0.13 -0.24 -0.11 -0.18 -0.11 -0.07 -0.11 -0.1 -0.12||0.4 0.46 0.36 0.89 0.52 0.72 0.55 0.28 0.32 0.49 0.52 0.65 0.52 0.49 0.32 0.75 0.43 0.68 0.28 0.44 0.61 0.54 0.69 0.75 0.49 0.96 0.89 0.6 0.62 0.15 0.23 0.55 0.82 0.93 0.26 0.26 0.41 0.39 0.36 0.71 0.3 0.04 0.36 0.38||0.01 5.89E-007 1.96E-005 4.80E-005 3.18E-006 5.40E-016 0 0 0 2.59E-009 1.28E-008 1.43E-009 3.56E-008 7.33E-005 0 3.79E-007 2.75E-005 1.85E-009 4.93E-005 9.51E-006 2.69E-005 1.69E-005 6.49E-014 2.81E-009 1.44E-006 0.02 2.53E-012 5.81E-007 1.24E-005 0.02 0 3.25E-007 1.71E-014 0.04 0.01 0.02 0 4.85E-005 0 8.21E-011 0.02 0.51 7.43E-007 0.03|
|Bashkortostan – Birskaya Tatarstan – Drozhzhanovsky Tyumen Oblast – Yalutorovsk City Penza Oblast – moksha Oryol Oblast – Soskovskaya Rostov Oblast – Milyutinskij Chuvashia – Yantikovskij Tuva – Ovyurskaya Kemerovo Oblast – Topkinsky City Chuvashia – Batyrevsky Belgorod Oblast – Volokonovsky Penza Oblast – Lopatinsky Voronezh Oblast – Bobrovskaya Tatarstan – Almetyevsk Mordovia – Kovylkino Bashkortostan – October Tyumen Oblast – Uvatsk Bashkortostan – Tujmazinskij Saratov Oblast – Dergachivska Saratov Oblast – Piter Belgorod Oblast – Novooskolskaya||45 38 18 38 17 28 29 6 53 53 39 31 44 53 59 40 17 83 28 21 46||1.06 1.05 1.05 1.05 1.04 1.04 1.03 1.03 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.01 1.01 1.01 1.01 1 1||-0.07 -0.06 -0.1 -0.13 -0.11 -0.1 -0.17 -0.04 -0.19 -0.09 -0.18 -0.05 -0.16 -0.11 -0.04 -0.05 -0.1 -0.06 -0.07 -0.04 -0.09||0.58 0.52 0.32 0.64 0.71 0.47 0.5 0.98 0.33 0.33 0.31 0.61 0.37 0.42 0.39 0.48 0.4 0.27 0.23 0.46 0.6||1.57E-009 3.58E-007 0.01 1.55E-009 2.12E-005 5.13E-005 2.01E-005 9.59E-005 7.62E-006 6.05E-006 0 2.45E-007 1.22E-005 1.61E-007 1.55E-007 7.22E-007 0.01 4.92E-007 0.01 0 2.46E-010|
Russian Election Interference : Voting Precinct-Level Analysis of the 2008 Russian Presidential Election
Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference Russian Election Interference