Back to Home

Predictive Modeling of Political Contributions

December 2025 7 min read
Python Machine Learning K-Means KNN EDA Supervised Learning
Predictive Political Contributions - ML Dashboard

Project Overview

This project analyzes Federal Election Commission (FEC) individual contribution records to build predictive models of political donations. Using real-world campaign finance data, the goal was to explore patterns in political giving and develop models that can predict donation behavior, party affiliation based on contribution patterns, and donor clustering.

The FEC dataset includes detailed records of individual contributions to political campaigns, committees, and organizations across the United States. This rich dataset provides an ideal foundation for applying machine learning techniques to understand political behavior.

Donation Distribution Analysis - Multi-panel EDA

Donation distribution analysis across 11.5 million contribution records

Key Features

Geographic Analysis - Top 20 States

Geographic analysis of donors and donation amounts by state

Technical Architecture

The project follows a structured data science pipeline built entirely in Python:

Feature Importance for Donation Likelihood Prediction

Random Forest feature importance for donation likelihood prediction

Approach & Methodology

The analysis takes a two-pronged approach. First, supervised learning models are trained to predict party affiliation and donation likelihood from contribution features. This helps answer questions like: given a donor's contribution history, can we predict which party they support?

Second, unsupervised clustering using K-Means with KNN indexing identifies natural groupings in the donor population. These clusters reveal patterns that aren't immediately obvious, such as distinct donor archetypes based on contribution behavior.

Top 10 Donors - Flow to Committees

Sankey diagram showing top donor flows to political committees

Insights

Working with real FEC data provided valuable experience in handling messy, large-scale government datasets. The parallel processing pipeline was essential for managing the volume of individual contribution records, and the combination of supervised and unsupervised approaches offered complementary perspectives on political giving patterns.

Back to Home