Inside Baseball

I’ve just begun the process of gathering and organizing the necessary data to do some predictive and prescriptive 2020 election analytics.  The questionable key word here is “necessary”.  As I piece together a data set from a multitude of sources (that are often not easily consumable), I constantly find additional data sets that might also prove useful.  Is that additional data necessary?  I have no clue.  I find it fascinating, though, and I’d rather have data I don’t eventually use than miss something important.  It’s a geeky FOMO.  Help me.

Anyway, as previously noted, the 2020 Presidential election will obviously be all about Electoral College votes.  Hence, the data I’m currently gathering is organized by state.

At the moment, I’m accumulating the mostly obvious state-level data related to such things as population, population growth, voter counts (eligible, registered, & participating) by election year, military population, # of Electors, Elector voting history by party, Governor’s party, Senators’ parties, US House party split, cumulative US House voting history by party by year, State Legislatures’ party control, concurrent 2020 Governor and US Senator races, region, % of US GDP, federal tax per capita, federal aid as a % of state revenue, military $s as a % of state GDP, … you get the idea.

I really want to add state-level voter estimates by age, sex, race, religion, education, etc., but that’s going to require a ton of work and I may decide to only pull that data for states that I otherwise deem to be “in-play”.  Or I may find myself on my couch at 3am with my second bag of Cheetos, an empty bottle of bourbon, and 50 state websites open on my laptop.  We’ll see.

Through the haze, I can begin to see patterns emerging even within the embryonic version of the collected data.  A few of the patterns have surprised me; some appear supportive of early conjectures by a few of the better analysts that do this professionally (e.g. the Cook Political Report, 270toWin).  While I’m not interested in simply duplicating others’ work, it is nice to reach some similar conclusions as validation.

There is so much available data out there in so many relevant arenas that every analysis must define its own scope limitations.  My scope will likely be limited by my patience and alcohol budget.  However, I will eventually try to interpret whatever data I ultimately assemble from what I hope will be a fresh perspective.