SlideShare a Scribd company logo
Analyzing Federal
Campaign Contributions
       Dave Fauth




                         1
About Me
 Full time consultant
 Playing with NoSQL and Neo4J
 for a couple of years
• My Blog: http://
  www.intelliwareness.org
• Find me on Twitter: @davefauth
• Email me: dsfauth@gmail.com
Agenda
•   Graph Database Overview
•   The Data
•   Data Layout
•   Data Prep
•   Analysis
•   Q&A
Graph Database Primer




                        4
A Graph Database




                   5
A Graph Database
๏ no: not for charts & diagrams, or vector artwork




                                                     5
A Graph Database
๏ no: not for charts & diagrams, or vector artwork
๏ yes: for storing data that is structured as a graph




                                                        5
A Graph Database
๏ no: not for charts & diagrams, or vector artwork
๏ yes: for storing data that is structured as a graph
   • remember linked lists, trees?




                                                        5
A Graph Database
๏ no: not for charts & diagrams, or vector artwork
๏ yes: for storing data that is structured as a graph
   • remember linked lists, trees?
   • graphs are the general-purpose data structure




                                                        5
A Graph Database
๏ no: not for charts & diagrams, or vector artwork
๏ yes: for storing data that is structured as a graph
   • remember linked lists, trees?
   • graphs are the general-purpose data structure
๏ “A relational database may tell you
     the average age of everyone in this United
     States,
   but a graph database will tell you
     who is most likely to buy you a beer.”


                                                        5
A Graph Database


                   6
You know relational




                      7
You know relational




                      7
You know relational




             foo

                      7
You know relational




             foo      bar

                            7
You know relational




             foo      foo_bar   bar

                                      7
You know relational




             foo      foo_bar   bar

                                      7
You know relational




             foo      foo_bar   bar

                                      7
You know relational




             foo      foo_bar   bar

                                      7
You know relational
now consider relationships...




                                7
You know relational
now consider relationships...




                                7
You know relational
now consider relationships...




                                7
You know relational
now consider relationships...




                                7
You know relational
now consider relationships...




                                7
You know relational




                      7
We're talking about a
Property Graph




                        8
We're talking about a
Property Graph



     Nodes




                        8
We're talking about a
Property Graph



     Nodes


      Relationships




                        8
We're talking about a
Property Graph

                                                 Em                                     Joh
                                                       il                                     an
                                       knows                                   knows
                          Alli                                       Tob                                       Lar

     Nodes
                                 son                                     ias           knows                      s
                                                             knows
                                               And                                     And                     knows
                          knows                      rea                                     rés
                                                         s
                                                             knows             knows                   knows
                          Pet                                        Miic
                                                                     Mc                knows                    Ian
                             er                knows                      a
                                                                          a
                                       knows                 knows
                                                De                                     Mic
                                                   lia                                    hae
                                                                                                   l
      Relationships

               Properties (each a key+value)


         + Indexes (for easy look-ups)

                                                                                                                       8
We're talking about a
Property Graph




                        8
Datasets
• FEC Data
  – Candidates
  – Committees
  – Contributions
• Sunlight Labs
  – SuperPac Contributions and Expenditures
• And still more could be had…
FEC Data
• In 1975, Congress created the Federal
  Election Commission (FEC) to
  administer and enforce the Federal
  Election Campaign Act (FECA) –
  – The statute that governs the financing of
    federal elections.
• The duties of the FEC, which is an
  independent regulatory agency, are to
  disclose campaign finance information
FEC Data
• Detailed files about Candidates,
  Committees and Individual
  Contributions
  – http://www.fec.gov/finance/disclosure/
    ftpdet.shtml
• 10 years of data
  – Updated every Sunday
FEC Files
• Committees
   – The committee master file contains one record for each
     committee registered with the Federal Election
     Commission.
• Candidates
   – The candidate master file contains one record for each
     candidate who has either registered with the Federal
     Election Commission or appeared on a ballot list prepared
     by a state elections office.
• Individual Contributions
   – The individual contributions file contains each contribution
     from an individual to a federal committee if the
     contribution was at least $200.
The Data
Data
• Committee
  – C00000059|HALLMARK CARDS PAC|GREG SWARENS|2501 MCGEE|
    MD#288|KANSAS CITY|MO|64108|U|Q|UNK|M|C| |

• Candidate
  – 4372|P80003338|OBAMA, BARACK|DEM|2012|US|P|0|I|C|C00431445|
    PO BOX 8102| |CHICAGO|IL|60680|NA|NA
  – 4373|P80003346|HONEYCHURCH, JOE|REP|2008|US|P|0|O|N|
    C00431155|5401 LENNOX AVE #40 B| |BAKERSFIELD|CA|93309|NA|
    NA
  – 4374|P80003353|ROMNEY, MITT|REP|2012|US|P|0|C|C|C00431171|
    585 COMMERCIAL ST.| |BOSTON|MA|2109|NA|NA
Data
• Individual Contributors
  – AJA, BAYYOGAL55283|AJA, BAY|YOG|AL|55283|LAW|MANAGE|na|na
  – AXEL, ANNNYNY10025|AXEL, ANN|NY|NY|10025|COLUMBIA UNIVERSITY|SOCIAL
    WORKER|na|na
  – DYSITETULSAOK74145|DYSITE|TULSA|OK|74145|N/A/WEB DESIGN|WEB DESIGN|
    na|na

• Contributions
  – NASSOUR, TAMMYMALVERNPA19355|C00410266|01052011|750.0|15|
    AA24A9CC9439D46D7A6A|714732
  – NORRIS, ELIZABETH AMALVERNPA19355|C00410266|01132011|4000.0|15|
    AD2BAAB7CE6234AA585D|714732
  – NORRIS, JAMESMALVERNPA19355|C00410266|01042011|4000.0|15|
    A1B7AADE1665546EB8AF|714732
  – POLLOCK, RANDALLMALVERNPA19355|C00410266|01122011|250.0|15|
    AC184E2EB48E14A4991F|714732
Data
• SuperPAC Contributors
  – SA11AI Individual/Corporation|WINNING OUR FUTURE|C00507525| |
    Adelson|Miriam|Las Vegas|NV|Physician|Adelson Clinic|5000000|1/24/12|
    5000000|SA11AI.9595|787720

• SuperPAC Expenditures
  – UNITED STEELWORKERS POLITICAL ACTION FUND|C00003590|N|
    G|"Obama, Barack"|Support|P80003338|DEM|P|0|NA|780.52|DC|
    9/5/12|G|"Printing, PDQ"|Inv. # 84734 - Hard Hat Stickers - Obama
    Campaign|E118F9A6FA8B048EF8E4|808740
  – GOVERNMENT IS NOT GOD|C00297531|N|G|"OBAMA, BARACK"|
    Oppose|P80003338|DEM|P|0|NA|10800| |9/5/12|G|"Envision
    Marketing, Inc."|postage and mail preparation|SE.10416|808752
Data Prep
• Extract and Transform
  – Stored files on S3
  – Used MortarData to run Hadoop jobs to
    prepare data (@MortarData)
• Load
  – Used Neo4J BatchInserter to load
    • Thanks to Michael Hunger (@mesirii)
    • Loaded 2M+ nodes in <5 minutes
Java BatchInsert




Download   Use S3
data       Storage
                  Process with
                  Hadoop/Pig




                        Created Neo4J
                        DB
Develop
Get it perfect. Mortar       Run
shows you what your job      Mortar creates a private
will do before you run it.   Apache Hadoop cluster for
Mortar uses real Python,     your job. Pig + Python create
so NumPy, SciPy, and         optimized jobs for execution.
NLTK work perfectly.
                             Read
                             from cloud storage (S3)
                             Process
                             in a private, on-demand
                             Apache Hadoop cluster
                             Write
                             to cloud storage (S3)
Process Text Files Using PIG
• Name Standardization (Last, First ->
  First,Last)
• Date Reversal (MMDDYYYY ->
  YYYYMMDD)
• Create unique individual contributors
  with multiple contributions
  – 1 to Many
Why PIG/Hadoop
• Cloud storage and cloud based
  HADOOP
• Easy to use
• Fast
Financial Data Example
                                                              Expends
                                         s
                                 Receive
                    s
                Give    contribAmt: 300
                   Giv contribDate: 20120604
                      es
                               Receives                         Supp
Troy Smith                                                           or      ts
                  Gi



                                                s
                                            ive
                    ve



                                                                                  Michelle Bachman
                                          e
                      s



                                      c             Bachman for Congress
                                 Re
          contribAmt: 200
          contribDate: 20120501
                          contribAmt: 830                                s        David Schweikert
                ives      contribDate: 20110506                      o rt
                 G                Rece                          pp
                                               ives           Su

                     Giv
                        es               i   ves
                                 R   ece
  Craig Stull                                           DAVID SCHWEIKERT FOR CONGRESS


                       contribAmt: 400
                       contribDate: 20110815
Data Model
 Expenditures


                 Committee      SUPPORTS   Candidate


                       FUNDS
superPac
Contributions    Contribution


                       GIVES


                 Individual
More Details on the Data
    Committee                          Candidate

  committeeID                       candidateID
  committeeName                     candidateName
  committeeTreasurer
                       SUPPORTS     candidateParty
  committeeCity                     candidateElectionYear
  committeeState                    candidateOfficeState
  committeeZip                      candidateOffice




                       FUNDS
       Contribution                      Donor
                                        indivName
      commID                            indivCity
      contribDate
                            GIVES       indivState
      contribAmt                        indivZip
      contribType                       idivEmp
                                        indivOccupation
More Details on the Data
 Expenditures

commID                                      Committee
contribDate                SPENDS_MONEY
contribAmt
contribType                               committeeID
                                          committeeName
                                          committeeTreasurer
                                          committeeCity
                                          committeeState
                                          committeeZip
                     SUPERPAC_GIVES
       superPac Contributions
              commID
              donorName
              donorCity
              donorState
              donorZip
              donorAmt
              donorDate
Cypher



• Developed for and by
  Neo4J
Can I Draw Patterns


         A




     B       C
Sure, with ASCII Art!


             A




       B           C



   A - - > B - - > C, A - - > C
Sure, with ASCII Art!


         A




     B       C



   A-->B-->C<--A
Directed Relationship


     A              B



         (A) - - > (B)
Labeled Directed Relationship


              Gives
         A             B



        (A) – [:Gives] - >
        (B)
Variable Length Path
        A                 B


        A                     B



  A                               B



      (A) – [*] - > (B)
Let’s take a look….
GitHub for Data and Code

https://github.com/akollegger/
FEC_GRAPH




                                 34
All Candidates in Minnesota
• START
  n=node:candidates(‘candidateOfficeSta
  te:MN*')
• where nreturn n
Simple Contribution
• start a = node(5058)
• match p = a-[:RECEIVES]-(b)<-
  [:GAVE]-(c)
• where 1=1
• and b.contribAmt > 1
• return c




                                  36
Bachman Contributions
// Bachman Contributions
start a = node(3946)
match p = a-[:RECEIVES]-(b)
where 1=1
and b.contribAmt > 1000
return b
Real World Examples
• Cash pours into state campaigns
  from afar
  – http://www.startribune.com/printarticle/?
    id=162672646
  – A Star Tribune analysis of her
    (Bachmann’s) most recent financial reports
    shows that up to 80 percent of her
    reported contributions came from
    individuals outside Minnesota
Bachman Contributions
• start a = node(3946)
• match p = a-[:RECEIVES]-(b)<-
  [:GAVE]-(c)
• where 1=1
• and (c.INDIVSTATE? = "MN")
• and b.contribAmt > 1
• return sum(b.contribAmt)


                                  39
Julie Waddle

START person = node(191468)
MATCH person-[*]->c<-[*]-d
    return c.contribAmt,
   c.contribDate, d.name




                              40
SuperPac Examples

START n=node:committees('commName:
              1911*')
  MATCH n-[r:SUPERPACEXPEND]->c
 RETURN c.PURPOSE, count(c.PURPOSE)
   ORDER BY count(c.PURPOSE) DESC




                                      41
More SuperPac Examples
     START n=node:committees('commName:*')
         MATCH n-[r:SUPERPACEXPEND]->c
           WHERE c.PURPOSE! <> "ZOO"
       RETURN c.PURPOSE, count(c.PURPOSE)
         ORDER BY count(c.PURPOSE) DESC


     START n=node:committees('commName:*')
         MATCH n-[r:SUPERPACEXPEND]->c
         WHERE c.PURPOSE! = "Rental Van"
 RETURN c.PURPOSE, count(c.PURPOSE), c.CANDIDATE,
               sum(c.expendAmt)
         ORDER BY count(c.PURPOSE) DESC

                                                    42
Other Resources
• https://github.com/
  akollegger/FEC_GRAPH




                         43
Thank you!
   @davefauth

More Related Content

Recently uploaded (20)

PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Python basic programing language for automation
DanialHabibi2
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 

Featured (20)

PDF
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
 
PDF
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
PDF
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
 
PDF
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
PDF
Everything You Need To Know About ChatGPT
Expeed Software
 
PDF
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
PDF
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
PDF
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
PDF
Skeleton Culture Code
Skeleton Technologies
 
PDF
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
PDF
Content Methodology: A Best Practices Report (Webinar)
contently
 
PPTX
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
PDF
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
PDF
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
PDF
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
PDF
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
PDF
Getting into the tech field. what next
Tessa Mero
 
PDF
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
PDF
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
PDF
Introduction to Data Science
Christy Abraham Joy
 
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
 
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Christy Abraham Joy
 
Ad

Analyzing FEC Data with NEO4J

  • 2. About Me Full time consultant Playing with NoSQL and Neo4J for a couple of years • My Blog: http:// www.intelliwareness.org • Find me on Twitter: @davefauth • Email me: [email protected]
  • 3. Agenda • Graph Database Overview • The Data • Data Layout • Data Prep • Analysis • Q&A
  • 6. A Graph Database ๏ no: not for charts & diagrams, or vector artwork 5
  • 7. A Graph Database ๏ no: not for charts & diagrams, or vector artwork ๏ yes: for storing data that is structured as a graph 5
  • 8. A Graph Database ๏ no: not for charts & diagrams, or vector artwork ๏ yes: for storing data that is structured as a graph • remember linked lists, trees? 5
  • 9. A Graph Database ๏ no: not for charts & diagrams, or vector artwork ๏ yes: for storing data that is structured as a graph • remember linked lists, trees? • graphs are the general-purpose data structure 5
  • 10. A Graph Database ๏ no: not for charts & diagrams, or vector artwork ๏ yes: for storing data that is structured as a graph • remember linked lists, trees? • graphs are the general-purpose data structure ๏ “A relational database may tell you the average age of everyone in this United States, but a graph database will tell you who is most likely to buy you a beer.” 5
  • 15. You know relational foo bar 7
  • 16. You know relational foo foo_bar bar 7
  • 17. You know relational foo foo_bar bar 7
  • 18. You know relational foo foo_bar bar 7
  • 19. You know relational foo foo_bar bar 7
  • 20. You know relational now consider relationships... 7
  • 21. You know relational now consider relationships... 7
  • 22. You know relational now consider relationships... 7
  • 23. You know relational now consider relationships... 7
  • 24. You know relational now consider relationships... 7
  • 26. We're talking about a Property Graph 8
  • 27. We're talking about a Property Graph Nodes 8
  • 28. We're talking about a Property Graph Nodes Relationships 8
  • 29. We're talking about a Property Graph Em Joh il an knows knows Alli Tob Lar Nodes son ias knows s knows And And knows knows rea rés s knows knows knows Pet Miic Mc knows Ian er knows a a knows knows De Mic lia hae l Relationships Properties (each a key+value) + Indexes (for easy look-ups) 8
  • 30. We're talking about a Property Graph 8
  • 31. Datasets • FEC Data – Candidates – Committees – Contributions • Sunlight Labs – SuperPac Contributions and Expenditures • And still more could be had…
  • 32. FEC Data • In 1975, Congress created the Federal Election Commission (FEC) to administer and enforce the Federal Election Campaign Act (FECA) – – The statute that governs the financing of federal elections. • The duties of the FEC, which is an independent regulatory agency, are to disclose campaign finance information
  • 33. FEC Data • Detailed files about Candidates, Committees and Individual Contributions – http://www.fec.gov/finance/disclosure/ ftpdet.shtml • 10 years of data – Updated every Sunday
  • 34. FEC Files • Committees – The committee master file contains one record for each committee registered with the Federal Election Commission. • Candidates – The candidate master file contains one record for each candidate who has either registered with the Federal Election Commission or appeared on a ballot list prepared by a state elections office. • Individual Contributions – The individual contributions file contains each contribution from an individual to a federal committee if the contribution was at least $200.
  • 36. Data • Committee – C00000059|HALLMARK CARDS PAC|GREG SWARENS|2501 MCGEE| MD#288|KANSAS CITY|MO|64108|U|Q|UNK|M|C| | • Candidate – 4372|P80003338|OBAMA, BARACK|DEM|2012|US|P|0|I|C|C00431445| PO BOX 8102| |CHICAGO|IL|60680|NA|NA – 4373|P80003346|HONEYCHURCH, JOE|REP|2008|US|P|0|O|N| C00431155|5401 LENNOX AVE #40 B| |BAKERSFIELD|CA|93309|NA| NA – 4374|P80003353|ROMNEY, MITT|REP|2012|US|P|0|C|C|C00431171| 585 COMMERCIAL ST.| |BOSTON|MA|2109|NA|NA
  • 37. Data • Individual Contributors – AJA, BAYYOGAL55283|AJA, BAY|YOG|AL|55283|LAW|MANAGE|na|na – AXEL, ANNNYNY10025|AXEL, ANN|NY|NY|10025|COLUMBIA UNIVERSITY|SOCIAL WORKER|na|na – DYSITETULSAOK74145|DYSITE|TULSA|OK|74145|N/A/WEB DESIGN|WEB DESIGN| na|na • Contributions – NASSOUR, TAMMYMALVERNPA19355|C00410266|01052011|750.0|15| AA24A9CC9439D46D7A6A|714732 – NORRIS, ELIZABETH AMALVERNPA19355|C00410266|01132011|4000.0|15| AD2BAAB7CE6234AA585D|714732 – NORRIS, JAMESMALVERNPA19355|C00410266|01042011|4000.0|15| A1B7AADE1665546EB8AF|714732 – POLLOCK, RANDALLMALVERNPA19355|C00410266|01122011|250.0|15| AC184E2EB48E14A4991F|714732
  • 38. Data • SuperPAC Contributors – SA11AI Individual/Corporation|WINNING OUR FUTURE|C00507525| | Adelson|Miriam|Las Vegas|NV|Physician|Adelson Clinic|5000000|1/24/12| 5000000|SA11AI.9595|787720 • SuperPAC Expenditures – UNITED STEELWORKERS POLITICAL ACTION FUND|C00003590|N| G|"Obama, Barack"|Support|P80003338|DEM|P|0|NA|780.52|DC| 9/5/12|G|"Printing, PDQ"|Inv. # 84734 - Hard Hat Stickers - Obama Campaign|E118F9A6FA8B048EF8E4|808740 – GOVERNMENT IS NOT GOD|C00297531|N|G|"OBAMA, BARACK"| Oppose|P80003338|DEM|P|0|NA|10800| |9/5/12|G|"Envision Marketing, Inc."|postage and mail preparation|SE.10416|808752
  • 39. Data Prep • Extract and Transform – Stored files on S3 – Used MortarData to run Hadoop jobs to prepare data (@MortarData) • Load – Used Neo4J BatchInserter to load • Thanks to Michael Hunger (@mesirii) • Loaded 2M+ nodes in <5 minutes
  • 40. Java BatchInsert Download Use S3 data Storage Process with Hadoop/Pig Created Neo4J DB
  • 41. Develop Get it perfect. Mortar Run shows you what your job Mortar creates a private will do before you run it. Apache Hadoop cluster for Mortar uses real Python, your job. Pig + Python create so NumPy, SciPy, and optimized jobs for execution. NLTK work perfectly. Read from cloud storage (S3) Process in a private, on-demand Apache Hadoop cluster Write to cloud storage (S3)
  • 42. Process Text Files Using PIG • Name Standardization (Last, First -> First,Last) • Date Reversal (MMDDYYYY -> YYYYMMDD) • Create unique individual contributors with multiple contributions – 1 to Many
  • 43. Why PIG/Hadoop • Cloud storage and cloud based HADOOP • Easy to use • Fast
  • 44. Financial Data Example Expends s Receive s Give contribAmt: 300 Giv contribDate: 20120604 es Receives Supp Troy Smith or ts Gi s ive ve Michelle Bachman e s c Bachman for Congress Re contribAmt: 200 contribDate: 20120501 contribAmt: 830 s David Schweikert ives contribDate: 20110506 o rt G Rece pp ives Su Giv es i ves R ece Craig Stull DAVID SCHWEIKERT FOR CONGRESS contribAmt: 400 contribDate: 20110815
  • 45. Data Model Expenditures Committee SUPPORTS Candidate FUNDS superPac Contributions Contribution GIVES Individual
  • 46. More Details on the Data Committee Candidate committeeID candidateID committeeName candidateName committeeTreasurer SUPPORTS candidateParty committeeCity candidateElectionYear committeeState candidateOfficeState committeeZip candidateOffice FUNDS Contribution Donor indivName commID indivCity contribDate GIVES indivState contribAmt indivZip contribType idivEmp indivOccupation
  • 47. More Details on the Data Expenditures commID Committee contribDate SPENDS_MONEY contribAmt contribType committeeID committeeName committeeTreasurer committeeCity committeeState committeeZip SUPERPAC_GIVES superPac Contributions commID donorName donorCity donorState donorZip donorAmt donorDate
  • 48. Cypher • Developed for and by Neo4J
  • 49. Can I Draw Patterns A B C
  • 50. Sure, with ASCII Art! A B C A - - > B - - > C, A - - > C
  • 51. Sure, with ASCII Art! A B C A-->B-->C<--A
  • 52. Directed Relationship A B (A) - - > (B)
  • 53. Labeled Directed Relationship Gives A B (A) – [:Gives] - > (B)
  • 54. Variable Length Path A B A B A B (A) – [*] - > (B)
  • 55. Let’s take a look….
  • 56. GitHub for Data and Code https://github.com/akollegger/ FEC_GRAPH 34
  • 57. All Candidates in Minnesota • START n=node:candidates(‘candidateOfficeSta te:MN*') • where nreturn n
  • 58. Simple Contribution • start a = node(5058) • match p = a-[:RECEIVES]-(b)<- [:GAVE]-(c) • where 1=1 • and b.contribAmt > 1 • return c 36
  • 59. Bachman Contributions // Bachman Contributions start a = node(3946) match p = a-[:RECEIVES]-(b) where 1=1 and b.contribAmt > 1000 return b
  • 60. Real World Examples • Cash pours into state campaigns from afar – http://www.startribune.com/printarticle/? id=162672646 – A Star Tribune analysis of her (Bachmann’s) most recent financial reports shows that up to 80 percent of her reported contributions came from individuals outside Minnesota
  • 61. Bachman Contributions • start a = node(3946) • match p = a-[:RECEIVES]-(b)<- [:GAVE]-(c) • where 1=1 • and (c.INDIVSTATE? = "MN") • and b.contribAmt > 1 • return sum(b.contribAmt) 39
  • 62. Julie Waddle START person = node(191468) MATCH person-[*]->c<-[*]-d return c.contribAmt, c.contribDate, d.name 40
  • 63. SuperPac Examples START n=node:committees('commName: 1911*') MATCH n-[r:SUPERPACEXPEND]->c RETURN c.PURPOSE, count(c.PURPOSE) ORDER BY count(c.PURPOSE) DESC 41
  • 64. More SuperPac Examples START n=node:committees('commName:*') MATCH n-[r:SUPERPACEXPEND]->c WHERE c.PURPOSE! <> "ZOO" RETURN c.PURPOSE, count(c.PURPOSE) ORDER BY count(c.PURPOSE) DESC START n=node:committees('commName:*') MATCH n-[r:SUPERPACEXPEND]->c WHERE c.PURPOSE! = "Rental Van" RETURN c.PURPOSE, count(c.PURPOSE), c.CANDIDATE, sum(c.expendAmt) ORDER BY count(c.PURPOSE) DESC 42
  • 66. Thank you! @davefauth

Editor's Notes