Matching Lists of Victims: Experiences with Semi-Supervised Record Linkage to Identify Unique, Documented Victims of Homicide in Syria and Other Conflicts

Sep 16

Wednesday, September 16, 2015

3:30 pm - 4:30 pm
Gross Hall 330


Megan Price, Human Rights Data Analysis Group

12noon: learner lunch informal with grad students and postdocs 3:30pm: seminar In this talk I will present an overview of how the Human Rights Data Analysis Group (HRDAG) utilizes various machine learning methods to identify unique, documented victims of violent conflict. I will discuss the evolution of our approach, from relying on Weka software implementations of random forests to a competing classifier approach via the scikit-learn module in python. Along the way I will discuss specific challenges including the sparsity of data and cost of developing a training set. I will close with current open questions, including how best to implement data reduction and how to incorporate uncertainty from record linkage into subsequent analyses.