“ChatGPT can give entirely wrong answers and present misinformation as fact, writing plausible-sounding but incorrect or nonsensical answers”, The Guardian. “ChatGPT may be coming for our jobs”, Insider. Due to the large availability of text-data from websites and social networks, text generative models and automated text analysis have become increasingly popular over the last years, spurring the interest (and skepticism) of scholars from different disciplines. Automated text analysis and text generative models are promising approaches for analyzing large databases of social and political texts. Text databases have been used to infer political behavior, policy preferences, predict electoral outcomes, or generate plausible-sounding texts (ChatGBT). This course introduces a variety of automated text analysis tools and models, presenting their applications in social research and demystifying what they can, but most importantly, what they cannot do (yet). |
The course combines :
Some basic knowledge of R is required (or having completed the “Fundamentals of R” workshop). The coding session will be hands-on, dealing with practical issues occurring in research such as: collecting and pre-processing text data, interpreting, validating and visualizing the outputs of an analysis).
Learning Objectives:
Students will learn how to collect, prepare and process text-as-data in R; based on the research question, identify and apply the most appropriate text analytical tool to describe text databases (wordcloud, word distributions); identify and apply the most appropriate text analysis model to infer causal relationships between variables of interest (i.e. topic models).
Valentina Baiamonte has a PhD in International Relations/Political Science from the Graduate Institute. In her Ph.D thesis, she analysed a database of 218 position papers submitted by interest groups during a EU consultation on energy and climate change, to assess the diversity of the EU consultation process. She currently works at the World Business Council for Sustainable Development (WBCSD) managing projects on ESG-related risks and disclosure. She is also a freelance consultant and passionate geek during her spare time.
PhD students of the Graduate Institute will be informed of each workshop by email. For any questions regarding registration to the workshop, please contact: emma.cranfield@graduateinstitute.ch
R is a programming language and open-source software that allows users to import, transform, and analyse diverse types of data. Academics, governments, and industry use R data collection, data visualisation, and data analysis.
This summer school is a hands-on introduction to R, starting from scratch. In separate blocks, the summer school covers fundamental tasks in R such as how to import different types of data; how to clean and manipulate objects; how to create beautiful visualisations; and how to export reports and high-resolution figures. Each block is matched with a topical case studies aimed at illustrating a practical application of the fundamentals of R to cover key social science questions related to the environment, conflict, and democracies.
By the end of this summer school, participants should be able to (1) perform simple data analysis, (2) communicate findings with visualisations, and (3) produce integrated reports using R.
Participants can bring their own project and discuss their design and feasibility with instructors in office hours or one-on-one sessions.
Schedule:
Classes will take place during the week with 3 hours of in-person lectures in the morning. During the afternoon, participants can work independently on exercises that review the material and join office hours with the instructors from 15:00-17:00.
Monday 4 Sep 2023 |
Tuesday 5 Sep 2023 |
Wednesday 6 Sep 2023 |
Thursday 7 Sep 2023 |
Friday 8 Sep 2023 |
|
---|---|---|---|---|---|
09:30-12:30 |
Object, class and data structure |
Cleaning and wrangling data |
Principles and practices of data visualisation |
Independent work on (i) case study or (ii) own project |
Making shareable reports with R Markdown |
12:30-13:30 |
Lunch |
Lunch |
Lunch |
Lunch |
|
13:30-15:00 |
Individual Consultations |
||||
15:00-17:00 |
Office Hours |
Office Hours |
Office Hours |
Henrique Sposito
Henrique is currently a fourth year PhD candidate at the International Relations and Political Science Department at the Graduate Institute. His dissertation leverages advanced text analysis techniques in R, such as Natural Language Processes (NLP) and supervised machine learning, to investigate how authenticity, problem construction, and urgency appear and change over time and across settings in discursive politics. Henrique is also a Research Assistant in the "PANARCHIC: Power and Network and the Rate of Change in Institutional Complexes" project at the Center for International Environmental Studies (CIES). For the project, he develops, contributes, and helps maintain several R packages that assist researchers dealing with multiple, overlapping, and uncertain datasets across various issues domains of Global Governance.
Livio Silva-Muller
Livio is a fourth year Ph.D. candidate in Anthropology and Sociology at the Graduate Institute, working on the intersection of climate change, policy effectiveness, and transnational finance. His dissertation utilises longitudinal grant-level data, textual data, and in-depth interviews to answer how governments adopt effective climate mitigation policies. Livio also works as a research assistant at the SNF Elites & Inequality project, which relies on survey data to estimate elites’ support for redistributive projects and the cultural process that enable this support. Finally, Livio provides data-related consulting services to organisations based in Geneva.
PhD students of the Graduate Institute will be informed of each workshop by email. For any questions regarding registration to the workshop, please contact: emma.cranfield@graduateinstitute.ch
References you need for this workshop
Web scraping can help you extract data and content from a website. The morning session will cover an introduction to R (DS15) for those that have not used R before (but ideally have some programming knowledge) or those that need a refresher. The afternoon session will be dedicated to web scraping (DS16). |
Emma is a senior data analyst at the Health Foundation (London) where she works on quantitative evaluations of healthcare interventions. Prior to joining the Health Foundation, Emma worked as a senior data analyst for the Sentinel Stroke National Audit Programme (SSNAP) at the Royal College of Physicians. Previous to this she worked as an economic researcher consultant for the World Intellectual Property Organization. Emma is a part-time PhD student at UCL Institute of Child Health. She is interested in using electronic health care records to measure and improve antimicrobial drug use in children. She is an organiser for R-ladies London.
This workshop will take place on Friday 22 October 2021 online and will be divided into two sessions with independent registration:
For any questions regarding registration to the workshop, please contact: emma.cranfield@graduateinstitute.ch