Skip to Main Content

Digital Skills Workshops for PhD Students (UNIGE and Geneva Graduate Institute)

Automated Text Analysis with R - 16 & 17 November 2023, 14:15-18:00

DS42 - Automated Text Analysis with R

Valentina Baiamonte

ChatGPT can give entirely wrong answers and present misinformation as fact, writing plausible-sounding but incorrect or nonsensical answers”, The Guardian.

ChatGPT may be coming for our jobs”, Insider.

Due to the large availability of text-data from websites and social networks, text generative models and automated text analysis have become increasingly popular over the last years, spurring the interest (and skepticism) of scholars from different disciplines. Automated text analysis and text generative models are promising approaches for analyzing large databases of social and political texts. Text databases have been used to infer political behavior, policy preferences, predict electoral outcomes, or generate plausible-sounding texts (ChatGBT). This course introduces a variety of automated text analysis tools and models, presenting their applications in social research and demystifying what they can, but most importantly, what they cannot do (yet). 

The course combines :

  • Theoretical lecture: 3 hours - Thursday 16 November, 14:15 - 17:15
  • Coding session in R: 4 hours - Friday 24 November, 14:15 - 18:00

Some basic knowledge of R is required (or having completed the “Fundamentals of R” workshop). The coding session will be hands-on, dealing with practical issues occurring in research such as: collecting and pre-processing text data, interpreting, validating and visualizing the outputs of an analysis).

Learning Objectives: 

Students will learn how to collect, prepare and process text-as-data in R; based on the research question, identify and apply the most appropriate text analytical tool to describe text databases (wordcloud, word distributions); identify and apply the most appropriate text analysis model to infer causal relationships between variables of interest (i.e. topic models).

Valentina Baiamonte has a PhD in International Relations/Political Science from the Graduate Institute. In her Ph.D thesis, she analysed a database of 218 position papers submitted by interest groups during a EU consultation on energy and climate change, to assess the diversity of the EU consultation process. She currently works at the World Business Council for Sustainable Development (WBCSD) managing projects on ESG-related risks and disclosure. She is also a freelance consultant and passionate geek during her spare time.

PhD students of the Graduate Institute will be informed of each workshop by email.  For any questions regarding registration to the workshop, please contact: 


Fundamentals of R, From 4 to 8 September 2023

DS40 - Fundamentals of RHenrique SpositoLivio Silva-Muller
Livio Silva-Muller & Henrique Sposito



R is a programming language and open-source software that allows users to import, transform, and analyse diverse types of data. Academics, governments, and industry use R data collection, data visualisation, and data analysis.

This summer school is a hands-on introduction to R, starting from scratch. In separate blocks, the summer school covers fundamental tasks in R such as how to import different types of data; how to clean and manipulate objects; how to create beautiful visualisations; and how to export reports and high-resolution figures. Each block is matched with a topical case studies aimed at illustrating a practical application of the fundamentals of R to cover key social science questions related to the environment, conflict, and democracies.

By the end of this summer school, participants should be able to (1) perform simple data analysis, (2) communicate findings with visualisations, and (3) produce integrated reports using R.

Participants can bring their own project and discuss their design and feasibility with instructors in office hours or one-on-one sessions.


Classes will take place during the week with 3 hours of in-person lectures in the morning. During the afternoon, participants can work independently on exercises that review the material and join office hours with the instructors from 15:00-17:00.



Monday 4 Sep 2023
Room P3 506

Tuesday 5 Sep 2023 
Room S12

Wednesday 6 Sep 2023
Room S12

Thursday 7 Sep 2023
No class/Project Work

Friday 8 Sep 2023
Room S12


Object, class and data structure

Cleaning and wrangling data

Principles and practices of data visualisation

Independent work on (i) case study or (ii) own project

Making shareable reports with R  Markdown








Individual Consultations


Office Hours

Office Hours

Office Hours

Henrique Sposito
Henrique is currently a fourth year PhD candidate at the International Relations and Political Science Department at the Graduate Institute. His dissertation leverages advanced text analysis techniques in R, such as Natural Language Processes (NLP) and supervised machine learning, to investigate how authenticity, problem construction, and urgency appear and change over time and across settings in discursive politics. Henrique is also a Research Assistant in the "PANARCHIC: Power and Network and the Rate of Change in Institutional Complexes" project at the Center for International Environmental Studies (CIES). For the project, he develops, contributes, and helps maintain several R packages that assist researchers dealing with multiple, overlapping, and uncertain datasets across various issues domains of Global Governance.

Livio Silva-Muller
Livio is a fourth year Ph.D. candidate in Anthropology and Sociology at the Graduate Institute, working on the intersection of climate change, policy effectiveness, and transnational finance. His dissertation utilises longitudinal grant-level data, textual data, and in-depth interviews to answer how governments adopt effective climate mitigation policies. Livio also works as a research assistant at the SNF Elites & Inequality project, which relies on survey data to estimate elites’ support for redistributive projects and the cultural process that enable this support. Finally, Livio provides data-related consulting services to organisations based in Geneva.

PhD students of the Graduate Institute will be informed of each workshop by email.  For any questions regarding registration to the workshop, please contact: 


References you need for this workshop

Introduction to Web Scraping Using R

DS15 & DS16 - Introduction to Web Scraping Using R

(Emma Vestesson, 2x 4h)


Web scraping can help you extract data and content from a website.

In this workshop, you will learn some basic web scraping including when web scraping is appropriate and how to access both information that can be seen on a website as well as information stored in the Html code.

The morning session will cover an introduction to R (DS15) for those that have not used R before (but ideally have some programming knowledge) or those that need a refresher.

The afternoon session will be dedicated to web scraping (DS16).

Emma is a senior data analyst at the Health Foundation (London) where she works on quantitative evaluations of healthcare interventions. Prior to joining the Health Foundation, Emma worked as a senior data analyst for the Sentinel Stroke National Audit Programme (SSNAP) at the Royal College of Physicians. Previous to this she worked as an economic researcher consultant for the World Intellectual Property Organization. Emma is a part-time PhD student at UCL Institute of Child Health. She is interested in using electronic health care records to measure and improve antimicrobial drug use in children. She is an organiser for R-ladies London.

This workshop will take place on Friday 22 October 2021 online and will be divided into two sessions with independent registration:

  • DS15: 09:00-12:30 (Introduction to R)
  • DS16: 14:00-17:30 (Web Scraping)

For any questions regarding registration to the workshop, please contact: