Digital research methods for textual data

Methodology courses and philosophy of science

Course information

ECTS: 2.5
Number of sessions: 2
Hours of session: 3
Course fee:

  • free for PhD candidates of the Graduate School
  • €265 for non-members
  • Consult our enrolment policy for more information


Telephone: +31 (0)10 4082607 (Graduate School).

Session 1
September 27 (Friday) 2019
Sanders Building (directions), room Sanders 1-13

Session 2
October 4 (Friday) 2019
Tinbergen building (directions), room HT-204


This course introduces a set of digital research methods (DRM). With these innovative methods, it is possible to analyse large textual datasets from social media, news articles, interviews, and other sources. In virtually all disciplines in the social sciences and humanities, these techniques are becoming increasingly popular.

The course is specifically designed for people who do not feel comfortable using technical programming software. We will focus on how DRM can be applied with accessible software based on user-friendly interfaces. 

The first class will introduce basic approaches to scraping social media content (namely Twitter) as well as news articles (LexisNexis) and will also cover steps for cleaning textual data.  Additionally, some text analysis approaches will be introduced, and there will be in-depth exploration of topic modelling, a powerful but easy to use text analytic method for uncovering hidden themes from many text documents.

In the second class, we will explore additional social media scraping tools (Facebook and YouTube) and also investigate how topic modelling results can be visualized as networks.  Finally, some exercises of using network analysis approaches to visualize and analyse qualitative content coding will be undertaken.  Network depictions of textual content can reveal new perspectives and lead to enhanced interpretations.

Working method

  • There will be two 3-hour sessions.  Each session will include a mix of lectures (15%), demonstrations (5%), and in-class exercises (80%).
  • Participants can work with text data supplied for the course or they can explore text data of their own to work with. 

Learning objectives

After completion of this workshop, you will be able to:

  • Scrape and clean textual data from social media and news articles;
  • know how to conduct digital research methods, particularly topic modelling, and
  • to visualize and interpret results of the analysis.

How to prepare

In order to actively participate in the course, you are required to read the following literature:

 The first two readings are very short introductions and applicable to domains beyond business.

You should also familiarise yourself with the instructor’s Digital Research Methods Step-by-Step Guide, particularly the sections on topic modelling (5.8) and topic networks (7.9) and data scraping: Mozdeh (3.8), LexisNexis (4.2), GetOldTweets (4.8), and Netvizz (4.3, 4.6):

Bring your own laptops for the in-class exercises. Do note, you need to have Administrator rights to install the software.  The following software programs need to be installed:

These tools may be acquired from either the hosts’ websites or from the course instructor’s Digital Research Methods Dropbox ‘tools’ folder (see below).

Session description

Session 1
Basic scraping and cleaning of data and basic topic modelling

  • In this session, you will learn to scrape data from Twitter and LexisNexis using several online and offline tools, extract their textual elements, and learn how to conduct basic, but necessary, cleaning of the data in the ConText text analysis software. 
  • You will also learn about how topic models operate, their application, and subsequently perform and interpret topic modelling on the acquired data.

Session 2
Further scraping and advanced topic modelling

  • In this session, we will cover other approaches to social media scraping (namely, Netvizz) and more rigorous cleaning through Excel.
  • You will learn about more advanced approaches to topic modelling and become familiarized with a more precise tool for topic modelling (MALLET). 
  • We will explore visual interpretation of topic models through network representations (in Gephi).

About the instructor

Ju-Sung (Jay) Lee is assistant professor of digital research methods at the Department of Media and Communication of Erasmus University Rotterdam (EUR). 

His research focuses on various digital, network, and statistical methodologies and their application to online and offline discourse and interactions, recently in the context of the refugee crisis and artist communities. Jay holds a PhD in sociology from Carnegie Mellon University (USA) and has a background in computer science, organisation and decision sciences, and quantitative sociology.