Digital research methods for textual data

Methodology courses and philosophy of science


Course information

ECTS: 2.5
Number of sessions: 4
Hours of session: 3
Course fee:

  • free for PhD candidates of the Graduate School
  • €525 for non-members
  • Consult our enrolment policy for more information

Contact:

Telephone: +31 (0)10 4082607 (Graduate School).


In the academic year 2021-2022 this course will take place online.

Session 1
September 24 (Friday) 2021
09.30-12.30

Session 2
October 1 (Friday) 2021
09.30-12.30

Session 3
October 8 (Friday) 2021
09.30-12.30

Session 4
October 15 (Friday) 2021
09.30-12.30


Introduction

This course introduces a set of digital research methods (DRM). With these innovative methods, it is possible to analyse large textual datasets from social media, news articles, interviews, and other sources, and also render these as networks, an alternative analytical perspective. In virtually all disciplines in the social sciences and humanities, these techniques are becoming increasingly popular.

The course is specifically designed for people who do not feel comfortable using technical programming software. We will focus on how DRM can be applied with accessible software based on user-friendly interfaces. However, those who more inclined to learn or use programming are welcome to do so, as the course material also includes instructions for executing DRM using R (a statistical programming language).

The first class will introduce concepts and structuring of digital data. We will also cover some basic approaches to scraping social media content (namely Twitter) as well as news articles (LexisNexis) and will also cover steps for cleaning textual data and basic text analysis.

In the second class, more advanced text analysis approaches will be introduced. This will include topic modelling - a powerful but easy to use text analytic method for uncovering hidden themes from many text documents - and sentiment analysis, a method for assessing polarity in texts.

In the third class, we will explore additional social media scraping tools (for Facebook, YouTube, and Instagram) and also introduce network analysis, a relational perspective that can also be applied to text data. We will examine topic models rendered as networks. Network depictions of textual content can reveal new perspectives and lead to enhanced interpretations.

The fourth class continues the exploration of text-as-networks, including entity and semantic networks,

Also, some steps for using network analysis approaches to visualise and analyse qualitative content coding will be undertaken.


Working method

› There will be four 3-hour sessions. Each session will include a mix of lectures (40%), demonstrations (5%), and in-class exercises (55%).

› Participants can work with the text and network data supplied for the course OR they can explore text/network data of their own.


Learning objectives

After completion of this workshop, you will be able to:

› Scrape and clean textual data from social media and news articles.

› Conduct some digital research methods, particularly text analysis, topic modelling, sentiment analysis and network analysis.

› Visualise and interpret results of the analysis.


How to prepare

In order to actively participate in the course, you are required to read the following literature:

› Levallois, C. (2017). A primer on text mining for business. (https://seinecle.github.io/mk99/generated-pdf/text-mining-for-business.pdf)

› Levallois, C. (2017). A primer on network analysis for business.

 (https://seinecle.github.io/mk99/generated-pdf/network-analysis-for-business.pdf)

› Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84. (Focus on sections leading up to ‘LDA and probabilistic models’

 (https://cacm.acm.org/magazines/2012/4/147361-probabilistic-topic-models/fulltext)

› Thelwall, M. (2017). Heart and soul: Sentiment strength detection in the social web with SentiStrength.

Cyberemotions: Collective Emotion in Cyberspace, 119134.

(Paper available on SentiStrength website; focus on sections Introduction, Using, Core, Additional, Sarcasm, Application; you may skim the rest)

› Lee, J. (2021). Digital methods and tools: A Step-by-Step Guide, Erasmus University Rotterdam (URL will be emailed to participants)

The first two readings are very short introductions and applicable to domains beyond business.

You should also familiarise yourself with the instructor’s Digital Research Methods Step-by-Step Guide, particularly the sections on topic modelling (4.8) and topic networks (6.9) and data scraping: Mozdeh (3.8), LexisNexis (4.1), SNScrape (3.9), and Netvizz (for YouTube 2.4):

If the course is not held in a pc lab, then bring your own laptops for the in-class exercises. Do note, you may need to have Administrator rights on your laptop in order to install some of the software. The following software programs need to be installed:

› ConText 1.2 or 2.0: http://context.lis.illinois.edu (http://context.lis.illinois.edu/) (http://context.lis.illinois.edu/)

› Gephi 0.9.2: https://gephi.org (https://gephi.org/) (https://gephi.org/)

› Mozdeh (Big Data Text Analysis, Windows only):(http://mozdeh.wlv.ac.uk/) (http://mozdeh.wlv.ac.uk/)

SNScrape (for Twitter scraping. Available only through the DRM Dropbox ‘tools/Extra’ folder: (URL to

 be emailed to participants)

These tools may be acquired from either the course instructor’s Digital Research Methods Dropbox ‘tools’ folder (see below) or the original websites.

› DRM Dropbox ‘tools’ folder: (URL to be emailed to participants)


Session description

Session 1
Digital data, basic scraping, cleaning of data and text analysis

  • This session introduces you to world of digital data, including text data. 
  • Also, you will learn to scrape data from Twitter and LexisNexis using several online and offline  tools, extract their textual elements, and learn how to conduct basic, but necessary, cleaning of the data in the ConText  text analysis software.
  • Finally, you will learn to conduct basic text analysis.

Session 2
Advanced text analysis: Topic modeling and sentiment analysis

  • In this session, you will learn about how topic models operate, their application, and subsequently perform and interpret topic modelling on the acquired data.
  • We will cover other approaches to social media scraping (for Facebook, YouTube, and Instagram) and more rigorous text cleaning through Excel.
  • You will also learn about automated sentiment analysis, which can detect polarity of text segments.

Session 3
Network analysis

  • In this session, you will learn about (social) network analysis, an analytical relational perspective of data analysis.
  • You will learn how textual data can be viewed as networks, specifically topic model networks, through the Gephi program.

Session 4
Advanced text-as-networks

  • This session extends the network treatment of textual data and covers various semantic networks.
  • Also, the network approach to qualitative coding/analysis will also be investigated.

About the instructor

Ju-Sung (Jay) Lee is assistant professor of digital research methods at the Department of Media and Communication of Erasmus University Rotterdam (EUR).

His research focuses on various digital, network, and statistical methodologies and their application to online and offline discourse and interactions, recently in the context of the refugee crisis and artist communities. Jay holds a PhD in sociology from Carnegie Mellon University (USA) and has a background in computer science, organisation and decision sciences, and quantitative sociology.