In the academic year 2023-2024 this course will take place online.
May 7 (Tuesday) 2024
May 14 (Tuesday) 2024
May 21 (Tuesday) 2024
May 28 (Tuesday) 2024
Analysis of textual data, such as interview transcripts, policy documents, and social media content, is common practice within the social sciences, humanities, and other faculties. In this course, you will learn how to do such analysis with the program MATLAB. We will discuss both pre-processing and analysing both quantitative and qualitative textual data.
Learning and using MATLAB has several valuable advantages:
- The program’s Text Analytics Toolbox offers a pleasant and effective tool for analysis of textual data.
- The field of text analysis is rapidly evolving with strong connections to data intensive science and machine learning applications. MATLAB helps you master these advanced methods by providing a versatile learning platform.
- MATLAB also helps with working with the heterogeneous, multimode and large datasets that are becoming increasingly important in the social sciences and humanities.
- With MATLAB you can produce high quality visualizations and build a strong Open Science visibility record.
Learning MATLAB is a long-term investment; it is worth your time if you want to prepare for a ‘data-intensive’ career potentially using various methods, including text analysis.
This online course follows a learning-by-doing approach with practical hands-on examples and interactive notebooks. After an introduction of MATLAB’s fundamentals, you will learn to work with the Text Analytics toolbox. For instance, you will learn to work with tokenized and labeled datasets and apply various methods and applications for text analysis research, such as TF-IDF, BagofWords, bagOfNgrams, text-search, word-embeddings, and sentiment analysis. Students are encouraged to bring their own dataset(s) to work on.
By completing this course participants will:
- Acquire essential data engineering skills to organize, structure and prepare text data for qualitative and quantitative analysis in MATLAB;
- Be able to work independently with the MATLAB Text Analytics Toolbox, and to apply various text analysis research methods and functions in this program;
- Visualise text-analysis results and produce high quality graphics with MATLAB.
In advance of session 1 students should have completed the MATLAB Onramp (2 hours, self-paced course online).
Further, students should install MATLAB and both the Text Analytics toolbox and the Statistics and Machine Learning Toolbox before the course starts. More information on how to install MATLAB and MATLAB Toolboxes can be found here and at the EUR employee work support page.
Students can install the MATLAB 2023A software directly from the EUR Software Center or download the latest MATLAB version from MathWorks. A MathWorks account is needed to download the latest version of the MATLAB software (choose MATLAB individual as license type). Click here to register for a MathWorks account. A MathWorks account is also required to make use of MATLAB Drive where all course materials will be shared.
MATLAB’s minimum system requirements are described here. A minimum of 8 GB of RAM is advised.
Participating in this course does not require any previous programming experience. The course can be attended by researchers who are not yet experienced with text analysis. Students need to prepare for 2 hours homework per session.
The course is useful for students who have no prior knowledge of and experience with MATLAB or text analysis. Some familiarity with a statistical package (SPSS, Stata, R, SAS) and/or a programming language (Python, R) is recommended.
In the first session, students are familiarized with the MATLAB user interface, working with interactive notebooks, MATLAB Drive and installing toolboxes (i.e., the Text Analytics Toolbox). Through hands-on examples students learn to work with (among other things) chars, strings, tokenized documents, text search and simple regular expressions. Students will be introduced to visualizing text data in MATLAB using basic 2D scatter plots. Home exercises are provided for further exploration and deepening of working with text in MATLAB.
In the second session, students will master various pre-processing methods for text analysis, such as frequency counts, TD-IDF and custom labeled datasets, and learn to make use of MATLAB’s supporting methods for text analysis, such as BagofWords, bagOfNgrams and word-embeddings. The second session will also cover practical data management skills to handle and organize (large) collections of text data. Students are encouraged to bring their own dataset to work on.
In the third session students will explore data structures relevant to text-analysis e.g., graphs and learn to apply clustering, classification, and data reduction methods to text-data. Students will also learn to use MATLAB’s advanced capabilities for visualizing text data.
Each student presents a case study and shares lessons learned for working with text data and text analysis in MATLAB. A comparison of MATLAB with other software for text analysis (ATLAS.ti, R) and programming languages e.g., Python) will be discussed where relevant.
Rob Grim has held positions as a Data Analyst, as a Research Data Specialist and as Head of Research Support. He currently works as Business/Economics & Data Librarian at the EUR and as a member of the Erasmus Data Service Centre (EDSC) team. Rob is a Carpentries teaching instructor and has extensive experience with data-preprocessing, and data analytics in various science disciplines. He has an interest in statistics, cognitive science, and machine learning. Rob has a background in Psychology.