Methods for the Classification of Data from Open-Ended Questions in Surveys

Disputation
16 April 2024

Camille Landesvatter

University of Mannheim

Research Questions

Which methods can we use to classify data from open-ended survey questions?
Can we leverage these methods to make empirical contributions to different research areas?

Motivation

1️⃣ Increase in methods to collect natural language (e.g., smartphone surveys with voice technologies) requires the evaluation of available classification methods (i.e., fully manual, semi-automated, fully automated methods).

2️⃣ Special structure of open-ended survey answers (e.g., shortness, lack of context) requires the testing of machine learning methods for the survey context.

3️⃣ The potential of open answers to equip researchers with rich data useful for various research areas and debates.

Overview of Studies

Study 1 Study 2 Study 3
How valid are trust survey measures? New insights from open-ended probing data and supervised machine learning Open-ended survey questions: A comparison of information content in text and audio response formats Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?
Research Areas/Debates
Measurement equivalence Questionnaire Design Emotion Analysis

Study 1:
“How valid are trust survey measures?
New insights from open-ended probing data and supervised machine learning”
(Published in Sociological Methods & Research)

Study 1: Characteristics

  • Background: debate about whether we are measuring the same type of trust across respondents (i.e., equivalence debate cf. Bauer & Freitag 2018)

  • Research Question: How valid are traditional trust survey measures?

  • Questionnaire Design: 5 open-ended questions per respondent, block-randomized order

  • Data: U.S. non-probability sample; \(n\)=1,500 with 7,497 open answers

Study 1: Methodology

Figure 1: Supervised Classification for a Trust Question.

Supervised classification approach:

    1. manual labeling of randomly sampled documents (n=[1,000/1,500])
    1. fine-tuning the weights of two BERT1 models, using the manually coded data as training data, to classify the remaining n=[6,500/6,000]

Study 1: Results

ID Measure Trust Probing Answer Associations (known others) Associations (sentiment)
123 Most people 0.33 I was thinking of people I don’t know personally. 0 (No) 0 (neutral/positive)
3139 Most people 0.17 Tourists that come to our little village. I tend to be very wary of them. 0 (No) 1 (negative)
2980 Stranger 0 No one in particular, but I don’t think I could trust anyone ever again. 0 (No) 1 (negative)
4286 Watching a loved one 0 A former neighbor of mine who was a single father with a son close to my son’s age. 1 (Yes) 0 (neutral/positive)
Table 2: Illustration of exemplary data. Note: n=7,497.

Figure 2: Associations and Trust Scores. Note. CIs are 95% and 90%.

Study 2:
“Open-ended survey questions: A comparison of information content in text and audio response formats”
(Under Review at Public Opinion Quarterly)

Study 2: Characteristics

  • Background: requests for spoken answers are assumed to trigger an open narration with more intuitive and spontaneous answers (e.g., Gavras et al. 2022)

  • Research Question: Are there differences in information content between responses given in audio and text formats?

  • Experimental Design: random assignment into either the text or audio condition

Study 2: Methodology

  • Operationalization of information content in open answers via

    • response length (# of words)
    • number of topics
    • response entropy
  • Questionnaire Design: 9 open-ended questions per respondent, block-randomized order, SVoice tool (Höhne, Gavras and Qureshi 2021)

  • Data: U.S. non-probability sample; \(n\)=1,461 with \(n_{text}\)=800 and \(n_{audio}\)=661

    • average item non-response rate text: 1%
    • average item non-response rate audio: 53%

Study 2: Results

Figure 3: Information Content Measures across Questions.
Note. CIs are 95%, n_vote-choice: 830 (audio: 225, text: 605), n_future-children: 1,337 (audio: 389, text: 748)

Study 3:
“Asking Why: Is there an Affective Component of Political Trust Ratings in Surveys?”
(Submitted to American Political Science Review)

Study 3: Characteristics

  • Background: conventional notion of rational trust is challenged by the idea of an “affect-based” form of political trust (e.g., Theiss-Morse and Barton 2017)

  • Research Question: Are individual trust judgments in surveys driven by affective components?

  • Questionnaire Design: audio condition only, SVoice tool (Höhne, Gavras and Qureshi 2021)

  • Data: U.S. non-probability sample; \(n\)=1,474 with 491 audio open answers

Study 3: Methodology

Figure 4: Methods for Sentiment and Emotion Analysis.

Study 3: Results

Figure 5: Emotions in Speech Data from SpeechBrain.
Note. n_neutral=408, n_anger=44, n_sadness=18, n_happiness=21. Reference category (right): neutral.

Summary and Conclusions

  • Web surveys allow to collect narrative answers that provide valuable insights into survey responses
    • think aloud, associations, emotions, tonal cues, additional info, etc.
  • New technologies (e.g., speech recognition) allow innovative data collection
  • Analyzing natural language can inform various debates, e.g.:
    • Study 1: equivalence debate in trust research (cf. Bauer & Freitag 2018)
    • Study 2: audio response formats in web surveys (cf. Gavras et al. 2022)
    • Study 3: cognitive-versus-affective debate in political trust research (cf. Theiss-Morse and Barton 2017)
    • Study 1-3: item and data quality

Summary and Conclusions

Semi-automated methods for open survey answers

  • Traditional supervised methods need extensive labeled datasets
  • Language Models (LMs) allow modeling with less labeled data and domain-specific knowledge (fine-tuning and prompting techniques)
    • E.g., Study 1: Random Forest vs. BERT with n=1,500
  • But: LMs suffer from high complexity and limited transparency
    • start with simple methods and evaluate (e.g., dictionary approach → deep learning in Study 3)
    • trade off between accuracy and explainability
  • Additionally consider: task difficulty, sample size, structure of the answers, state of previous research, available resources, etc.

Thank you for your Attention!

References

Bauer, P. C., and M. Freitag. 2018. “Measuring Trust.” Pp. 1–27 in The Oxford Handbook of Social and Political Trust, edited by E. M. Uslaner. Oxford University Press.

Gavras, K. et al. 2022. “Innovating the collection of open-ended answers: The linguistic and content characteristics of written and oral answers to political attitude questions.” Journal of the Royal Statistical Society. Series A, 185(3):872-890.

Höhne, J. K., K. Gavras, and D. Qureshi. 2021. SurveyVoice (SVoice): A Comprehensive Guide for Collecting Voice Answers in Surveys. Zenodo. Available from: https://doi.org/10.5281/zenodo.4644590.

Pérez, J. et al. 2023. “Pysentimiento: A Python Toolkit for Opinion Mining and Social NLP Tasks.” arXiv.
Ravanelli, M. et al. 2021. “SpeechBrain: A General-Purpose Speech Toolkit.” arXiv.

Theiss-Morse, E., and D. Barton. 2017. “Emotion, Cognition, and Political Trust.” Pp. 160–75 in Handbook on Political Trust. Edward Elgar Publishing.