I participate in user research everyday in my position at Harvard and I'm always searching for ways to make data analysis easier for myself and my team. This includes staying up-to-date on best practices and standards in user research and design, and learning about the latest tools professionals are using. And with the latest hubbub around ChatGPT, I began to think about ways AI can be used to inform UX. What about transcript analysis from user interviews?
I decided to run a little test using ChatGPT to summarize a very large transcript from a user interview. The transcript is 43 minutes in length and was transcribed by Zoom. The participant is asked questions about their experience and opinion on a website building platform called Open Scholar. Open Scholar is a platform that was born within Harvard University as a tool to help showcase faculty research. Today, it is the primary tool used by Harvard to create websites across the entire university (for now).
Preparation, Trial, & Error
Before heading over to ChatGPT, I removed any identifying traits about the participant, including their name, title, and affiliation. After that, I was ready to get the show on the road. You may be wondering, if this was a Zoom transcript, this interview must have been plagued by misspellings, grammatical errors, run-on sentences, and a variety of other inaccuracies. And you would be right! Zoom is terrible at transcription, especially for ESL speakers. I decided for this little experiment that I would include the transcript in it's raw form to see how ChatGPT processes a transcript in its raw form.
The first thing I did was ask ChatGPT if it could summarize a transcript. To my delight, it responded with a confirmation. What I found most interesting is that it also confirmed it can transcribe video! (Since this was just a quick experiment, I decided not to test the AI's ability to transcribe video just yet but this is something I would like to test in the near future.)
Once I received the green light, I decided to copy and paste the entire transcript into ChatGPT to see how it would respond (time stamps and all!). Unsurprisingly, it was not capable of processing text at that length in one interaction.
I decided to proceed with the only logical next step. I broke the transcript into parts for ChatGPT to individually summarize. It took some trial and error to figure out what the sweet spot was for the word count ChatGPT could handle at once. Each batch of text consisted of 850 words, give or take. In total, 10 batches of text were processed individually but consecutively. I wasn't strategic in where I cut off each section of text within the transcript. This was a quick experiment after all. Once I confirmed it's limitations on word count, I was pleased to see what ChatGPT came up with. ChatGPT generated the following summary:
The transcript is a conversation between an interviewer and a participant who works at Harvard in academic technologies. The participant talks about their background in the private sector and their role at Harvard, which involves making sure that faculty, students, and administrators efficiently use digital tools for teaching and learning. They also discuss the importance of user experience and accessibility for all users, regardless of their physical or mental status. The interviewer then asks about the participant's technology setup for using Open Scholar, and the participant talks about using whatever CMS is available. The conversation is approximately 2 minutes and 54 seconds long.
So cool!! After that, I came up with the idea of running two instances of ChatGPT summarizing the same transcript. For the first one, I started the prompt asking ChatGPT to summarize the transcript. In the second instance, I gave it no instruction. I just threw that bad boy in the chat box in bunches and clicked enter.
The Results
After copying and pasting the same 10 broken up sections of the same transcript into both instances of ChatGPT, I was holding on with bated breath to uncover the fruits of my labor. Ok! Maybe I'm over exaggerating just a little bit but I was super excited. Here are the results from my little experiment.
Technical Issues
Sometimes ChatGPT ran into processing issues. For example, I would receive the same message it provides when the text is too long even when it wasn't. On one occasion, the AI couldn't process a specific chunk of words from the transcript for reasons I don't know. It could be related to the block of text starting or ending at an odd place in the grand scheme of the entire transcript or something totally unrelated.
Level of Accuracy
The whole purpose of my little experiment was a desire to know if ChatGPT could accurately summarize a 40 something minute long transcript that was transcribed using Zoom. My conclusion is, yes...and no. Let me explain.
The level of detail and accuracy ChatGPT provides is truly remarkable in its summarization capabilities. I'm not going to sit here and lie by telling you that I compared and contrasted the entire original transcript against both AI summaries. This is a short experiment after all. However, I did compare four individually summarized chunks across both summaries produced by the two instances of ChatGPT, and the original transcript. What I noticed was that the level of detail and accuracy was pretty remarkable to me but it varied. Sometimes the instance of ChatGPT where I specifically asked the AI to transcribe it was more accurate and detailed. While other times, the instance where I just threw the transcript into the chat with no rhyme or reason was more accurate and detailed. For example, here are two summaries from the same portion of the transcript generated by ChatGPT. This one was generated by the instance where I did not ask it to summarize the transcript:
The speaker is discussing two different situations. In the first situation, a new faculty member is joining Harvard and has a website with content that they want to merge into Harvard's website. The content is unique and innovative and cannot be found elsewhere. The speaker needs to merge the content into Harvard's website using Open Scholar. In the second situation, a faculty member has been at Harvard for some time, but their website has not been maintained for a few years. The backlog of information needs to be updated and ported to the website. This is also an opportunity to redesign the website. The speaker needs to use Open Scholar to update the website.
This summary is from the instance where I did ask ChatGPT to summarize the transcript:
The participant is discussing two situations in which they need to transfer content to the Open Scholar platform. In the first situation, a new faculty member is joining Harvard from a different university, and they want to merge their website content into Harvard's website. The content includes 650 publications with links to events, projects, and web pages that are all interlinked together. The participant needs to merge this content from the new faculty member's website into Open Scholar. In the second situation, an existing faculty member at Harvard needs their website updated. The participant needs to port the backlog of information to the website and redesign it if necessary, as the website looks outdated.
Notice in this example, the second summary provides information the participant talks about related to the type of content this faculty member wanted to merge. It provides detailed information on the number of publications and the types of content. In the first summary, it just says that the content is unique, innovative, and cannot be found elsewhere. After reading the original transcript, I was able to confirm that the participant did say all of these things, so both of these summaries are technically correct. However, the level of detail was different. And it flip flops back and forth between the two instances.
Ad Lib
Something else I noticed was that ChatGPT likes to ad lib from time to time. At the beginning of the interview, the user was asked if they use any assistive technology. In response to this question, the participant doesn't really talk about it. This is what ChatGPT came up with (the instance I did not ask to summarize):
As for assistive technologies, I personally do not require any, but we always keep accessibility in mind when designing and developing features for Open Scholar. This includes things like using proper HTML markup, providing alt text for images, and ensuring that the platform is compatible with screen readers and other assistive technologies.
At no point in the original transcript does the participant talk about accessibility features they use. What's funny is, I actually cut off the transcript right at this point when I copied and pasted into the chat. In actuality, the participant confirms that they use whatever accessibility features are provided by Open Scholar. However, ChatGPT fills in the blanks for us here by providing us with examples of standard digital accessibility features, such as alt text, screen reader compatibility, and so on. In case you were wondering, the other summary did not do this.
For extra measure, I decided to use an entirely different AI tool to compare ChatGPT's accuracy (or lack thereof). A good colleague of mine told me about Chat PDF. It's another tool that can analyze text in any language but in pdf form. It wasn't awful but it wasn't great either. When I asked the AI what the participant does, among other questions, the responses provided very little information. Sometimes, it couldn't answer questions that I knew the pdf held the answer to. It appeared that it wasn't as capable as ChatGPT. There is also the conundrum of having to figure out a question to ask it, making it difficult to access important pieces of information provided by the user.
Caveats and Causes for Concern
In addition to the aforementioned concern, there are other issues ChatGPT poses when being used for user research (and not just for transcripts). There are real concerns regarding data privacy and how OpenAI, the company behind ChatGPT, uses that data.
In an attempt to ease these concerns, they've introduced a new feature to ChatGPT where users can turn off the chat history. When chat history is enabled, OpenAI uses that information to train and improve the AI. OpenAI claims that when chat history is disabled, they retain the conversation for 30 days and then it is permanently deleted. Do I believe that with just a click of a button OpenAI won't use my data? Not really but I'm a pessimist, so take that as you will. The conversation is then held for 30 days to monitor abuse. Unfortunately, they don't go into detail on what this means.
What I found most interesting is OpenAI's role out of their business subscription plan for ChatGPT. With this subscription, users' data won't be used to inform their models by default and users have to opt-in. If only you could see my eyes roll as I write this. However, the 30 day holding period for abuse still applies.
Briefly pivoting back to Chat PDF, their privacy policy states they do not share your files with any one and that they are stored on a secured cloud storage which you can delete at any time. I'm still a pessimist.
My final concern is AI generated bias. This is an issue that cannot be overlooked and should always be taken into consideration when using any AI tool for user research.
Conclusion
I must confess, this was a fun little experiment and I learned a lot. If we just look at ChatGPT's capabilities, I do think it has the potential to be a great tool for user research analysis but there are so many caveats that one must consider at all times. Nothing beats manual analysis but I do believe it could be a great tool to help with developing themes for coding research and in gaining a broad overview of the results. However, there's just too many caveats.
Comments