ChatGPT vs. Superusers – What Our Data Showed
Over the past month, we’ve been putting ChatGPT to the test to answer two questions.
1) How might ChatGPT threaten the work of community professionals?
2) How might ChatGPT be useful to the work of community professionals?
The Limitations of ChatGPT
Before we provide answers, let’s highlight the limitations of ChatGPT.
I like Benedict Evans description of AI as: “a ten-year-old that’s read every book in the library and can repeat stuff back to you, but a little garbled and with no idea that Jonathan Swift wasn’t actually proposing, modestly, a new source of income for the poor.”
Specifically, ChatGPT suffers from the following:
1) It’s not able to understand anything. It essentially makes predictions. When you give it a prompt, it gives a set of words and sentences which matches the input with the highest probability.
2) It’s simply attempting to give the most “plausible” answer. This isn’t mean it’s the right answer – just the one which best matches the pattern.
3) It can’t evaluate competing sources of information. The lack of subject matter expertise means it lacks the ability to evaluate the quality of sources.
4) It doesn’t acknowledge uncertainty. ChatGPT will provide a statement of fact without acknowledging any degree of uncertainty. At present, you can’t find the sources of information to evaluate.
Let’s take a look at some examples.
To help answer this question, we undertook a few simple tests.
1) Comparing existing answers. We found questions with an accepted solution and posted those same questions to Chat-GPT to evaluate if it could provide a comparable, if not better, answer.
2) Creating new answers. We posted unanswered questions into Chat-GPT and published the answer in the community. We then let members highlight if they found the answer useful or not.
Before we explore the results, it’s useful to highlight the methodological problems.
Comparing ChatGPT responses to accepted solutions suffers from a verification problem. We don’t have the expertise to evaluate whether one response is better than another – but we can look at the nature and structure of the response. Yet, this might suffer from subjective bias. We’re also, by definition, comparing ChatGPT to the ‘best’ answers – not the typical member response.
Creating and publishing answers to unanswered questions in the community also has issues. By nature, if the question is unanswered, it might be a more difficult question to answer. This creates a bias against ChatGPT.
However, those issues aside, we feel we can draw a few specific conclusions from our experiments which are worth sharing.
Does ChatGPT Provide Better Answers Than Top Members?
It’s not always easy to be definitive, but here are some of our experiments.
Using this question on the Spotify Community as an example.
This is a very precise question – but its explanation is somewhat long-winded.
We compared the Accepted Solution with the Chat GPT response below.
This answer perhaps best showcases the challenge in comparing the quality of answers.
The accepted solution suggests using private mode to prevent influencing music recommendations. The author also suggests training the algorithm by liking/disliking songs and the Spotify Kids app. The use of a video is useful. However, the explicit content part of the question is ignored.
The ChatGPT answer suggests removing previously listened to songs from playlists – which seems like a lot more work. It also wouldn’t immediately affect future playlists as much as past playlists. However, it does include specific steps to filter explicit content from kids.
On balance, we judge the accepted solution better because the author understood the need to train the algorithm for the future.
ChatGPT Can Provide Better Answers Than Top Members
Let’s try this example from the SAP community.
We can compare the an answer from ChatGPT against an answer from the community below:
In this example, ChatGPT is giving a different answer to that provided by the solution. It’s not easy to independently check if it’s better, but at a glance, it seems to be a better answer than the one marked as Best Answer (also by the OP).
Thus it is possible for ChatGPT to provide better answers than top members.
ChatGPT Doesn’t Presume Expertise
Let’s use one more example from the Tableau community.
The question above is looking to create a shape based upon whether the table has been refreshed within the past four hours or not.
The question includes images (which ChatGPT can’t read) and a workbook (which it potentially can). I copied and pasted the workbook into a dropbox folder and linked to this. This was the result.
Again we have two slightly different answers here. The code for both seems to be accurate (although each used a slightly different approach). But the ChatGPT answer offers more information – especially explaining how to create shapes.
The difference here is a presumption of knowledge.
The community answer presumes the person asking the question has a certain level of expertise, the ChatGPT answer doesn’t. The former might be better for a member more familiar with Tableau, the latter might be better for someone that isn’t.
ChatGPT Can’t Share Experiences
Let’s try an example from Mayo Clinic.
The question is asking for opinions on two different medications.
There isn’t a ‘best answer’ on the platform – so we went with an answer near the top which had the most reactions (i.e. the one most people are likely to see if they visit the question).
The community answer is the clear-cut winner. Not only that but there are 48 pages(!) of responses to review the aggregated experiences of many, many, people. As noted in the ChatGPT answer, it’s only capable of giving a generic answer about the drugs. It can’t share the experiences of either of them.
ChatGPT Could Serve As A Good Assistant To Superusers
So, does ChatGPT provide better answers than top members?
It can certainly provide more complete answers and alternative approaches to solving problems. But it can’t understand anything related to the context.
The best use of ChatGPT right now is perhaps to encourage superusers to use it to generate an answer which they can then edit to the specifics of the question.
Does ChatGPT Provide Comparable Answers to Members?
For this test, we found 20 unanswered questions in the Tableau community over several weeks.
We put the questions into ChatGPT and copied/pasted the results without any editing.
Then we waited to see how members would respond and engage with the questions.
You can see the results in this table
Several of our answers were marked as best answers and it was clear the OP (original poster) received the response they needed.
This is shown below.
However, there were also occasions when ChatGPT seemed to provide the wrong answer (see below) or an answer which didn’t contain enough context.
This highlights an important detail.
For example, many member don’t provide enough information in the question for someone to be able to provide a response. ChatGPT will provide an answer based upon whatever prompt is given, but it won’t ask for clarifying information as a human would. That’s a severe limitation.
For example, if someone says “The [software] app on my iPhone crashes”. ChatGPT will churn out the best possible answer with whatever information is provided. This usually means a generic list of things to try (this is the ChatGPT equivalent of a support agent advising someone to turn it on and off again).
However, a human might first ask for more information to provide the right answer (‘What iPhone, what software, what version, what happened before that’) etc…This means a human is more likely to deliver the right response eventually.
It’s a little like someone asking for the best holiday destination. A human would ask a clarifying question ‘well, what do you like?’. Whereas ChatGPT might blurt out “Thailand has great beaches!”.
These are admittedly very primitive metrics from a TINY dataset.
By our count, as you can see in the table shared above, ChatGPT gave:
- Two best answers.
- Six incomplete answers (which may have been helpful)
- Two false answers.
Out of 20 responses, we might consider just two best answers (a ‘Best Answer’ Rate of 10% to be extremely poor). But that really depends on what we compare it to.
If we look at the best answer rate of the leaderboard of the top 10 members, ChatGPT’s 10% ‘best answer’ rate doesn’t make the top 10 – but it’s not too far away from the typical best answer rate of 10% to 16%. One more ‘best answer’ and ChatGPT would have been in the middle of the pack (and one less would have been far away – that’s the problem with a small dataset).
Realistically, we would need a much bigger test with 100+ answers to get a detailed picture. But responding to that many questions raises some ethical concerns.
Where Can ChatGPT Be Useful?
Let’s refresh where community fits into the modern customer support experience.
We mentioned communities are great at highlighting the ‘in between’ questions. These are the questions where the answer isn’t easy to find in the FAQ, yet don’t require a member’s personal data to answer.
Yet even within the in-between questions, there are several different needs members have.
When a member asks a question, they typically fall into four categories:
- The Lazy Member. These members know the information probably exists, but they don’t have the time or inclination to search through so much documentation to find it. They might not know what terms to use.
- The ‘Tried Everything’ Member. These members have tried to resolve their problems, but suffer from a unique edge case. They’re looking for a workaround to their challenge.
- The ‘Optimizer’. These members want to achieve the best outcome. They don’t care about speed, they just want to know how to do something better.
- The Experientialist. These members care about the experiences others have had and want to understand how they felt about it.
ChatGPT’s capabilities are fantastic at helping the ‘lazy member’ find documented information a lot faster, but it’s far less effective when members come up against unique edge cases, need the best possible option, or want to know how an experience was for others.
You can see this here:
In short, Chat-GPT is excellent at supporting members who don’t have the time or inclination to search for answers in the knowledge base or documentation. If you don’t know the exact terms to use, Chat-GPT is essentially a highly powered chat bot which can resolve problems which are documented in the FAQ.
This could help a lot of members save a lot of time and effort. It could theoretically answer the majority of questions the moment they are posted.
However, at present, it’s not possible for ChatGPT to know if it is answering a ‘time-saving’ question (where it should respond as quickly as possible) or whether it is answering any of the other three (in which case it should leave it for the community).
This is the problem. ChatGPT will give whatever answer best matches the prompt it’s given. Even if it has little data to work from, it will still give an answer which matches the prompt. In our tests, ChatGPT frequently fabricated links and statistics.
The consequences of this could range from a minor annoyance (‘this didn’t work!’) to extremely problematic (‘your financial advice cost me a fortune!’).
The Opportunities of ChatGPT
The biggest immediate opportunity is to use it to answer questions. I’d suspect up to 50% of questions asked in support communities are questions which do have a documented solution somewhere. ChatGPT is the ultimate retrieval system.
Yet, it’s too early to automate responses to questions.
The best benefit right now is to integrate it as an assistant. You can train and encourage moderators and superusers to use it. These audiences can use their judgement to decide the probability of the question having a documented solution. If it does, they can quickly generate and publish an answer. If not, they can follow whatever they’re currently doing today.
Another benefit of ChatGPT is for community professionals who constantly have to ask their colleagues for help answering questions. This was never a great use of anyone’s time. Hopefully it can save a lot of time here.
A longer-term benefit is to use it to screen questions prior to reaching the community. A member can write a question, ChatGPT can serve up an answer (clearly marked as so), and the member can highlight if they are happy with the answer or not. If they are, the answer is posted and documented in the community (for others to find). If not, it is published as a question in the community for others to answer.
Is ChatGPT A Threat To Communities?
ChatGPT will clearly have a bigger impact in some communities than others.
Communities based around a feeling (sense of community, professional connections, sharing expertise in private) won’t be affected. Theoretically, it can be used to welcome and greet members – but I’m not sure that’s a game-changer.
But communities centred around finding answers to questions are likely to experience change.
For example, it won’t take people long to realise they can become a superuser in a community by pumping questions into ChatGPT and publishing answers. I’m not quite sure how to prevent that yet (or whether it’s a bad thing).
The major dangers I can see are:
a) Huge decline in search traffic reaching a community. As more people turn to ChatGPT and similar tools, this could reduce a significant decline in search traffic. This in turn could reduce the reason for having a community.
b) Huge decline in engagement. It’s potentially going to help a lot of people get answers without having to ask questions in the community. This should save everyone’s time to answer more challenging questions – but given how many people are measured by engagement alone (bad idea), this is likely to be a problem.
c) Helping members ask and verify answers. If ChatGPT can provide good questions, the scarcity shifts to helping members ask better questions and verify the answers provided. We don’t have great tools for doing either yet. Begging people to mark questions as ‘Best Answer’ has failed miserably. We need a better method.
Start Using ChatGPT Today
ChatGPT is close to the peak of its hype cycle. But unlike web3, it clearly has plenty of immediate, practical, uses. In our experiments, it was able to create answers almost as good as members – but at a much faster speed. This is going to create opportunities I don’t think we can ignore.
I’d strongly recommend exploring its use to answer questions in a community and seeing what kinds of questions it can and can’t answer effectively. I’d begin by training superusers and your staff to use it and then decide if a technical integration would offer much help.