Subscribe for regular insights

Explore by Category:

Resources:

Community AI, Community Management

“I run a messy community 100k+ questions – how can I clean this up?”

Richard Millington

Founder of FeverBee

16 February 2026

A variation of the ‘100k+ posts’ question has arisen about five times in the past four weeks.

I suspect it’s one that many organizations with large enterprise communities are struggling with.

It’s one we’re going to tackle in the ‘AI Ready Communities Program, ‘ but I’ll show you how we help clients tackle it here too.

(Aside: There are only 3 places and 2 weeks remaining to join our AI Ready Communities program. Once all places are taken, we won’t be enrolling any other organisation.)

The Worst Thing You Can Do Is Nothing

It’s best to present the ‘100k+ threads’ problem as a continuum with two ends.

At one extreme, you can do nothing. You simply leave them as they are and hope AI tools gradually learn how and when to utilise them.

At the other extreme, you manually check each and every thread to ensure it’s accurate and up to date.

I think we can both agree that neither is acceptable, so the answer lies somewhere between the two extremes.

The 101 Guide To Cleaning Up Community Data

The basic steps to clean up community data for LLMs (e.g., Claude, ChatGPT, Gemini) aren’t much different from those for internal AI tools.

You essentially need to dive into discussions and ensure:

The question title is clean and specific.
The question has an accepted, up-to-date answer.
The answer date is clearly labelled.
Long paragraphs are broken up into bullet points or paragraphs of 150 to 200 words.
The question is properly tagged and categorised.
Terminology used in the question and answer is aligned with the rest of the organisation.
The question is placed in a lifecycle queue, indicating whether and when it will be reviewed.

You can see a good example of this from the HP community here:

Now that we know what to do, we need to determine when to do it.

And this is where things become difficult.

“Just Focus On The Top 10% of Discussions!”

You hear this advice a lot, and it’s not bad. But it misses the key reason we’re doing this in the first place.

Many people currently identify the top 5%-20% of discussions by views and focus on optimising them using the principles above.

That’s better than doing nothing (and something you should already be doing), but it’s not the best thing you can do.

The problem with this advice is that the most-viewed questions in most communities are beginner-level, where the OP (original poster) didn’t want to invest the effort to find the information in official sources.

This means a better answer exists in official sources, and any self-respecting AI system will retrieve it from official sources when a user asks.

You can see a breakdown of question types and best source below:

So optimising these discussions isn’t a bad idea, but it’s not the best use of time either.

Remember, the unique value of a community lies in generating knowledge that isn’t available elsewhere.

That means edge cases, contextual discussions, and ephemeral discussions.

By definition, these won’t be the most viewed questions in your community. But they will be the ones who collectively offer the most unique value to your community.

Remember, the community’s new goal is to provide a comprehensive knowledge index to support every customer channel.

The Idea Approach: Extract, Tag, Prioritise

We’ll continue to target the top 5%-25% of discussions, but we won’t use views alone as the metric. We’re going to target the views + knowledge type.

This means we’re going to categorise our discussions by knowledge type and then update them.

In practice, this means the following:

Export questions to an Excel spreadsheet or a JSON file.
Use AI to tag questions into the five categories from here (canonical, procedural, experiential/edge cases, contextual, ephemeral). Ensure the knowledge types is tightly defined, and a confidence score is assigned.
Filter out those that are canonical, procedural, and ephemeral (what’s happening right now is less likely to be relevant in the future).
Rank the remaining discussions by the number of views in the past three months.
Update 5+ discussions per day (vs. setting an arbitrary fixed target).

Of course, 5+ discussions is an arbitrary metric that depends entirely on the resources you have available. In practice, it might be much higher or much lower than that. You can adjust.

The key thing is you create momentum, avoid vanity metrics, and scale with your team size.

You’ll notice none of this is a perfect system. It never will be. But it’s an excellent starting point for determining which items require human review and which don’t, based on the data provided.

(Also, note that we have a confidence level assigned to tags, which we can review as needed.)

This isn’t a one-time program you complete; it’s now part of your revised community management process.

Note: How to use AI to tag discussion types is covered in our AI Ready Communities Program.

Don’t Try to Be Perfect

The best advice I can give to anyone facing the 100k+ threads problem is stop trying to be perfect.

Doing anything will put you in a better place than you are today.

The great thing here is that effort compounds over time. Doing just a small number of posts each day, or even each week, will soon build up to a huge number of posts.

Better yet, you’re targeting the most important discussions first.

As you begin working your way down, the work is still important, but you’re getting the biggest bang for your effort buck just by getting started.

And if you want help, join our ‘AI Ready Communities’ Program.

Learn How To Overcome The Support Search Problem

Our AI Ready Communities program will teach you how to overcome the critical challenges preventing community from being included in your knowledge index.

In my opinion, this is the biggest single win for most community professionals today.

The program combines:

Live working sessions
Structured frameworks and templates
Guided analysis of your own community
Peer discussion and comparison
Direct input from FeverBee
Guest experts

What’s Included?

3 months of guided cohort sessions
12 personal guidance sessions
Access to FeverBee’s AI readiness frameworks and tools
Peer learning with a small group of comparable organisations
Direct feedback and facilitation from FeverBee
Applied work focused on your community

Investment: $4,000 per organisation (and you can enrol unlimited staff members)

Strategy

Technology

Intelligence

Training

Subscribe by email

Blog

Webinars

Browse Great Examples

Buzz Report - Evaluation Of The Top 6 Platforms

Books

Subscribe by email

Creating Successful Community Strategies

Calculate ROI

Superuser Programs

Beginners' Guide To Community Management

Strategic Community Management

You Can’t Fight The Tide: Why Sephora Closed The Most Totemic Brand Community Of The 2010s

Stop Letting Defaults Dictate Your Community’s Design

Read This Before Your Next Community Migration

Login

Community Strategy Insights

“I run a messy community 100k+ questions – how can I clean this up?”

The Worst Thing You Can Do Is Nothing

The 101 Guide To Cleaning Up Community Data

“Just Focus On The Top 10% of Discussions!”

The Idea Approach: Extract, Tag, Prioritise

Don’t Try to Be Perfect

Learn How To Overcome The Support Search Problem

What’s Included?

Subscribe for regular insights

Related blogs:

You Can’t Fight The Tide: Why Sephora Closed The Most Totemic Brand Community Of The 2010s

Stop Letting Defaults Dictate Your Community’s Design

Read This Before Your Next Community Migration

Subscribe for regular insights