How to Calculate the ROI of Online Communities

By Richard Millington

ROI People

This is where we get to the difficult phase in understanding community value. As we have mentioned, collecting accurate data is one of the biggest impediments to calculating the ROI of a community. When a full data set is available, it is possible to analyze the data and calculate the ROI with 100% accuracy. However, organizations are complex and finding specific data for comparable periods is often rare.

Calculating the return of community is hard even when you are able to gather all the data. If you have no background in data or statistics, calculating the return is going to be difficult to understand in places. However, this is also what makes being able to calculate the return so valuable. So few people can do it today.

Calculating the return of community (remember not profit or ROI, as we haven’t reached this stage yet) means understanding a combination of methods shown below. These methods are:

    1. Use proxy metrics.
    2. Sampling.
    3. Direct Analysis.
    4. Run An Experiment.

We will quickly explain each.

1) Proxy Metrics

A proxy metric is an indirect data point assumed to represent a direct data point. For example, you might know on average that 10% of web visitors made a purchase. Yet, if you’re unable to track purchases directly, you might instead track web visitors which you can measure. Proxy metrics assume a close or direct correlation with a return on investment metric. However, there are some clear problems with proxy metrics.

The further removed from the eventual figure, the less accurate the metric becomes. For example, the quality of traffic from the community might be superior or inferior to what the organization presently receives. This can lead to wildly inaccurate calculations. Proxy metrics also make it easy to input a bias by cherry-picking the metrics which suit an agenda, rather than measuring the real value generated.

You’re probably using proxy metrics today. You might use the number of members, posts, likes or votes received as a proxy for success. You might include mention of the community or product elsewhere or customer sentiment. Most KPIs (key performance indicators) are proxy metrics. They are not direct revenue generated or saved. They are metrics which, we assume, correlate with revenue generated or cost saved.

These kinds of proxy metrics can sadly lead us astray. For example, call deflection is a proxy metric used to determine the number of issues handled in the community before reaching the customer service line. This assumes that, as call deflection increases, the organization can reduce its customer service costs by terminating customer service staff contracts. Can you spot the problem with this proxy metric?

In the real world, organizations are either unable or unwilling to let go of staff. It can be hard to fire hard-working and long-serving customer service staff (and pay severance). This means the cost savings don’t really materialize.

Instead, the cost per call actually increases (the same number of staff handling less calls due to the community taking on more). Yet, the equivalent cost per call is the very figure we use to determine the value of the community. Therefore, the community is shown to be saving increasing sums of money while actually incurring additional costs.

Proxy metrics should be used as a tool of last resort, when it is not possible to gather the required data by any other method. In practice, their availability means this tool of last resort is often the very tool we use to calculate ROI.

2) Sampling Method

A sampling method is used when it is impractical to study the behavior of the complete set.

For example, it is impractical to poll every individual in the United States about their voting intention. Instead, polls sample groups of 1000 to 2000 people using quota systems designed to represent the broader population. We mentioned this briefly earlier.

Very often, we can calculate the return by measuring the behavior of a small group (a sample) of members over a period of time to determine whether the community had an impact over a specific behavior. It might not be possible to measure whether the buying habits of the entire community had increased, but it might be possible to measure whether the buying habits of 100 members, who joined a year ago, have increased (and by what amount).

The downside of sampling methods is they are prone to considerable inaccuracies owing to the sample size and composition. This means the results can only be generalized upon the sample they represent. However, they are often generalized across an entire group, which leads to a margin of error due to low confidence intervals (# samples which reflect the community).

Larger sample sizes can lead to more accurate results if it is more representative of the entire group. Sampling methods include random sampling (picking people at random), systematic sampling (picking people based upon a fixed interval), stratified sampling (picking people at random from within specific groups), and cluster (or quota) sampling (sampling people to reach a specific quota of people). Quota sampling is most accurate, but also creates questions about how to decide the quotas.

For example, in a community, should quotas comprise of levels of activity per members? Demographics of members? Gender of members? Age of members, etc? Each might influence behavior patterns. This is why random, systematic, or stratified sampling is often used as a means of representing the entire community. This refers to the process of taking a sample of every {x} member that joined the community (e.g. every 10th member that joined during this period).

3) Conducting An Experiment (changing a variable and measuring impact)

A more accurate approach is to conduct an experiment in which one variable is withheld or added to a group of similar people and assessing the impact. For example, the organization may withhold access to their community from a randomly selected group of customers (by hiding the link to join) and comparing their behavior with a group who did see the link to join.

You can then study the behavior of the two groups chosen at random and see what impact the community had upon the behavior.

While experiments allow for tests of the community as a whole, they are difficult to set up (how, for example, would an organization hide the community from one group of customers and not others?). They need to be carefully established to avoid corrupting the results through natural biases. It would be easy to compare non-members with members, for example. The people that opted are also most likely to buy from the organization.

Imagine also, for example, members in the community were provided with discount offers or marketing material which non-members didn’t receive. Are additional purchases attributable to the community or to the discounts offered? It is difficult in experiments to control for these extraneous variables.

Yet, when conducted using statistically valid principles, an experiment represents a powerful method to establish causation. It helps uncover where the community is generating the benefits the organization seeks.

4) Direct Analysis

The final method to establish the ROI of a community is through a direct analysis. This is when it is possible to gain access to a complete data set of all member behavior (clicks, buying habits, etc.) and, thus, remove the need for proxy metrics, sampling, or experiments.

A complete data set is often available when the organization has access to a database linked to an individual member ID (name, email, address, loyalty card no., etc.), which is also linked to member behavior within the community. When this data is available, it is possible to quickly measure variables to ascertain the impact of unique variables.

However, gaining access to a full data set concerning all customer behavior and all member behavior (in which customers use the same ID for both) is extremely rare and it requires an advanced understanding of statistics to both normalise the data and identify the impact of the community upon the ROI objectives.

Yet, when this data set is available, it represents the most accurate source of data for determining the ROI of an online community.


  1. There are four key tools we can use to measure the return generated by a community. These are proxy metrics, sampling, testing, and direct analysis.
  2. Proxy metrics are assumed to represent a real value figure. They are easy to use but can lead to wild inaccuracies.
  3. Sampling can make measurement easier, but it’s often hard to generalize the results over the community.
  4. Testing is the ideal method, but it is often hard to create a test that controls for external variables.
  5. Direct analysis is ideal, but exceedingly rare.



 LinkedIn LinkedIn

 Google Plus Google+