The motivation is simple: some reviewers (e.g., Joyce) tend to give high scores, so if a product happened to be reviewed by them, the averaged score would be higher than it should be. Likewise if some reviewers tend to give lower scores (e.g, Precogvision). It comes more complex if we consider further. Some reviewers like super-review uses a different scale. Some reviewers seem rarely give very low scores (e.g, Jays). Of course, I know some scores are AI-analysed from review videos.
Proposal. For each reviewer (call him/her r): Compute the reviewer's mean and variance (call them mu, sigma), then compute the mean and variance (call them mu0, sigma0) of the site scores (currently averaged on all reviewers) of all the products reviewed by reviewer r. Now, for each score s given by reviewer r, do the normalization s_normalized = sigma0*((s - mu)/sigma) + mu0.
Notes:
- The core assumption is that all reviewers give scores according to a Gaussian distribution. This is generally false, but a common working assumption.
- The core idea is to work on the the products reviewed by the reviewer. This matters. For example, at first glance, Smirk tends to give higher scores, but, if you look at each product, it is clear they in fact give lower scores. This is because smirk focuses on reviewing high-end products.
- At a high level, this requires a 3-stage procedure: 1) compute the plain site scores as now, 2) compute the normalized scores for each reviewer (proposed here), 3) compute site scores again. Maybe there are cleverer ways, idk.
- Improving the LLM analysis of review-to-score is of course (maybe more) important, but the above idea applies after the LLM analysis.
- As to those reviewers who have their own explicit scores, I think we should respect their scores and show them directly. But the proposal applies to the site scores.
- For those who like math: this procedure will probably reduce the variance between reviewers (on certain products), but I guess only to a limited extent.
