Overview

Rigorous evaluation of adaptive gamification systems requires both spatial statistical methods and system-level stress testing. The methods described here form the core evaluation toolkit used across GAME-based research.

The guiding principle is reproducibility: every metric must be computable from a fixed input log and produce identical results across runs.

Spatial Analysis: Getis-Ord Gi*

Getis-Ord Gi* (pronounced G-i-star) is a spatial autocorrelation statistic used to identify hot spots and cold spots in geographic participation data.

In the context of citizen science, it identifies:

This metric is central to the spatial equity evaluation. Before and after applying equity-aware incentive strategies, Getis-Ord Gi* is computed on the participation density surface to measure whether underrepresented areas have improved coverage.

A statistically significant shift in cold spot zones toward neutrality (z-score approaching zero) constitutes evidence of effective equity intervention.

Participation Overlap: Jaccard Index

The Jaccard similarity index measures the overlap between two sets of participating geographic areas across time periods or strategy conditions.

For a given set of tasks or spatial cells:

This metric is used to assess how much participant diversity changes between control and treatment conditions. A low Jaccard index between baseline and gamified conditions indicates that the incentive mechanism has attracted a meaningfully different participant population.

Centroid Displacement Analysis

Participation centroid displacement measures the geographic shift of the mean participation center between two conditions or time windows.

The centroid of a set of participation events is computed as the mean latitude/longitude weighted by event count. Displacement is then measured as the geodesic distance between centroids.

This metric complements Getis-Ord Gi* by providing a single scalar summary of directional spatial change — useful when comparing multiple strategy configurations.

System Stress Testing

GAME’s deterministic properties are validated through protocol-based load and concurrency tests.

Load Testing Protocol

Concurrency Correctness Testing

Beyond performance, concurrency correctness is validated by:

  1. injecting N simultaneous score events for the same player
  2. verifying that the final wallet balance matches the sum of all valid events exactly
  3. confirming no duplicate credits or phantom deductions

Any discrepancy constitutes a scoring anomaly and fails the determinism test.

Reproducibility Standards

All evaluations follow a fixed reproducibility protocol:

  1. Seed control — any random sampling uses fixed seeds documented in the methods section
  2. Frozen datasets — evaluation datasets are versioned and archived before analysis
  3. Strategy snapshots — strategy configurations are serialized and archived alongside results
  4. Log-based re-computation — results must be re-derivable from the raw event log alone

This protocol ensures that third parties can reproduce published results independently.

View all publications