Research Methods
Evaluation protocols, spatial metrics, and reproducibility standards used in adaptive gamification research
Overview
Rigorous evaluation of adaptive gamification systems requires both spatial statistical methods and system-level stress testing. The methods described here form the core evaluation toolkit used across GAME-based research.
The guiding principle is reproducibility: every metric must be computable from a fixed input log and produce identical results across runs.
Spatial Analysis: Getis-Ord Gi*
Getis-Ord Gi* (pronounced G-i-star) is a spatial autocorrelation statistic used to identify hot spots and cold spots in geographic participation data.
In the context of citizen science, it identifies:
- hot spots — areas with statistically significant clustering of high participation
- cold spots — areas with statistically significant clustering of low participation
This metric is central to the spatial equity evaluation. Before and after applying equity-aware incentive strategies, Getis-Ord Gi* is computed on the participation density surface to measure whether underrepresented areas have improved coverage.
A statistically significant shift in cold spot zones toward neutrality (z-score approaching zero) constitutes evidence of effective equity intervention.
Participation Overlap: Jaccard Index
The Jaccard similarity index measures the overlap between two sets of participating geographic areas across time periods or strategy conditions.
For a given set of tasks or spatial cells:
- Jaccard = 1.0 — identical participation footprint
- Jaccard = 0.0 — no overlap (completely disjoint participation sets)
This metric is used to assess how much participant diversity changes between control and treatment conditions. A low Jaccard index between baseline and gamified conditions indicates that the incentive mechanism has attracted a meaningfully different participant population.
Centroid Displacement Analysis
Participation centroid displacement measures the geographic shift of the mean participation center between two conditions or time windows.
The centroid of a set of participation events is computed as the mean latitude/longitude weighted by event count. Displacement is then measured as the geodesic distance between centroids.
This metric complements Getis-Ord Gi* by providing a single scalar summary of directional spatial change — useful when comparing multiple strategy configurations.
System Stress Testing
GAME’s deterministic properties are validated through protocol-based load and concurrency tests.
Load Testing Protocol
- Tool: Locust (Python-based distributed load testing)
- Scenarios: concurrent score submissions, simultaneous wallet updates, parallel strategy evaluations
- Metrics: response time (p50, p95, p99), error rate, throughput (requests/second)
- Threshold: the system must sustain 500 concurrent users with p95 response time under 400ms and zero scoring anomalies
Concurrency Correctness Testing
Beyond performance, concurrency correctness is validated by:
- injecting N simultaneous score events for the same player
- verifying that the final wallet balance matches the sum of all valid events exactly
- confirming no duplicate credits or phantom deductions
Any discrepancy constitutes a scoring anomaly and fails the determinism test.
Reproducibility Standards
All evaluations follow a fixed reproducibility protocol:
- Seed control — any random sampling uses fixed seeds documented in the methods section
- Frozen datasets — evaluation datasets are versioned and archived before analysis
- Strategy snapshots — strategy configurations are serialized and archived alongside results
- Log-based re-computation — results must be re-derivable from the raw event log alone
This protocol ensures that third parties can reproduce published results independently.
Related Publications
- Borge et al. (2025). Stress-Testing Citizen Science Platforms Under High Concurrent Load. SpliTech 2025.
- Vergara-Borge (2025). Gamifying Engagement in Spatial Crowdsourcing: A Deterministic Approach. Systems.