Research Methods | Felipe Vergara-Borge

Overview

Rigorous evaluation of adaptive gamification systems requires both spatial statistical methods and system-level stress testing. The methods described here form the core evaluation toolkit used across GAME-based research.

The guiding principle is reproducibility: every metric must be computable from a fixed input log and produce identical results across runs.

Spatial Analysis: Getis-Ord Gi*

Getis-Ord Gi* (pronounced G-i-star) is a spatial autocorrelation statistic used to identify hot spots and cold spots in geographic participation data.

In the context of citizen science, it identifies:

hot spots — areas with statistically significant clustering of high participation
cold spots — areas with statistically significant clustering of low participation

This metric is central to the spatial equity evaluation. Before and after applying equity-aware incentive strategies, Getis-Ord Gi* is computed on the participation density surface to measure whether underrepresented areas have improved coverage.

A statistically significant shift in cold spot zones toward neutrality (z-score approaching zero) constitutes evidence of effective equity intervention.

Participation Overlap: Jaccard Index

The Jaccard similarity index measures the overlap between two sets of participating geographic areas across time periods or strategy conditions.

For a given set of tasks or spatial cells:

Jaccard = 1.0 — identical participation footprint
Jaccard = 0.0 — no overlap (completely disjoint participation sets)

This metric is used to assess how much participant diversity changes between control and treatment conditions. A low Jaccard index between baseline and gamified conditions indicates that the incentive mechanism has attracted a meaningfully different participant population.

Centroid Displacement Analysis

Participation centroid displacement measures the geographic shift of the mean participation center between two conditions or time windows.

The centroid of a set of participation events is computed as the mean latitude/longitude weighted by event count. Displacement is then measured as the geodesic distance between centroids.

This metric complements Getis-Ord Gi* by providing a single scalar summary of directional spatial change — useful when comparing multiple strategy configurations.

System Stress Testing

GAME’s deterministic properties are validated through protocol-based load and concurrency tests.

Load Testing Protocol

Tool: Locust (Python-based distributed load testing)
Scenarios: concurrent score submissions, simultaneous wallet updates, parallel strategy evaluations
Metrics: response time (p50, p95, p99), error rate, throughput (requests/second)
Threshold: the system must sustain 500 concurrent users with p95 response time under 400ms and zero scoring anomalies

Concurrency Correctness Testing

Beyond performance, concurrency correctness is validated by:

injecting N simultaneous score events for the same player
verifying that the final wallet balance matches the sum of all valid events exactly
confirming no duplicate credits or phantom deductions

Any discrepancy constitutes a scoring anomaly and fails the determinism test.

Reproducibility Standards

All evaluations follow a fixed reproducibility protocol:

Seed control — any random sampling uses fixed seeds documented in the methods section
Frozen datasets — evaluation datasets are versioned and archived before analysis
Strategy snapshots — strategy configurations are serialized and archived alongside results
Log-based re-computation — results must be re-derivable from the raw event log alone

This protocol ensures that third parties can reproduce published results independently.

Borge et al. (2025). Stress-Testing Citizen Science Platforms Under High Concurrent Load. SpliTech 2025.
Vergara-Borge (2025). Gamifying Engagement in Spatial Crowdsourcing: A Deterministic Approach. Systems.

→ View all publications