⚡ DriftPA

OpenEnv 0.2.1 · Track 3.2 Personalized Tasks · Patronus AI Partner Track

Trains LLM agents to act as personal executive assistants in a world that changes mid-task. The agent manages cascading real-life conflicts — calendar clashes, urgent emails, dinner bookings, ride scheduling — while four failure modes fire without warning.

−9.55Untrained mean reward
+22.0Optimal episode reward
31 ptsTraining gap
24,000GRPO rollouts (H100)

Four Novel Mechanics

Schema Drift — API field names change mid-episode (party_size → guests). Agent must call list_tools() to discover new schema or get penalised.
Time Pressure — Tasks expire if not resolved within N steps. Boss email expires at step 4. Missing it triggers a cascade.
Irreversible Actionsreply_message, book_restaurant, book_ride cannot be undone. Wrong commits create cascade failures.
Policy Drift — Cancellation window tightens from 2hr → 4hr post-drift. Late cancellation = policy violation.

API Endpoints

GET/health— liveness check
POST/reset— start episode  {"seed": 0}
POST/step— take action  {"action": {"tool_name": "list_tools", "payload": {}}}
GET/state— episode metadata

Quick Start

curl /health
curl -X POST /reset -H "Content-Type: application/json" -d '{"seed": 0}'
curl -X POST /step -H "Content-Type: application/json" -d '{"action": {"tool_name": "list_tools", "payload": {}}}'

Links