OpenEnv 0.2.1 · Track 3.2 Personalized Tasks · Patronus AI Partner Track
Trains LLM agents to act as personal executive assistants in a world that changes mid-task. The agent manages cascading real-life conflicts — calendar clashes, urgent emails, dinner bookings, ride scheduling — while four failure modes fire without warning.
party_size → guests). Agent must call list_tools() to discover new schema or get penalised.reply_message, book_restaurant, book_ride cannot be undone. Wrong commits create cascade failures./health— liveness check/reset— start episode {"seed": 0}/step— take action {"action": {"tool_name": "list_tools", "payload": {}}}/state— episode metadata
curl /health
curl -X POST /reset -H "Content-Type: application/json" -d '{"seed": 0}'
curl -X POST /step -H "Content-Type: application/json" -d '{"action": {"tool_name": "list_tools", "payload": {}}}'