Sample 140

Evaluation Instructions

Task Overview: AI models read clinical trial protocol documents and were asked to extract evidence supporting specific eligibility criteria modifications. You're evaluating whether the AI extracted the RIGHT evidence that justifies the post-modified (final) eligibility criteria.

Workflow: AI reads protocol → AI extracts evidence → Ground truth is final modified criteria → You evaluate if AI's evidence actually supports those final criteria

Important: Some outputs may be low quality and should be scored accordingly.

Your task: Compare the model-generated prediction (right panel) against the ground truth criteria (left panel).

Clean view: Shows formatted, readable text for easy comparison
Diff view: Highlights differences between ground truth and prediction
Raw view: Shows original unformatted text as it appears in the dataset

Evaluation scale (0-4):

0 = No match: Prediction is unrelated or contradicts the target criteria
1 = Poor match: Minimal overlap with ground truth, major errors or omissions
2 = Partial match: Some key elements captured, but notable gaps or inaccuracies
3 = Good match: Most elements correct, minor differences or formatting issues
4 = Excellent match: Essentially identical content, fully correct capture

Consider both content accuracy and completeness. Some predictions may be technically "correct" but incomplete or out of order.

Ground Truth

INCLUSION CRITERIA

  1. Patients with symptomatic unilateral diaphragmatic paralysis persisting for ≥1 year
  2. Patients are scheduled for RATS at thoracic surgery clinics in five Swedish university hospitals (Gothenburg, Linköping, Lund, Umeå, and Uppsala).

EXCLUSION CRITERIA

  1. Inability to complete forms in Swedish
  2. Neuromuscular disease as the cause of diaphragmatic paralysis
  3. Other significant causes of dyspnea
  4. Impaired physical capacity from other causes than diaphragmatic paralysis.

Prediction

INCLUSION CRITERIA

  1. Patients with symptomatic unilateral diaphragmatic paralysis persisting for ≥1 year, scheduled for robot-assisted thoracic surgery (RATS) at thoracic surgery clinics in five Swedish university hospitals (Gothenburg, Linköping, Lund, Umeå, and Uppsala).

EXCLUSION CRITERIA

  1. Inability to complete forms in Swedish.
  2. Neuromuscular disease as the cause of diaphragmatic paralysis.
  3. Other significant causes of dyspnea and impaired physical capacity than diaphragmatic paralysis.
  4. Use of long-term oxygen therapy and/or mechanical ventilation.