Pipeline Overview¶

IDAP processes somatic mutation data through four independent evidence modules, merges the results, and generates a ranked report.

Architecture¶

Input: MAF file + Cancer type
         │
         ├──→ [1] OncoKB Annotation
         │         Curated variant-drug associations
         │
         ├──→ [2] PubMed Literature Mining
         │         Gene-drug co-mention counts
         │
         ├──→ [3] TxGNN Knowledge Graph
         │         Graph-based drug prioritization
         │
         └──→ [4] ClinicalTrials.gov
                   Trial metadata for candidate drugs
         │
         ▼
    [5] Evidence Merging & Scoring
         Percentile-normalized combined score
         │
         ▼
    [6] Report Generation
         Excel + PDF patient-level reports

Module Execution Order¶

OncoKB Annotation -- Annotates variants using the OncoKB MAF Annotator to obtain clinical evidence levels and drug associations.
PubMed Literature Mining -- Queries PubMed for abstracts matching each altered gene and the cancer type, then performs dictionary-based drug name matching against a curated ChEMBL-derived anticancer drug list.
TxGNN Knowledge Graph -- Maps altered genes onto a TxGNN-derived biomedical knowledge graph to identify drug candidates through disease-drug indication edges, drug-target relationships, and FDA-approved repurposing opportunities.
ClinicalTrials.gov -- Queries the ClinicalTrials.gov v2 REST API for each candidate drug in the specified cancer context.
Evidence Merging -- All module outputs are merged on normalized drug names. The combined score is computed using within-sample percentile normalization (see Scoring).
Report Generation -- Produces an Excel workbook with per-module sheets and a merged drug ranking, plus a PDF summary with visualizations.

Data Flow¶

Each module produces a TSV file that feeds into the final merge:

Module	Output Key Columns	Merge Key
OncoKB	`drug`, `oncokb_level`, `oncokb_score`	Normalized drug name
PubMed	`drug`, `mention_count`	Normalized drug name
TxGNN	`drug`, `txgnn_score`, `category`	Normalized drug name
ClinicalTrials	`drug`, `n_clinical_trials`, `top_phase`	Normalized drug name