Scoring Method¶

Combined Score¶

IDAP ranks candidate drugs using a scale-aware combined score based on within-sample percentile normalization and fixed evidence bonuses.

Problem¶

The three evidence sources operate on very different numeric scales:

txgnn_score: 2 -- 146 (heuristic graph score)
mention_count: 1 -- 67 (PubMed abstract co-mentions)
oncokb_score: -5 -- 10 (curated evidence level mapping)

A naive weighted sum would be dominated by whichever source has the widest range.

Solution: Percentile Normalization¶

Each component score is first converted to a within-sample percentile rank, producing values in [0, 1]:

tx_pct  = percentile_rank(txgnn_score)
pm_pct  = percentile_rank(mention_count)
ok_pct  = percentile_rank(oncokb_score)

Score Formula¶

The final combined score is:

combined_score = 0.50 * tx_pct
               + 0.40 * pm_pct
               + 0.10 * ok_pct
               + 0.20 * max(0, support_count - 1)
               + 0.30 * I(oncokb_score > 0)
               + 0.05 * clinical_flag

Where:

Term	Description
`tx_pct`	Within-sample percentile of TxGNN graph score
`pm_pct`	Within-sample percentile of PubMed mention count
`ok_pct`	Within-sample percentile of OncoKB score
`support_count`	Number of evidence layers with positive signal (0--3)
`I(oncokb_score > 0)`	Binary indicator for any curated OncoKB support
`clinical_flag`	Binary indicator for matched ClinicalTrials.gov metadata

Weight Rationale¶

Weight	Component	Rationale
0.50	Graph (TxGNN)	Largest weight because graph captures structured biological relationships and repurposing opportunities
0.40	Literature (PubMed)	Second largest; literature provides complementary evidence not yet in curated databases
0.10	Curated (OncoKB)	Smallest percentile weight because curated evidence is sparse; its importance is captured by the fixed 0.30 bonus
0.20	Convergence bonus	Rewards drugs supported by 2+ evidence sources
0.30	Curated bonus	Ensures any OncoKB-supported drug is substantially boosted
0.05	Trial bonus	Modest bonus for drugs with registered clinical trials

Sensitivity Analysis¶

A sensitivity analysis across seven alternative weight configurations showed that the default weights achieve the best balance of multi-source convergence and trial linkage among top-ranked candidates. See the manuscript Supplementary Materials for details.

Warning

The combined score is a heuristic ranking and was not trained on clinical outcomes. It is designed to support evidence triage, not to provide clinical recommendations.

OncoKB Evidence Level Mapping¶

OncoKB evidence levels are converted to numeric scores:

Level	Score	Meaning
1	10.0	FDA-recognized biomarker
2	8.0	Standard care biomarker
3A	6.0	Compelling clinical evidence
3B	4.0	Standard care or investigational in another tumor type
4	2.0	Compelling biological evidence
R1	-5.0	Standard care resistance biomarker
R2	-3.0	Investigational resistance biomarker

TxGNN Heuristic Score¶

The TxGNN module assigns scores based on drug categories:

Category	Base Score	Description
Repurposing Priority	20.0	FDA-approved for other diseases + targets mutated gene
Investigational Targeting	10.0	Targets mutated gene but not FDA-approved
Current Indication	5.0 -- 8.0	Already indicated for this cancer type
Disease-Related	2.0	Connected to cancer in the knowledge graph

Additional bonuses are given for drugs targeting multiple mutated genes.