Skip to content

API Reference

Main Pipeline

run_pipeline()

run_pipeline(
    maf_path,           # Path to input MAF file
    cancer_type,        # Cancer type abbreviation (e.g., "NSCLC")
    oncokb_token,       # OncoKB API token
    annotator_path,     # Path to MafAnnotator.py
    pubmed_token,       # PubMed API key
    data_folder,        # TxGNN data folder path
    txgnn_root,         # TxGNN package root path
    output_dir,         # Output directory
    patient_id=None     # Optional patient identifier
)

Executes the full IDAP pipeline: OncoKB annotation, PubMed mining, TxGNN graph-based prioritization, ClinicalTrials.gov query, evidence merging, and report generation.


Module Functions

OncoKB Module

from oncokb_module import run_oncokb_and_extract

oncokb_df = run_oncokb_and_extract(
    maf_path,           # Path to MAF file
    cancer_type,        # OncoKB-formatted cancer name
    output_tsv_path,    # Output TSV path
    annotator_path,     # Path to MafAnnotator.py
    oncokb_api_token    # OncoKB API token
)

Returns: DataFrame with variant-level annotations including evidence levels and drug associations.


PubMed Module

from pubmed_module import run_pubmed

pubmed_df = run_pubmed(
    maf_path,                   # Path to MAF file
    cancer_type,                # Cancer type for query
    output_path,                # Output TSV path
    drug_list_path=None,        # Path to drug dictionary (default: data/chembl_anticancer_drugs.txt)
    max_workers=4,              # Number of parallel threads
    pubmed_token=None           # PubMed API key
)

Returns: DataFrame with columns [variant, drug, mention_count].


TxGNN Module

from txgnn_module import run_txgnn

txgnn_df = run_txgnn(
    maf_path,                   # Path to MAF file
    cancer_type,                # Cancer type (disease name form)
    output_path,                # Output TSV path
    data_folder="./data",       # TxGNN data folder
    txgnn_root="./TxGNN",      # TxGNN package root
    top_k=50,                   # Maximum candidates to return
    mode="repurposing"          # "repurposing" or "all"
)

Returns: DataFrame with columns [drug, txgnn_score, category, repurposing, current_use, mutation_target, fda_approved, connected_genes, num_genes].


ClinicalTrials Module

from clinicaltrials_module import run_clinical_trials, build_trial_summary_table

clinical_df = run_clinical_trials(
    drug_list,          # List of drug names to query
    cancer_type,        # Cancer type for query
    output_path         # Output TSV path
)

summary_df = build_trial_summary_table(clinical_df)

Returns: clinical_df with raw trial records; summary_df with per-drug trial summaries.


Merging Function

from main_pipeline import merge_all_results

merged_df = merge_all_results(
    oncokb_df,              # OncoKB output DataFrame
    pubmed_df,              # PubMed output DataFrame
    txgnn_df,               # TxGNN output DataFrame
    clinical_summary_df     # Clinical trial summary DataFrame
)

Returns: Merged DataFrame sorted by combined_score (descending), with percentile-normalized scores and evidence metadata.