Tandem repeat · STR · VNTR · expansion

Tandem repeat은 왜 variant calling 밖으로 자꾸 새는가

Tandem repeat은 motif가 몇 번 반복되는지만의 문제가 아니다. 같은 `CAG`라도 coding exon, intron, enhancer, splice-proximal region, mobile-element 안에서 전혀 다른 allele이 된다. 이 페이지는 motif, copy number, context, sequencing technology를 함께 움직이며 어떤 질문에 어떤 repeat method가 필요한지 보여준다.

1-6 bpSTR / microsatellite motif

4.86MTRExplorer v1.0 loci

52%polymorphic loci in HiFi resources

73curated disease-associated TR loci

Repeat Builder

motif와 genomic context를 바꾸면 allele 해석이 달라진다

Motif

Copy number

Context

Allele View

repeat count, interruption, short-read visibility, long-read resolution

Method Stack

질문별로 필요한 caller와 validation layer

Genomic Context Map

TRExplorer coordinate layer로 본 repeat 위치

가장 실용적인 해석 대부분의 TR은 intronic/intergenic에 있지만, 이 말은 “기능 없음”이 아니다. Intronic repeat는 transcription, R-loop, splicing, methylation, long neural gene vulnerability와 연결될 수 있고, promoter/enhancer/UTR/splice-proximal repeat는 copy number가 곧 regulatory dosage나 isoform change가 될 수 있다. 따라서 context map은 분포표가 아니라 해석 우선순위표로 읽어야 한다.

Mechanism Board

repeat이 어떻게 생기고, 왜 커지고, 왜 세포마다 달라지는가

Interpretation Layers

repeat allele을 논문에서 읽을 때 빠지면 안 되는 여섯 가지 질문

Assay Trade-offs

short read는 breadth, long read는 allele sequence

Short-read genotyping

HipSTR, GangSTR, lobSTR는 많은 known STR loci를 cohort scale로 genotype한다. Strength는 breadth와 sample size다. Weakness는 read length 밖 allele, stutter, motif composition, flanking complexity다. Association result는 catalog/QC와 함께 읽어야 한다.

Expansion discovery

ExpansionHunter Denovo와 STRling은 read length를 넘는 expansion signal이나 outlier를 찾는다. Cohort burden과 candidate discovery에는 강하지만, size threshold, batch, ancestry control, exact boundary가 claim을 제한한다. Locus-level allele sequence는 추가 검증이 필요하다.

Long-read resolution

TRGT, TRGT-denovo, vclust, pangenome/graph methods는 motif mixture, interruptions, haplotype, variation cluster를 직접 읽는다. Disease-locus interpretation과 complex VNTR에는 필수적이지만, 비용, coverage, catalog subset, representation mismatch가 새 제약이 된다.

문헌 지도

tandem repeat biology와 algorithm 기준 문헌