Species
Human (hg38) & Mouse (mm10) ATAC-seq IDR peaks
Ortholog Mapping
HALtools + HALPER lifts mouse peaks to human coordinates
Peak Classes
Shared, human-specific, and mouse-specific open-chromatin regions
Functional Analysis
HOMER motifs, enhancer/promoter annotation, and rGREAT GO enrichment
Full Pipeline Flow
Decompress IDR Peak Files
Decompresses the gzipped idr.conservative_peak.narrowPeak.gz
files for both species so downstream tools can read them directly.
.narrowPeak format is a 10-column BED extension used
by and HOMER.# Decompress mouse peaks
cd "${BASE}/MouseAtac/AdrenalGland/peak/idr_reproducibility"
zcat idr.conservative_peak.narrowPeak.gz > idr.conservative_peak.narrowPeak
# Decompress human peaks
cd "${BASE}/HumanAtac/peak/idr_reproducibility"
zcat idr.conservative_peak.narrowPeak.gz > idr.conservative_peak.narrowPeak
| Type | Path pattern | Description |
|---|---|---|
| Input | …/idr_reproducibility/idr.conservative_peak.narrowPeak.gz | Gzipped IDR peaks (mouse & human) |
| Output | …/idr_reproducibility/idr.conservative_peak.narrowPeak | Plain-text narrowPeak (10-col BED) |
Cross-Species Peak Mapping with HALPER
Uses halLiftover and
HALPER (HAL Peak Ortholog Regions) to lift mouse
ATAC-seq peaks from mm10 into hg38 coordinates using a
whole-genome multiple-alignment (the 10-way .hal file).
- The
-s Mouse -t Humanflags specify the source and target species in the alignment tree. - HALPER applies filters to retain only peaks with good orthologous coverage (the "conservative" output).
- A final awk join rebuilds mouse-specific peaks
back in original mm10 coordinates using the HAL name field
(
chr:start-end:summit) as the join key.
bash "$SCRIPT_DIR/integrate_halper.sh" \
--base "$SCRIPT_DIR" \
--hal-file "$HAL_FILE" \
--halper-map "$HALPER_MAP"
export PATH="${HAL_BIN}:${PATH}"
export PYTHONPATH="${HALPER_PP}:${PYTHONPATH:-}"
mkdir -p "${HALPER_OUT}"
bash "${HALPER_MAP_SH}" \
-b "${MOUSE_CONSERVATIVE_NARROWPEAK}" \ # input mouse peaks
-o "${HALPER_OUT}/" \ # output directory
-s Mouse \ # source species
-t Human \ # target species
-c "${HAL_FILE}" # whole-genome alignment
$PATH
and halLiftover-postprocessing must be on $PYTHONPATH.
These are exported directly by the script; the conda environment is activated
beforehand by complete_analysis_pipeline.sh.| Type | File | Description |
|---|---|---|
| Input | idr.conservative_peak.narrowPeak | Mouse peaks (mm10) |
| Input | 10plusway-master.hal | Multi-species Cactus alignment |
| Output | idr.conservative_peak.MouseToHuman.HALPER.narrowPeak.gz | Mouse peaks lifted to hg38 |
Peak Classification with BEDTools
Performs three BEDTools intersect operations to classify peaks into three mutually exclusive categories, all in human (hg38) coordinates:
Mouse-lifted peaks that overlap human IDR peaks → conserved accessible chromatin
Human IDR peaks that do not overlap lifted mouse peaks
Lifted mouse peaks that do not overlap human IDR peaks (non-reciprocal)
# Sort both files by chromosome then start position
sort -k1,1 -k2,2n mouse_to_human_conservative.narrowPeak \
> mouse_to_human_conservative.sorted.narrowPeak
sort -k1,1 -k2,2n "${HUMAN_CONSERVATIVE_NARROWPEAK}" \
> human_conservative.sorted.narrowPeak
# Shared: mouse peaks that overlap human peaks
bedtools intersect \
-a mouse_to_human_conservative.sorted.narrowPeak \
-b human_conservative.sorted.narrowPeak \
-u > shared_peaks_conservative.narrowPeak
# Mouse-specific: lifted mouse peaks NOT in human
bedtools intersect \
-a mouse_to_human_conservative.sorted.narrowPeak \
-b human_conservative.sorted.narrowPeak \
-v > mouse_specific_peaks_conservative.narrowPeak
# Human-specific: human peaks NOT in lifted mouse
bedtools intersect \
-a human_conservative.sorted.narrowPeak \
-b mouse_to_human_conservative.sorted.narrowPeak \
-v > human_specific_peaks_conservative.narrowPeak
-u flag reports each query feature once if it overlaps any subject feature
(regardless of how many times). The -v flag is the complement —
query features with no overlap.| Type | File | Description |
|---|---|---|
| Output | shared_peaks_conservative.narrowPeak | Shared peaks (human coords, hg38) |
| Output | human_specific_peaks_conservative.narrowPeak | Human-specific peaks (hg38) |
| Output | mouse_specific_peaks_conservative.narrowPeak | Mouse-specific peaks (human coords, hg38) |
| Output | mouse_specific_peaks_conservative.mouse_coords.narrowPeak | Mouse-specific peaks in mm10 (awk join) |
HOMER Motif Finding & Peak Annotation
For each of the three peak sets (shared, human_specific, mouse_specific):
- Convert narrowPeak → HOMER-compatible 6-column BED (centering each peak at its summit).
- Run findMotifsGenome.pl — de-novo and known-motif enrichment, genome-masked, 200 bp windows.
- Run annotatePeaks.pl — annotate each peak with nearest gene, TSS distance, and motif instance hits from the nonRedundant motif set.
- Genome:
hg38for human/shared peaks,mm10for mouse-specific peaks (auto-detected from filename).
awk 'BEGIN{OFS="\t"} {
chr=$1; start=$2; end=$3
name=$4; score=$5; strand=$6; summit=$10
if (summit != -1 && summit != ".") {
center = start + summit
print chr, center-1, center, name, score, \
(strand=="." ? "+" : strand)
} else {
print chr, start, end, name, score, \
(strand=="." ? "+" : strand)
}
}' "$NARROWPEAK_FILE" > "$BED_FILE"
# [1/2] Find enriched motifs
findMotifsGenome.pl \
"$BED_FILE" "$GENOME" "$MOTIF_DIR" \
-size 200 -mask
# [2/2] Annotate peaks with motif instances
annotatePeaks.pl \
"$BED_FILE" "$GENOME" \
-m "${MOTIF_DIR}/nonRedundant.motifs" \
-mbed "${SAMPLE_DIR}/motif_instances.bed" \
-size 200 \
> "$ANNO_FILE"
homer_results/
shared_peaks/
peaks_homer.bed
motifs/
nonRedundant.motifs
homerResults.html
annotated_peaks.txt
motif_instances.bed
human_specific/ (same layout)
mouse_specific/ (same layout)
#SBATCH --cpus-per-task=4 --mem=8000M --time=8:00:00).
Ensure HOMER is in $PATH by providing <bin_path> as the
second argument.Enhancer & Promoter Extraction
Parses each annotated_peaks.txt file with awk
to extract a minimal table containing only peaks annotated as
enhancers or promoters:
- Enhancer: intergenic or intronic regions >2 kb from any TSS, or anything already labeled "enhancer" by HOMER.
- Promoter: any region containing "promoter" in the HOMER annotation.
- Motif hit columns (Distance From Peak) are preserved for downstream analysis.
- After filtering, calls
plot_promoters_enhancers.Rto generate bar charts andmotif_table.Rto summarize the top motifs.
awk -F'\t' 'BEGIN {OFS="\t"}
NR==1 {
# Detect column positions dynamically from header
for (i=2; i<=NF; i++) {
if ($i=="Annotation") annot = i
if ($i=="Distance to TSS") dist = i
if ($i ~ /Distance From Peak/) motif_cols[n_motifs++] = i
}
# Build output header: PeakID + Annotation + motif cols
header = "PeakID\tAnnotation"
for (m=0; m<n_motifs; m++) header = header "\t" $motif_cols[m]
print header; next
}
{
new_annot = ""
# Enhancer: intergenic/intronic AND > 2000 bp from TSS, or labeled enhancer
if (((tolower($annot) ~ /intergenic/ || tolower($annot) ~ /intron/)
&& $dist >= 2000) || (tolower($annot) ~ /enhancer/))
new_annot = "enhancer"
# Promoter: labeled "promoter"
else if (tolower($annot) ~ /promoter/)
new_annot = "promoter"
# Only print classified peaks
if (new_annot != "") { ... print ... }
}'
| Type | File | Description |
|---|---|---|
| Input | homer_results/*/annotated_peaks.txt | Full HOMER annotation (one per peak set) |
| Output | filtered_annotations/shared_peaks.txt | Enhancer/promoter-filtered rows for the shared peak set |
| Output | filtered_annotations/human_specific.txt | Filtered annotations for the human-specific peak set |
| Output | filtered_annotations/mouse_specific.txt | Filtered annotations for the mouse-specific peak set |
| Plot | filtered_annotations/*.png | Enhancer vs promoter bar charts |
| Table | filtered_annotations/*_motif_table.tsv | Top-N motifs per sample with annotations |
Motif Analysis by Biological Condition
Reads the filtered_annotations/ tables generated from HOMER and produces
motif presence lists per biological assignment condition.
A motif is included if it has ≥ 1 non-empty hit in a given peak set
and annotation category:
- Human enhancers = shared + human-specific
- Mouse enhancers = shared + mouse-specific
- Shared enhancers = shared only
- Human-specific enhancers
- Mouse-specific enhancers
- Shared promoters = shared only
- Human promoters = shared + human-specific
- Mouse promoters = shared + mouse-specific
python enhancer_promoter_analysis.py \
--input-dir ./filtered_annotations \
--output-dir ./motif_annotation_split
motif_annotation_split/assignment_conditions/
human_enhancers_motifs.txt mouse_enhancers_motifs.txt human_promoters_motifs.txt mouse_promoters_motifs.txt shared_promoters_across_species_motifs.txt shared_enhancers_across_species_motifs.txt human_specific_enhancers_motifs.txt mouse_specific_enhancers_motifs.txt
GO Enrichment with rGREAT
Runs rGREAT (Genomic Regions Enrichment of Annotations Tool)
on the four peak sets resolved by run_GO_pipeline.sh to find enriched Gene Ontology terms:
- Reads 3-column BED (chr, start, end) from each narrowPeak file; non-standard chromosomes are filtered out.
- Runs GO enrichment for BP, MF, and CC ontologies.
- Generates
.tsvresult tables and publication-style dot-plots (top 20 terms, fold enrichment vs −log₁₀ p-adj).
| Peak set | Genome |
|---|---|
| mouse_specific | mm10 |
| human_specific | hg38 |
| conserved (mouse in human coords) | hg38 |
| conserved (human in mouse coords) | mm10 |
bash run_GO_pipeline.sh \
./filtered_annotations \ # directory scanned for the 4 expected peak files
./rGREAT_results # output directory (default: rGREAT_results)
run_GO_pipeline.sh currently locates input files by filename pattern, including
*mouse_specific_peaks_conservative*.narrowPeak,
*human_specific_peaks_conservative*.narrowPeak,
*mouse_to_human_conservative*.narrowPeak, and
*human_conservative*.narrowPeak.# Read narrowPeak as GRanges (3-col BED, standard chroms only)
gr <- read_bed3(file)
# GO:BP enrichment
job_bp <- great(gr = gr, tss_source = genome, gene_sets = "GO:BP")
tbl_bp <- getEnrichmentTable(job_bp)
# Also run GO:MF and GO:CC
job_mf <- great(gr = gr, tss_source = genome, gene_sets = "GO:MF")
job_cc <- great(gr = gr, tss_source = genome, gene_sets = "GO:CC")
# Publication dot-plot (top 20 terms)
p <- ggplot(plot_tbl, aes(x = fold_enrichment, y = description,
size = observed_region_hits, fill = log10_p_adjust)) +
geom_point(alpha = 0.9, shape = 21) +
scale_fill_distiller(palette = "YlOrRd", direction = 1) +
theme_classic(base_size = 14)
rGREAT_results/:
mouse_specific_GO_BP.tsv(and MF, CC variants)human_specific_GO_BP.tsvconserved_*_GO_BP.tsvALL_significant_terms_BP.tsv— merged table of all significant terms (p_adj < 0.05)*_GO_BP_dotplot.png— publication-ready figures
Quickstart — Run the Full Pipeline
complete_analysis_pipeline.sh orchestrates
mapping, HOMER annotation, promoter/enhancer splitting, motif summarization, and GO analysis end-to-end. Submit it to SLURM with the two required arguments.cd RUN_FULL_PIPELINE
sbatch complete_analysis_pipeline.sh \
/path/to/10plusway-master.hal \ # ARG 1: .hal alignment file
/path/to/halper_map_peak_orthologs.sh # ARG 2: HALPER mapping script
- Run from the
RUN_FULL_PIPELINE/directory, because the script uses${SLURM_SUBMIT_DIR}as its base path. - Place HOMER executables under
RUN_FULL_PIPELINE/bin/; the pipeline now picks upBIN_PATH="$SCRIPT_DIR/bin"automatically. - Keep the repository helper scripts in the same directory as
complete_analysis_pipeline.sh. - Ensure the HAL alignment file and
halper_map_peak_orthologs.shpath are absolute or resolvable from the submit location.
integrate_halper.shrun_full_annotatePeaks_findMotifs.shenhancer_promoter_analysis.pyrun_GO_pipeline.sh
| Parameter | Value | Note |
|---|---|---|
| CPUs | 4 | Matches --cpus-per-task=4 in the script header |
| Memory | 8 GB | Matches --mem=8000M |
| Wall time | 18 h | Matches --time=18:00:00; HALPER and HOMER remain the slowest stages |
| Account | bio230007p | PSC Bridges-2 allocation |
$WORKDIR/
├── bin/
│ ├── annotatePeaks.pl
│ └── findMotifsGenome.pl
├── narrowPeak_for_homer/
│ ├── shared_peaks.narrowPeak
│ ├── human_specific.narrowPeak
│ └── mouse_specific.narrowPeak
├── homer_results/
│ ├── shared_peaks/
│ │ ├── annotated_peaks.txt
│ │ ├── motif_instances.bed
│ │ └── motifs/
│ │ ├── nonRedundant.motifs
│ │ └── homerResults.html
│ ├── human_specific/ (same layout)
│ └── mouse_specific/ (same layout)
├── filtered_annotations/
│ ├── shared_peaks.txt
│ ├── human_specific.txt
│ ├── mouse_specific.txt
│ ├── *.png ← enhancer/promoter bar charts
│ └── *_motif_table.tsv ← top motifs per sample
├── motif_annotation_split/
│ └── assignment_conditions/
│ ├── human_enhancers_motifs.txt
│ ├── mouse_enhancers_motifs.txt
│ └── …
└── rGREAT_results/
├── *_GO_BP.tsv
├── *_GO_MF.tsv
├── *_GO_CC.tsv
├── ALL_significant_terms_BP.tsv
└── *_dotplot.png
Dependencies & Requirements
- HALtools — whole-genome alignment tools (conda env)
- HALPER — halLiftover-postprocessing suite
- BEDTools ≥ 2.29 (module load bedtools)
- HOMER —
annotatePeaks.pl,findMotifsGenome.pl, placed underRUN_FULL_PIPELINE/bin/ - Python ≥ 3.8 (stdlib only)
- R ≥ 4.1 with rGREAT, ggplot2, tidyverse
- rGREAT — Bioconductor GO enrichment
- GenomicRanges / IRanges
- TxDb.Mmusculus.UCSC.mm10.knownGene
- ggplot2, dplyr, stringr
- RColorBrewer, tidyverse