qc_report

qc_report(fileNames, sampleNames, outDir, colors=['#BB4430', '#FFBC0A', '#053C5E', '#A9E5BB', '#610345', '#2D1E2F', '#559CAD', '#5E747F', '#F343F4'], cores=None)
fileNames

list of names of bam files; indexed; or single file name as string

sampleNames

list of names of samples for output plot name labelling; or single sample name as string; valid names contain [a-zA-Z0-9_].

outDir

directory to output QC summary report

cores

number of cores over which to parallelize; default is all available

colors

color list in hex for overlay plots; default is: [“#BB4430”,”#FFBC0A”,”#053C5E”,”#A9E5BB”,”#610345”, “#2D1E2F”,”#559CAD”,”#5E747F”,”#F343F4”]

Example

For single sample:

>>> dm.qc_report("dimelo/test/data/mod_mappings_subset.bam", "test", "dimelo/dimelo_test")

For multiple sample files:

>>> dm.qc_report(["dimelo/test/data/mod_mappings_subset.bam", "dimelo/test/data/winnowmap_guppy_merge_subset.bam"], ["test1", "test2"], "dimelo/dimelo_test")

Return

  • PDF of QC Summary Report which includes:
    • read length histogram

    • mapping quality histogram

    • average alignment quality per read histogram (if basecaller provided information)

    • average basecall quality per read histogram (if basecaller provided information)

    • summary table describing spread of data

    • number of reads, number of basepairs

Returns a SQL database in the specified output directory. Database can be converted into pandas dataframe with:

>>> fileName = "dimelo/test/data/mod_mappings_subset.bam"
>>> sampleName = "test"
>>> outDir = "dimelo/dimelo_test"
>>> all_reads = pd.read_sql("SELECT * from reads_" + sampleName, sqlite3.connect(outDir + "/" + fileName.split("/")[-1].replace(".bam", "") + ".db"))

After QC, each database contains this table with columns listed below:

reads_sampleName
  • name

  • chr

  • start

  • end

  • length

  • strand

  • mapq

  • ave_baseq

  • ave_alignq

Example Plots QC Report

dimelo-qc-report - CLI interface

Generate DiMeLo qc report

dimelo-qc-report [-h] -f FILENAMES [FILENAMES ...] -s SAMPLENAMES [SAMPLENAMES ...] -o OUTDIR
                 [--colors COLORS [COLORS ...]] [-p CORES]

dimelo-qc-report optional arguments

  • -h, --help - show this help message and exit

  • -p CORES, --cores CORES - number of cores over which to parallelize (default: None)

dimelo-qc-report required arguments

  • -f FILENAMES, --fileNames FILENAMES - bam file name(s) (default: None)

  • -s SAMPLENAMES, --sampleNames SAMPLENAMES - sample name(s) for output labelling (default: None)

  • -o OUTDIR, --outDir OUTDIR - directory to output QC summary report (default: None)

dimelo-qc-report plotting options

  • --colors COLORS - color list in hex (e.g. "#BB4430") for overlay plots (default: ['#BB4430', '#FFBC0A', '#053C5E', '#A9E5BB', '#610345', '#2D1E2F', '#559CAD', '#5E747F', '#F343F4'])