qc_report
- qc_report(fileNames, sampleNames, outDir, colors=['#BB4430', '#FFBC0A', '#053C5E', '#A9E5BB', '#610345', '#2D1E2F', '#559CAD', '#5E747F', '#F343F4'], cores=None)
- fileNames
list of names of bam files; indexed; or single file name as string
- sampleNames
list of names of samples for output plot name labelling; or single sample name as string; valid names contain [
a-zA-Z0-9_
].- outDir
directory to output QC summary report
- cores
number of cores over which to parallelize; default is all available
- colors
color list in hex for overlay plots; default is: [“#BB4430”,”#FFBC0A”,”#053C5E”,”#A9E5BB”,”#610345”, “#2D1E2F”,”#559CAD”,”#5E747F”,”#F343F4”]
Example
For single sample:
>>> dm.qc_report("dimelo/test/data/mod_mappings_subset.bam", "test", "dimelo/dimelo_test")
For multiple sample files:
>>> dm.qc_report(["dimelo/test/data/mod_mappings_subset.bam", "dimelo/test/data/winnowmap_guppy_merge_subset.bam"], ["test1", "test2"], "dimelo/dimelo_test")
Return
- PDF of QC Summary Report which includes:
read length histogram
mapping quality histogram
average alignment quality per read histogram (if basecaller provided information)
average basecall quality per read histogram (if basecaller provided information)
summary table describing spread of data
number of reads, number of basepairs
Returns a SQL database in the specified output directory. Database can be converted into pandas dataframe with:
>>> fileName = "dimelo/test/data/mod_mappings_subset.bam" >>> sampleName = "test" >>> outDir = "dimelo/dimelo_test" >>> all_reads = pd.read_sql("SELECT * from reads_" + sampleName, sqlite3.connect(outDir + "/" + fileName.split("/")[-1].replace(".bam", "") + ".db"))
After QC, each database contains this table with columns listed below:
- reads_sampleName
name
chr
start
end
length
strand
mapq
ave_baseq
ave_alignq
Example Plots QC Report
dimelo-qc-report - CLI interface
Generate DiMeLo qc report
dimelo-qc-report [-h] -f FILENAMES [FILENAMES ...] -s SAMPLENAMES [SAMPLENAMES ...] -o OUTDIR
[--colors COLORS [COLORS ...]] [-p CORES]
dimelo-qc-report optional arguments
dimelo-qc-report required arguments
-f
FILENAMES
,--fileNames
FILENAMES
- bam file name(s) (default:None
)-s
SAMPLENAMES
,--sampleNames
SAMPLENAMES
- sample name(s) for output labelling (default:None
)-o
OUTDIR
,--outDir
OUTDIR
- directory to output QC summary report (default:None
)
dimelo-qc-report plotting options
--colors
COLORS
- color list in hex (e.g."#BB4430"
) for overlay plots (default:['#BB4430', '#FFBC0A', '#053C5E', '#A9E5BB', '#610345', '#2D1E2F', '#559CAD', '#5E747F', '#F343F4']
)