引用本文:闫 瑾,潘 琦,任 红.全外显子组测序分析中预处理方法和变异识别方法的比较[J].重庆医科大学学报,2013,(12):1397~1404
全外显子组测序分析中预处理方法和变异识别方法的比较
Comparison of methods of pre-processing and variant filtering in analyzing whole exome sequencing data
DOI:
中文关键词:  全外显子组测序  预处理  变异识别
英文关键词:whole exome sequencing  pre-processing  variant filtering
基金项目:
作者单位
闫 瑾,潘 琦,任 红 重庆医科大学附属第二医院感染科、重庆医科大学病毒性肝炎研究所重庆 400010 
摘要点击次数: 2407
全文下载次数: 1665
中文摘要:
      目的:比较全外显子组数据分析中不同的预处理方法和变异过滤方法对变异识别的影响。方法:利用2例全外显子组测序数据,从使用不同的预处理方法(FASTX-Toolkit、Trimmomatic及未做预处理)、修饰后不成对读长(single-end reads,SE)取舍策略以及变异过滤方法[Hard过滤和变异质量得分重新校正(variant quality score recalibration,VQSR)]3个方面,通过数据覆盖深度(depth of coverage,DP)、识别变异的数目、转换/颠换比值和基因型一致性等特征,比较他们对全外显子组变异识别结果的影响。结果:Trimmomatic预处理后的读长测序DP与未预处理的原始数据接近,但明显高于FASTX-Toolkit预处理方法。当DP≥10×且基因型质量分数(genotype quality score,GQ)≥20时,经Trimmomatic预处理后识别到的单核苷酸变异(single nucleotide variant,SNV)数量比FASTX-Toolkit多,与未预处理组接近。当包含SE时,FASTX-Toolkit组多识别出的SNV数量高于(28%)Trimmomatic组(5%)。当样本量较少时,在所有实验组中Hard过滤方法滤掉的SNV要少于VQSR。结论:Trimmo-matic修饰(过滤)原始序列更温和,而FASTX-Toolkit可能过度过滤了原始数据。保留SE有利于下游变异识别。Hard过滤相较于VQSR表现出了更高的容忍度。
英文摘要:
      Objective:To investigate effects of methods of pre-processing and variant filtering on variant recognition in analyzing whole exome sequencing data. Methods:Through the calculation of depth of coverage(DP),number of variants,transition/transversion and non-reference concordance,we compared the effects of different pre-processing methods(FASTX-Toolkit,Trimmomatic and non treat-ment) and strategies of single-end(SE) inclusion and ‘Hard’ filter and variants quality score recalibration(VQSR) on variants call-ing in variants filter using whole exome sequencing data from two test samples. Results:Trimmomatic pre-processed reads showed similar DP to reads without pre-processing,but significantly higher than that of FASTX-Toolkit pre-processed reads. With DP ≥10× and genotype quality(GQ)≥20,number of called single nucleotide variants(SNV) identified by Trimmomatic was greater than that identified by FASTX-Toolkit,but similar to that without pre-processing. With the inclusion of SE,number of variants increased signif-icantly for FASTX-Toolkit pre-processing(28%) than Trimmomatic pre-processing(5%). In the all settings,‘Hard’ filtering filtered less SNVs than VQSR filtering in small sample size. Conclusions:Sequence reads are trimmed and/or filtered moderately by Trim-momatic,whereas they seemed to be over-filtered by FASTX-Toolkit. Keeping the SE is good for variants recognition in the down-stream analysis. The ‘Hard’ filtering showed a more favorable tolerability profile than ‘VQSR’ filtering.
查看全文  查看/发表评论  下载PDF阅读器
关闭
微信关注二维码