1.find_circ的安装
#find_circ需要运行在装有python 2.7的64位系统上,同时需要安装numpy和pysam这两个python模块。其运行需要借助bowtie2和samtools来完成基因组mapping的过程。
wget
https://github.com/marvin-jens/find_circ/archive/v1.2.tar.gz
tar -xzvf v1.2.tar.
gz
2.参考基因组的下载
#通过fetch_ucsc.py下载ucsc最新版本的参考基因组
fetch_ucsc.py hg19/hg38/mm9/mm10 ref/kg/ens/fa out
3.bowtie2建立参考基因组索引
bowtie2_build hg38.fa hg38
4.基于RNA-Seq的基因组比对(pair-end模式)
###bowtie2参数介绍###
-p 使用多线程;--very-sensitive 允许多重比对,报告出最好的一个;--score-min=C,-15,0 设置比对分数函数;--mm 设置I/O模式。
###samtools view参数介绍###
-h 文件包含header line;-b 输出bam格式;-u 输出非压缩的bam格式 –S 忽略版本兼容
samtools view -hf 4 output.bam | samtools view -Sb - > unmapped.bam
python unmapped2anchors.py unmapped.bam | gzip > anchors.qfa.gz
6.根据anchor比对基因组情况寻找潜在的circRNA
###根据以下规则对结果进行筛选
1.根据关键词CIRCULAR筛选环状RNA
2.去除线粒体上的环状RNA
3.筛选unique junction reads数至少为2的环状RNA
4.去除断裂点不明确的环状RNA
5.过滤掉长度大于100kb的circRNA,这里的100kb为基因组长度,直接用环状RNA的头尾相减即可
7.分析多个样本
#如果有多个样本,需要分别用find_circ.py运行,然后将各自的结果合并./merge_bed.py sample1.bed sample2.bed [...] > combined.bed
8.输出的文件格式
#前六列为标准的BED文件格式,剩余的12列关于junction的一些信息
column | name | description |
---|---|---|
1 | chrom | chromosome/contig name |
2 | start | left splice site (zero-based) |
3 | end | right splice site (zero-based).(Always: end > start. 5' 3' depends on strand) |
4 | name | (provisional) running number/name assigned to junction |
5 | n_reads | number of reads supporting the junction (BED 'score') |
6 | strand | genomic strand (+ or -) |
7 | n_uniq | number of distinct read sequences supporting the junction |
8 | uniq_bridges | number of reads with both anchors aligning uniquely |
9 | best_qual_left | alignment score margin of the best anchor alignment supporting the left splice junction (max=2 * anchor_length ) |
10 | best_qual_right | same for the right splice site |
11 | tissues | comma-separated, alphabetically sorted list of supporting the left splice junction (max=2 * anchor_length ) |
12 | tiss_counts | comma-separated list of corresponding read-counts |
13 | edits | number of mismatches in the anchor extension process |
14 | anchor_overlap | number of nucleotides the breakpoint resides within one anchor |
15 | breakpoints | number of alternative ways to break the read with flanking GT/AG |
16 | signal | flanking dinucleotide splice signal (normally GT/AG) |
17 | strandmatch | 'MATCH', 'MISMATCH' or 'NA' for non-stranded analysis |
18 | category | list of keywords describing the junction. Useful for quick grep filtering |