For all projects, you may use your own Unix-based system and, where applicable, ensure that you are running the version of the software specified in the assignments. Alternatively, you may use the VMBox virtual machine environment provided with the course materials. Instructions on how to download and use the environment can be found on the course web site.
For the following questions, refer to the class workflow and use the data in the Online materials (‘gencommand_proj1_data.tar.gz’) to answer the questions. Assume you sequenced and assembled the genome of Malus domestica (apple), and performed gene annotation. You then collected samples and ran RNA-seq experiments to determine sets of genes that are expressed in the various tissues. This information was stored, respectively, in the following files: “apple.genome”, “apple.genes”, “apple.condition{A,B,C}”.
NOTE: The apple genome and the apple gene annotations for this project were extracted from the Rosaceae Genome Database (RGD). Actual data have then been modified, and hence may not directly reflect the information in the original RGD records.
grep -c ">" apple.genome
## 3
cut -f1 apple.genes | sort -u | wc -l
## 5453
cut -f2 apple.genes | sort -u | wc -l
## 5456
cut -f1 apple.genes | uniq -c | grep " 1 " | wc -l
## 5450
cut -f1 apple.genes | uniq -c | grep -v " 1 " | wc -l
## 3
cut -f1,4 apple.genes | sort | uniq -c | grep "+" | wc -l
## 2662
cut -f1,4 apple.genes | sort | uniq -c | grep "-" | wc -l
## 2791
cut -f1,3 apple.genes | sort -u | cut -f2 | sort | uniq -c
## 1624 chr1
## 2058 chr2
## 1771 chr3
cut -f2,3 apple.genes | sort -u | cut -f2 | sort | uniq -c
## 1625 chr1
## 2059 chr2
## 1772 chr3
cut -f1 apple.conditionA | sort -u > sortA
cut -f1 apple.conditionB | sort -u > sortB
comm -1 -2 sortA sortB | wc -l
## 2410
comm -2 -3 sortA sortB | wc -l
## 1205
comm -1 -3 sortA sortB | wc -l
## 1243
cut -f1 apple.conditionC | sort -u > sortC
comm -1 -2 sortA sortB > AB_common
comm -1 -2 AB_common sortC | wc -l
## 1608