5  Sequence Logo

5.1 Introduction

Sequence logos are a graphical representation of the sequence conservation in a multiple sequence alignment. They provide a way to visualize the relative frequencies of nucleotides or amino acids at each position in the alignment. The height of each letter in the logo corresponds to the frequency of the residue at that position. In R, the ggseqlogo package provides a set of functions to create sequence logos using the ggplot2 framework. To install this package, run install.packages("ggseqlogo"). Once installed, you can load the package using library(ggseqlogo).

library(ggplot2)
library(ggseqlogo)

5.3 Using alignment file

We can also read a multiple sequence alignment and render a logo from it. The read.alignment function from the seqinr package is used to read the alignment file. Subsequently, the desired region of the alignment is extracted using the substr function. Also, note that the toupper function is used to convert the sequences to uppercase. Finally, the ggseqlogo function is then used to create the logo from the extracted region.

library(seqinr)
prot_align <- read.alignment("DEAD2.aln", format = "clustal")
align_motif <- substr(toupper(prot_align$seq), 443, 448)
ggseqlogo(align_motif, seq_type="aa", method='prob')

We can render the logo using bits or prob as method. The logo can be saved as any other ggplot using the ggsave function.

library(patchwork)
p1 <- ggseqlogo(align_motif, seq_type="aa") 
p2 <- ggseqlogo(align_motif, seq_type="aa", method="prob")
p1+p2+ plot_layout(guides = "collect") & theme(legend.position = "bottom")

#ggsave("seqlogo_2.png", width = 6, height = 3, units = "in", dpi = 300)