INTRODUCTORY BIOLOGY

(BS 130) CASE STUDY 2                                                      Name_______________

 

Introduction

This case study is designed to give students some experience with protein data bases and the analyses of such data and is based upon and largely excerpted from an article by Bilardello and Valdes (1998). The goal of modern classification schemes is to show the possible evolutionary histories of organisms.  Characteristics used for classification typically has included morphology. These classifications sometimes change when both extant and extinct forms are evaluated. More drastic changes in classifications are now being proposed with consideration of molecular traits such as sequences of bases in DNA and RNA and sequences of amino acids in proteins. The degree of difference in sequences is a reflection of the relative distance of species. These differences can be compared phenetically, with construction of phylogenetic diagrams. The purpose of this case study is to demonstrate a technique of constructing phylogenies using molecular traits.

 

The Problem

The molecular data that can be used to develop phylogenies is available on the internet. For example, the sequence of amino acids for specific proteins are available at http://www.expasy.org/. This is the web site for ExPASy (Expert Protein Analysis System) Proteomic Server of the Swiss Institute of Bioinformatics.  

 

Comparison of the amino acid sequences for hemoglobin a for five different organisms, is presented in the following tables:

 

Table 1. Codes for five species being analyzed.

 

Sequence 1: sp    Dog             HBA_CANFA_Hemoglobin     

Sequence 2: sp    Donkey       HBA_EQUAS_Hemoglobin     

Sequence 3: sp    Horse          HBA_HORSE_Hemoglobin     

Sequence 4: sp    Human        HBA_HUMAN_Hemoglobin   

Sequence 5: sp    Mouse         HBA_MOUSE_Hemoglobin    

Table 2. Percentage similarities for 141 aa of five species being analyzed.

Sequences (1:2) Aligned. Score: 80.1418

Sequences (1:3) Aligned. Score: 81.5603

Sequences (1:4) Aligned. Score: 83.6879

Sequences (1:5) Aligned. Score: 81.5603

Sequences (2:2) Aligned. Score: 100

Sequences (2:3) Aligned. Score: 97.8723

Sequences (2:4) Aligned. Score: 85.8156

Sequences (2:5) Aligned. Score: 82.2695

Sequences (3:2) Aligned. Score: 97.8723

Sequences (3:3) Aligned. Score: 100

Sequences (3:4) Aligned. Score: 87.9433

Sequences (3:5) Aligned. Score: 84.3972

Sequences (4:2) Aligned. Score: 85.8156

Sequences (4:3) Aligned. Score: 87.9433

Sequences (4:4) Aligned. Score: 100

Sequences (4:5) Aligned. Score: 85.8156

Sequences (5:2) Aligned. Score: 82.2695

Sequences (5:3) Aligned. Score: 84.3972

Sequences (5:4) Aligned. Score: 85.8156

Sequences (5:5) Aligned. Score: 100
 
 
Table 3. CLUSTAL W multiple sequence alignment.
 

sp|P01959|HBA_EQUAS_Hemoglobin      VLSAADKTNVKAAWSKVGGNAGEFGAEALERMFLGFPTTKTYFPHFDLSH

sp|P01958|HBA_HORSE_Hemoglobin      VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLSH

sp|P69905|HBA_HUMAN_Hemoglobin      VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH

sp|P01942|HBA_MOUSE_Hemoglobin      VLSGEDKSNIKAAWGKIGGHGAEYGAEALERMFASFPTTKTYFPHFDVSH

sp|P60529|HBA_CANFA_Hemoglobin      VLSPADKTNIKSTWDKIGGHAGDYGGEALDRTFQSFPTTKTYFPHFDLSP

                                    ***  **:*:*::*.*:*.:..::*.***:* * .************:*

 

sp|P01959|HBA_EQUAS_Hemoglobin      GSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKL

sp|P01958|HBA_HORSE_Hemoglobin      GSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSDLHAHKLRVDPVNFKL

sp|P69905|HBA_HUMAN_Hemoglobin      GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKL

sp|P01942|HBA_MOUSE_Hemoglobin      GSAQVKGHGKKVADALASAAGHLDDLPGALSALSDLHAHKLRVDPVNFKL

sp|P60529|HBA_CANFA_Hemoglobin      GSAQVKAHGKKVADALTTAVAHLDDLPGALSALSDLHAYKLRVDPVNFKL

                                    ******.*****.***: *..*:**:*.*** ******:***********

 

sp|P01959|HBA_EQUAS_Hemoglobin      LSHCLLSTLAVHLPNDFTPAVHASLDKFLSTVSTVLTSKYR

sp|P01958|HBA_HORSE_Hemoglobin      LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR

sp|P69905|HBA_HUMAN_Hemoglobin      LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

sp|P01942|HBA_MOUSE_Hemoglobin      LSHCLLVTLASHHPADFTPAVHASLDKFLASVSTVLTSKYR

sp|P60529|HBA_CANFA_Hemoglobin      LSHCLLVTLACHHPTEFTPAVHASLDKFFAAVSTVLTSKYR

                                    ****** *** * * :************:::**********

 

 

To obtain tables such as this, when you open the web site for ExPASy, type in the name of the protein you wish to compare at the top of the page in the cell following Search Swiss-Prot/TrEMBL for ___________. In our case this will be hemoglobin a. Then select Go. The list of sequences produced is long and may seem confusing. For our species, go down to HBA_HUMAN.  Click on the sequence. This brings up the sequence entry. Scroll down to the sequence information. Click on FASTA format. You need to copy the sequence (the entire sequence) in this FASTA format and then go to CLUSTALW  

(http://clustalw.genome.jp/. Use another browser for this) and paste the sequence into the window

provided. You need to do this for the other four species: HBA_CANFA, HBA_EQUAS,HBA_HORSE, and HBA_MOUSE. Then execute the multiple alignment by hitting the execute multiple alignment button at the bottom of the page. A CLUSTALW results form will be presented.

 

It is useful to develop a similarity matrix for the information presented in Table 3 (Bilardello and Valdes 1998).

 

 

Table 4. Similarity matrix based upon amino acid differences for hemoglobin a.

 

Hu

Ho

E

M

D

Hu

100

88

86

86

84

Ho

 

100

98

84

82

E

 

 

100

82

80

M

 

 

 

100

82

D

 

 

 

 

100

 

 

You can develop a tree (see Figure 1) using this similarity matrix or by going to the bottom if the CLUSTALW page to the select tree menu, select dendrogram, and execute. The branching tree developed will be based upon the similarities in the amino acid sequences.

 

 

Project

 

1. In class, you will use Swiss-Prot  to find the amino acid sequences of the protein hemoglobin a for the
five species listed above (i.e. horse, human, donkey, mouse and dog).

 

2. A similarity matrix for hemoglobin a for these five species will be developed.

 

3. Then in class, you will develop an evolutionary phylogeny from the data.

 

4. For your project, select five different species from the list provided (species list) and develop a similarity matrix and dendrogram for these species.

 

5. Compare your results to the dendrograms presented in your textbook (figure 31.7) and summarize your findings in a report using WORD. Include in your report tables similar to those above (Tables 1-4) and Figure 1 (a dendrogram developed by the computer program). Evaluate the dendrogram produced by the computer. Is it correct or not (use your text, e.g. figure 23-9, to justify your answers).

 

6. You must work with a partner for this project and each of you independently fill out and hand in a Peer Evaluation Form

 

7. Grading:
Table 1 = 3 points
Table 2 = 3 points

Table 3 = 3 points

Table 4 = 3 points

Figure 1 = 3 points

Discussion = 3 points

Peer evaluation forms (2) = 7 points

 

Project due November 22 at 11:00 AM for section 1 and 2:00 PM for section 2. Minus 25% each day late including the first period .

 

References
Bilardello, N. and Valdes, L. 1998. Constructing phylogenies. The American Biology Teacher 60 (5): 369-373.