Protease, renowned as proteolytic enzymes or proteinases, refers to a group of enzymes that hydrolyzes (breakdown) proteins into small peptides and amino acids. Proteolytic enzymes are essential in various therapeutic purposes such as oncology, inflammatory conditions, blood rheology control, and immune regulation. Undigested proteins, cellular debris and blood toxins can also be digested by proteases. Current classification of enzymes unveils six broad groups of proteases, i.e. serine proteases, threonine proteases, cysteine proteases, aspartate proteases, glutamic proteases and metalloproteases. Alternatively, on the basis of the isoelectric points (pI) of catalytic proteins they can also be classified into acid, alkaline and neutral proteases. Acid and neutral proteases are involved in type I hypersensitivity by activating complement systems and kinins (Mitchell et al., 2007) and the function of alkaline or basic proteases of Bacillus anthracis is still unknown.
Bacillus anthracis, pathogens of anthrax, is a gram positive, endospore-forming, rod-shaped bacterium with 1 - 1.2 μm in width and the etiologic agent of anthrax disease. According to previous study (Russell et al., 2007) the mode of invasion of B. anthracis can occur in three forms: cutaneous (skin), gastrointestinal (digestive), and inhalation (lungs). The cutaneous anthrax rarely fatal if treated, gastrointestinal anthrax shows 25 - 60% death and the inhalation anthrax is more deadly than others. B. anthracis possesses acid, neutral as well as alkaline protease. Some strains such as B. anthracis str. CDC 684 contains both acid and alkaline protease gene in their whole genome. The main fatal complication caused by these bacteria is hemorrhagic meningitis but the pathogenesis is still unknown. Mukherjee et al. (2011) showed that Bacillus anthracis increases permeability of human brain microvasculature endothelial cells (HBMECs) which constitute the blood-brain barrier (BBB) by secreting metalloprotease InhA on the monolayer integrity of HBMECs. According to Chertow (2011) and Tonry et al. (2012), immune inhibitor a metalloprotease (InhA) of Bacillus anthracis helps in adhesion and invasion of human brain endothelial cells by modifying cell surface properties through direct proteolysis of adhesin protein. Anthracis protease cleaves the anthrax lethal protein factor to be internalised by the host cell endocytosis (John, 2010). B. anthracis was also used as effective vector for production of recombinant proteins after deletion of six proteases (Pomerantsev et al., 2011). Experimental upshot makes it understand that protease enzyme of Bacillus anthracis carries an important active site responsible for its lethal effect. Protein sequences analysis of B. anthracis protease, may disclose the underling secrets about the functions and evolutionary relatedness.
Previous in silico study on different enzymes like tannase from bacterial and fungal origin (Banerjee et al., 2012), alkaline proteases from different species of Aspergillus (Morya et al., 2012), xylanase from Thermomyces lanuginosus (Shrivastava et al., 2007), pectate lyase from different sources (Dubey et al., 2010), have been reported but in silico study on B. anthracis protease protein sequences is still unrevealed.
Present work has been designed to understand the natures of different types of B. anthracis proteases through in silico comparative study. Analysis and characterization of 13 acid proteases, 9 neutral proteases and one alkaline protease were performed. The protein sequences were employed to analyze various physiochemical parameter analyses, super family search, multiple sequence alignment for homology search, construction of phylogenetic tree, common and conserved motif finding and protein-protein interaction. Physiochemical parameter analysis of individual protein sequences will help to understand the different physiochemical conditions for each individual protein which maintains their stability and also indicate respective organisms’ optimum cultural conditions. Superfamily and phylogenetic tree analysis will help to classify the proteins and their evolutionary relatedness as well as protein-protein interaction analysis will detect the enzymes responsible for pathogenesis. Finally, consensus sequences from multiple sequence alignment and conserved motifs will help to design specific primers for each different species.
A total of 84 acid, 283 neutral and 31 alkaline proteases of B. anthracis origin were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/) database. Among them 13 acid proteases, 9 neutral and one alkaline protease sequences were selected for in silico analysis.
Physicochemical data were generated from ProtParam software using ExPASy server (the proteomic server of Swiss Institute of Bioinformatics). FASTA sequence format were applied for subsequent analysis.
The Superfamily tool on ExPASy server was used for protein family search.
The program ClustalW2 (Larkin et al., 2007) was used for multiple sequence alignment and MSA was represented by CLC-Bio sequence viewer.
Phylip-3.69 (Tuimala, 1989) was used for phylogram construction by Neighbor-joining (NJ) method using 100 bootstrap values. Tree was edited by Dendroscope (Huson et al., 2007).
Selected protein sequences were studied for protein-protein interaction to detect the probable function using STRING Database.
Acid and neutral proteases were separately subjected to Pfam to find out conserved domains. Separated domains were subjected to Block Maker for conserved block identification. Separated blocks were used for motif finding using MEME Suite. Conserved motif of alkaline protease was deduced from the multiple sequence alignment.
Among all the deposited B. anthracis protease sequences in the NCBI database, 13 acid, 9 neutral and one alkaline protease sequences were found to have unique sequences i.e. they all have sequence level dissimilarity and different amino acid compositions. The accession numbers of protease protein sequences along with the source organism and the type of proteases are listed in Table 1.
The physicochemical features of protease sequences were represented in Table 2. The amino acid number for acid protease ranged from 345 to 742 with variable molecular weight. The pI value varied from 4.94 to 6.29, except sequences 5 and 11 (Acc. No. YP_018763.1 and AFH83399.1), which have their pI of 8.65 and 7.24 respectively. The above mentioned two sequences (5 and 11) were membrane bound proteases which have the aliphatic index value of 84.35 and 77.07 respectively. On the other hand sequence 3 (Acc. No. YP_030468.1) has the highest aliphatic index value of 96.36.
For all the neutral proteases group of protein different range were found in different analysis (Table 2). Germination protease (sequence 2) showed all the values similar to that of the germination protease of acid protease group (sequence3) with 4.94 pI value. Some serine proteases were also found in this group with various pI values. Accession number EJT19501.1 (Sequence 7) which was a membrane bound protease, showed the highest pI value of 9.33 and aliphatic index value of 104.64.
The entire sequences of acid, neutral and alkaline proteases when subjected to Superfamily tools on ExPASy server revealed different superfamily and family (Table 3). For acid protease 10 sequences were found with Metalloproteases (''zincins''), catalytic domain superfamily and Thermolysin-like family (Table 3). Sequence 3 (Acc. No. YP_030468.1) was found with HybD-like superfamily and Germination protease family. Sequence 5 and 11 (Acc No. AFH83399.1 and YP_018763.1) were found with Cysteine proteinases superfamily and Transglutaminase core family. The short segments were found to have similarity with thermostable phytase (Table 3). But in the case of neutral proteases most variable domains were observed in superfamily and family analysis (Table 3). Among them sequence 7 and 9 were related to ten acid proteases, sequence 2 showed similarity with germination protease sequence 3 (acid protease) and sequence 8 specified similarity with alkaline protease 1. So a sequence level dissimilarity was observed among 9 neutral proteases. These have been reflected in the multiple sequence alignment also.
Multiple sequence alignment analysis of the 13 acid proteases, the 9 neutral proteases and the one alkaline protease displayed the superfamily results of each groups. Fig. 1A and B showed consensus regions of acid proteases. Presence of consensus regions throughout the whole alignment indicated high level of sequence similarity among them. Three 100% conserved positions were found in aligned region such as position 368, 489 and 498 which have been represented in pink bar. Blue bars represented some specific changes only for sequence 3 (germination protease). Red bars represented near about conserved regions where green bars indicated the changes. In maximum cases, changes were found for two membranes bound proteases sequences 5 and 11 (Accession No. YP_018763.1 and AFH83399.1) and germination protease sequence 3 (Accession No. YP_030468.1).
A few ranges of consensus regions were found with low levels of sequence similarity in multiple sequence alignment of neutral protease group and one alkaline protease. Red bars indicated the similarity area. As the alkaline protease sequence (Accession No. YP_002814818.1) showed highest similarity with neutral protease sequence 8 (AFH85759.1) in previous experiments, they have been presented in green colour (Fig. 1C and D). Ten short conserved motifs of alkaline protease were also showed in violet box in comparison with neutral protease 8.
Phylogenetic tree construction of all the 13 acid protease, the 9 neutral protease and the one alkaline protease showed an interesting result. It was found that 10 acid protease were cluster together in the top of the tree (Fig. 2). Two membranes bound protease sequences 5 and 11 (YP_018763.1 and AFH83399.1) were found together in the bottom of the tree. Germination protease sequence 3 was found with another germination protease sequence 2 of neutral protease group. One alkaline serine protease was found with the neutral serine protease sequence 8.
Protein-protein interactions are the core of the interactom study which also represents the secretom of an organism. Here in this study the interaction of acid, neutral and alkaline protease of B. anthracis were studied. Among the entire category, only 9 acid proteases (Accession No. YP_002869281.1, YP_002816538.1, YP_031148.1, EJY90308.1, EJY93035.1c, EJY94285.1, AFH86425.1, AFH83987.1 and ZP_05183912.1) showed interaction with immune inhibitor A metalloproteases (inhA or others). Accession number YP_002869281.1 (NprB), EJY94285.1 (extracellular protease) and ZP_05183912.1 (Npr599) specifically showed interaction with inhA1 and AFH83987.1 interacts with inhA2. For two germination protease, interaction was found with some proteins affecting the sporulation and germination procedure of B. anthracis, like Putative stage II sporulation protein P, spore cortex-lytic enzyme prepeptide, germination protein YpeB, Stage IV sporulation protein A, Small acid-soluble spore protein/B etc.
A total of 6 motifs were found from acid and neutral protease (Table 4). Motif A2 and A3 showed similarity with peptidase M4 function as per the BLAST and PFAM result. According to PFAM and GENE3D motif B1, 2, and 3 all have the peptidase activity. The function of B3 deduced by BLAST was endopeptidase spore protease Gpr. Ten short conserved motifs of alkaline protease were identified from multiple sequence alignment (Fig. 1C and D).
The present study reported that B. anthracis acid metalloproteases have some definite role in their pathogenesis. The extracellular nature and the protein-protein interaction pattern claimed the involvement of some acid proteases during anthrax infection.
Physicochemical nature of a protein can be easily calculated through in silico analysis based on their amino acid sequences. Solubility of protein can be determined from the Grand average of hydropathicity or GRAVY. Positive GRAVY indicates hydrophobicity and negative GRAVY indicates the hydrophilicity (Kyte, 1982). On the other hand thermostability of a protein is directly proportional to the aliphatic index value.
All the metalloproteases of acid group were found to prefer extracellular medium according to GRAVY results (Table 2) and they were moderately thermostable in nature. In reference to the above parameters protein sequences of 5 and 11 were highly thermostable and membrane bound proteases of B. anthracis. The presence of thermostable phytase (3-phytase) domain in sequence 5 and 11 at superfamily and family analysis also supported their thermostability (Table 3). Germination protease (sequence 3) showed highest aliphatic index value of 96.36 indicating high level of thermal stability. The 4.94 pI value of two germination protease indicated that the germination of B. anthracis occurs in acidic medium and the high value of hydrophobicity (GRAVY -0.235) denied their extracellular existence (Table 2). In case of neutral protease group sequence 7 was found to prefer intracellular medium with highest thermal stability of 104.64. Sequence 3 and 6 showed hydrophilic nature but all other were hydrophobic or preferred intercellular space. The in vivo half-life of a protein can be calculated in the form of instability index (Guruprasad et al., 1990). As per the literature, proteins having more than 40 instability index value, having less than 5h of half-life and proteins having less than 40 instability index values, having more than 16h of half-life (Rogers et al., 1986). From this point of view the studied proteases have their half life of greater than 16 h.
Phylogenetic tree (Fig. 2) visibly reflected the superfamily results (Table 3). Although all the proteins were sequentially different, 10 acid proteases showed close relationship according to their evolution. They all showed same domains in superfamily search indicating their similar function. Functional similarity was also found between two germination proteases and also between two membrane bound proteases. The studied neutral proteases were found together in the tree. Among them neutral serine protease 8 showed highest similarity with alkaline serine protease one (Fig. 2), indicating their functional similarity. From the above result it was found that same sequence represented same functional domain and on the basis of that they showed evolutionary relationships.
According to Baillie (2001) and Russell et al. (2007) anthrax pathogen initiates their germination as well as infection by interacting with the host macrophages. The resulting vegetative cells spread in blood and other tissues as the causative agent of meningitis and ultimately causes death. Literature showed that among the secreted metalloproteases of B. anthracis, immune inhibitor A1 (InhA1) was found to be the single pathogenic member and during the infection it helps to cleave mammalian cell matter with addition to the modulation of the B. anthracis secreted proteins (Pflughoeft et al., 2014) and increases the permeability of blood-brain barrier resulting cerebral hemorrhages (Dhritiman et al., 2011). Previous investigation of Pflughoeft (2010) suggested that the protease cascade regulated the organism response in altering environments like reacting to a changing signal or in presence of different types of tissue. Mukherjee et al. (2011), also showed that NprB and Npr599 are the extracellular enzymes which interact with inhA1 and were able to degrade the plasma and matrix proteins of host. From the secretom analysis through protein-protein interaction, it has been shown that maximum acid proteases (except 2, 3, 5, 6, and 11) interacts with immune inhibitor A (inhA or othrs) as shown in Table 5. So, in accordance to the above literature it can be concluded that protease from the organism with accession number YP_002869281.1, YP_002816538.1, YP_031148.1, EJY90308.1, EJY93035.1, EJY94285.1, AFH86425.1, AFH83987.1, ZP_05184332.1 and ZP_05183912.1 (Table 1) are extracellular in nature and they are proved to be the extracellular protein in this study through physicochemical parameter analysis. Among them zinc metalloprotease (Table 3) YP_002869281.1 (NprB), EJY94285.1 (extracellular protease) and ZP_05183912.1 (Npr599) are the part of regulatory cascade of B. anthracis which helps the cell to react against changed external environment and supports its stabilization in altered condition and may have some direct relation to the permeability of endothelial cell and degradation of plasma or matrix protein at the time of infection (Mukherjee et al., 2011). Thus the above evidence linked physicochemical parameters and PPI outputs as a result of which extracellular and intracellular acid proteases could be identified. As neutral proteases have diversities in their sequence, they showed different interactions also (Table 5). Further investigation on neutral and alkaline protease PPI study is needed for the better understanding of their biological functions.
As 100% sequence similarity were found among 31 retrieved alkaline proteases, it can be concluded that B. anthracis alkaline protease are more conserved than others. A total of 10 aligned regions were found for alkaline protease 1 during the multiple sequence alignment result analysis in comparison with neutral protease sequence 8 (Fig. 1C and D). Identified short segments could be used as primer or probe to identify B. anthracis alkaline protease. Besides specific regions in multiple sequence alignment results of 13 acid proteases (Fig. 1A and B) could be used for further investigation. The pink bars in acid protease group were the highly conserved regions which could be used as the target site for the inactivation of those proteases. Acid and neutral protease specific motifs were represented in Table 4 where peptidase activity was found for A2, A3, and B1. Similarity was found between B3 and B. cereus endopeptidase spore protease Gpr, which could be related with the germination procedure of B. anthracis spore. For the preparation of acid protease specific primers and probes, or to inactivate the protease responsible for spore germination, identified motifs could be used. The detailed study of metabolic network could be investigated further.
The in silico characterization of B. anthracis protease revealed pH range based sequence similarity. Multiple sequence alignment and motif finding result can be used to design degenerate primers or probes for specific sequences as to cloning the putative genes based on PCR amplification for further analysis. Conserved amino acid positions could be used as target site to deactivate the enzyme function. Among all the groups only acid protease were found to interact with InhA, which indicated that metallo proteases of acid protease group have the capability to develop pathogenesis during B. anthracis infection. Deactivation of conserved amino acid position of germination protease can stop the sporulation and germination of B anthracis cell. The detailed interaction study of neutral and alkaline proteases could also help to design the interaction network for the better understanding of anthrax disease. Further study on structure prediction and protein-protein or protein-ligand interaction of B. anthracis proteases could reveal new drugs to inactivate the disease causing proteins.