Accessing 3DBIONOTES-WS can be done in two different ways: submitting a structure file or querying the application by a PDB code, EMDB id or a UniProt accession. Both ways are available through two different web forms accessible from the web server main menu (Figure 1). The 'SUBMIT' option (Figure 1A) links with the structure model submission form and the 'QUERY' option (Figure 1B) is used to query the application using a EMDB, PDB or UniProt entry.
EMDB, PDB OR UNIPROT ENTRY
To query 3DBIONOTES-WS using a database ID type the PDB code, EMDB id or UniProt accession in the text field (Figure 2A) and then click on the 'Submit' button.
MULTIPLE UNIPROT ACCESIONS
The information for multiple UniProt accession can requested in order to explore protein-protein interaction networks. To that end, first the organism of interest has to be selected using the select menu (Figure 3A); then, the UniProt accessions can be typed in the text panel (Figure 3B). If the checkbox (Figure 3C) is checked the application will also return the information for the interaction partners of the submitted proteins for which structural information is available.
UPLOADING YOUR STRUCTURE
Uploading a structural model file can be easily done through a web form available clicking on the 'SUBMIT' option and using the file input fiedl to upload the corrdinates (figure 4A). The accepted file formats are
format. When the file is submited the system perfromes a BLAST search in order to identiffy the possible UniProt accessions of the submitted proteins. This step can take a few minutes since UniProt - TrEMBL contains more than 40000000 of protein chains. The 'JOB TITLE' field (Figure 4B) is an optional text field that can be used to identiffy the macromolecular structure or project, when no data is provided the name of the structure file becomes the title of the project.
When the structure file is submitted the web server performs a BLAST search against SwissProt and TrEMBL databases to find the potential proteins corresponding to the different chains that are contained in the structure. When this process is finished a new form (Figure 5) is returned to the user with the best hits to select the proper protein for each chain. This new form conatins a table for each chain and the user will be asked to chose a protein (table row) for each table. The different tables conatin next fields: (A) check button for the protein selection, (B) gene symbol, (C) protein name, (D) organism name, (F) UniProt accession, (G) sequence identity, (H) BLAST e-value, (I) seqeunce alignment start position and (L) sequence alignment end position. There is no need to select a protein for each chain; however, those chains that were not identified by the user will not be annotated. Once the desired proteins have been selected for the different chains clicking the 'Submit' button, then this information is sent back to the server and a sequence alignment between the chains and UniProt sequences are calculated.
GRAPHICAL USER INTERFACE
The graphical interface comprises 3 different panels (Figure 6). The 3D panel (Figure 6A) is based on
protein viewer and it displays the macromolecular structure submitted by the user. The viewers controls are explained in the
section. The annotation panel (Figure 6B) is build on the
plug-in and shows the biomedical and biochemical data collected form the different sources of information. Finally, the sequence panel (Figure 6C) represents the alignment between the selected UniProt sequence and the active chain of the structural model.
All these panels are interconnected and selecting an annotation or a sequence region will highlight the involved amino acids in the other panels (Figure 7). See each of the components for a more detailed explanation.
The annotation panel comprises a set of different lanes where the biochemical and biomedical annotations are represented using different shapes and colors (Figure 8). Currently the web server collects information from multiple sources: post-translational modifications (
), genomic variations and diseases (
), short linear motifs (
), disordered regions (
) and domain families (
The different annotations can be selected clicking them, then the annotation panel will display a description of the selected annotation and the link to the original source or publication (Figure 9).
Also, when an annotation is selected the application will display the region in the sequence panel and highlight the amino acids in the 3d viewer (Figure 10).
The annotation panel provides different options: download the displayed information, reset the annotation panel and zoom to the selected region (Figure 11A, left to right).
The variant viewer lane contains the genomic variants associated to the protein (Figure 12). The variants can be filtered depending their consequences (disease, non-disease, predicted deleterious, ...) using the legend menu (Figure 12B). This viewer provides two different options: filter by disease and reset the variant panel (Figure 12A, left to right).
Clicking the disease filter (Figure 13A) will display a list of diseases associated to the variants (Figure 14B) and when clicking on one of the disease names the variant panel will display only those variants that are associated wit the disease.
The sequence panel shows the alignments between the chains in the structural model and their corresponding UniProt accessions that were selected in the identification step (Figure 15).
This frame allows the user to select a segment of the sequence to highlight their corresponding residues on the structural viewer and the corresponding region in the annotation panel (Figure 16).
The 3D viewer displays the macromolecular structural model submitted or queried the server (Figure 17). The menu on the top of the panel (Figure 17A) can be use to select what subunit (protein) in the structure the annotations are visualized for. Also, it allows to change the representation of a selected annotation and navigate through the different model if the structure contains more than one.
The tool bar in the 3D viewer is comprised of two components (Figure 18). The select menu on the top is used to navigate through the different subunits or protein that are contained in the structural model. Changing this component will also change the annotation and sequence panels to show the information related the selected protein. From left to right, the icons on the bottom of the tool bar are used for: the clockwise arrow to reset all the panels clearing the selections, the light icon to change the representation of the selected residues, the eye icon display/hide the heteroatoms, the arrow icons are used to navigate through the different models, the scope icon to mark and label a selected annotation (see next), the menu icon to display the labeled annotations (see next) and finally, the camera icon to save an image of the 3D viewer.
The scope icon (Figure 19) is used to label a selected annotation on the 3D view of the structure. The label will indicate a short description of the annotation and its position, the color will be the same as in the annotation panel. The labels are not deleted when a new selection is done or when any of the panels are reset. The user can selected other subunits to annotate the 3D structure and all selected labels will be kept. To remove, hide or display the selected labels has to be done with the menu icon (see next).
The menu icon (Figure 20) is used to list all the current selected labels in the 3D viewer. When clicked a list of the current labels is displayed. Each label in the list is preceded of two checkboxes and a close icon. The first checkbox is used to display/hide the text of the label, the second checkbox to display/hide the label representation on the structure and the close icon to completely remove the label
The gene panel (Figure 21) displays the information collected from ENSEMBL database. The panel header (Figure 21A) contains the title, assembly and coordinates of the ENSEMBL gene associated to the selected protein. The select menu (Figure 21B) is used to select among the different ENSEMBL transcripts associated to the gene. The checkboxes (Figure 21C) can be used to hide or display the different panel tracks (Figure 21D). From top to bottom the different tracks displays: (POS STRAND) positive sequence of the DNA gene. (NEG STRAND) negative sequence of the DNA gene. (PROTEIN SEQ) sequence of the protein. (KRAS-202) exons of the current selected transcript, in this example KRAS-202. (CODING REGION) transcript regions that are translated. (UNIPROT ALIGN) regions that can be aligned with the associated UniProt sequence.
Protein-protein interaction networks are displayed using a graph-based representation (Figure 22). In this graph nodes represent proteins and edges interaction among them. The color code indicates the source of the structural information: blue if is compiled from experimental data, yellow if comes from homology models or gray when structural data is not available. At the same time, yellow edges can be drawn as a continuous or discontinuous line depending whether the structure was predicted from sequence homology modeling or domain-domain interaction templates.
The “FEATURES” menu located on the top right side of the GUI (Figure 23A) can be used to map annotations on the network.
The "FILTER *" menu (Figure 24A) can be used to hide and display specific annotations.
For example, in Figure (Figure 25) only annotations associated to “Colorectal cancer” are displayed.
The analysis panel displays those diseases for which the cooccurrence of their associated variants with a specific biochemical feature is statistically significant when Fisher’s exact test is computed. The statistically significance is defined in terms of Fisher’s exact test p-value and the Benjamini–Hochberg procedure with a false discovery rate of 1% to control the amount of false positives or Type I error. This information is displayed in two different tracks (Figure 26): the first track displays the biochemical feature regions (Figure 26A) and the second track the list of diseases with the results of Fisher’s exact test (Figure 26B) and the genomic variants (Figure 26C) using a colour code of red, if they are within the selected biochemical feature region ('Nucleotide Binding Site', in this case), or grey, if they are outside.
Variants can be filtered by disease in such a way that only variants for the selected diseases are visible. For example, in Figure 27 only variants associated to 'Colorectal cancer' are visible.
UPLOADING ANNOTATION FILES
A custom annotation file can be attached when requesting information to 3DBIONOTES. Annotation files must be submitted in
format (Figure 28). An array must contain all the annotations. Each annotation (Figure 28A) encodes the information as a hash. The “track_name” key indicates the name of the lane that will appear in the annotation viewer. The key “visualization_type” is used to choose the type of data: “variants”, “continuous” or segments when the “visualization_type” is not present. Then the “chain” key indicates that the annotations are associated to the PDB chain and when the “acc” key is available (Figure 28B) means that the annotation are associated to proteins with this accession. Finally, “data” contains an array describing the positions in the sequence, value, amino acid variation and wild type, type (sub-track name in the annotation viewer) and a description. All analysis tools and viewers are available for the attached annotations.
The annotation panel provide different tools to add custom annotations or to transfer them from similar proteins. This tools are available on the annotation pane 'MENU' (Figure 29).
UPLOADING CUSTOM ANNOTATIONS FROM THE PROTEIN ANNOTATION VIEWER
This annotations will not be included in the genomic variant analysis and viewers.
Adding your annotations can be done in two different ways: uploading a file (Figure 30A) or manually (Figure 30B). Uploaded files must be in JSON format, here
you can find an example and description of the format. Adding an annotation manually only requires to fill the form displayed in the Figure 30B. The 'COLOR' field describes the annotation color using its name (e.g. red, orange, pink) or hexadecimal number (e.g. #FF0000, #C345D8). The 'INDEX' selector is used to select the desired coordinates, thus if 'INDEX' value is 'SEQUENCE' 'BEGIN' and 'END' values will be related to the sequence positions while in 'INDEX' value is structure will be related to the residues id of the PDB structure.
When new annotations are added new tracks containing them will be displayed and the new annotations will behave as the native ones (Figure 31).
Annotations can be transferred from other proteins that share at least 80% of sequence similarity with the target protein. Transferring annotation from similar proteins can be done through the annotation menu (Figure 20). When clicking on 'TRANSFER ANNOTATIONS FROM SIMILAR PROTEINS', the application will display a table containing a list of proteins that share at least 80% of sequence similarity (Figure 32). Each row of the table displays: (A) gene symbol, (B) protein name, (C) organism name, (D) UniProt accession, (E) number of UniProt annotations and (F) sequence identity.
When a protein is selected the application will update the annotation and sequence panel (Figure 33). The sequence panel will now contain an alignment between 3 sequences (Figure 33A): the first sequence is the original sequence (target protein), the second sequence is the structure chain and the third sequence is the selected protein to transfer annotations from. The annotation panel will display the annotation for the selected protein, and now when an annotation is selected its tooltip will display a button on the right site of its title displaying 'TRANSFER'. Clicking this button will transfer the annotation to the target protein. Also, the annotation panel will include a selector (Figure 33C) to navigate through the different imported proteins.
Annotations can be transferred from multiple proteins. The selector on the top of the annotation menu (Figure 34A) shows all imported proteins and the target protein, to display the target protein were the selected annotations have been transferred to we have to select the original protein in the selector, the first item (Figure 35A).
The transferred annotations are displayed in a track named 'Imported annotations' and grouped by their type (Figure 35B). When a transferred annotation is selected a warning message displaying the protein source of the annotation will appear in its tooltip (Figure 35C).