SURFACE-Bind
  • Home
  • Analysis
  • Protein Families
    • Enzymes
    • Receptors
    • Transporters
    • Miscellaneous
    • Unclassified
    • Unmatched
  • About
  1. Transferases
  2. P04843

  • Hydrolases
    • O14672
    • O43184
    • O43506
    • O75077
    • O75078
    • O75355
    • O75976
    • P08473
    • P08842
    • P09848
    • P09958
    • P11117
    • P12821
    • P14384
    • P15144
    • P16444
    • P21589
    • P22413
    • P23276
    • P28907
    • P38567
    • P42892
    • P49961
    • P50281
    • P51512
    • P56817
    • P61567
    • P78325
    • P78536
    • P78562
    • P98073
    • Q8IU80
    • Q8TC27
    • Q9BYE2
    • Q9BZ11
    • Q9BZG2
    • Q9H2U9
    • Q9H3S3
    • Q9H4A9
    • Q9H4B8
    • Q9H8J5
    • Q9H013
    • Q9HA72
    • Q9P0K1
    • Q9UEF7
    • Q9UIQ6
    • Q9UJA9
    • Q9UK23
    • Q9UKF2
    • Q9UKF5
    • Q9UKJ8
    • Q9UKQ2
    • Q9UKU6
    • Q9UL52
    • Q9Y3Q7
    • Q9Y5Q5
    • Q9Y5Z0
    • Q9Y6X5
    • Q495T6
    • Q04609
    • Q07075
    • Q13443
    • Q13444
    • Q16819
    • Q16820
    • Q99965

  • Lyases
    • O43306
    • O43570
    • O60266
    • O60503
    • O95622
    • P22748
    • P51828
    • Q9ULX7
    • Q08462

  • Oxidoreductases
    • O00391
    • P04839
    • P14679
    • P19021
    • Q6ZRP7
    • Q9NPH5
    • Q9NRD8
    • Q9NRD9

  • Transferases
    • P0C7U3
    • P04843
    • P52961
    • Q8TCJ2
    • Q8WUD6
    • Q9C0B5
    • Q9H8X9
    • Q13508
    • Q16880

  • Isomerases
    • P40126
    • Q96JJ7
    • Q15125

  • Multiple_functions
    • O60235
    • P07202
    • Q86T26

  • Enzymes

On this page

  • General information
  • AlphaFold model
  • Surface representation - binding sites
  • All detected seeds aligned
  • Seed scores per sites
  • Binding site metrics
  • Binding site sequence composition
  • Download
  1. Transferases
  2. P04843

P04843

Author

Hamed Khakzad

Published

August 10, 2024

General information

Code
import requests
import urllib3
urllib3.disable_warnings()

def fetch_uniprot_data(uniprot_id):
    url = f"https://rest.uniprot.org/uniprotkb/{uniprot_id}.json"
    response = requests.get(url, verify=False)  # Disable SSL verification
    response.raise_for_status()  # Raise an error for bad status codes
    return response.json()

def display_uniprot_data(data):
    primary_accession = data.get('primaryAccession', 'N/A')
    protein_name = data.get('proteinDescription', {}).get('recommendedName', {}).get('fullName', {}).get('value', 'N/A')
    gene_name = data.get('gene', [{'geneName': {'value': 'N/A'}}])[0]['geneName']['value']
    organism = data.get('organism', {}).get('scientificName', 'N/A')
    
    function_comment = next((comment for comment in data.get('comments', []) if comment['commentType'] == "FUNCTION"), None)
    function = function_comment['texts'][0]['value'] if function_comment else 'N/A'

    # Printing the data
    print(f"UniProt ID: {primary_accession}")
    print(f"Protein Name: {protein_name}")
    print(f"Organism: {organism}")
    print(f"Function: {function}")

# Replace this with the UniProt ID you want to fetch
uniprot_id = "P04843"
data = fetch_uniprot_data(uniprot_id)
display_uniprot_data(data)
UniProt ID: P04843
Protein Name: Dolichyl-diphosphooligosaccharide--protein glycosyltransferase subunit 1
Organism: Homo sapiens
Function: Subunit of the oligosaccharyl transferase (OST) complex that catalyzes the initial transfer of a defined glycan (Glc(3)Man(9)GlcNAc(2) in eukaryotes) from the lipid carrier dolichol-pyrophosphate to an asparagine residue within an Asn-X-Ser/Thr consensus motif in nascent polypeptide chains, the first step in protein N-glycosylation (PubMed:31831667). N-glycosylation occurs cotranslationally and the complex associates with the Sec61 complex at the channel-forming translocon complex that mediates protein translocation across the endoplasmic reticulum (ER). All subunits are required for a maximal enzyme activity (By similarity)

More information:   

AlphaFold model

Surface representation - binding sites

The computed point cloud for pLDDT > 0.6. Each atom is sampled on average by 10 points.

To see the predicted binding interfaces, you can choose color theme “uncertainty”.

  • Go to the “Controls Panel”

  • Below “Components”, to the right, click on “…”

  • “Set Coloring” by “Atom Property”, and “Uncertainty/Disorder”

All detected seeds aligned

Seed scores per sites

Code
import re
import pandas as pd
import os
import plotly.express as px

ID = "P04843"
data_list = []

name_pattern = re.compile(r'name: (\S+)')
score_pattern = re.compile(r'score: (\d+\.\d+)')
desc_dist_score_pattern = re.compile(r'desc_dist_score: (\d+\.\d+)')

directory = f"/Users/hamedkhakzad/Research_EPFL/1_postdoc_project/Surfaceome_web_app/www/Surfaceome_top100_per_site/{ID}_A"

for filename in os.listdir(directory):
    if filename.startswith("output_sorted_") and filename.endswith(".score"):
        filepath = os.path.join(directory, filename)
        with open(filepath, 'r') as file:
            for line in file:
                name_match = name_pattern.search(line)
                score_match = score_pattern.search(line)
                desc_dist_score_match = desc_dist_score_pattern.search(line)
                
                if name_match and score_match and desc_dist_score_match:
                    name = name_match.group(1)
                    score = float(score_match.group(1))
                    desc_dist_score = float(desc_dist_score_match.group(1))
                    
                    simple_filename = filename.replace("output_sorted_", "").replace(".score", "")
                    data_list.append({
                        'name': name[:-1],
                        'score': score,
                        'desc_dist_score': desc_dist_score,
                        'file': simple_filename
                    })

data = pd.DataFrame(data_list)

fig = px.scatter(
    data,
    x='score',
    y='desc_dist_score',
    color='file',
    title='Score vs Desc Dist Score',
    labels={'score': 'Score', 'desc_dist_score': 'Desc Dist Score'},
    hover_data={'name': True}
)

fig.update_layout(
    legend_title_text='File',
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=1.05
    )
)

fig.show()

Binding site metrics

Code
import pandas as pd
pd.options.mode.chained_assignment = None
import plotly.express as px

df_total = pd.read_csv('/Users/hamedkhakzad/Research_EPFL/1_postdoc_project/Surfaceome_web_app/www/database/df_flattened.csv')
df_plot = df_total[df_total['acc_flat'] == ID]
df_plot ['Total seeds'] = df_plot.loc[:,['seedss_a','seedss_b']].sum(axis=1)
df_plot.loc[:, ["acc_flat", "main_classs", "sub_classs", "seedss_a", "seedss_b", "areass", "bsss", "hpss"]]
acc_flat main_classs sub_classs seedss_a seedss_b areass bsss hpss
4538 P04843 Enzymes Transferases 0 40 7590.313774 400 38.3000
4539 P04843 Enzymes Transferases 1 1 761.720639 361 -2.0999
4540 P04843 Enzymes Transferases 26 106 1673.592341 320 -15.2000
Code
import math
import matplotlib.pyplot as plt

features = ['seedss_a', 'seedss_b', 'areass', 'hpss']
titles = ['Alpha seeds', 'Beta seeds', 'Area', 'Hydrophobicity']
num_features = len(features)

if len(df_plot) > 8:
    num_rows = 2
    num_cols = 2
else:
    num_rows = 1
    num_cols = 4

fig, axes = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(9, num_rows * 5))

axes = axes.flatten()
positions = range(1, len(df_plot) + 1)

for i, feature in enumerate(features):
    title = titles[i]
    axes[i].bar(positions, df_plot[feature], color=['blue', 'orange', 'green', 'red', 'purple', 'brown'])
    axes[i].set_title(title, fontsize=13)
    axes[i].set_xticks(positions)
    axes[i].set_xticklabels(df_plot['bsss'], rotation=90)
    axes[i].set_xlabel("Center residues", fontsize=13)
    axes[i].set_ylabel(title, fontsize=13)

for j in range(len(features), len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()
plt.show()

Binding site sequence composition

Code
amino_acid_map = {
    'ALA': 'A', 'ARG': 'R', 'ASN': 'N', 'ASP': 'D', 'CYS': 'C',
    'GLN': 'Q', 'GLU': 'E', 'GLY': 'G', 'HIS': 'H', 'ILE': 'I',
    'LEU': 'L', 'LYS': 'K', 'MET': 'M', 'PHE': 'F', 'PRO': 'P',
    'SER': 'S', 'THR': 'T', 'TRP': 'W', 'TYR': 'Y', 'VAL': 'V'
}

from collections import Counter
from ast import literal_eval
from matplotlib.gridspec import GridSpec
import warnings
warnings.filterwarnings("ignore", message="Attempting to set identical low and high xlims")

def convert_to_single_letter(aa_list):
    if type(aa_list) == str:
        aa_list = literal_eval(aa_list)
    return [amino_acid_map[aa] for aa in aa_list]

def create_sequence_visualizations(df, max_letters_per_row=20):
    for idx, row in df.iterrows():
        bsss = row['bsss']
        AAss = row['AAss']
        single_letter_sequence = convert_to_single_letter(AAss)
        
        freq_counter = Counter(single_letter_sequence)
        total_aa = len(single_letter_sequence)
        frequencies = {aa: freq / total_aa for aa, freq in freq_counter.items()}
        
        cmap = plt.get_cmap('viridis')
        norm = plt.Normalize(0, max(frequencies.values()) if frequencies else 1)
        
        n_rows = (len(single_letter_sequence) + max_letters_per_row - 1) // max_letters_per_row
        fig = plt.figure(figsize=(max_letters_per_row * 0.6, n_rows * 1.2 + 0.5))
        
        gs = GridSpec(n_rows + 1, 1, height_ratios=[1] * n_rows + [0.1], hspace=0.3)
        
        for row_idx in range(n_rows):
            start_idx = row_idx * max_letters_per_row
            end_idx = min((row_idx + 1) * max_letters_per_row, len(single_letter_sequence))
            ax = fig.add_subplot(gs[row_idx, 0])
            ax.set_xlim(0, max_letters_per_row)
            ax.set_ylim(0, 1)
            ax.axis('off')
            
            for i, aa in enumerate(single_letter_sequence[start_idx:end_idx]):
                freq = frequencies[aa]
                color = cmap(norm(freq))
                ax.text(i + 0.5, 0.5, aa, ha='center', va='center', fontsize=24, color=color, fontweight='bold')
        
        cbar_ax = fig.add_subplot(gs[-1, 0])
        sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
        sm.set_array([])
        cbar = plt.colorbar(sm, cax=cbar_ax, orientation='horizontal')
        cbar.set_label('Frequency', fontsize=12)
        cbar.ax.tick_params(labelsize=12)
        
        plt.suptitle(f"Center residue {bsss}", fontsize=14)
        plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1)
        plt.show()
            
create_sequence_visualizations(df_plot)

Download

To download all the seeds and score files for this entry Click Here!

P0C7U3
P52961