P13688

Author

Hamed Khakzad

Published

August 10, 2024

General information

Code

import requests
import urllib3
urllib3.disable_warnings()

def fetch_uniprot_data(uniprot_id):
    url = f"https://rest.uniprot.org/uniprotkb/{uniprot_id}.json"
    response = requests.get(url, verify=False)  # Disable SSL verification
    response.raise_for_status()  # Raise an error for bad status codes
    return response.json()

def display_uniprot_data(data):
    primary_accession = data.get('primaryAccession', 'N/A')
    protein_name = data.get('proteinDescription', {}).get('recommendedName', {}).get('fullName', {}).get('value', 'N/A')
    gene_name = data.get('gene', [{'geneName': {'value': 'N/A'}}])[0]['geneName']['value']
    organism = data.get('organism', {}).get('scientificName', 'N/A')
    
    function_comment = next((comment for comment in data.get('comments', []) if comment['commentType'] == "FUNCTION"), None)
    function = function_comment['texts'][0]['value'] if function_comment else 'N/A'

    # Printing the data
    print(f"UniProt ID: {primary_accession}")
    print(f"Protein Name: {protein_name}")
    print(f"Organism: {organism}")
    print(f"Function: {function}")

# Replace this with the UniProt ID you want to fetch
uniprot_id = "P13688"
data = fetch_uniprot_data(uniprot_id)
display_uniprot_data(data)

UniProt ID: P13688
Protein Name: Carcinoembryonic antigen-related cell adhesion molecule 1
Organism: Homo sapiens
Function: Cell adhesion protein that mediates homophilic cell adhesion in a calcium-independent manner (By similarity). Plays a role as coinhibitory receptor in immune response, insulin action and functions also as an activator during angiogenesis (PubMed:18424730, PubMed:23696226, PubMed:25363763). Its coinhibitory receptor function is phosphorylation- and PTPN6 -dependent, which in turn, suppress signal transduction of associated receptors by dephosphorylation of their downstream effectors. Plays a role in immune response, of T cells, natural killer (NK) and neutrophils (PubMed:18424730, PubMed:23696226). Upon TCR/CD3 complex stimulation, inhibits TCR-mediated cytotoxicity by blocking granule exocytosis by mediating homophilic binding to adjacent cells, allowing interaction with and phosphorylation by LCK and interaction with the TCR/CD3 complex which recruits PTPN6 resulting in dephosphorylation of CD247 and ZAP70 (PubMed:18424730). Also inhibits T cell proliferation and cytokine production through inhibition of JNK cascade and plays a crucial role in regulating autoimmunity and anti-tumor immunity by inhibiting T cell through its interaction with HAVCR2 (PubMed:25363763). Upon natural killer (NK) cells activation, inhibit KLRK1-mediated cytolysis of CEACAM1-bearing tumor cells by trans-homophilic interactions with CEACAM1 on the target cell and lead to cis-interaction between CEACAM1 and KLRK1, allowing PTPN6 recruitment and then VAV1 dephosphorylation (PubMed:23696226). Upon neutrophils activation negatively regulates IL1B production by recruiting PTPN6 to a SYK-TLR4-CEACAM1 complex, that dephosphorylates SYK, reducing the production of reactive oxygen species (ROS) and lysosome disruption, which in turn, reduces the activity of the inflammasome. Down-regulates neutrophil production by acting as a coinhibitory receptor for CSF3R by down-regulating the CSF3R-STAT3 pathway through recruitment of PTPN6 that dephosphorylates CSF3R (By similarity). Also regulates insulin action by promoting INS clearance and regulating lipogenesis in liver through regulating insulin signaling (By similarity). Upon INS stimulation, undergoes phosphorylation by INSR leading to INS clearance by increasing receptor-mediated insulin endocytosis. This inernalization promotes interaction with FASN leading to receptor-mediated insulin degradation and to reduction of FASN activity leading to negative regulation of fatty acid synthesis. INSR-mediated phosphorylation also provokes a down-regulation of cell proliferation through SHC1 interaction resulting in decrease coupling of SHC1 to the MAPK3/ERK1-MAPK1/ERK2 and phosphatidylinositol 3-kinase pathways (By similarity). Functions as activator in angiogenesis by promoting blood vessel remodeling through endothelial cell differentiation and migration and in arteriogenesis by increasing the number of collateral arteries and collateral vessel calibers after ischemia. Also regulates vascular permeability through the VEGFR2 signaling pathway resulting in control of nitric oxide production (By similarity). Down-regulates cell growth in response to EGF through its interaction with SHC1 that mediates interaction with EGFR resulting in decrease coupling of SHC1 to the MAPK3/ERK1-MAPK1/ERK2 pathway (By similarity). Negatively regulates platelet aggregation by decreasing platelet adhesion on type I collagen through the GPVI-FcRgamma complex (By similarity). Inhibits cell migration and cell scattering through interaction with FLNA; interferes with the interaction of FLNA with RALA (PubMed:16291724). Mediates bile acid transport activity in a phosphorylation dependent manner (By similarity). Negatively regulates osteoclastogenesis (By similarity)

More information:

AlphaFold model

Surface representation - binding sites

The computed point cloud for pLDDT > 0.6. Each atom is sampled on average by 10 points.

To see the predicted binding interfaces, you can choose color theme “uncertainty”.

Go to the “Controls Panel”
Below “Components”, to the right, click on “…”
“Set Coloring” by “Atom Property”, and “Uncertainty/Disorder”

All detected seeds aligned

Seed scores per sites

Code

import re
import pandas as pd
import os
import plotly.express as px

ID = "P13688"
data_list = []

name_pattern = re.compile(r'name: (\S+)')
score_pattern = re.compile(r'score: (\d+\.\d+)')
desc_dist_score_pattern = re.compile(r'desc_dist_score: (\d+\.\d+)')

directory = f"/Users/hamedkhakzad/Research_EPFL/1_postdoc_project/Surfaceome_web_app/www/Surfaceome_top100_per_site/{ID}_A"

for filename in os.listdir(directory):
    if filename.startswith("output_sorted_") and filename.endswith(".score"):
        filepath = os.path.join(directory, filename)
        with open(filepath, 'r') as file:
            for line in file:
                name_match = name_pattern.search(line)
                score_match = score_pattern.search(line)
                desc_dist_score_match = desc_dist_score_pattern.search(line)
                
                if name_match and score_match and desc_dist_score_match:
                    name = name_match.group(1)
                    score = float(score_match.group(1))
                    desc_dist_score = float(desc_dist_score_match.group(1))
                    
                    simple_filename = filename.replace("output_sorted_", "").replace(".score", "")
                    data_list.append({
                        'name': name[:-1],
                        'score': score,
                        'desc_dist_score': desc_dist_score,
                        'file': simple_filename
                    })

data = pd.DataFrame(data_list)

fig = px.scatter(
    data,
    x='score',
    y='desc_dist_score',
    color='file',
    title='Score vs Desc Dist Score',
    labels={'score': 'Score', 'desc_dist_score': 'Desc Dist Score'},
    hover_data={'name': True}
)

fig.update_layout(
    legend_title_text='File',
    legend=dict(
        yanchor="top",
        y=0.99,
        xanchor="left",
        x=1.05
    )
)

fig.show()

Binding site metrics

Code

import pandas as pd
pd.options.mode.chained_assignment = None
import plotly.express as px

df_total = pd.read_csv('/Users/hamedkhakzad/Research_EPFL/1_postdoc_project/Surfaceome_web_app/www/database/df_flattened.csv')
df_plot = df_total[df_total['acc_flat'] == ID]
df_plot ['Total seeds'] = df_plot.loc[:,['seedss_a','seedss_b']].sum(axis=1)
df_plot.loc[:, ["acc_flat", "main_classs", "sub_classs", "seedss_a", "seedss_b", "areass", "bsss", "hpss"]]

	acc_flat	main_classs	sub_classs	seedss_a	seedss_b	areass	bsss	hpss
758	P13688	Miscellaneous	StructuralAndAdhesion	619	1261	1371.096142	317	2.79999
759	P13688	Miscellaneous	StructuralAndAdhesion	1533	4588	3036.526354	43	0.29999
760	P13688	Miscellaneous	StructuralAndAdhesion	552	964	787.318413	64	-0.20000

Code

import math
import matplotlib.pyplot as plt

features = ['seedss_a', 'seedss_b', 'areass', 'hpss']
titles = ['Alpha seeds', 'Beta seeds', 'Area', 'Hydrophobicity']
num_features = len(features)

if len(df_plot) > 8:
    num_rows = 2
    num_cols = 2
else:
    num_rows = 1
    num_cols = 4

fig, axes = plt.subplots(nrows=num_rows, ncols=num_cols, figsize=(9, num_rows * 5))

axes = axes.flatten()
positions = range(1, len(df_plot) + 1)

for i, feature in enumerate(features):
    title = titles[i]
    axes[i].bar(positions, df_plot[feature], color=['blue', 'orange', 'green', 'red', 'purple', 'brown'])
    axes[i].set_title(title, fontsize=13)
    axes[i].set_xticks(positions)
    axes[i].set_xticklabels(df_plot['bsss'], rotation=90)
    axes[i].set_xlabel("Center residues", fontsize=13)
    axes[i].set_ylabel(title, fontsize=13)

for j in range(len(features), len(axes)):
    fig.delaxes(axes[j])

plt.tight_layout()
plt.show()

Binding site sequence composition

Code

amino_acid_map = {
    'ALA': 'A', 'ARG': 'R', 'ASN': 'N', 'ASP': 'D', 'CYS': 'C',
    'GLN': 'Q', 'GLU': 'E', 'GLY': 'G', 'HIS': 'H', 'ILE': 'I',
    'LEU': 'L', 'LYS': 'K', 'MET': 'M', 'PHE': 'F', 'PRO': 'P',
    'SER': 'S', 'THR': 'T', 'TRP': 'W', 'TYR': 'Y', 'VAL': 'V'
}

from collections import Counter
from ast import literal_eval
from matplotlib.gridspec import GridSpec
import warnings
warnings.filterwarnings("ignore", message="Attempting to set identical low and high xlims")

def convert_to_single_letter(aa_list):
    if type(aa_list) == str:
        aa_list = literal_eval(aa_list)
    return [amino_acid_map[aa] for aa in aa_list]

def create_sequence_visualizations(df, max_letters_per_row=20):
    for idx, row in df.iterrows():
        bsss = row['bsss']
        AAss = row['AAss']
        single_letter_sequence = convert_to_single_letter(AAss)
        
        freq_counter = Counter(single_letter_sequence)
        total_aa = len(single_letter_sequence)
        frequencies = {aa: freq / total_aa for aa, freq in freq_counter.items()}
        
        cmap = plt.get_cmap('viridis')
        norm = plt.Normalize(0, max(frequencies.values()) if frequencies else 1)
        
        n_rows = (len(single_letter_sequence) + max_letters_per_row - 1) // max_letters_per_row
        fig = plt.figure(figsize=(max_letters_per_row * 0.6, n_rows * 1.2 + 0.5))
        
        gs = GridSpec(n_rows + 1, 1, height_ratios=[1] * n_rows + [0.1], hspace=0.3)
        
        for row_idx in range(n_rows):
            start_idx = row_idx * max_letters_per_row
            end_idx = min((row_idx + 1) * max_letters_per_row, len(single_letter_sequence))
            ax = fig.add_subplot(gs[row_idx, 0])
            ax.set_xlim(0, max_letters_per_row)
            ax.set_ylim(0, 1)
            ax.axis('off')
            
            for i, aa in enumerate(single_letter_sequence[start_idx:end_idx]):
                freq = frequencies[aa]
                color = cmap(norm(freq))
                ax.text(i + 0.5, 0.5, aa, ha='center', va='center', fontsize=24, color=color, fontweight='bold')
        
        cbar_ax = fig.add_subplot(gs[-1, 0])
        sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
        sm.set_array([])
        cbar = plt.colorbar(sm, cax=cbar_ax, orientation='horizontal')
        cbar.set_label('Frequency', fontsize=12)
        cbar.ax.tick_params(labelsize=12)
        
        plt.suptitle(f"Center residue {bsss}", fontsize=14)
        plt.subplots_adjust(left=0.1, right=0.9, top=0.9, bottom=0.1)
        plt.show()
            
create_sequence_visualizations(df_plot)

Download

To download all the seeds and score files for this entry Click Here!