- Structure Search
- About this site
AMED cardiotoxicity database release 2020.04.02 contains following updates about data and prediction models. Detailed description about each update/function were to be available soon.
Update of database
1. Inclusion of activity information about other ion channels related to cardiotoxicity (Nav1.5, Kv1.5, and Cav1.2) along with hERG.
2. Update following those in public database.• ChEMBL (→v.24)
• NCGC (→v2.1)
3. Additional assay results measured in AMED Development of a Drug Discovery Informatics System project by RIKEN and sourcing to Eurofins (hERG) and Icagen. (Nav1.5, Kv1.5, and Cav1.2)
1. Updated hERG discrimination model• New model corresponding to database update
• Assessment of applicability domain based on molecular similarity to learning data
• Output values are now scaled as probability from 0 to 1
• Change of SVM implementation from SVMlight to scikit-learn.svm.svc
2. New prediction models• hERG regression model (implemented using scikit-learn.svm.svr)
• Nav1.5 discrimination/regression model (implemented using tensorflow.keras) with pre-learning of other sodium channels
• Kv1.5 discrimination model (implemented using tensorflow.keras) with pre-learning of other potassium channels
• Cav1.2 discrimination model (implemented using tensorflow.keras) with pre-learning of other calcium channels
3. Downloads of search/prediction results• Search/prediction results can be download by clicking "download" button in the results page.
• Due to the calculation cost, sdf input for prediction was limited up to 100 compounds in a single run.
• Direct download of the whole database was removed due to the large data size and ongoing collaboration with a software company. When license and usage conditions are sorted, we intend to prepare application form for the data download.
AMED Cardiotoxicity Database is a database of small molecules which bind to various ion channels and potentially cause cardiotoxic risk.
AMED Cardiotoxicity Database compiles cardiotoxicity-related information from publicly available databases and integrates them in standardized format. As an initial target, bioactivities for hERG potassium channel were collected from ChEMBL, NIH Chemical Genomics Center, and hERGCentral because the inhibition of hERG potassium channel is closely related to the prolonged QT interval, and to assess the risk could greatly contribute to avoid delay of the development of therapeutic compounds or withdrawal of marketed drugs.
ChEMBL is a bioactivity database maintained by European Bioinformatics Institute, and frequently used in various cheminformatics researches as the de facto standard database. According to the target ID (CHEMBL240) of hERG, 2,153 hERG-related bioassays were registered in ChEMBL version 22, then, 10,976 activity entries for the assays were extracted. To ensure validity of the data, entries with low confidence value, undesirable data validity comments, or specified as potential duplicate were excluded. hERG-related assays which did not measure inhibitory activities were manually removed by checking assay descriptions.
2. NIH Chemical Genomics Center (NCGC)
Quantitative high throughput screening to determine in vitro hERG channel blockage by NCGC was derived from PubChem bioassays (AID = 588834). The data related to hERG (about 2,688 compounds) in LOPAC1280 library (Sigma) were determined by FluxORTM thallium flux assay. The data contain both EC50 values for both hERG inhibitors and activators along with some undefined data because the EC50 values were calculated from automated sigmoid curve-fitting to dose response of hERG activities by Hill equation, and did not distinguish the positive and negative values of Hill coefficient (inhibitor/activator) or fitting quality (inconclusive entries). Some results in this dataset were redundantly included in ChEMBL database. However, the outcome comments attached to the entries to specify whether EC50 values means inhibitors, activators, and inconclusive ones were omitted in ChEMBL. Thus, the corresponding entries were excluded from the ChEMBL dataset. In the NCGC dataset, hERG inhibitors were defined as the entries with both outcome comments specifying "inhibitor" and sufficient inhibitory activity (EC50<10µM in this case). Compounds with EC50 exceeding 10µM and compounds specified as activators and inconclusive entries were defined as negative compounds.
hERGCentral is a database containing hERG activity information of more than 300,000 compounds. Because hERGCentral database (www.hergcentral.org) is currently out of order, values of the percent inhibitory activities of 318,496 compounds at 10µM concentration determined by IonWorks Quattro (MDC, Sunnyvale, CA) in population patch clamp (PPC) mode were retrieved from supporting information of manuscript about statistical analysis of hERGCentral dataset published by Fang et. al.(2015).
The number of compounds
AMED cardiotoxity database consists of 9,259 hERG inhibitors (IC50≤10µM) and 279,718 inactive compounds (IC50>10µM). The assessment of structural diversity using Murcko frameworks revealed that the database contains more than 2 times as many scaffold for hERG inhibitors as any of the existing databases, and covering 18.0% of all chemical space occupied by whole compounds in ChEMBL (438,551 frameworks).
database class Number of compounds Number of Murcko frameworks ChEMBL Inhibitors 4,793 2,474 Inactives 5,275 3,012 All 10,068 4,954 NCGC Inhibitors 232 173 Inactives 1,234 504 All 1,466 639 hERGCentral Inhibitors 4,321 2,708 Inactives 274,536 73,419 All 278,857 74,687 AMED cardiotoxity database Inhibitors 9,259 5,203 Inactives 279,718 75,868 All 288,977 79,014
Sato T, Yuki H, Ogura K, Honma T Construction of an integrated database for hERG blocking small molecules.
PLOS ONE 13(7) (2018): e0199348. https://doi.org/10.1371/journal.pone.0199348
Last Update: 2020-04-02