成果筛选
共找到42结果
筛选条件 : Yijie DING
Yijie Ding; Hongmei Zhou; Quan Zou; Lei Yuan
Methods, 2023 219 - SCIE

摘要 : Adverse drug reactions include side effects, allergic reactions, and secondary infections. Severe adverse reactions can cause cancer, deformity, or mutation. The monitoring of drug side effects is an important support for post marketing safety supervision of drugs, and an important basis for revising drug instructions. Its purpose is to timely detect and control drug safety risks. Traditional methods are time-consuming. To accelerate the discovery of side effects, we propose a machine learning based method, called correntropy-loss based matrix factorization with neural tangent kernel (CLMF-NTK), to solve the prediction of drug side effects. Our method and other computational methods are tested on three benchmark datasets, and the results show that our method achieves the best predictive performance.

Weizhong Lu; Nan Zhou; Yijie Ding; Hongjie Wu; Yu Zhang; Qiming Fu
BioMed Research International, 2022 2022 (Special Issue) - SCIE

摘要 : DNA contains the genetic information for the synthesis of proteins and RNA, and it is an indispensable substance in living organisms. DNA-binding proteins are an enzyme, which can bind with DNA to produce complex proteins, and play an important role in the functions of a variety of biological molecules. With the continuous development of deep learning, the introduction of deep learning into DNA-binding proteins for prediction is conducive to improving the speed and accuracy of DNA-binding protein recognition. In this study, the features and structures of proteins were used to obtain their representations through graph convolutional networks. A protein prediction model based on graph convolutional network and contact map was proposed. The method had some advantages by testing various indexes of PDB14189 and PDB2272 on the benchmark dataset.

Yijie Ding; Jijun Tang; Fei Guo
Neurocomputing, 2021 461 - EI SCIE

摘要 : Diseases are usually caused by body's own defects protein or the functional structure of viral proteins . Effective drugs can be combined with these proteins well and remove original functions to achieve the therapeutic effect. The biochemical approaches of drug-target interactions (DTIs) determination is expensive and time-consuming. Therefnal-based methods have been proposed to predict new DTIs. In order to solve the problem of multiple information fusion, we propose a multi-view graph regularized link propagation model (MvGRLP) to predict new DTIs. Multi-view learning could use the complementary and correlated information between different views (features). Compared with existing models, our method achieves comparable and best results on four benchmark datasets.

Wang, Hao; Tang, Jijun; Ding, Yijie; Guo, Fei
Briefings in Bioinformatics, 2021 22 (5) - SCIE

摘要 : Relationship of accurate associations between non-coding RNAs and diseases could be of great help in the treatment of human biomedical research. However, the traditional technology is only applied on one type of non-coding RNA or a specific disease, and the experimental method is time-consuming and expensive. More computational tools have been proposed to detect new associations based on known ncRNA and disease information. Due to the ncRNAs (circRNAs, miRNAs and lncRNAs) having a close relationship with the progression of various human diseases, it is critical for developing effective computational predictors for ncRNA–disease association prediction. In this paper, we propose a new computational method of three-matrix factorization with hypergraph regularization terms (HGRTMF) based on central kernel alignment (CKA), for identifying general ncRNA–disease associations. In the process of constructing the similarity matrix, various types of similarity matrices are applicable to circRNAs, miRNAs and lncRNAs. Our method achieves excellent performance on five datasets, involving three types of ncRNAs. In the test, we obtain best area under the curve scores of |$0.9832$|⁠, |$0.9775$|⁠, |$0.9023$|⁠, |$0.8809$| and |$0.9185$| via 5-fold cross-validation and |$0.9832$|⁠, |$0.9836$|⁠, |$0.9198$|⁠, |$0.9459$| and |$0.9275$| via leave-one-out cross-validation on five datasets. Furthermore, our novel method (CKA-HGRTMF) is also able to discover new associations between ncRNAs and diseases accurately. Availability: Codes and data are available: https://github.com/hzwh6910/ncRNA2Disease.git. Contact:fguo@tju.edu.cn

Yang, Chao; Ding, Yijie; Meng, Qiaozhen; Tang, Jijun; Guo, Fei
Neural computing & applications (Print), 2021 33 (17) - EI SCIE

摘要 : RNA-binding proteins play an important role in the biological process. However, the traditional experiment technology to predict RNA-binding residues is time-consuming and expensive, so the development of an effective computational approach can provide a strategy to solve this issue. In recent years, most of the computational approaches are constructed on protein sequence information, but the protein structure has not been considered. In this paper, we use a novel computational model of RNA-binding residues prediction, using protein sequence and structure information. Our hybrid features are encoded by local sequence and structure feature extraction models. Our predictor is built by employing the Granular Multiple Kernel Support Vector Machine with Repetitive Under-sampling (GMKSVM-RU). In order to evaluate our method, we use fivefold cross-validation on the RBP129, our method achieves better experimental performance with MCC of 0.3367 and accuracy of 88.84%. In order to further evaluate our model, an independent data set (RBP60) is employed, and our method achieves MCC of 0.3921 and accuracy of 87.52%. Above results demonstrate that integrating sequence and structure information is beneficial to improve the prediction ability of RNA-binding residues.

Lu, Weizhong; Cao, Yan; Wu, Hongjie; Ding, Yijie; Song, Zhengwei; Zhang, Yu
BMC Bioinformatics, 2021 22 (Supplement 3) - EI SCIE

摘要 : BackgroundRNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance.ResultsThe algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively.ConclusionsThe flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.

Xiaobin Liu; Xiran Zhang; Xiaoyi Guo; Yijie Ding; Weiwei Shan; Liang Wang
Computational and Mathematical Methods in Medicine, 2021 2021 (Special Issue) - EI SCIE

摘要 : In end-stage renal disease (ESRD), vascular calcification risk factors are essential for the survival of hemodialysis patients. To effectively assess the level of vascular calcification, the machine learning algorithm can be used to predict the vascular calcification risk in ESRD patients. As the amount of collected data is unbalanced under different risk levels, it has an influence on the classification task. So, an effective fuzzy support vector machine based on self-representation (FSVM-SR) is proposed to predict vascular calcification risk in this work. In addition, our method is also compared with other conventional machine learning methods, and the results show that our method can better complete the classification task of the vascular calcification risk.

Qian, Yuqing; Jiang, Limin; Ding, Yijie; Tang, Jijun; Guo, Fei
BMC Bioinformatics, 2021 22 (03) - EI SCIE

摘要 : BackgroundDNA-Binding Proteins (DBP) plays a pivotal role in biological system. A mounting number of researchers are studying the mechanism and detection methods. To detect DBP, the tradition experimental method is time-consuming and resource-consuming. In recent years, Machine Learning methods have been used to detect DBP. However, it is difficult to adequately describe the information of proteins in predicting DNA-binding proteins. In this study, we extract six features from protein sequence and use Multiple Kernel Learning-based on Centered Kernel Alignment to integrate these features. The integrated feature is fed into Support Vector Machine to build predictive model and detect new DBP.ResultsIn our work, date sets of PDB1075 and PDB186 are employed to test our method. From the results, our model obtains better results (accuracy) than other existing methods on PDB1075 (\(84.19\%\)) and PDB186 (\(83.7\%\)), respectively.ConclusionMultiple kernel learning could fuse the complementary information between different features. Compared with existing methods, our method achieves comparable and best results on benchmark data sets.

Yubo Wang; Yijie Ding; Jijun Tang
IEEE ACM Transactions on Computational Biology and Bioinformatics, 2021 18 (1) - EI SCIE

摘要 : Improving the accuracy of predicting protein crystallization is very important for protein crystallization projects, which is a critical step for the determination of protein structure by X-ray crystallography. At present, many machine learning methods are used to predict protein crystallization. Here, we use a novel feature combination to construct a SVM model in the prediction of protein crystallization, called as CrystalM. In this work, we extract six features to represent protein sequences, namely Average Block-Position specific scoring matrix (AVBlock-PSSM), Average Block-Secondary Structure (AVBlock-SS), Global Encoding (GE), Pseudo-Position specific scoring matrix (PsePSSM), Protscale, and Discrete Wavelet Transform-Position specific scoring matrix (DWT-PSSM). Moreover, we employ two training datasets (TRAIN3587 and TRAIN1500) and their corresponding independent test datasets (TEST3585 and TEST500) to evaluate CrystalM by feeding multi-view features into Support Vector Machine (SVM) classifier. Two training datasets are employed for five-fold cross validation, and two test datasets are separately used to test the corresponding datasets. Finally, we compare CrystalM with other existing methods in the performance. For the datasets of TRAIN3587 and TEST3585, CrystalM achieves best Accuracy (ACC), best Specificity (SP), and the same Mathew's correlation coefficient (MCC) as the previous outperforming methods in the five-fold cross validation. In particular, ACC, SP, and MCC have surpassed the existing methods in independent test, which proves the effectiveness of CrystalM. Meanwhile, ACC, SP, and MCC are higher than existing methods in the five-fold cross validation for TRAIN1500. Although the performance of independent test for TEST500 is not the best, CrystalM also has a certain predictability in the prediction of protein crystallization. In addition, we find that only choosing the first four features can improve the performance of prediction for TRAIN1500 and TEST500, not only in independent tests but also in five-fold cross validation. This phenomenon indicates that the latter two features can not effectively represent proteins of TRAIN1500 and TEST500. CrystalM is a sequence-based protein crystallization prediction method. The good performance on the datasets proves the effectiveness of CrystalM and the better performance on large datasets further demonstrates the stability and superiority of CrystalM.

Hongyu Zhang; Limin Jiang; Jijun Tang; Yijie Ding
Frontiers in Cell and Developmental Biology, 2021 9 - SCIE

摘要 : In recent years, cancer has become a severe threat to human health. If we can accurately identify the subtypes of cancer, it will be of great significance to the research of anti-cancer drugs, the development of personalized treatment methods, and finally conquer cancer. In this paper, we obtain three feature representation datasets (gene expression profile, isoform expression and DNA methylation data) on lung cancer and renal cancer from the Broad GDAC, which collects the standardized data extracted from The Cancer Genome Atlas (TCGA). Since the feature dimension is too large, Principal Component Analysis (PCA) is used to reduce the feature vector, thus eliminating the redundant features and speeding up the operation speed of the classification model. By multiple kernel learning (MKL), we use Kernel target alignment (KTA), fast kernel learning (FKL), Hilbert-Schmidt Independence Criterion (HSIC), Mean to calculate the weight of kernel fusion. Finally, we put the combined kernel function into the support vector machine (SVM) and get excellent results. Among them, in the classification of renal cell carcinoma subtypes, the maximum accuracy can reach 0.978 by using the method of MKL (HSIC calculation weight), while in the classification of lung cancer subtypes, the accuracy can even reach 0.990 with the same method (FKL calculation weight).