- 其二:描述符,对于描述符的选取也十分重要,好的描述符能够准确简洁的描述好想要研究的性质,不会出现过拟合或者欠拟合的状态。
- 其三:拟合函数。拟合函数也不是越复杂越好,一个合适的拟合函数,能够准确的描述想要研究的性质,最好具有迁移性以及扩展性。
Materials Project Computed properties of known and hypothetical materials https://materialsproject.org Protein Data Bank (PDB) 3D structures of proteins, nucleic acids, and complex assemblies http://www.wwpdb.org Citrination Computed and experimental properties of materials https://citrination.com Polymer Genome An informatics platform for polymer property prediction and design https://www.polymergenome.org PoLyInfo Various data required for polymeric material design https://polymer.nims.go.jp NanoMine An open-source data resource for members of the nanocomposites community materialsmine Polymer Property Predictor and Database Flory–Huggins χ parameters and glass transition temperatures for various polymers https://pppdb.uchicago.edu Physical Properties of Polymers Various physical properties and characterization techniques of polymers by J. Mark, K. Ngai, W. Graessley, L. Mandelkern, E. Samulski, J. Koenig and G. Wignall ACD/Labs NMR Databases Polymer NMR spectra ACD/Labs | Software for R&D | Chemistry Softwareproducts/dbs/nmr_db Polymer Science Learning Center Spectral Database Polymer IR and NMR spectra https://pslc. uwsp.edu NIST Synthetic Polymer MALDI Recipes Database Matrix-assisted laser desorption ionization (MALDI) mass spectrometry on a wide variety of synthetic polymers https://maldi. nist.gov CROW Polymer Properties Database A multitude of polymer properties http://polymerdatabase.com MATWEB Material Property Data Material properties of thermoplastic and thermoset polymers http://www.matweb.com Material Properties Database Engineering material properties that emphasize ease of comparison https://www.makeitfrom.com
1 聚合物基因组 Polymer Genome: Predict 高分子材料的计算或实验特性数据库以及用于快速预测的相应机器学习模型。 2 PoLy信息 高分子データベース(PoLyInfo) - DICE :: 国立研究開発法人物質・材料研究機構 PoLyInfo 从学术文献中提供有关高分子材料的 ≈100 性质、化学结构和合成方法的信息。 3 聚合物性能预测器和数据库 Polymer Property Predictor and Database 用于结构和多功能应用的聚合物材料的Flory-Huggins χ参数和玻璃化转变温度。 4 材料属性数据库 MakeItFrom.com: Material Properties Database 该数据库提供聚合物材料的机械、热学和电学性能。 5 CROW聚合物特性数据库 iPage 高分子科学数据库,包括高分子材料的结构、性能和应用。 6 PI1M GitHub - RUIMINMA1996/PI1M: A benchmark dataset for polymer informatics. 100万种聚合物用于聚合物信息学。 7 UniProt的 UniProt UniProt 提供全面、高质量且可免费访问的蛋白质序列和功能信息资源。
尼龙-6的重复单元可以看作是1-NH-,5-CH的连接2- 和 1-CO 块。构成重复单元的这些构建块称为聚合物指纹
2D 和 3D 描述符可以通过专业的描述符生成软件获得,例如 Dragon;或开源工具包,例如 Mordred 和 RDKit 中的其他工具包。
使用最广泛的基于 2D 的结构描述符是 SMILES 表示。SMILES 的全称是 Simplified Molecular Input Line Entry System,
基于 3D 结构的描述符可以通过 Mordred 和 Dragon 等描述符生成软件获得。
2.1 描述符的评估
滤波方法旨在通过原始数据集计算特征与目标之间的相关性,并通过设置阈值来选择关键特征,以消除相关性较弱的特征。[88]滤波方法的主要特点在于特定判别模型的独立性以及对特征之间潜在相关性的无知。[89、90]Wrapper 方法从所有特征组合中尽可能地选择最优特征组合,将特征选择过程视为搜索最优任务。[91]过滤法和包装法的区别在于建模算法是否在特征选择过程中引入。
1 CFS Filter CFS estimates the performance of a subset of features rather than a single feature. It introduces a forward search strategy to select strongly correlated non-redundant features. 2 mRMR Filter mRMR uses incremental search to select features, which can maximize the correlation between features and categories as well as minimize the redundancy between features. 3 Markov blanket Filter Markov blankets can perform feature redundancy analysis. In the feature space, the detailed information of the target variable can be obtained from its Markov blanket, and the non-Markov blanket can be regarded as redundant features of the target variable to reduce the feature dimension. 4 Genetic algorithm Wrapper Genetic algorithm uses an evolution-based method to determine the optimal set. After the algorithm runs for a certain number of generations, the optimal member of the group is the selected feature. 5 Backward elimination Wrapper All independent variables are selected into the model and then the partial F test is performed on each independent variable. The smallest F value is recorded as FL and compared with the pre-specified significance level F0. If FL < F0, the variable is eliminated, and refit the regression model with the remaining variables. 5 Forward selection Wrapper Forward selection method is a method of independent variable selection of a regression model. Its characteristic is to introduce the candidate independent variables into the regression equation one by one to test the significance of the regression coefficient, and to decide whether to introduce the independent variable into the model.