English  |  正體中文  |  简体中文  |  Items with full text/Total items : 54371/62179 (87%)
Visitors : 8884843      Online Users : 113
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTHU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    National Tsing Hua University Institutional Repository > 生命科學院  > 分子醫學研究所 > 博碩士論文 >  以機械學習方式預測藥物之小腸吸收度


    Please use this identifier to cite or link to this item: http://nthur.lib.nthu.edu.tw/dspace/handle/987654321/86870


    Title: 以機械學習方式預測藥物之小腸吸收度
    Authors: 李育任
    Lee, Yu Ren
    Description: GH02101080600
    碩士
    分子醫學研究所
    Date: 2015
    Keywords: 機械學習 小腸吸收 特徵選取 支持向量機 十折交叉比對 結構最佳化
    Machine learning Human intestinal absorption Feature selection Support vector machine 10-fold cross validation structure optimization
    Abstract: 摘要
    一開始我們從NCBI網站獲得的180個擁有不同小腸吸收率藥物分子。由於做任何分子計算前必須先將分子結構最佳化,我們用國家高速網路中心(NCHC)的Gaussian09,以DFT(density function)/6-31方法,B3LYP函數,將藥物分子最佳化。由於Gaussain09所運算之MOL檔是以座標形式儲存之座標檔,故再將檔案以Discovery Studio 3.1 轉成真正的3D立體結構。Padel-molecular descriptor是計算分子特徵以及特性軟體,是由新加坡國立大學(NUS)發展,擁有強大計算能力並可計算1875種平面和立體分子特徵。WEKA是紐西蘭Wekato大學發展之機械學習軟體,用於機械學習(machine learning)、資料探勘(data mining )以及特徵選取(feature selection)等。而特徵選取(feature selection)的主要依據計算方法是以最佳特徵選取(CfsSubsetEval)並配合學習粒子群最佳化(Particle Swarm Optimization, PSO)、演化演算法(Evolutionary algorithm)以及其它五種輔助的演算法來選取。支持向量機(Support Vector Machine, SVM)是本研究分類的工具,它有三種主要參數,分別是C (cost)、gamma (γ)以及ε。由於不佳的參數選取會導致分類結果不理想或過度擬合,因此以Pearson相關係數做為參數選取的依據。分類與選取是交替進行的,直到最後選取的特徵無法再繼續縮小範圍為止,而特徵選取是為了要讓分類結果更完整,最後幾個階段選出來的特徵也是最具有代表性的。在分類與選取交替使用後,依序得到與吸收度越發相關的分子特徵,此特徵群體分別是12015, 625, 280, 177, 98, 50 以及 37等特徵數目。最後再以non validated statistics, R^2 和 10-fold statistics, Q^2,得到選取特徵數(NFS)=98是具預測能力最佳的一組特徵,並將預測結果與原來小腸吸收度比較,得到線性之相關係數correlation coefficient R^2 = 0.887與0.5431。因此這98是所選之最有預測能力的分子特徵。最後再將額外的13個藥物分子以所建立的模型預測之,得到 q^2=0.729 以及correlation coefficient R^2=0.7536 .
    Abstract
    In the beginning there are 180 drug compounds with different human intestinal absorption (HIA) values obtained from literatures. From NCBI these 180 compound 3D structures are obtained. Before when starts any chemical compounds calculation, it is necessary to have them optimized. We use Gaussian09 in NCHC (National Center for High-performance Computing) with DFT, 6-31G via B3LYP energy levels to optimize those compounds. Discovery Studio is responsible to convey the coordinates to real 3D structures. Padel(Pharmaceutical Data Exploration Laboratory)-molecular descriptors is developed from National University of Singapore (NUS), is a powerful software contains 1875 molecular descriptors from 2D to 3D. And then we calculated 2D and 3D descriptors via Padel.
    WEKA is a data exploring machine learning soft ware developed via Wekato University, New Zealand. WEKA offers classification selecting attributes function. Feature selection is mainly based on selecting attributes function. The selecting methods are dominantly calculating via best-first evaluator with PSO (Particle Swarm Optimization) and EA (evolutionary algorithm) methods, another 5 algorithms are used to compensate whether there are some possible related to HIA features maybe being lost .
    SVM is used to classify and valid the results via choosing the proper parameters. There are three parameters important to classification: Cost (C), Gamma (γ) and epsilon (ε). Bad parameter chooses induces incorrect classification or overffiting results. In this research we have tried many series of parameters to obtain the best set validated via Pearson correlation coefficient and cross validate statistics.
    Classification and feature selection are used alternately until the ranges of selected features are not narrowed. By means of feature selection, the classification is getting better. Selecting features step by step its number of selected features (NFS) vary from 1875, 12015, 625, 280, 177, 98, 50 and 37.
    Using non validated statistics and 10-fold statistics, NFS=98 is the best predictive feature sets and its correlation coefficient R^2 is 0.88. That is these 98 molecular descriptors are highly correlated to intestinal absorption. Finally the testing task is completed via external validation. 13 drug compounds with different HIA values are used to the external validation. The q^2 is 0.729 and R^2 is 0.7536.
    URI: http://nthur.lib.nthu.edu.tw/dspace/handle/987654321/86870
    Source: http://thesis.nthu.edu.tw/cgi-bin/gs/hugsweb.cgi?o=dnthucdr&i=sGH02101080600.id
    Appears in Collections:[分子醫學研究所] 博碩士論文

    Files in This Item:

    File Description SizeFormat
    source_GH02101080600.html0KbHTML246View/Open


    在NTHUR中所有的資料項目都受到原著作權保護,僅提供學術研究及教育使用,敬請尊重著作權人之權益。若須利用於商業或營利,請先取得著作權人授權。
    若發現本網站收錄之內容有侵害著作權人權益之情事,請權利人通知本網站管理者(smluo@lib.nthu.edu.tw),管理者將立即採取移除該內容等補救措施。

    SFX Query

    與系統管理員聯絡

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback