Chia Nan University of Pharmacy & Science Institutional Repository:Item 310902800/24597
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 18258/20456 (89%)
Visitors : 5988774      Online Users : 1082
RC Version 7.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://ir.cnu.edu.tw/handle/310902800/24597


    Title: 運用三種資料探勘方法預測原發性肝癌患者存活情形之比較
    Predicting Primary Liver Cancer Survivability: A Comparison of Three Data Mining Methods
    Authors: 胡嘉仁
    Contributors: 嘉南藥理科技大學:醫療資訊管理研究所
    陳俞成
    Keywords: 原發性肝癌
    存活預測
    類神經網路
    邏輯斯迴歸
    決策樹分析
    Primary Liver Cancer
    Survival Prediction
    Artificial Neural Network
    Logistic Regression
    Decision Tree Analysis
    Date: 2011
    Issue Date: 2011-10-26 15:41:51 (UTC+8)
    Abstract: 根據世界衛生組織(World Health Organization)於2010年資料統計顯示,肝癌為全球男性死因中的第二位,而在我國癌症死因中肝癌排名第二名,僅次於肺癌;男性的肝癌死亡率也明顯地高於女性,肝癌對於國人的健康產生極大的威脅性,儘管肝癌之死亡率逐年降低,但發生率仍高居男性癌症第一位。當臨床症狀出現後而被診斷出肝癌時,都已失去對肝癌治療的最佳時機。基於肝癌不易察覺、高發生率及高死亡率的特性,如何透過有效的工具,來預測癌前病變特性成為癌症中最容易早期發現與治癒的疾病,以增進肝癌的早期發現與早期治療成效。
    以往在疾病的預後預測在統計上常運用統計學上邏輯斯迴歸(Logistic Regression)、Kaplan-Meier Method及Cox Proportional Hazards Models 等方法來進行,在醫療領域中希望能藉由人工智慧的資料探勘技術,調整執行的參數就更能夠找出可能的組合,輔助鑑別診斷以提高診斷的正確率,甚至更進一步改善治療提高存活率,此類的研究近年來持續地被討論。
    本研究採用資料探勘技術,以美國SEER(the Surveillance, Epidemiology, and End Results)1973至2007年癌症登記資料庫(CIPUD,Cancer Incidence Public-Use Database)中原發性肝癌共65,088筆個案資料記錄及72個變項,經過資料清理後留下與預測原發性肝癌五年存活相關的14個變項,與診斷年份為1988至2002年的資料共2,066筆進行資料分析,以不同的存活小於五年樣本個數子集組合與不同診斷年代子集,運用類神經網路、決策樹以及邏輯斯迴歸三種演算法來比較內部驗證與外部驗證的預測存活準確度、敏感度、特異度以及ROC曲線下面積。
    研究結果顯示,以不同存活情形比例個案數組合方式進行內部驗證,發現預測準確度整體平均以決策樹分析表現較好,預測準確度平均達83.59%,標準差為0.09,優於類神經網路以及邏輯斯迴歸,顯示決策樹分析在內部學習效能及穩定度上比另外兩種模式好。在外推能力上,整體平均而言,邏輯斯迴歸的準確度差異及ROC曲線下面積差異皆優於類神經網路及決策樹分析,分別為0.0446及0.13,顯示邏輯斯迴歸對於外來新的資料歧異度上容忍性較高,但其準確度差異及AUC差異之標準差較高,呈現不穩定的情形。
    According to World Health Organization statistics in 2001, liver cancer was the second in all cancer deaths in male worldwide. In Taiwan, liver cancer was also the second of all cancer deaths, next to lung cancer. Male liver cancer mortality was significantly higher than the female liver cancer mortality. The liver disease is the great threat, despite the annual mortality rate of liver cancer going down, and cancer incidence rate is still the highest in the male. When liver cancer clinical symptoms have been diagnosed, the best time for treatment of liver cancer has been lost. Due to difficult to diagnose liver cancer, the high incidence and the high mortality, how to use the effective tool to diagnose the liver cancer early for curing would be an important topic.
    In the past, Logistic Regression, Kaplan-Meier Method and the Cox Proportional Hazards Models were often used to predict the prognosis of the disease. Medically, the artificial intelligence data mining techniques and adjustment of the parameters were used to improve diagnostic accuracy, treatment and even survival rate. These studies in recent years continued to be discussed.
    In this study, samples were diagnosed as primary liver cancer patients in the USA Cancer Registry Database (SEER) during years 1973-2007. There were 65,088 cases with 72 variables. After the data was cleaned, there were 2,206 cases and 14 variables remaining during years 1988-2002. The performances of prediction models were evaluated according to parameters such as accuracy, , sensitivity, specificity and the area under ROC curve, and were compared efficacy with three kinds of data mining methods for the internal verification and the external validation.
    The results showed the difference proportion of survival cases for several datasets of the internal verification. The average prediction accuracy of decision tree analysis better than neural network and logistic regression had the highest average predict accuracy of 83.59% with 0.09 of standard deviation. The studies showed that decision tree analysis performance in the internal learning was more accurate than the other two models. In the external validation analysis, the average of the accuracy of logistic regression had the least accuracy differences and the area under ROC curve (AUC) differences. Therefore, logistical regression was more reliable than neural networks and decision trees analysis with average value 0.446 and 0.13 respectively. It is likely that logistic regression can describe the new data more preciously. However, logistical regression has an unstable situation because of its higher standard deviation in difference of accuracy and AUC.
    Relation: 校內一年後公開,校外永不公開,學年度:99,96頁
    Appears in Collections:[Dept. of Hospital and Health (including master's program)] Dissertations and Theses

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML1928View/Open


    All items in CNU IR are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback