運用三種資料探勘方法預測原發性肝癌患者存活情形之比較

Chia Nan University of Pharmacy & Science Institutional Repository > 智慧生活學院 > 醫務管理系(所) > 博碩士論文 > Item 310902800/24597

請使用永久網址來引用或連結此文件: https://ir.cnu.edu.tw/handle/310902800/24597

標題:	運用三種資料探勘方法預測原發性肝癌患者存活情形之比較 Predicting Primary Liver Cancer Survivability: A Comparison of Three Data Mining Methods
作者:	胡嘉仁
貢獻者:	嘉南藥理科技大學：醫療資訊管理研究所陳俞成
關鍵字:	原發性肝癌存活預測類神經網路邏輯斯迴歸決策樹分析 Primary Liver Cancer Survival Prediction Artificial Neural Network Logistic Regression Decision Tree Analysis
日期:	2011
上傳時間:	2011-10-26 15:41:51 (UTC+8)
摘要:	根據世界衛生組織（World Health Organization）於2010年資料統計顯示，肝癌為全球男性死因中的第二位，而在我國癌症死因中肝癌排名第二名，僅次於肺癌；男性的肝癌死亡率也明顯地高於女性，肝癌對於國人的健康產生極大的威脅性，儘管肝癌之死亡率逐年降低，但發生率仍高居男性癌症第一位。當臨床症狀出現後而被診斷出肝癌時，都已失去對肝癌治療的最佳時機。基於肝癌不易察覺、高發生率及高死亡率的特性，如何透過有效的工具，來預測癌前病變特性成為癌症中最容易早期發現與治癒的疾病，以增進肝癌的早期發現與早期治療成效。以往在疾病的預後預測在統計上常運用統計學上邏輯斯迴歸（Logistic Regression）、Kaplan-Meier Method及Cox Proportional Hazards Models 等方法來進行，在醫療領域中希望能藉由人工智慧的資料探勘技術，調整執行的參數就更能夠找出可能的組合，輔助鑑別診斷以提高診斷的正確率，甚至更進一步改善治療提高存活率，此類的研究近年來持續地被討論。本研究採用資料探勘技術，以美國SEER（the Surveillance, Epidemiology, and End Results）1973至2007年癌症登記資料庫（CIPUD，Cancer Incidence Public-Use Database）中原發性肝癌共65,088筆個案資料記錄及72個變項，經過資料清理後留下與預測原發性肝癌五年存活相關的14個變項，與診斷年份為1988至2002年的資料共2,066筆進行資料分析，以不同的存活小於五年樣本個數子集組合與不同診斷年代子集，運用類神經網路、決策樹以及邏輯斯迴歸三種演算法來比較內部驗證與外部驗證的預測存活準確度、敏感度、特異度以及ROC曲線下面積。研究結果顯示，以不同存活情形比例個案數組合方式進行內部驗證，發現預測準確度整體平均以決策樹分析表現較好，預測準確度平均達83.59%，標準差為0.09，優於類神經網路以及邏輯斯迴歸，顯示決策樹分析在內部學習效能及穩定度上比另外兩種模式好。在外推能力上，整體平均而言，邏輯斯迴歸的準確度差異及ROC曲線下面積差異皆優於類神經網路及決策樹分析，分別為0.0446及0.13，顯示邏輯斯迴歸對於外來新的資料歧異度上容忍性較高，但其準確度差異及AUC差異之標準差較高，呈現不穩定的情形。 According to World Health Organization statistics in 2001, liver cancer was the second in all cancer deaths in male worldwide. In Taiwan, liver cancer was also the second of all cancer deaths, next to lung cancer. Male liver cancer mortality was significantly higher than the female liver cancer mortality. The liver disease is the great threat, despite the annual mortality rate of liver cancer going down, and cancer incidence rate is still the highest in the male. When liver cancer clinical symptoms have been diagnosed, the best time for treatment of liver cancer has been lost. Due to difficult to diagnose liver cancer, the high incidence and the high mortality, how to use the effective tool to diagnose the liver cancer early for curing would be an important topic. In the past, Logistic Regression, Kaplan-Meier Method and the Cox Proportional Hazards Models were often used to predict the prognosis of the disease. Medically, the artificial intelligence data mining techniques and adjustment of the parameters were used to improve diagnostic accuracy, treatment and even survival rate. These studies in recent years continued to be discussed. In this study, samples were diagnosed as primary liver cancer patients in the USA Cancer Registry Database (SEER) during years 1973-2007. There were 65,088 cases with 72 variables. After the data was cleaned, there were 2,206 cases and 14 variables remaining during years 1988-2002. The performances of prediction models were evaluated according to parameters such as accuracy, , sensitivity, specificity and the area under ROC curve, and were compared efficacy with three kinds of data mining methods for the internal verification and the external validation. The results showed the difference proportion of survival cases for several datasets of the internal verification. The average prediction accuracy of decision tree analysis better than neural network and logistic regression had the highest average predict accuracy of 83.59% with 0.09 of standard deviation. The studies showed that decision tree analysis performance in the internal learning was more accurate than the other two models. In the external validation analysis, the average of the accuracy of logistic regression had the least accuracy differences and the area under ROC curve (AUC) differences. Therefore, logistical regression was more reliable than neural networks and decision trees analysis with average value 0.446 and 0.13 respectively. It is likely that logistic regression can describe the new data more preciously. However, logistical regression has an unstable situation because of its higher standard deviation in difference of accuracy and AUC.
關聯:	校內一年後公開，校外永不公開，學年度：99，96頁
顯示於類別:	[醫務管理系(所)] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	2240	檢視/開啟

在CNU IR中所有的資料項目都受到原著作權保護.

TAIR相關文章

資料載入中.....