摘要: | 統計界時常面臨科學及工業界提出問題之挑戰。早期問題大多源自農業與工業實驗,而且問題範圍相對較小;隨著計算機科學與資訊時代的進步,統計問題在大小與複雜度上均同時面臨高速的膨脹。統計學者的工作即在於由龐大的資料中抽取出重要的形態與趨勢,並瞭解資料解說出什麼知識;稱此過程為「自資料中學習」。大略可將學習問題分成監督式(supervised)或非監督式(unsupervised)兩類。在監督式學習的問題中,是想探討如何由某些獨立變數來預測依變數之結果;而非監督式學習的問題中,並沒有依變數,是想描述所蒐集之變數間的關聯與型態。工業革命時期,工廠以大量生產的特性對經濟造成重大影響;隨著資訊時代來臨,已由技術導向轉為市場或消費導向。近年來,客戶關係管理(CRM, Customer Relationship Management)及一對一行銷成了相當熱門的話題。完善的客戶關係管理模式與機制,有助於提高客戶忠誠度,可以降低管理行銷成本,讓客戶購買更多產品,以提高收益。欲完成上述目的,必須從瞭解客戶著手。要瞭解客戶得先從資料的收集與有效率的組織管理開始;資料來源很多,例如:量販大賣場帳單記錄,信用卡消費記錄,申請表,電話記錄等等。將收集的資料進行有效率組織管理的步驟,須靠資料倉儲(Data Warehouse)。資料倉儲是資料庫技術的一個新主題,由於資訊科技之進步,利用電腦幫助我們有效率的操作、計算和儲存大量資料。對於資料的分析、瞭解,進而轉換為有用知識或訊息,就得靠資料採擷(Data Mining)。資料採擷可以說是從巨大的資料倉儲中找出有用資訊之過程與技術,主要的技術工具是由機械學習、人工智慧與統計等學門發展而來。一般來說,資料採擷包含下列功能:分類(Classification)、推估(Estimation)、預測(Prediction)、關聯分組或購物籃分析(Affinity Grouping or Market Basket Analysis)、同值分組(Clustering)與描述(Description)。 Science and industry always bring problem to challenge the field of statistics. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope of data. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. The job of statistician is to extract important patterns and trends, and understand “what the data says.” We call this “learning from data.” The learning problems can be roughly categorized as either supervised or unsupervised. In supervised learning problems, the goal is to predict the value of a dependent variable based on a number of independent variables; in unsupervised learning problems, there is no dependent variable, and the goal is to describe the associations and patterns among a set of collected variables. In Industrial Revolution age, factories make mass production and put great effect to economics. As information age coming, technology directed has been turned about market or consumer directed. Recently, CRM(Customer Relationship Management) and one-to-one marketing are popular issues. A good model of CRM can help improve the royalty of customers, low down the marketing fee, make customers to buy more and get more profit. It needs to understand the customers to complete the goal of CRM. Collecting data and efficiently organizing data are the first step to understand the customers. There are many sources of data, for example: shopping center transaction records, consumer records of credit card, application forms, telephone records, etc. Data Warehouse is the process to efficiently organize the collecting data. Data Warehouse is a new issue of database technology. For the advancement of information technology, we can use computer efficiently to operate, compute and store mass data. From analyzing and understanding data to useful knowledge and information, it should be based on Data Mining. Data Mining is the process and technology to find useful information from mass Data Warehouse. The majority technology tools were formed from machine learning, artificial intelligence and statistics. Data mining includes the following functions: Classification, Estimation, Prediction, Affinity Grouping or Market Basket Analysis, Clustering and Description. We will study what the role statistics is in the Data Mining. |