There are many different methods that can be used to investigate the data we collect. These techniques include data visualization, predictive modeling, market basket analysis, time series analysis, and text analysis. Data mining is a process as well as a collection of techniques and is one step in “Knowledge Discovery in Databases”.
Knowledge Discovery in Databases (KDD) process is commonly defined with the stages:
- Data Mining
Data mining, still something of a buzzword, is the analysis step in “Knowledge Discovery in Databases” process, or KDD and is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
Data mining deals with heterogeneous data, sometimes with a complex internal structure such as multimedia; including images, video, and text. Because most data collected in healthcare related to the routine of patient treatment consists of heterogeneous populations, the techniques of data mining are ideal to use with the various healthcare datasets.
“We see an emerging opportunity for Data as a Service (DaaS), which is not limited to Big Data (e.g. Unstructured Data), but rather including Structured Data that is from public, private, and hybrid sources. Some of the data will be acquired over Application Programmer Interfaces (API), which may be telephony, enterprise, and/or a “mash-up” hybrid of both.”
Data mining and statistics have generally developed in different domains. Statisticians are primarily interested in inference; data miners in exploratory data analysis. Nevertheless, there are some instances where data mining and statistics have blended. Many statisticians remain dubious about the data mining process. Others are concerned with the lack of a theoretical framework similar to the one for inferential statistics, especially since data mining tends to be algorithmic-based.
Statistics and data mining differ in the use of machine learning methods, the volume of data, and the role of computational complexity.
Most of the focus to date has been on the importance of optimizing data management and analytics. There has been very little analysis about challenges and opportunities for capturing Big Data. For that matter, there is very little discussion about capturing, aggregating, managing, mediating, brokering, etc.
We see an emerging opportunity for Data as a Service (DaaS), which is not limited to Big Data (e.g. Unstructured Data), but rather including Structured Data that is from public, private, and hybrid sources. Some of the data will be acquired over Application Programmer Interfaces (API), which may be telephony, enterprise, and/or a “mash-up” hybrid of both.
As a result of an evolving DaaS ecosystem, there will be many emerging applications to be developed within enterprise for their own internal needs, their customers (CRM), and completely new product offerings.
For more information about currently available related research in this area, see: