Chapter 1 Exercises
1. What is data mining? In your answer, address the following:
Data mining refers to the process or method that extracts or \mines" interesting knowledge or patterns from large amounts of data.
(a) Is it another hype?
Data mining is not another hype. Instead, the need for data mining has arisen due to the wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge. Thus, data mining can be viewed as the result of the natural evolution of information technology.
(b) Is it a simple transformation or application of technology developed from databases, statistics, machine learning, and pattern recognition?
No. Data mining is more than a simple transformation
…show more content…
The resulting description could be a general comparative profile of the students such as 75% of the students with high GPA's are fourth-year computing science students while 65% of the students with low GPA's are not. * Association is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. For example, a data mining system may find association rules like major(X; “computing science”) → owns(X; “personal computer”) [support = 12%; confidence = 98%] where X is a variable representing a student. The rule indicates that of the students under study, 12% (support) major in computing science and own a personal computer. There is a 98% probability (confidence, or certainty) that a student in this group owns a personal computer. * Classification differs from prediction in that the former constructs a set of models (or functions) that describe and distinguish data classes or concepts, whereas the latter builds a model to predict some missing or unavailable, and often numerical, data values. Their similarity is that they are both tools for prediction: Classification is used for predicting the class label of data objects and prediction is typically used for predicting missing numerical data values. * Clustering analyzes data objects without consulting a known class label. The objects are clustered or grouped based on the principle of
To begin with, Dell software an information technology enterprises describes Data Mining as “an analytic process designed to explore data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the
known as data classification. Data classification; how the map is divided according to data in
Data mining is essentially the ability to discover new information by exploring through various databases of existing information. According to Laura and Jack Cook, data mining "facilitates the discovery of previously unknown relationships among the data. …These operations present results that users already intuitively knew existed in the database."[2] As an example, let us take a school system consisting of three databases: one which stores the student profiles consisting of name and identification number, another to store student grades based on identification number, and the last one stores all the transactions at the bookstore through the student identification card. This is a simple example, but it should illustrate our point. Alone, the separate databases might not tell us much. With data mining techniques, the process might be able to tell us that in a particular school year, students of a certain ethnic background obtained above a 3.0 GPA, or that the bookstore sold mostly engineering books to students last year, or even that students who obtained above a 3.0 GPA were ones who bought engineering books. More specifically, the technology might be smart enough to associate that John Doe from Ireland had a 3.32 GPA in his engineering classes, even though he did not buy any engineering books from the bookstore. This type of technology is very powerful source of
Data Mining. It is the process of discovering interesting knowledge that are gathered and significant structures from large amounts of data stored in data warehouse or other information storage.
Data mining is another concept closely associated with large databases such as clinical data repositories and data warehouses. However data mining like several other IT concepts means different things to different people. Health care application vendors may use the term data mining when referring to the user interface of the data warehouse or data repository. They may refer to the ability to drill down into data as data mining for example. However more precisely used data mining refers to a sophisticated analysis tool that automatically dis covers patterns among data in a data store. Data mining is an advanced form of decision support. Unlike passive query tools the data mining analysis tool does not require the user to pose individual specific questions to the database. Instead this tool is programmed to look for and extract patterns, trends and rules. True data mining is currently used in the business community for market ing and predictive analysis (Stair & Reynolds, 2012). This analytical data mining is however not currently widespread in the health care community.
Data Mining is an analytical process that primarily involves searching through vast amounts of data to spot useful, but initially undiscovered, patterns. The data mining process typically involves three major stepsexploration, model building and validation and finally, deployment.
A series of operations on data by a computer in order to retrieve or transform or classify
Data mining uses computer-based technology to evaluate data in a database and identify different trends. Effective data mining helps researchers predict economic trends and pinpoint sales prospects. Data mining is stored in data warehouses, which are sophisticated customer databases that allow managers to combine data from several different organization functions.
What is data mining? Data mining is the deriving new information from massive amounts of data in databases (Sauter, 2014, p. 148). Chowdhurry argues that data mining is part of KDD. KDD is knowledge discovery in databases, it is a process that includes data mining. In addition to data mining, KDD includes data preparation, modeling and evaluation of KDD. KDD is at the heart of this research field. This research field is multidisciplinary and includes data visualization, machine learning, database technology, expert systems and statistics. Overall, the use of a case based reasoning and data mining tools within an information system would create a CBR system to solve new problems with adapted solutions and could be used in many industries such as education and healthcare (Chowdhurry,
Data mining is a class of database applications that looks for hidden patterns in a group of data that can be
18. _____ is the process of extraction of hidden predictive information from large data bases.
With the increased and widespread use of technologies, interest in data mining has increased rapidly. Companies are now utilized data mining techniques to exam their database looking for trends, relationships, and outcomes to enhance their overall operations and discover new patterns that may allow them to better serve their customers. Data mining provides numerous benefits to businesses, government, society as well as individual persons. However, like many technologies, there are negative things that caused by data mining such as invasion of privacy right. This paper tries to explore the advantages as well as the disadvantages of data mining. In addition, the ethical and global issues regarding the use of data mining
Data mining is really just the next step in the process of analyzing data. Instead of getting queries on standard or user-specified relationships, data mining goes a step farther by finding meaningful relationships in data. Relationships that were thought to have not existed, or ones that give a more insightful view of the