Combining Structured and Unstructured Data
Research Proposal - Plan A
(11/11/ 2016)
Option: Information System
Proposed Completion Date: 11/11/2016
This proposal is submitted to the Computer and Information Science faculty in partial fulfillment for the degree
Master of Science in Computer and Information Science
TABLE OF CONTENTS
1. Introduction 1
1.1 Background RESEARCH 1
1.2 PROBLEM AREA 2
2. REsearch APPROACH 3
2.1 HYPOTHESIS 3
2.2 ANALYSIS APPROACH 3
3. EXPECTED RESEARCH ACCOMPLISHMENT 4
3.1 Evaluation plan of research approach 4
3.2 Significance of study 4
4. Schedule 5
REFERENCES 6
1. Introduction
In every moment of our lives, we handle a tremendous amount of large data, this large data will become a store of values if we could be turned it into searchable information involving analysis steps. The big challenge is that 90% of the large data are unstructured data which is growing faster than structured data, unstructured data as a data warehouse come without any predefined data structure and not appropriate with any relational database schema.
We used the expression " A huge database" when data is accumulated and increased rapidly, The a huge is a relative matter, but in general, a huge database is a data set that exceeds the size or capacity of conventional database tools to fetch, store, manage and analyze that data. Which means that the challenge is manifold when we try to analyze and extract a meaningful value from a huge unstructured database
The first step in finding a credible article is looking for credentials and qualifications of the authors. Each of the three authors of this article has the credentials and qualifications to write about this subject. The three researchers of this article are Zhengchuan Xu, Qing Hu, and Chenghong Zhang. Zhengchuan Xu has a Ph. D in computer software and theory. He is also an associate professor in the Department of Information Management and
“datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze,” the McKinsey researchers acknowledged that “this definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data.”
Swinburne Computer-Human Interaction Laboratory, School of Information Technology, Swinburne University of Technology, Australia, E-mail: oburmeister@it.swin.edu.au;
Big data is the present most-liked theme of today 's technology. These research goes through all description of techniques and technologies of extracting of the data, storing of data, distribution of data, analyzing of data, managing of data with high velocity and from the structured data and helps in the handling of the extreme data. Big data has the presentation the capacity to improve predictions, saving money and enhancing the decision making process in the fields of the traffic control, weather forecasting, disaster prevention, fraud control, business transaction, education system, health and the national security.
Big data is anything which is too large for traditional databases to handle. They range from Terabytes of data to petabytes of data. Big data is generated from various sources, such as social media networks, oil wells, mobile phone conversations, weather data etc.
Relational databases play a major role in making many apps and programs work. They provide an easy way to store large amounts of data in a consistent, non duplicating, and maintainable way to be used by developers for analytical or software use ("Advantages of a relational database", n.d.). However, more and more applications and companies with a tremendous amount of data such as search engines, social networks, and e-commerce sites have been requiring a level of speed and scalability that relational databases can not provide ("Why NoSQL?", n.d.). NoSQL is a name given to a quickly growing type of database known as non-relational databases, which are being used to store and manage huge amounts of structured, semi-structured, and non-structured data known as "Big Data" ("Why NoSQL?" n.d.). With the advent of social networks and apps with millions of users, the rate of growth of non-structured and semi-structured data is exponential, and the value in being able to quickly traverse it, analyze it, and use it for development is also growing quickly (McGuire, Manyika, & Chui, 2012).
Raw data used in generating this report mainly came from 2 sources: United States Department of Agriculture and Federal Reserve Bank of St. Louis Economic Data. All Data manipulated are national, time series data. The frequency of the dada is monthly. All Data were de-trended; the base year is 2007.
Unstructured data analytics is an essential part of any Big Data offering. Making sense of unstructured data is time consuming and complicated, yet the insights generated from these data are valuable and meaningful if proper techniques are utilized.
Over many ago relational databases reside most of the data but after the introduction of NoSQL database had changed this procedure. Most of the unstructured data had been sent to NoSQL database. Relational database systems, which showed good performance before the birth of internet and cloud computing era is now unable to control the heat of new technologies. To stabilize this situation new requirements were set to design by RDBMS. To meet these challenges they need highly scalable and unstructured data model with high performance; so they choose NoSQL database (Muhammad Mughees, 2013).
Due to the increase in new technology, business, communication, device, big scale of data was produced. About 90% data in today’s world was just created in last two years alone, without counting those data that has been created previously. The information retained in those data was a big risk to many organizations as the current technology was managing the data with traditional approach, which consisted of user, a centralized system and relational data base. This style had various drawbacks together along with two key problems: less storage capacity and slow data processing.
Firstly understanding what unstructured data is of primary importance before trying to handle it. In simple terms unstructured data can be understood as data that can’t be stored in the form of rows and columns. It can be anything including email files, text documents, presentations, image and video files.
Author: Sunil Sanka, Chittaranjan Hota, Muttukrishnan Rajarajan, Computer Science and Information Systems Group, Birla Institute of Technology and Science-Pilani
Data has always been analyzed within companies and used to help benefit the future of businesses. However, the evolution of how the data stored, combined, analyzed and used to predict the pattern and tendencies of consumers has evolved as technology has seen numerous advancements throughout the past century. In the 1900s databases began as “computer hard disks” and in 1965, after many other discoveries including voice recognition, “the US Government plans the world’s first data center to store 742 million tax returns and 175 million sets of fingerprints on magnetic tape.” The evolution of data and how it evolved into forming large databases continues in 1991 when the internet began to pop up and “digital storage became more cost effective than paper. And with the constant increase of the data supplied digitally, Hadoop was created in 2005 and from that point forward there was “14.7 Exabytes of new information are produced this year" and this number is rapidly increasing with a lot of mobile devices the people in our society have today (Marr). The evolution of the internet and then the expansion of the number of mobile devices society has access to today led data to evolve and companies now need large central Database management systems in order to run an efficient and a successful business.
Submitted in partial fulfillment of the requirements for the degree of Bachelor of Engineering in Computer Engineering
to user and new users also feel more comfortable with time B. Ease and Low Cost of Access