PROJECT PROPOSAL
Moving from Structured to non-Structured Data
PROJECT ADVISOR
PROJECT TEAM:
ABSTRACT
The promise of data-driven decision-making is now being recognized broadly, and there is growing enthusiasm for the notion of ``Big Data.’’ While the promise of Big Data is real -- for example, it is estimated that Google alone contributed 54 billion dollars to the US economy in 2009 -- there is currently a wide gap between its potential and its realization.
Heterogeneity, scale, timeliness, complexity, and privacy problems with Big Data impede progress at all phases of the pipeline that can create value from data. The problems start right away during data acquisition, when the data tsunami requires us to make decisions, currently in an ad hoc manner, about what data to keep and what to discard, and how to store what we keep reliably with the right metadata. Much data today is not natively in structured format; for example, tweets and blogs are weakly structured pieces of text, while images and video are structured for storage and display, but not for semantic content and search: transforming such content into a structured format for later analysis is a major challenge. We will find out a way how structured big data can be transformed into unstructured data to increase the performance. Storage price trends have shown that now a days it’s not a big deal to afford storage for big un structured data. As far as performance is concerned, big data
Foster Provost and Tom Fawcett. Big Data, Data Science and its Relationship to Big Data and Data-Driven Decision making,” http://online.liebertpub.com/doi/full/10.1089/big.2013.1508
Big Data is an expansive phrase for data sets so called big, large or complex that they are very difficult to process using traditional data processing applications. Challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and information privacy. In common usage, the term big data has largely come to refer simply to the use of predictive analytics. Big data is a set of techniques and technologies that need or require new forms of integration to expose large invisible values from large datasets that are diverse, complex, and of a massive scale. When big data is effectively and efficiently captured, processed, and analyzed, companies
The emergence of big data has provided different avenues for organizations to use data to improve different aspects of their respective operations. Be it customer service, research and development, or market position, Big Data has the potential to be a significant driving force in all these areas. However, there’s still a significant gap between the ability of Big Data to produce insightful analytical information based on real-time data and the ability of organizations to capture and utilize this readily available tool. This is, in part, due to the fact that the systems and processes necessary to fully maximize the usefulness of Big Data is currently lacking in most organizations. This lack of a conducive habitat for Big Data is further magnified in new organizations without any knowledge of Big Data. For organizations that have that have little to no knowledge of Big Data, there must be a thorough assessment of the benefits of big data and how they could improve the organizations overall place in the market. There also needs to be steps taken towards the design of frameworks that will enable the organization to better capture and utilize Big Data.
Heterogeneity, scale, timeliness, complexity, and privacy problems with Big Data impede progress at all phases of the pipeline that can create value from data. The problems start right away during data acquisition, when the data tsunami requires us to make decisions, currently in an ad hoc manner, about what data to keep and what to discard, and how to store what we keep reliably with the right metadata. Much data today is not natively in structured format; for example, tweets and blogs are weakly structured pieces of text, while images and video are structured for storage and display, but not for semantic content and search: transforming such content into a structured format for later analysis is a major challenge. We will find out a way how structured big data can be transformed into unstructured data to increase the performance. Storage price trends have shown that now a days it’s not a big deal to afford storage for big un structured data. As far as performance is concerned, big data manageability in terms of unstructured data is more efficient. So as far as revenue is concerned this research will provide
Business thrive when they have the most accurate, up-to-date, and relevant information at their disposal. This information can be used for a plethora of pertinent markers in small and large businesses, relating to accounting, investments, consumer activity, and much more. Big data is a term used to describe the extremely large amounts of data that floods a business every day. For decades, big data has been a growing field, facing controversy on many levels, but as of late, it has been a major innovator in the challenge of making businesses more sustainable. Big data is often scrutinized for its over-generalization and inability to display meaningful results at times. When applied correctly, data analysis can bring earth-altering information to the table.
The term “Big Data” has been around for quite some time and has been catching everyone’s attention with remarkable speed. A plethora of questions do pop up in our mind sometimes. What is big data, is this something absolutely new, how can it be leveraged to create value for an organization and so on. For many years, companies have used various transactional records stored in relational databases to make competitive business decisions. But how long can we sustain or depend on these traditional methods of doing analysis and coming to a conclusion. There is an ocean of non-traditional, less structured data such as weblogs, social media, email, sensors, and photographs that can be mined not only for useful information but also to make strategic decisions.
Over the past several years, the term “Big Data” has been used more frequently to help
Big Data is all around us, from creating our anniversary videos on Facebook and giving Netflix recommendations to utilizing all cellphones as weather stations to improve predictions (source 4) and combining it with machine learning for interesting insights. (source 5) All this data needs to be processed, creating a multitude of jobs in the data science field. Companies with products relying heavily in internet are investing heavily in Big Data, companies such as Uber, Spotify, Google and IBM to name a
Improvements brought by big data include creating more transparency among stakeholders by making big data more accessible (Manyika, 2011). Having access to and being able to manipulate data sets enables a different way of decision making to bring more science into management. Companies can segment and analyze data in near real time. The analysis improves decision making, minimizes risks, and uncover more insights to enable new innovation.
Volume is often regarded as the primary attribute of big data. With that in mind, a large number of people define big data in terabytes—sometimes petabytes, but big data can also be quantified by counting records, transactions, tables, or files (Russom, 2011). Volume refers to the mass quantities of data that organizations are trying to harness to improve decision-making across the enterprise (Schroeck et al., 2012). The volumes of data have continued to increase at an unprecedented rate over the last couple of years. The sheer volume of data that is stored or available for storage today is exploding, it is expected that by the year 2020 40 zetabytes (ZB) of data will be stored (Zikopoulos et al. 2012) which
Structured data is an organised data structured into rows and columns. The purpose of this is to enable machines that apply limited logic, to be able to understand and process information. The most well-known example of storing structured data and to apply operation is SQL meanwhile unstructured data is the complete opposite of this. It is everything else that isn’t in a structured format. It was never meant to be understood by machines, as this type of information has been created specifically for the purpose of being understood by a human mind. Examples would include: emails, books, letters, social media posts, images, audio & video files, etc. The most common and expensive operation on structured data is cascading and a defined schema of tables that can’t allow
Data analytics has drastically changed how business operate on a day-by-day basis. It can make or break a business. It involves “exploring huge volumes of data to provide greater insight and intelligence, and doing so quickly.” (Efraim Turban. Linda Volonino. Gregory R. Wood., 2013) According to the Better Business Outcomes White Paper that was published by IBM, IBM observed that the planet was becoming more instrumented, interconnected, and intelligent about five years ago. Twenty thousand engagements later, though not it doesn’t say how many years later, IBM has gained critical knowledge of how big data analytics can improve conditions for organizations in nearly every industry. (Better business outcomes with IBM Big Data &
In 2013 the overall created and copied data volume in the world was 4.4 ZB and it is doubling in size every two years and, by 2020 the digital universe – the data we create and copy annually – will reach 44 ZB, or 44 trillion gigabytes [1]. Under the massive increase of global digital data, Big Data term is mainly used to describe large-scale datasets. Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making [2]. Volume of Big Data represents the magnitude of data while variety refers to the heterogeneity of the data. Computational advances create a chance to use various types of structured, semi-structured, and
If the Information Age began in the 1990s with the rise of digital technology, then we’ve now officially entered the Age of Big Data, wherein companies like Google, Facebook, IBM, Teradata, Oracle, and SAS have the capacity to gather a lifetime’s worth of data about customers and their behavior.
Nowadays, terabytes to petabytes of data that is been stored and transmitted by numerous sources and organizations have realized that these data contain tangible value that has the potential to change the fortunes of a business. Top firms leverage their business through the valuable insights gained through these data to assist them in their decision making process. The huge chucks of data consists structured, semi structured and unstructured data. Organizations have switched their focus more on exploring semi structured and unstructured data that is generated through social media activities, personal media information and geo location data.