Only for puja app development
Big Data
(This paper has some minor issues with the references at the end but is otherwise good)
Introduction
Information is one of the most important resources that companies have available to them; this information allows decisions to be made to determine what the company is going to do for the next day, the next month, and the next year. The core component of this important resource is data, and with a little data, companies can have a little bit information to plan future operations. That same company with large amounts of data, or big data as it is known, can much more accurately find trends, become more efficient, increase productivity, and in turn be more profitable. What separates data from big data, what defining characteristics does it have, how can such a massive resource be fully utilized, and why should businesses, especially smaller businesses, even bother with such an undertaking.
To understand what big data is first one must start at what came before this big data revolution that some big companies are just now at the cusp of. Before the advent of big data when companies gathered data, first it was fairly cost prohibitive due to issue with storage of larger amounts of data and since computers processing power was not equal to what most businesses are working with today what those companies were trying to accomplish could end up taking larger or not being possible by the equipment or techniques being used. Since the first reason has become less burdensome for companies it has become easier to collect larger amounts of data and store larger amounts of data, which has allowed some companies to use old data for things outside the original intended purpose. When a business collects data it normally is towards a goal or trying to gain an understanding but after the meaning from the data gathered had been extracted not much else would be done with the data and typically thrown away. With it no longer being as cost prohibitive companies like Google were able to reuse old data for other purposes and glean additional insight beyond what the initial set of data had revealed. This is the idea behind big data and what companies hope to gain is more information beyond the explicit information within very large sets of data.
Key information
How is data any different than big data; at what point does the size of this raw information change how it’s labeled. Actually this is misleading because it is not just the size of the data, but three defining characteristics that help to identify what big data is. According to the web site Gartner.com (Laney, 2001), the focus area of data management were related to volume, variety, and velocity. Volume specifies the actual size of the data being stored, and as such since overtime data storage has become more efficient the for where big data starts is something that has changed with better technology.
Even with all of the advances in storage architecture and data compression the amount of data available continues to grow faster for individual companies to manage on their own, in fact Cisco has estimated that by the end of 2015 there will be 4.8 Zetabytes of internet traffic throughout the world, ("Cisco global cloud," ), and by the end of 2020 this number will nearly be 50 Zetabytes. As such the common terms for categorizing data that many people are used to, like Gigabytes, terabytes, and for some even petabytes, are becoming insufficient to define such enormous amount of potential information that will be available, though big data can still be small.
Large amounts of small pieces of information can also be considered big data, if there are enough smaller bits to add up. Using Twitter as a prime example, people who use this site can send out a tweet, which is basically a text message with a character limit of one hundred and forty characters,
which is broadcasted out to all of their followers or even to the general public if that was the intention. Now this piece of information still falls under big data because of the sheer numbers of people doing this same thing all day, every day, but besides just having volume based on usage, it is also part of the second characteristic of big data which has to do with variety of data. The variety of data is the various formats that the data can come in, normally in business data is highly structured information contained in charts, listed out numerically, or by some other highly structured means, but with the rise in popularity of social media sites and the very unstructured format that content is in, big data has to be able to process, handle, and make sense or what people are tweeting or what their status on Facebook might be and incorporate that into predictable patterns of behavior. The third definition of big data is the velocity of the data, which not only has to do with the speed of the data but also the frequency with which the data is updated, moving towards much faster gathering of data but also analyzing and making use of that final information in a rapid fashion.
The three characteristics are volume, the size of the information which is moving from terabytes to zetabytes, Variety which is how the content is structured and is moving to more of an unstructured emphasis, and velocity which is how fast that information goes in and comes out, batch versus streaming. With these 3 ways that data is evolving the method which the data is handled must evolve as well. The structure of having the data within a relative database is no longer an option that will be able to keep up with the characteristics that define big data, so with that there has to be a new approach to how this information will be handled.
With that better understanding of what makes up big data how companies use this vast and ever growing resource to improve business. Depending on the size and scope of the current business it is possible for a company to manage its own big data using various programs available in the market place, currently one of the most widely utilized one is a product known as Hadoop that is made by Apache. Hadoop is used by many large companies such as Amazon.com, Google, and Facebook.
But what about for smaller companies that maintaining large inventories of equipment to run this software on is just not feasible, there are several services available from places like Amazon.com, IBM has similar offerings, and plenty of online market places where a smaller company that does not have the capital to invest into all the equipment and the specialists to run and maintain such systems yet are able to still take advantage of some of the benefits of big data.
So why should a company take the step toward big data, after all it makes sense for large establishments that already have large amounts of capital in computers used for various reason but not necessarily for businesses like hospitals who are more focused on taking care of patients. Making use of big data is very important for many different fields of work, not just in medicine, but in agriculture where it can be used to make the most accurate predictions on things like the health of the crop and patterns in weather. Weather models use quite a lot of big data techniques already to try and develop the most comprehensive forecast that is available for the consumer. One nice aspect of big data is that experts in given fields will no longer be needed as much because that same way of thinking can eventually be applied to a computer("The big data," 2013).
Even with all of this hype and very evident indication that this is a huge movement in computers that is just taking off, would there be any reason why a company shouldn’t even bother with it for the time being. The answer to this is of course; sometimes depending on how large the current organization are there just are not the resources available to dedicate to do anything even if all that money was being spent to keep is on the services through one of the hosted sites. It is decision that must not be made
lightly but the company should make sure they are aware of all options and just beneficial have a big data asset could be for different kinds of business.
There are realms that big data is being processed and analyzed outside of the business front and that is within the scientific community. Certain projects related to science generate very large amounts of data very quickly, that need to be recorded and then studied to find all pieces of relevant information that can be acquired in the research. No project has had as much attention focused on it as the Large Hadron Collider, located in CERN Switzerland. The wonderful aspect of the ongoing projects being performed here is the media attention from many scientists to non-scientists alike, and whether people were aware of what was going on or not big data is very much involved. An interesting aspect of the information gathered at CERN is how many sensors they have available and how much data could be available to be gathered if the scientists wanted it. The LHC has one hundred and fifty million sensors that deliver data forty million times per second which could deliver almost 500 Exabyte of data per day (LHC Brochure, 2013), which is a much larger amount than the world currently produces on a daily bases.
Another project within the realm of science that uses large subsets of data and is trying to break those down for the advancement of research is a project known as Folding@Home. Started at Stanford University, the idea behind folding at home is based on research of protein structures and unfolding those through mathematical number crunching, this goal is achieved by using volunteers to donate the processing power of their home computers and keep these machines running and then sending back this segmented but completed data back once finished and starting a new string that automatically downloads and starts going through the cycle again. This is an example of a University taking a very nontraditional approach to tackling a very large set of data and making a game of it among computer enthusiasts to help contribute toward research that may not have otherwise been possible.
Another scientific machine that used big data is the IBM computer Watson. Normally computers or robots are created to accomplish very specialized tasks, sometimes they need an operator and other times they can be automated to function in limited capacity on their own. Watson was created to accomplish a very specific task, win Jeopardy, which an earlier version of Watson called deep blue had been designed for the sole function of playing a game of chess. Well a very key difference between Jeopardy and chess is that in chess the data is very structured, the squares all have spaces that are labeled, the pieces are all labeled, and pieces all have their own individual value. Jeopardy on the other hand is highly unstructured, the answer is asked and the correct question must be given to earn any points, and when using spoken language there could be things that may have more than one meaning depending on how the answer was phased. So in the end Watson was able to be made with some of the technology that makes processing big data possible and play an amazing game of jeopardy and win for IBM.
Personal observations
“Volume is Big Data's greatest challenge and as well as its greatest opportunity.” ((Barnatt, 2012))
I believe this a very powerful statement that sums up one of the biggest huddles that is still yet to be overcome within this field. What I personally gather from this statement is that the sheer volume of big data that we have now and will have in the future is something collectively we will have to work together to have the ability to house all of the data that we need. Like discussed in the key information, the amount of data we have now is not the concern, the issue is coming when we have nearly 5
Zetabytes of data in just a few short years and take that and jump to almost ten times that number five years later. So this is a huge hurdle to overcome but if we can, the companies that can collect these very large amounts of information and use the information competitively will have such a huge advantage. My main reason for believing this this is that I do believe data to be a resource that a lot of consumer spend these days in place of money, which is fine for most companies because this allows the collection of this very precious data and if a company was able to truly get this whole process going very early on it might give them such a competitive advantage that we may not see much in the way of close competitors for a while.
One of the pitfalls I think may be an issue is the idea of getting too wrapped up in the data and missing things that are occurring within the organization. This is something I have seen from time to time within my own place of employment, you spend much of your time looking at the number, running what if analysis on the metrics in place against new purposed metrics to increase productivity. Of course when allowed to crunch those number you can come to what appears to be a perfect solution, that would appear to have a minimal impact on the frontline agents who those metrics are graded and appear to of increased some goal, but when a change like this went through what happens is a breakdown on the individual level, people now have to devote more time to the various metrics that have changed, even the ones who had done fine in the past. Maybe this is because we are not able to look at all of the data available and it is possible our techniques will get better for rolling out mass changes to frontline employees, but I guess a serious oversight at least for most companies in the market place today is that change is based on averages but not considered at the individual level which is where if it can be done I hope big data is able to make a large impact on the work environment, because I do not believe any company likes high amounts of attrition.
The last observation I have is the market dominance Hadoop appears to have within the big data market, I mean not only are the big companies using Hadoop, but the companies that are offering their own cloud based services are using Hadoop as the platform for which those services will be run on. From Amazon’s cloud service, to IBM and others out there in the retail world but all appear to be built off the same platform which on one hand I like because the software is open source which is something I truly admire about companies that can make something like this and turn it over for free but on the other hand the idea is a bit disconcerting because this does not create an arena for any kind of competition. I enjoy the competition that a rival company causes; this to me leads to innovation which means more cool stuff and then more innovation again.
Summary
Obviously the world of big data is very large and often times more than a little confusing, there are many applications that I believe will be enhanced by the adoption and use in a much larger scale. These improvements will not only be for area within business either like we discussed by within the scientific community and other groups that will need to look at the ever increasing amount of data that is going to available in the very near future. Some of these numbers mentioned were not even number I was aware we would be working with anytime soon.
For additional information on big data and topic suggestions I would say some of the aspects of the Hadoop, the framework for how it processes its information will not only be very interesting but also give some insight into a topic that may need a lot more talented individual to fill out the ranks.
References
Laney, D. (2001, February 06). [Web log message]. Retrieved from http://blogs.gartner.com/doug- laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and- Variety.pdf
Cisco global cloud index: forecast and methodology, 2011–2016 . (n.d.). Retrieved from http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns1175/Cloud_Index_Whi te_Paper.pdf
(2013). The big data revolution. (2013). [Web Video]. Retrieved from http://www.youtube.com/watch?v=5ZyQ04zzyoE
Barnatt, C. (2012, october 09). Big data. Retrieved from http://www.explainingcomputers.com/big_data.html
"LHC Brochure, English version. A presentation of the largest and the most powerful particle accelerator in the world, the Large Hadron Collider (LHC), which started up in 2008. Its role, characteristics, technologies, etc. are explained for the general public.". CERN-Brochure-2010-006-Eng. LHC Brochure, English version. CERN. Retrieved 20 January 2013.