Big Data Changes the Process
The angle that makes this fascinating is that the information was there before, but it wasn’t accessible in a comprehensive way. The available data was the result of many siloed data gathering efforts, used for specific purposes over the past years. The role that big data analytics played was in bringing disparate information together in a comprehensive way. This meant that campaign workers could draw from information in sources such as volunteer-management programs, campaign finance and budgeting tools, voter-file interfaces and especially social media to gain a bigger picture of the voter’s profile and to guide the campaign’s success. As a result canvassers weren’t dispatched to knock on the doors of people who were already supporters of Obama, and if a donor had given the maximum contribution, instead of getting email for money they got an email asking them to volunteer.
The way that they did it was a lesson in the creative use of technology. Instead of hiring a consulting company, the Obama campaign pulled together a team of technologists who brought a range of skills and experience to get the job done. Using various programming languages and APIs they built a set of services that acted as an interface to a single shared data store for their applications. This made it possible to quickly develop new applications and to integrate existing ones into the system. As a result they were able to create a dashboard that provided access to a set of tools to use across all of their data sets and drive each step of the campaign process. The application included an analytics programs called Dreamcatcher that was developed to microtarget voters based on sentiments in social media text. Using this tool they could determine which way the vote was going and where to focus their resources. For more information on the technology see, Built to Win: Deep Inside Obama's Campaign Tech.
How Big Data is Different
While we have been hearing a lot about big data, you may be wondering what it is and how it is changing the IT infrastructure. Big data refers to the collection and subsequent analysis of any significantly large collection of data that may contain hidden insights or intelligence (user data, sensor data, machine data). When analyzed properly, big data can deliver new business insights, open new markets, and create competitive advantages. Big Data is different in that it extends beyond structured data and includes data of all varieties, such as text, audio, video, click streams, and log files. What has changed is that the volume is huge. Organizations are amassing terabytes and even petabytes of data. Much of this data is created in real time and is often streamed to the processing infrastructure to be analyzed to be used as soon as possible.
Big Data is changing the face of business intelligence and as a result the way that decisions are made. The capability to pull from many data sources and data types including real-time data from sources such as social media can shape marketing strategy on the fly. Numerous technological innovations are driving the dramatic increase in data and data gathering. This is why big data has become a recent area of strategic investment for IT organizations. The rise of mobile device users and the ability to gather user statistics including location data can provide powerful business intelligence. As much of this data is gathered in real time it provides a unique opportunity if it can be analyzed and acted upon quickly.
Driven by a combination of technology innovations, maturing open source software, commodity hardware, ubiquitous social networking, and pervasive mobile devices, the rise of big data has created an inflection point making real-time data collection and analysis mission critical for businesses today. However, given that the data and its structures are fundamentally different, it is increasingly evident that the infrastructure, tools, and architectures to support real-time analysis and insight from this data also must be different.
What Big Data Means for IT Infrastructure
To benefit from this new technology the way in which IT infrastructure is connected and distributed needs a fresh and critical look. Within the last 20 years, data center infrastructure has been designed to closely align data, applications, and end users to provide secure, high-performance access. These silos have become commonplace, and network administrators can safely assume that the biggest consumer of an application and its corresponding data is an intelligent endpoint that can provide dedicated resources for compilation, execution, and display. This infrastructure has often been referred to as a three-tier application architecture. The computing, storage, and networking to support this tiered architecture is largely optimized to deliver data and corresponding network traffic up and down the integrated stack to an end user and back to the database or storage (often referred to as north-south traffic).
During the last few years, this predominant traffic pattern has changed dramatically. Big data represents just the latest application environment to drive this architectural shift. As data becomes more horizontally scaled and distributed throughout network nodes, traffic between server and storage nodes has become significantly greater than between servers and end users. This machine-to-machine network traffic and data sharing is often referred to as east-west traffic. Building a data center optimized to provide high-speed connections optimized for east-west traffic is critical in developing scalable, high-performance big data implementations. Another unique characteristic of big data is that it is often made up of incremental data elements. It does not work well in traditional, online transaction processing (OLTP) data stores or with traditional SQL analysis tools. Big data requires a flat, horizontally scalable database, often with unique query tools that work in real time.
To meet these needs there are new forms of supporting databases emerging, some utilizing traditional SQL for queries (often called NewSQL), and some that have largely abandoned SQL for new queries libraries (often called NoSQL). Businesses looking to leverage their large SQL infrastructure have used sharding technology to break up their existing databases to create more scaled environments that can leverage big data tools. This has added complexity and created some distinct decision points for IT organizations in planning for their big data implementations.
Changes are Needed in the Network
The move to high-performance, real-time data analytics has a profound impact on many aspects of data center technologies and architectures. The biggest evolution in the networking industry in the last few years is the introduction of point-to-point switching fabric network solutions. Understanding the benefits of a data center fabric is critical in creating the highest performance and lowest latency connections between big data nodes. Since fabric architectures create a point-to-point connection between nodes this will significantly reduce latencies between nodes. As IT organizations begin to test and evolve Big Data solutions it is critical for network administrators to consider the impact of these technologies on their server, storage, networking, and operations infrastructure.
To learn more about big data and how a Juniper network can help see, Introduction to Big Data: Infrastructure and Networking Considerations.