Must Know Big Data Terms for Champions

This glossary of key terms and definitions will give Champions the knowledge they need to be confident in their use of big data and the variety of online advertising practices that data enables.

Gloo Specific Terms

Big data is commonly referenced in terms of scientific inquiry or for marketing and sales. While these are familiar applications for data they can conjure images of big brother, mad scientists, or black hatted marketers focused on consumerism at any price.

Learn how big data works, understand how to buy the right software for your organization, and know the right questions to ask before you buy.

At Gloo we believe data has the power to do so much more than recommend our next Netflix binge. In the right hands—with the right controls in place—we believe data can be used to change people’s lives for the better.

Before we dig into our complete list of terms, it’s important for us to level set around some terms specific to Gloo.


Simply put, Champions are organizations and individuals that dedicate themselves to the mission of improving the lives of others.

Champion Organizations/Individual Champions:

  • Churches: pastors and small group leaders
  • Recovery centers: therapists and CEOs
  • Dentist offices: dentists and hygienists
  • Schools: teachers, counselors, and principals
  • Gyms: personal trainers
  • Therapy offices: therapists
  • Financial offices: financial advisors


A Growee is any person that a Champion or Champion organization serves. This includes people who have: at-risk marriages, substance use disorders, financial difficulties, depression, fitness goals, or another area of life they desire to grow.Blog_BigDataTerms_Champions-2


Co-serving is the idea that Growees will have a number of different Champions. For example, a Growee may take part in a small group at church, work with a therapist, and interact weekly with a personal trainer. Each of these Champions play an important role in the growth and development of each Growee.  


General Big Data Terms

Now that you know these Gloo specific terms, it will be easier to visualize how big data plays a part in supporting Champion activities like attracting, engaging, retaining, and deepening their relationships with Growees.

It’s not an exhaustive list, but it’s our pick of common terms to help you better understand the process and communicate it to your team, or others within your organization.

Data Management Platform (DMP)

A DMP is a technology tool for the collection, storage, management, and analysis of big data. DMPs are used primarily to gain insights about market segments. These insights are used to build segmented audiences for advertising or to inform other engagement and operational activities.

Big Data

Google defines big data as, “extremely large data sets that can be analyzed computationally to reveal patterns, trends, and associations, especially related to human behavior and interactions.”

It’s a broad term with no set definition about how large a data set must be to qualify as “big data,” but it’s generally data sets so large, disparate, or complex that they cannot be analyzed by a human alone or with basic software like Excel.


Because data can reveal sensitive information or even the identity of an individual, it’s important to have rules in place to ensure value, usability, integrity, and security, while protecting against risks, unethical use, and other pitfalls that can cause real harm.

These rules, laws, norms, and principles answer questions such as:

  • What uses for data will the platform support?
  • How will insights be delivered without compromising privacy?
  • What data sources are considered ethical?
  • How will platform users be allowed to speak or communicate?

When the right rules are in place, data is used ethically, privately and with respect to the rights of those who own that data. When the technology is designed and architected according to the right rules it enables it to be applied in a way that maximizes the benefit for all involved.

Data Sources

It’s important to know where data is sourced. For example, most healthcare data is limited to treatment episodes and basic 1st party demographics. Retailers often focus on 3rd party data—the information they’ve acquired from surveys or purchase histories.

The ability to add large consumer data sets to existing data is incredibly valuable because even the best algorithm in the world won’t work without a sufficient amount of diverse data to help you know your people in a holistic way.

1st Party Data

This is data that you or your organization owns. Think of it as any information that someone has volunteered to you by completing a form, taking a quiz or assessments, or making a purchase from you.

This could include information from:

  • Assessments
  • Intake Forms
  • A CRM 
  • A POS

2nd Party Data

This is essentially other people's 1st party data. Second party data is important because another organizations has taken time to ensure the data accuracy.  

3rd Party Data

Large chunks of shared or purchased anonymized data that can be used to enrich 1st party data with a previously unavailable number of data points that can be analyzed to provide deeper insight than 1st party data alone. 

This includes:

  • Consumer data
  • Surveys
  • Purchase data
  • Internet data
  • Internet of things
  • Geo-fencing


To make sense of large data sets, programmers create algorithms. Algorithms tell a computer how to identify patterns in data to find the few items out of millions that meet specific criteria.

Example: Imagine you have a list that includes the make, model, weight, and length for every vehicle ever manufactured. From this list you want to know which car has the largest weight-to-length ratio (weight of the car divided by the length). You could go through car-by-car and do the math yourself, but this would take a long time. Instead, it would make sense to write an algorithm. In your algorithm you might tell the computer to:

1. Do the division for the first car on the list.
2. Assume the first car has the largest weight-to-length ratio.
3. Divide for the second car and compare to the first.
4. If the second has a larger ratio, assume it is the biggest.
5. Repeat until finished.   

Algorithms can be static, insofar as the rules remain the same, or dynamic where machine learning algorithms improve based on previous results.

Learn how first, second, and third party data helps organizations identify important trends and patterns.

Based on the previous example, after your algorithm has analyzed just two cars it can begin to eliminate cars based on a range of weights and lengths that a car must fall in to even be considered. This reduces the total number of cars that need to be analyzed and cuts down on the computational power needed, and time required to do the analysis.





Powered by algorithms, analytics process, analyze, interpret, detect, and report on meaningful patterns in data.

Often visualized in a report or dashboard, analytics is what brings meaning to data and puts useful and actionable information into the hands of Champions.

Application Programming Interface (API)

An API could be compared to an electric socket. As long as you follow the rules (watts, amps, volts, and even the shape of the plug) you can plug any number of devices into the power grid.

When it comes to technology, an API contains the definitions for format, programming languages, communication, and other standards that allow separate applications to interface and share information.

Example: There are a lot of weather applications in the App Store. Some are designed for hikers, some contain the snow report, and others might have information about smog warnings for people with asthma. On the surface these applications might look very different, and the analytics surface only relevant information to that application. However, you might be surprised to learn these applications utilize and contribute to the same central database of information.

Without APIs every application would need to exist on its own and rely solely on the information it can collect. An API allows developers to take advantage of tools, data and other shared utilities so they can build customized applications without developing each individual component.


An example of first party data, assessments are used to benchmark and gather information about the knowledge, abilities, skills, beliefs, desires, or other data points about a Growee or group of Growees. Results are used to inform and customize strategies, programs, and tactics.

Behavioral Analytics

The process of analyzing online and offline behaviors like purchases, browsing history, social media interactions, car purchases, gym memberships, and all other available behavioral history to predict future actions.

Dashboards & Reports

Once data is analyzed it needs to be displayed in a useful way. This is usually done through reporting and dashboards that provide quick insights about patterns, metrics, or other indicators that may be useful to monitor on a continuous basis.

The best dashboards and reports reveal insights that increase a Champion’s ability to know the people they serve and tailor their approach.

Growth Outcome Metrics

  • Growth journey outcomes
  • Most effective content

Audience Insights

  • Relational information
  • Attitudes
  • Motivations
  • Geo-locations
  • Risk indicators
  • New markets

Data Exchange

This occurs when data from one source is converted into a new format so it can be displayed in another instance. This is important because it enables people who have similar data sets, that may not be organized in the same schema, to share information. This is an important feature of any DMP because it expands the available insights beyond a Champion’s core data set.

Data Exhaust

Data exhaust is the data trail that is generated by any activity that takes place over the internet or is eventually stored online. Examples include: purchases, online assessments, offline assessments, browsing information, and even location information that is shared from a phone.  

Data Hygiene

The processes by which data is “cleaned.” Clean data lacks duplicate records, multiple records for the same person (John, Johnny, Jack), stale or outdated data, or other errors that may arise when large sets of data are generated or sourced across disparate systems with different protocols.

Data Lake

In a data lake 1st, 2nd and 3rd party data is stored in an unstructured format until it's needed for reporting or other applications.

Data Model

When analytics are combined with behaviors and attitudes, models can be developed that illustrate a variety of growth and well-being related predictors. These models are often driven by machine-learning that improves over time. That means the more a model is used and updated, the smarter and more helpful it becomes.

Models help Champions discover the unique needs and propensities in their market so they can reach out to people who are most likely to benefit from their services.


The creation of a geographic boundary using digital tools. Examples can be WIFI, RFID, GPS, or cell phone data. Data is collected in an anonymized way from the people who enter these boundaries and can be used for any number of reasons, including:

  • Attendance tracking
  • Audience analytics
  • Retargeting

Internet of Things

The internet of things, frequently abbreviated as IoT is the collection of internet enabled devices that share data. Examples include:

  • Fitness trackers
  • Phones
  • Printers
  • Cell towers
  • Automobiles



Enabled by the API and governance, interoperability is the ability for computer systems to share information and functionality. Interoperability enables a number of key functions for Champions, including:

  • The collection of data from a variety of sources
  • Help to co-serve Growees across ecosystems 
  • To maximize existing investments

Personally Identifiable Information (PII)

This is data that contains information that can be used to identify an individual. This includes phone numbers, emails, names, birthdays, and addresses. The more pieces of information available the easier it is to identify an individual. The ethical use of data requires information to be de-identified, encoded, or hashed, and securely stored.


A statistical analysis about the likelihood or tendency for someone or something to behave in a particular way. It’s important to understand that this does not mean a person will behave in a predicted way, just that they have a certain statistical likelihood of doing so based on similar behaviors and profiles from other people or past behavior.




General Big Data Marketing Vocabulary

Now that you know the general vocabulary of Big Data, we’ll dive into some of the key marketing and engagement vocabulary that’s made possible thanks to big data and data management platforms.

Audience Targeting

The process of selecting certain groups or segments of people based on key data points to send direct advertisements to. When Champions use targeting they are able to focus on only the specific individuals for whom their message is best suited. This greatly enhances the ability for Champions to put the right message in front of the right person at the right time and has a profound impact on ROI and efficacy.

Interest Based Audience

Interest Based Audiences are part of the core Facebook Audience Builder. Like the name implies Interest Based Audiences are created by selecting any of the named interests on Facebook.

This is a useful way to build audiences, because you can use this information to customize your advertisement to the audience. For instance, an addiction recovery center that primarily serves young men who participate in high intensity behaviors and with a strong cultural connection, may build an audience that looks like this:

  • Male
  • Age 18–30
  • Extreme sports
  • Electronic music

Even these basic pieces of information can provide a lot of direction for marketers as they attempt to reach new people on Facebook and other digital advertising platforms.

Custom Audience

A Custom Audience takes things a step further. Facebook and many ad platforms reserve the term “Custom Audience” for targeting people you’re already connected to. This might include current customers, site visitors, and others. These audiences are a great way to get in front of, and deepen your relationship with current contacts.

Lookalike Audience

A Lookalike Audience is exactly what the name implies. These are audiences that literally look like other segments. That means they share previously unrecognized data points.

For example, a marketer might create a video to display only to people in an Interest Based Audience. Then as people engage with the video Lookalike Audiences can be built. The marketer might instruct Facebook to create an audience based on people who have:

  • Viewed the video
  • Clicked the link in the ad
  • Purchased the advertised item  

The creation of Lookalike Audiences is a marketing best practice because they are created based on your best, or most engaged customers and targets. This means you are creating audiences based on behaviors rather than just best guesses.

Demand Side Platform

A demand side platform (DSP) allows people who purchase advertising media inventory to manage a number of accounts or campaigns in one place. Advertisers are able to optimize their ads based on key performance indicators that are surfaced and reported in the DSP.

Programmatic Media

At the most basic level, Programmatic Media or Marketing is the process of displaying an advertisement across a variety of online properties to a predefined audience. The cool part about programmatic media is that your ads will only appear in front of the exact people you want. This way you only pay to advertise to people who are most suited to see your advertisement. That means the ad is written in a way that resonates with them, based on their preferences for communication and place in their growth journey.

Recommendation Engine

By analyzing other users, past behaviors and more, recommendation engines make predictions about how a user would score, or rate a suggested piece of content, a match to another user, or a product. Famously, Amazon’s recommendation engine is responsible for more than $50B revenue annually. More than ⅓ of Amazon’s total revenue.


This is the process of using cookies to identify site visitors and display advertisements to those visitors across the web. This ensures that site traffic is maximized and visitors are reminded about your offerings.


Knowledge is power

More than 2.5 Quintillion bytes of data are generated and collected daily. That’s more than 8 million MacBook Pros worth of data—every single day. Data will continue to play an ever increasing role in our lives and it will be the organizations and individuals who are able to harness the power of data that will experience the most success.  Download our comprehensive guide to buying big data software to empower your data strategy.