Oceans’ Data Part 2: Building the Greco Player Tracker with MapR

Blog Post created by onelson on Jul 21, 2016

In Part 1, we estimated the data necessary to build a Greco Player Tracker (GPT), the AI designed to catch cheaters at the Bank Hotel Casino in the 2007 film Ocean's Thirteen. Now we'll take a closer look at actually building the GPT using the MapR Converged Data Platform.


Storing the Data

With 5,800 players in our casino, biometric data from wristbands and statistics from games generate a few hundred gigabytes of data daily. Our security cameras generate an additional seven petabytes daily. Combined with metadata and other incidentals, we're looking at eight petabytes a day, which we'll need to keep for 30 days for audit purposes, bringing our storage needs to 240 petabytes if we do not compress the video data. The MapR Platform replicates data three times by default, and requires additional space for the operating system and ecosystem components. Therefore, we'll need a one exabyte cluster to run the GPT comfortably.


(Note that a single exabyte is a lot more efficient than the “field of exabytes” estimated by the movie).


What are we listening for?

We don't want to process all the data all the time – that's a waste of our resources. If statistical outliers are being dealt on our blackjack tables, but they favor the house, we don’t care (since we’re making money). Likewise, we do not want to waste resources on petty crime. Many casinos let small-time cheaters get away with it. Who cares if someone cheats and wins $20, when millions are flowing through the casino every day?


Our GPT program will look something like this:


  if (prize > (1000 && winner != house)) {

    if (isCheater(measure.biometrics(winner)) == TRUE) {





In other words: if the prize value is greater than $1,000 and the winner is not the house, then we’ll start our application. We’ll measure biometrics on the winner, including their heart rate, temperature, and pupil dilation. We’ll pass this information into another function, called isCheater. If the results determine the winner is a cheater, then we’ll issue an alert to our security personnel.


How would you write such an application? Take DEV 360 – Apache Spark Essentials to learn to build powerful analytics applications and share your ideas in MapR Academy forums!


Building the Cluster

We want to distribute our processing and storage across multiple nodes and multiple data centers. Since the MapR Platform is designed to distribute processing, we do not need to worry too much about the CPU of individual nodes. A cluster of many 2 or 3 GHz multi-core processors will be able to stream 340GBPS of video data and occasionally perform facial recognition on it.


According to the MapR End User License Agreement, the maximum-sized node in a cluster can contain up to 4 CPUs totalling 32 processing cores, 24 hard drives containing a total of 50 terabytes, and 196 GB of RAM. Two hundred such nodes would be able to store an exabyte of data. We also want to mirror our cluster to at least two off-site locations, to protect our data. Therefore, we want to build six hundred nodes, each with 50 TB for storage and 196 GB of RAM for processing.


You can learn more about building a cluster and protecting your data with mirrors in ADM 200 – Cluster Administration.



Now we have a good idea of how the Bank Hotel Casino can build the Greco Player Tracker with MapR. In the next blog, we’ll see what it takes for Danny and Rusty to break the system.