Securely Applying the Proper Data to Training a Deep Neural Net
Lucd’s Unified Data Space ensures integrity of an AI Training Pipeline
by: Mark Stadtmueller, VP Product Strategy
Google’s (now Apple’s) John Giannandrea is right: “The real safety question, if you want to call it that, is that if we give these systems biased data, they will be biased” Link.
At Lucd, our AI Platform addresses the challenge of bias through an integral Unified Data Space with role-based access control to data and cell level granularity as the data feed for an AI training pipeline.
The Challenge of AI Training Data
One of the biggest challenges associated with businesses leveraging AI is data. AI starts with data. A system powered by AI needs to be trained on lots and lots of data before being able to be put to positive business use. “Generating value from AI is more complex than simply making or buying AI for a business process. Training AI algorithms involves a variety of skills, including understanding how to build algorithms, how to collect and integrate the relevant data for training purposes, and how to supervise the training of the algorithm.” Link.
But, leveraging that large data set to train a deep neural network can create unforeseen bias if there is bias in the training set. Google does a good job of explaining how this interaction, latent, or selection bias can be introduced from data in this YouTube video.
However as explained in the introduction, some data can be leveraged by a business to create a system with bias that is positive and that same data can be leveraged by a business to create a system with bias that is extremely negative. For instance, a restaurant chain might want to optimize menu selections, background music, and offers targeted specifically to senior citizens and so age and demographic information would be very important in systems leveraging AI to provide better services. However, that same restaurant chain would not want to leverage that same age and demographic data for a hiring system powered by AI. However, creating a specific data store for each and every potential AI powered application would be cost prohibitive and counterproductive, creating the same type systems constrained by limited data that existed before AI.
The Lucd Unified Data Space provides the secure Role Based Access Control and cell level security to Unleash AI in business.
The Lucd Unified Data Space (UDS) balances efficiency, flexibility and readability when storing data objects. Data access is efficient because a single access is all that is required to retrieve any object. Decoding object values and metadata information is simple and straightforward. The data space is flexible since there are no restrictions on what data can be stored. And it is readable because data is stored in text form wherever possible. Only truly binary data is stored in binary form. This allows developers, analysts and administrators to browse and even insert data into the data space using existing command-line tools.
In the Unified Data Space, one row represents one data object. An object may represent a data file, a database row or any other entity composed of attributes. The attribute is the basic unit of storage in UDS. Rather than storing an entire row as a single element, individual attributes are stored as cells. Cells are stored as key-value pairs, where the key and value are comprised of the following elements:
The Visibility mark contains the rules governing which users have access to each attribute. Boolean operators are used to specify combinations of access controls. So, in our restaurant example, the marketing team would be allowed to see the date of birth cell but the hiring team would not.
Putting data into TensorFlow from the Unified Data Space
When the source data is pulled from the Lucd Unified Data Space and put into and AI framework like TensorFlow, the cell level visibility is enforced and the TensorFlow AI model can only be trained on data that it sees. The following figure illustrates such a workflow.
We all know intuitively that businesses will be able to provide us better products and services if they leveraged data better to understand our wants, needs, and requirements. AI offers the promise that businesses can provide these better products and services by learning from data. But, the data businesses learn from cannon introduce bias. The Lucd AI Platform turns data into positive AI outcomes business, stimulating better decisions and innovation that lead to better products and services. Leveraging the Lucd Unified Data Space prevents misuse of data and unwanted bias creeping into AI models used to provide those better products and services.