Data Generation
We have invented and filed a patent application on a method for using a Bayesian Network (BN) to generate supervised training data for an Artificial Neural Network (ANN). The data will consist of a number of training sets each comprised of a vector of instantiation values for a specific subset of BN variables that correspond to the input nodes of the ANN, together with a vector of calculated posterior probability distributions for another subset of BN variables that correspond to the output nodes of the ANN. Since a BN may be capable of producing more training sets than a training process can handle, the method selects only a representative subset.
For example, in a BN representing a portion of human disease developed by the inventors, the disease Anthrax affected the values of 202 possible symptoms and test results, yielding approximately 1074 possible training sets. Interestingly, it is estimated that the number of atoms in the “knowable universe” is about 1080.
After a period of beta testing, we plan to offer the generation of training data as a service. Potential customers will have processes that can be modeled as Bayesian networks and for which rapid classification is important; e.g., detecting real-time failure events that require a rapid response. Other customers who have constructed Bayesian networks that are too large to evaluate using commercially available software, will be able to have their networks used to produce training data for an ANN. N.b., we are not currently offering our Bayesian inference engine for separate use.
Supported file formats
xdsl (BayesFusion)
Supported node types
- cpt (Conditional Probability Table)
- deterministic
- noisymax
- noisyadder
Supported node keywords
- id
- diagtype = target or observation
- state
- parents
- probabilities
- resultingstates
dne (Norsys)
Supported network types
DNET-1
Supported node keywords
- kind = NATURE
- discrete = TRUE
- chance = CHANCE or DETERMIN
- states
- parents
- functable
- probs
- equation (only NoisyMaxTableDist)
Nodes identified as a target correspond to output nodes of a trained ANN while nodes identified as an observation correspond to input nodes of a trained ANN. For a xdsl file, the keywords target and observation provide this identification. For a dne file, NodeSet specifications must be appended to the file to identify nodes of each type. For example:
NodeSet target {Nodes = (Tuberculosis, Cancer, Bronchitis);};
NodeSet observation {Nodes = (XRay, Dyspnea);};
A Bayesian Network should be causal, with causality running from targets – directly or indirectly – to observations. For a medical example, from diseases to symptoms (e.g., in the .dne example above, from Bronchitis to Dyspnea).
Beta Program
The Beta Program will allow a selected set of potential customers (beta testers) to use the system for free for a limited time. The purpose is to aid us in testing the generation process and in making modifications to the process and deliverables based on beta tester feedback.
Beta testers will own any training data generated and delivered to them and may use that data for any purpose without limitation. As part of the generation process, we will train a simple multi-layered perceptron (MLP) classifier and validate it using the training data in a five-fold cross-validation. A Jupyter notebook demonstrating the training and validation will be provided with the training data. The notebook will have been run using Google Colab but should be appropriate for other environments. The training data will consist of Pandas files containing the inputs and outputs for the neural network.
We will generate enough data to demonstrate that the MLP classifier can be used to provide successful cross-validation with the data. Note that the MLP will not be included with the deliverables but could be constructed by the beta tester using the deliverables, if desired. Our use of it is solely to assure ourselves that the process works. When we feel that it does, the free beta testing interval will end with delivery of the training files and Jupyter notebook. However, it may well be that more training data and/or a different neural network structure might result in better classification. This determination will need to be made by the client and could then be pursued after a commercial contract had been executed.
How to Apply
Send an email to BetaProgram@ArchipelagoSystems.net with a copy of your Bayesian network attached (.xdsl or .dne only). We will evaluate the network for suitability (see section “Supported file formats” above). Priority will be given to large, complex networks as these will provide the best tests of our system.
During this initial evaluation, we will consider your network as your proprietary intellectual property and treat it as such. For example, we will not show it to or share it with anyone outside of our company. If you are not selected for the program or do not choose to continue with a commercial contract, we will delete all copies of it on our system. If you want additional security for your network, please obfuscate it before sending it to us (e.g., change node names to n1,n2,n3,… and state names for all nodes to s1,s2,s3,…).
If your network is selected for beta testing, you will be asked to sign a brief agreement stating essentially that we will protect your intellectual property and have no rights to it, and your obligation to provide us with a limitation of liability for any use you may make of the training data we supply to you. If your network is not selected, we will notify you with a brief description of why; e.g., “unsupported xdsl function: xxxx”.