In the second part of this series, we explored how the BigML Node-RED bindings work in more detail and introduced the key concepts of input-output matching and node reification which will allow you to create more complex flows. In this third and final part of this introductory series, we are going to review what we know about inputs and outputs in a more systematic way, to introduce debugging facilities, and present an advanced type of node that allows you to inject WhizzML code directly into your flows.
Details about node inputs and outputs
Each BigML node has a varying number of inputs and outputs, which are embedded in the message payload that Node-RED propagates across nodes. For example, the ensemble
node has one input called dataset
and one output called ensemble
. That means the following two things:
- An ensemble node expects by default to receive a
dataset
input. This can be provided by any of the upstream nodes through their outputs, which are added to the message payload, or as a property of the ensemble node configuration. - The ensemble output is sent over to downstream nodes with the
ensemble
key. This is a consequence of the fact that when a node sends an output value, this is appended to the message payload using that node output port label as a key. This way, downstream nodes can use that key to access the output value of that node.
You can change the input and output port labels when you need to connect two nodes whose inputs and outputs do not match. Say for example that a node has an output port label generically named resource
and that you want to use that output value in a downstream node that requires a dataset
input. You can easily access the upstream node configuration and change the node settings as shown in the following image.
One thing you should be aware of is that all downstream nodes will be able to see and use any output values generated by upstream nodes, unless another node uses the same key to send its output out. For example, consider the following partial flow, where all inputs and outputs are shown at the same time:
If you inspect the connection between Lookup Dataset
and Dataset Split
, you will see that both labels have the value dataset
. To reiterate the flow explained above, this will make Lookup Dataset
store its output in the message payload under the dataset
key. Correspondently, Dataset Split
expects its input under the key dataset
, so all will work out just fine.
If you inspect the connection between Dataset Spit
and Make Model
, you will see that Dataset Split
produces two outputs, training-dataset
and test-dataset
, in accordance with its expected behavior which is splitting a dataset into two parts, one for training a model and the other to later evaluate it. On the other hand, Make Model
expects a dataset
input.
Now, if you were to run the flow as it is defined, you would not get any error. The flow would be executed through, but it would produce an incorrect result because Make Model
would use the dataset
value produced by Lookup Dataset
instead of the training dataset
value produced by Dataset Split
.
You have two options to fix this issue: either you change Dataset Split
‘s output so it uses a dataset
label instead of training-dataset
; or you modify the Make Model
input so it uses training-dataset
instead of dataset
. In the former case, the dataset
value produced by Lookup Dataset
will be overridden by the value with the same name produced by Dataset Split
.
How to debug problems
When you build a flow that causes an error when you run it, a good approach to follow is to force each node to be reified and connected to a debug
node that will allow you to inspect the output generated from that node so you can detect any anomalies or unexpected results. This will allow you to make sure that each node sends out a message whose payload actually contains the information downstream nodes expect to receive.
For example, consider the following flow. An error could occur at any node but we will not get any useful information until the whole WhizzML code has been generated and sent to the BigML platform to be executed.
A rather trivial approach to get more information for each node would be connecting each node to a debug node. This would provide for each debugged node the WhizzML code generated at that node. Unfortunately, since this is information available previous to the WhizzML code execution, we get no information about the actual produced outputs, which are sent along within the message payload.
If you enable the reify
option for each node, you are actually forcing the execution of each BigML node and thus you will also get to know which outputs each node generates by inspecting its message payload. This can be of great help when, for example, a downstream node complains about some missing information, improperly formatted information, or you simply get the wrong result, e.g., by using a wrong resource.
Additionally, when you reify each node, you will divide the whole WhizzML code that the flow generates into smaller, independent chunks that you will be able to run in the BigML Dashboard, which provides a more user-friendly environment for you to assess why a flow is failing.
To streamline debugging even more, the BigML Node-RED bindings provide two special flags you can specify in the message payload you inject in your flow or inside the flow context. The first one, BIGML_DEBUG_TRACE
will make each node output the WhizzML code it generates on the Node-RED console. So, you do not have to connect each BigML node to a debug node to get that information, although it is perfectly fine if you do.
WhizzML for evaluation :
(define lookup-dataset-11 (lambda (r) (let (result (head (resource-ids (resources "dataset" (make-map ["name__contains" "limit" "order"] ["iris" 2 "Ascending"])))) ) (merge r (make-map ["dataset"] [result])))))
(define dataset-split-12 (lambda (r) (let (dataset (if (contains? r "dataset") (get r "dataset") "" ) result (create-random-dataset-split dataset 0.75 { "name" "Dataset - Training"} { "name" "Dataset - Test"}) ) (merge r (make-map ["training-dataset" "test-dataset"] result)))))
(define model-13 (lambda (r) (let (training-dataset (if (contains? r "training-dataset") (get r "training-dataset") "" ) result (create-and-wait "model" (make-map [(resource-type training-dataset)] [training-dataset])) ) (merge r (make-map ["model"] [result])))))
(define evaluation-14 (lambda (r) (let (test-dataset (if (contains? r "test-dataset") (get r "test-dataset") "" ) model (if (contains? r "model") (get r "model") "" ) result (create-and-wait "evaluation" (make-map [(resource-type model) "dataset"] [model test-dataset])) ) (merge r (make-map ["evaluation"] [result])))))
(define init {"inputData" {"petal length" 1.35}, "limit" 2, "BIGML_DEBUG_REIFY" false, "BIGML_DEBUG_TRACE" true})
(define lookup-dataset-11-out (lookup-dataset-11 init))
(define dataset-split-12-out (dataset-split-12 lookup-dataset-11-out))
(define model-13-out (model-13 dataset-split-12-out))
(define evaluation-14-out (evaluation-14 model-13-out)) WhizzML for Filter result :
(define lookup-dataset-11 (lambda (r) (let (result (head (resource-ids (resources "dataset" (make-map ["name__contains" "limit" "order"] ["iris" 2 "Ascending"])))) ) (merge r (make-map ["dataset"] [result])))))
(define dataset-split-12 (lambda (r) (let (dataset (if (contains? r "dataset") (get r "dataset") "" ) result (create-random-dataset-split dataset 0.75 { "name" "Dataset - Training"} { "name" "Dataset - Test"}) ) (merge r (make-map ["training-dataset" "test-dataset"] result)))))
(define model-13 (lambda (r) (let (training-dataset (if (contains? r "training-dataset") (get r "training-dataset") "" ) result (create-and-wait "model" (make-map [(resource-type training-dataset)] [training-dataset])) ) (merge r (make-map ["model"] [result])))))
(define evaluation-14 (lambda (r) (let (test-dataset (if (contains? r "test-dataset") (get r "test-dataset") "" ) model (if (contains? r "model") (get r "model") "" ) result (create-and-wait "evaluation" (make-map [(resource-type model) "dataset"] [model test-dataset])) ) (merge r (make-map ["evaluation"] [result])))))
(define filter-result-15 (lambda (r) (let (evaluation (if (contains? r "evaluation") (get r "evaluation") "" ) result (get (fetch evaluation (make-map ["output_keypath"] ["result"])) "result") ) (merge r (make-map ["evaluation"] [result])))))
(define init {"inputData" {"petal length" 1.35}, "limit" 2, "BIGML_DEBUG_REIFY" false, "BIGML_DEBUG_TRACE" true})
(define lookup-dataset-11-out (lookup-dataset-11 init))
(define dataset-split-12-out (dataset-split-12 lookup-dataset-11-out))
(define model-13-out (model-13 dataset-split-12-out))
(define evaluation-14-out (evaluation-14 model-13-out))
(define filter-result-15-out (filter-result-15 evaluation-14-out))
As you can see, for each node you get the whole WhizzML program that is being generated for the whole flow.
Similarly, BIGML_DEBUG_REIFY
will reify each node without requiring you to manually change its configuration. In this case as well, each node will print on the Node-RED console the WhizzML code it attempted to execute:
WhizzML for evaluation :
(define evaluation-9 (lambda (r) (let (test-dataset (if (contains? r "test-dataset") (get r "test-dataset") "" ) model (if (contains? r "model") (get r "model") "" ) result (create-and-wait "evaluation" (make-map ["dataset" (resource-type model)] [test-dataset model])) ) (merge r (make-map ["evaluation"] [result])))))
(define init {"BIGML_DEBUG_REIFY" true, "BIGML_DEBUG_TRACE" true, "dataset" "dataset/5c3dc6948a318f053900002f", "inputData" {"petal length" 1.35}, "limit" 2, "model" "model/5c489dc33980b5340f007d3a", "test-dataset" "dataset/5c489dbd3514cd374702713c", "training-dataset" "dataset/5c489dbc3514cd3747027139"})
(define evaluation-9-out (evaluation-9 init)) WhizzML for Filter result :
(define filter-result-10 (lambda (r) (let (evaluation (if (contains? r "evaluation") (get r "evaluation") "" ) result (get (fetch evaluation (make-map ["output_keypath"] ["result"])) "result") ) (merge r (make-map ["evaluation"] [result])))))
(define init {"training-dataset" "dataset/5c489dbc3514cd3747027139", "BIGML_DEBUG_TRACE" true, "model" "model/5c489dc33980b5340f007d3a", "dataset" "dataset/5c3dc6948a318f053900002f", "inputData" {"petal length" 1.35}, "limit" 2, "evaluation" "evaluation/5c489dce3514cd37470271b0", "BIGML_DEBUG_REIFY" true, "test-dataset" "dataset/5c489dbd3514cd374702713c"})
(define filter-result-10-out (filter-result-10 init))
In this case, each code snippet is complete with the inputs provided by the previous node, stored in the init
global, so you can more easily check its correctness and/or try to execute it in BigML.
Injecting WhizzML Code
As we mentioned, WhizzML, BigML’s domain-specific language for defining custom ML workflows, provides the magic behind the BigML Node-RED bindings. This opens up a wealth of possibilities by embedding a node inside of your Node-RED flows to execute generic WhizzML code. In other words, if our bindings for Node-RED do not already provide a specific kind of node for a given task, you can create one with the right WhizzML code that does what you need.
For example, we could consider the following case:
- We want to predict using an existing ensemble.
- We calculate the prediction using two different methods, then choose
the result that has the highest confidence.
To carry through this task in Node-RED, we define the following flow.
The portion of the flow delimited by the dashed rectangle is the same prediction workflow we described in part 2 of this series. You can then add a new prediction node making sure the two prediction nodes use different settings for Operating kind
. You can use Confidence
for one, and Votes
for the other.
Another detail to note is renaming the two prediction nodes output labels so they do not clash. Indeed, if you leave the two nodes with their default output port labels, which will read prediction
for both of them, the second prediction node will override the first’s output. So, just use prediction1
and prediction2
as port labels for the two nodes.
Finally, add a WhizzML
node, available through the left-hand node palette, and configure it as shown in the following image.
Since the WhizzML node is going to use the two predictions outputted by the previous nodes, we should also make that explicit in the WhizzML input port label configuration, as shown in the following image:
This is the exact code you should paste into the WhizzML
field:
(let (p1 ((fetch prediction1) "prediction") p2 ((fetch prediction2) "prediction") c1 ((fetch prediction1) "confidence") c2 ((fetch prediction2) "confidence")) (if (> c1 c2) [p1 c1] [p2 c2]))
As you see, the WhizzML node uses prediction1
and prediction2
. Those variables must match the labels you defined for the prediction nodes output ports and the WhizzML node input port.
Now, if you inject a new message, with the same format as the one used for the prediction use case introduced earlier, you should get the following output:
Conclusion
We can’t wait to see what developers will be able to create using the BigML Node-RED bindings to make IoT devices that are able to learn from their environment. Let us know how you are using the BigML Node-RED bindings and provide any feedback to support@bigml.com.
Leave a Reply