In continuation of the series on distribution modeling in DOE (Distribution Modeling in Data Orchestration Engine – 1, Distribution Modeling in Data Orchestration Engine – 2, Distribution Modeling in Data Orchestration Engine – 3, and Distribution Modeling in Data Orchestration Engine – 4), listing here performance related aspects of modeling rules.
The factors that can affect the rule evaluation:
- The volume of data associated with the data object whose data needs to be distributed.
- The data object node on which rule is modeled.
- The type of the rule.
- Number of criteria fields used in the rule.
- Type of the criteria field used in the rule.
- The operators used in rule modeling.
- The number of subscriptions for a device for a given rule.
- The number of subscription (of all devices) for a given rule.
- The number of different rule types for a data object.
To summarize the factors can be divided in following categories;
- Volume of data: for the data object & subscription.
- Rule model: Type of rule along with Field & operator used in rule.
- Number of rules.
The reason for the performance from above can be deduced from previous posts in the series. As the rule evaluation involves operations on the set, the factors affecting set evaluation are valid here as well.
Volume of data: The content of data object node forms one of the input set using which DOE does the rule evaluation. In case this set contains more data, DOE needs to perform evaluation accordingly. In most of the case, this will also be affected by the node on which the rule is modeled. In majority of the cases, the child nodes in a data object will contain more records than the root node of the data object. So the rule modeled on root node (keeping other factors same) are likely to perform better.
Similarly the volume of subscriptions for a rule can also affect the evaluation time for a rule. In case the same device is having multiple subscriptions in a given rule, then the cases where the data object instance needs to need to be removed from the device will require more evaluation than those cases where device is having single subscription for the rule.
Rule model: The distribution rule can be of following types- bulk, rule with criteria fields and rule with initial value option.
Between rules with criteria fields & their initial value counterpart, the initial value rules are performance intensive as these require all possible permutation evaluation.
The bulk rule may appear simple to model, but as the output produced by the bulk rule could be larger than normal rule on same data object volume, the comparison will depend on the rule fields & operators used.
The operator “equal to” performs better than the operators that can give range of values. Also the operators that are optimal in terms of usage database indices are better than the ones that are not.
The same applies for the rule fields. In general numeric fields may be faster to evaluate than the character fields. The character comparison for some operator is slower than the evaluation of numeric field using the same operator.
Number of rules: A message, sent from the receiver or from the backend systems, gets evaluated against all the active rules for the data object. Hence more the number of rules that are active, more evaluation will take place.