Automatic Rule Learning over Knowledge Graphs
If a picture is worth a thousand words, a demo worths a thousand of slides, and the online prototype I show you is intended to bring the notion of Inductive Logic Programming (ILP) over Knowledge Graph (KG) and let you play with it. If ILP over KG sounds too cryptic, name it Rule/Query Learning over Databases. ‘Learning’ means a process that takes examples as input and returns a codified query – in database oriented terminology – or a rule – in logic programming phraseology. As ‘Databases’ I here favor graph oriented databases, the ones which emphasize the importance of relations among entities, whatever entity might be: business processes, catalog products, or generic concepts.
As you approach the actual demo page you may choose if you want to see some preloaded examples such as
Or create a new training data set for a new rule.
I guess the relation names such as capitalOf, carManufacturer might be unclear, and this is because I express entities and relationships with a logic style. If you are familiar with Prolog/Datalog you may skip this premise otherwise I encourage you to read it thoroughly. If we take the following sentence ‘Berlin is capital of Germany’ we can say that there is a subject (Berlin), an object (Germany) and a predicate (capital of or camelcase capitalOf). We may express it in graphical terms as:
I use instead a concise and computable form that takes the predicate as the first element, followed by the subject and then the object. It converts the statement above into
just as simple as that. CapitalOf takes two arguments, but there are predicates that are unary, so they take one single attribute. This is the case of carManifacturer, where with carManifacturer(Toyota) means that Toyota is labelled as a car manufacturer.
After you choose a predefined sample or you create a new one, it is time to instruct the system on what you expect to get from the generated rule, and you do that by providing some samples. Differently from usual machine learning tasks, you are demanded to provide not only positive cases but also negative ones. While positive cases are guaranteed to be among the results of the generated rule, the negative cases will never be. To begin with, I would suggest you fill just some of the positive cases, the ones you expect the most, and see what the system finds out for you, by pushing the button ‘Run Training’.
The system can generate one or more rules, or none if it cannot find any way to relate the data you provided. You may feel uncomfortable with the syntax of the rule
But it is simpler than you think. The system leverages 32 relations present in the database in order to generate rules. On the right side of the left-arrow there are the conditions to be satisfied (see the post on logic programming) to populate the returning variables X and Y (or only X for single argument), which are then used to construct the new predicate. Now you should verify what the rule actually returns from the database, by clicking on ‘Execute Selected Rule’.
Is the process finished? Usually with the first iteration the generated rule is too generic, and the results might include some items that are not meant to be there. For that, click on the trash icon of one of those items, so they are appended to the negative case list.
With the updated training dataset, re-run the training and check how the system improves the rule that finds what you really want.
And Now? This prototype for me represents the occasion to create innovative algorithms for ILP over KGs and the opportunity for you to include this tool into your technologies for a variety of business problems. Which ones do I have in mind?
eCommerce customers have preferences and it is vital for the merchants to understand them. How such preferences might be expressed? They could be inferred by user choices (positive cases) or missed selections (negative cases). I think it is easy to find correlations with this prototype.
Question Answering and Search
As we have seen in the examples above, sentences and statements might be expressed in logical form and throughout training datasets it would not be impossible to imagine applying this technology to find the right program that can answer your requests.