Nowadays, many organizations are using decision-making processes for doing their business at the best level. They make decisions based on their past data. In the Data Mining world, the APriori algorithm is used for mining large amount of data and to provide quick and correct decisions.
Theoretical Implementation Steps
1. The Apriori algorithm would analyze all the transactions in the dataset for finding each items’ support count.
2. Initially each of the items is a member of a set of the First Candidate Itemset. The support count of each candidate item in the itemset is calculated and items with a support count less than the minimum required support count are removed as candidates. The remaining candidate items in the itemset are joined to create Second Candidate Itemset that each comprise of two items or members.
3. The support count of each two member itemset is calculated from the database of transactions and 2 member itemset that occur with a support count greater than or equal to the minimum support count are used to create the Third Candidate Itemset. The process in steps 1 and 2 are repeated for generating the Fourth and Fifth Candidate Itemset until the Support Count of all the itemset are lower than the minimum required support count.
4. All the candidate itemset generated with a support count greater than the minimum support count form a set of Frequent Itemset. These frequent itemsets are then used to generate association rules with a confidence greater than or equal to the Minimum Confidence.
5. Apriori recursively generates all the subsets of each frequent itemset and creates association rules based on the subsets with a confidence greater than the minimum confidence.
The algorithm flow
How could XI be used in this scenario?
This is the dataset of 8 transactions which is selected randomly from a large dataset of a mobile shop.
- Mobile Set, Memory card
- Panel, Charger, Memory card, Headset
- Battery, Mobile Set, Memory card
- Mobile Set, Bluetooth device
- Mobile Set, Headset, Battery
- Charger, Bluetooth device
- Mobile Set
This file is converted as a XML file using a sender file adapter where we have to choose Message protocol as File Content Conversion.
This XML file will be given as input to the Candidate Generation process which is a BPM process that generates the candidate itemset.
The output file of the Candidate Generation process will be given as the input to the Support Calculation process which counts the support for all candidate itemset.
Then the Candidate Pruning process takes the Support Calculation process’ output file as input and prunes the candidate itemsets to generate the next level (Second, Third, Fourth…) candidate items. Pruning means that it removes all the candidate items that do not fulfill set requirements.
The above mentioned processes are in a loop. After the Candidate Pruning process, the output file is checked whether it has the items or not. If there are no items,then that loop will terminate. Now the output file is the generated which is the output file of Support Calculation process. Otherwise, if that file has some items, then that is given as input to the Candidate Generation process.
In my next blog in this Data Mining blog series, we will see the technical implementation of Candidate generation using SAP Exchange Infrastructure.