Market Basket Analysis(Apriori)

So as most of you know the Beer Diaper analogy, which later was found out to be a total random association of items, famously known as Market Basket Analysis. This analysis technique is used to find out association between various type of items sold in a super market. Using this analysis technique, the super markets can better organize their shelfs, providing an opportunity to cross cell and up sell a product.

This technique is mostly used for B2C in data science, but I used this technique in my B2B area of study. I used this technique to find out association between various line of services(100’s in my organization) which clients tend to opt for, and create association between them such that the sales team would be able to suggest a new service line as well, when a client is interested in other services. I applied a similar approach of Market Basket on the lines of services, and came up with different associations.

So how does Apriori work –

The concept behind apriori –

  • All subsets of a frequent item set is frequent.
  • Supersets of an infrequent item set is infrequent.

Some basic statistics to run this mining –

Transaction NumItem set
T1A, B, C
T2D, C
T3A, B, D
T4A, B
T5A, D

So here are some transactions T1-T5 which are similar to transactions in a superstore with products A-D.

Support

To find out how popular an item set is, we calculate the support for an item set. It is the percentage of an item set which takes place in all of the item sets. If we want to calculate support for item set {A,B}, we see that it takes place 3 times in our sample of 5 item sets.

Support{A,B} = Frequency{A,B}/Total Transactions

Confidence

This is how likely item B will be purchased given that item A was purchased. But, there are chances that confidence may misrepresent an association. It will inflate the confidence when both the items in the item set are popular, to understand that, we calculate Lift.

Confidence{A->B} = Support{A,B}/Support{A}

Lift

This tells how likely item B is purchased, when item A is purchased. A value of 1 means there is no association, a value greater than 1 means it is likely that B will be bought given that item A is bought.

Lift{A,B} = Lift{B,A} = Support{A,B}/(Support{A}*Support{B})