What is Apriori Algorithm, Explain the Techniques for Improving the efficiency of Apriory Alogithum: C & C++, Java, PL/SQL Programming Language

Apriori Algorithm
Apriori Property: All nonempty subsets of a frequent itemset must also be frequent.
• If itemset I does not satisfy the minimum support threshold then I is not frequent,
P(I)<min_supp
• If A is added to I, then IUA cannot be more frequent than I.
• Hence IUA is not frequent, P(IUA)<min_supp

• Antimonotone Property: If a set cannot pass a test, all of its supersets will fail the same test as well.
• LK-1 is used to find Lk for k>=2
• Consists of two actions: Join and Prune
• Join: Join LK-1 with itself, denoted the candidates as Ck.
• Prune: Ck is a superset of Lk. Removing the tuples which do not satisfy the min supp criteria.(Subset Testing) Generating Association Rules from Frequent Itemsets
• Find Frequent itemsets and generate strong association rules.
• Explain Confidence and strong association rules.
• Association rules are generated as:
1. For each frequent itemset l, generate all nonempty subsets of l.
2. For every nonempty subset s of l, output the rule s ⇒ (l-s)

Techniques for Improving the efficiency of Apriory Alogithum
• Hash Based Techniques
• Transaction Reduction
• Partitioning
• Sampling
• Dynamic Item-set Counting

Hash Based Techniques: hash itemsets to corresponding buckets
• Used to reduce size of candidate k-itemsets Ck where k>1.
• Map generated frequent itemset in different buckets and increase corresponding bucket count.
• Itemset below the support threshold cannot be frequent and thus should be removed from the
candidate set.

Transaction Reduction: reduce number of transactions scanned in future iteration
• A transaction that does not contain any frequent k-itemset does not contain any
frequent (k+1)- itemset.
• Hence removed from further considerations.
Partitioning: partition data to find the candidate itemset
• Requires two data base scans to mine frequent itemsets.
• Phase I – algorithm subdivides transactions of D into n overlapping partitions.
• If Min Supp Thr in D is min_supp, then the min supp Thr for partition is min_supp*no of transactions in that partition.

Partitioning: partition data to find the candidate itemset
• Local Frequent Itemsets: frequent itemsets within the partition
• Global Frequent Itemsets: collection of all frequent itemsets from all partition.

Partitioning: partition data to find the candidate itemset
• Phase II: Second scan is done and actual support is assessed in order to determine global frequent itemsets.
• Partition size and number of partitions are set so that each partition can fit to main memory.
Partitioning: partition data to find the candidate itemset.

Sampling: mining on a subset of given data
• Pick up a random sample S of the given data D and search for frequent itemsets in S instead of D.
• Possible to miss some global frequent itemsets, so we use a lower min supp threshold to find local frequent itemsets.

Improving the Efficiency of Apriori
Dynamic itemset counting: adding candidate itemsets at different points during a scan.
• DB is partitioned into blocks marked by start points.
• New candidate can be added at any start point.
• Add new itemset if all of its subset are estimated

C & C++, Java, PL/SQL Programming Language

Pages

Tuesday, 22 October 2013

What is Apriori Algorithm, Explain the Techniques for Improving the efficiency of Apriory Alogithum

1 comment:

Laptops