Msapriori implementation(Data Mining)

profileparedan
msapriori_implementationdata_mining.docx

Implement: Ms-apriori (excluding rule generation) Consider: multiple minimum supports, support difference constraint, and item constraints Item constraints: Two types Cannot–be-together: sets of items cannot be in the same itemsets (pairwise), e.g., {1, 2, 3} and {6, 7, 9 10} Must-have: every itemset must have, e.g., (1 or 2) program this project by C++, do it in small database has just numbers. Algorithm MSapriori(T, MS, ?) // ? is for support difference constraint M ? sort(I, MS); L ? init-pass(M, T); F1 ? {{i} | i ? L, i.count/n ? MIS(i)}; for (k = 2; Fk-1 ? ?; k++) do if k=2 then Ck ? level2-candidate-gen(L, ?) else Ck ? MScandidate-gen(Fk-1, ?); end; for each transaction t ? T do for each candidate c ? Ck do if c is contained in t then c.count++; if c – {c[1]} is contained in t then c.tailCount++ end end Fk ? {c ? Ck | c.count/n ? MIS(c[1])} end return F ? ?kFk; Function level2-candidate-gen(L, M) 1 C2 m ?; // initialize the set of candidates 2 for each item l in L in the same order do 3 if l.count/n t MIS(l)then 4 for each item h in L that is after l do 5 if h.count/n t MIS(l) and |sup(h) sup(l)| M then 6 C2 m C2 ? {{l, h}}; // insert the candidate {l, h} into C2 Fig. 2.7. The level2-candidate-gen function Function MScandidate-gen(Fk1, M) 1 Ck m ?; // initialize the set of candidates 2 forall f1, f2  Fk // find all pairs of frequent itemsets 3 with f1 = {i1, … , ik2, ik1} // that differ only in the last item 4 and f2 = {i1, … , ik2, i’k1} 5 and ik-1 < i’k1 and |sup(ik-1) sup(i’k1)| M do 6 c m {i1, …, ik1, i’k1}; // join the two itemsets f1 and f2 7 Ck m Ck ? {c}; // insert the candidate itemset c into Ck 8 for each (k1)-subset s of c do 9 if (c[1]  s) or (MIS(c[2]) = MIS(c[1])) then 10 if (s  Fk1)then 11 delete c from Ck; // delete c from the set of candidates 12 endfor 13 endfor 14 returnCk; // return the generated candidates Fig. 2.8. The MScandidate-gen function