Still More Stream Mining - PowerPoint Presentation Notes | EduRev

: Still More Stream Mining - PowerPoint Presentation Notes | EduRev

 Page 1


1
Still More Stream-Mining
Frequent Itemsets
Elephants and Troops
Exponentially Decaying Windows
Page 2


1
Still More Stream-Mining
Frequent Itemsets
Elephants and Troops
Exponentially Decaying Windows
2
Counting Items
Problem: given a stream, which items 
appear more than s times in the 
window?
Possible solution: think of the stream of 
baskets as one binary stream per item.
 1 = item present; 0 = not present.
 Use DGIM to estimate counts of 1’s for all 
items.
Page 3


1
Still More Stream-Mining
Frequent Itemsets
Elephants and Troops
Exponentially Decaying Windows
2
Counting Items
Problem: given a stream, which items 
appear more than s times in the 
window?
Possible solution: think of the stream of 
baskets as one binary stream per item.
 1 = item present; 0 = not present.
 Use DGIM to estimate counts of 1’s for all 
items.
3
Extensions
 In principle, you could count frequent 
pairs or even larger sets the same way.
 One stream per itemset.
 Drawbacks:
1. Only approximate.
2. Number of itemsets is way too big.
Page 4


1
Still More Stream-Mining
Frequent Itemsets
Elephants and Troops
Exponentially Decaying Windows
2
Counting Items
Problem: given a stream, which items 
appear more than s times in the 
window?
Possible solution: think of the stream of 
baskets as one binary stream per item.
 1 = item present; 0 = not present.
 Use DGIM to estimate counts of 1’s for all 
items.
3
Extensions
 In principle, you could count frequent 
pairs or even larger sets the same way.
 One stream per itemset.
 Drawbacks:
1. Only approximate.
2. Number of itemsets is way too big.
4
Approaches
1. “Elephants and troops”: a heuristic 
way to converge on unusually strongly 
connected itemsets.
2. Exponentially decaying windows: a 
heuristic for selecting likely frequent 
itemsets. 
Page 5


1
Still More Stream-Mining
Frequent Itemsets
Elephants and Troops
Exponentially Decaying Windows
2
Counting Items
Problem: given a stream, which items 
appear more than s times in the 
window?
Possible solution: think of the stream of 
baskets as one binary stream per item.
 1 = item present; 0 = not present.
 Use DGIM to estimate counts of 1’s for all 
items.
3
Extensions
 In principle, you could count frequent 
pairs or even larger sets the same way.
 One stream per itemset.
 Drawbacks:
1. Only approximate.
2. Number of itemsets is way too big.
4
Approaches
1. “Elephants and troops”: a heuristic 
way to converge on unusually strongly 
connected itemsets.
2. Exponentially decaying windows: a 
heuristic for selecting likely frequent 
itemsets. 
5
Elephants and Troops
When Sergey Brin wasn’t worrying 
about Google, he tried the following 
experiment.
Goal: find unusually correlated sets of 
words.
 “High Correlation ” = frequency of 
occurrence of set >> product of frequency 
of members.
Read More
Offer running on EduRev: Apply code STAYHOME200 to get INR 200 off on our premium plan EduRev Infinity!