Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle

Onal K. D. , KARAGÖZ P.

4th International Conference on Web Intelligence, Mining and Semantics(WIMS), Thessaloniki, Yunanistan, 2 - 04 Haziran 2014 identifier identifier


Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from the literature and FOF (First Occurrence Forest) strategy. Experimental results revealed that FOF-SP finds patterns faster than previous WAP-Tree based algorithms PLWAP and FOF. Moreover, FOF-SP can mine patterns faster than PrefixSpan and as fast as LAPIN on real sequence databases from web usage mining and bioinformatics.