Improving Efficiency of Sequence Mining by Combining First Occurrence Forest (FOF) Strategy and Sibling Principle


Onal K. D., KARAGÖZ P.

4th International Conference on Web Intelligence, Mining and Semantics(WIMS), Thessaloniki, Greece, 2 - 04 June 2014 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1145/2611040.2611061
  • City: Thessaloniki
  • Country: Greece
  • Middle East Technical University Affiliated: Yes

Abstract

Sequential pattern mining is one of the basic problems in data mining and it has many applications in web mining. The WAP-Tree (Web Access Pattern Tree) data structure provides a compact representation of single-item sequence databases. WAP-Tree based algorithms have shown notable execution time and memory consumption performance on mining single-item sequence databases. We propose a new algorithm FOF-SP, a WAP-Tree based algorithm which combines an early prunning strategy called "Sibling Principle" from the literature and FOF (First Occurrence Forest) strategy. Experimental results revealed that FOF-SP finds patterns faster than previous WAP-Tree based algorithms PLWAP and FOF. Moreover, FOF-SP can mine patterns faster than PrefixSpan and as fast as LAPIN on real sequence databases from web usage mining and bioinformatics.