A new approach for reactive web usage data processing

Bayir M. A. , TOROSLU İ. H. , COŞAR A.

22nd International Conference on Data Engineering Workshops, ICDEW 2006, Georgia, United States Of America, 3 - 07 April 2006 identifier

  • Publication Type: Conference Paper / Full Text
  • Volume:
  • Doi Number: 10.1109/icdew.2006.13
  • City: Georgia
  • Country: United States Of America
  • Middle East Technical University Affiliated: Yes


© 2006 IEEE.Web usage mining exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web (WWW) users. The required information is captured by web servers and stored in web usage data logs. The first phase of web usage mining is the data processing phase. In the data processing phase, first, relevant information is filtered from the logs. After that, sessions are reconstructed by using heuristics that select and group requests belonging to the same user session. If we are processing requests after they are handled by the web server, this technique is called "reactive" while in "proactive" techniques the same (pre)processing occurs during the interactive browsing of the web site by the user. Reactive session reconstruction uses "time" and "navigation" oriented heuristics. We propose to combine these heuristics with "site topology" information in order to increase the accuracy of the reconstructed sessions. In this work, we have implemented an agent simulator, which models behavior of web users and generates web user navigation as well as the log data kept by the web server. By this way we know the actual user sessions and we can accurately evaluate and compare the performances of alternative session reconstruction heuristics (which will use only the web server log data). To the best of our knowledge, this paper is the first work that uses such an agent simulator, and therefore, is able to accurately evaluate different session reconstruction heuristics. By using the agent simulator, we attempt to show that our new heuristic discovers more accurate sessions than previous heuristics.