Improving educational web search for question-like queries through subject classification

Yilmaz T., Ozcan R., ALTINGÖVDE İ. S. , ULUSOY Ö.

INFORMATION PROCESSING & MANAGEMENT, vol.56, no.1, pp.228-246, 2019 (Peer-Reviewed Journal) identifier identifier

  • Publication Type: Article / Article
  • Volume: 56 Issue: 1
  • Publication Date: 2019
  • Doi Number: 10.1016/j.ipm.2018.10.013
  • Journal Indexes: Science Citation Index Expanded, Social Sciences Citation Index, Scopus
  • Page Numbers: pp.228-246
  • Keywords: Educational web search, Question classification, Search engine result page ranking, K-12, TURKISH


Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.