Summarizing Legal Regulatory Documents using Transformers

Klaus S., Van Hecke R., Djafari Naini K., ALTINGÖVDE İ. S. , Bernabé-Moreno J., Herrera-Viedma E.

45th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, Madrid, Spain, 11 - 15 July 2022, pp.2426-2430 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1145/3477495.3531872
  • City: Madrid
  • Country: Spain
  • Page Numbers: pp.2426-2430
  • Keywords: eur-lex, extractive text summarization, legal ir, transformer


© 2022 ACM.Companies invest a substantial amount of time and resources in ensuring the compliance to the existing regulations or in the form of fines when compliance cannot be proven in auditing procedures. The topic is not only relevant, but also highly complex, given the frequency of changes and amendments, the complexity of the cases and the difficulty of the juristic language. This paper aims at applying advanced extractive summarization to democratize the understanding of regulations, so that non-jurists can decide which regulations deserve further follow-up. To achieve that, we first create a corpus named EUR-LexSum EUR-LexSum containing 4595 curated European regulatory documents and their corresponding summaries. We then fine-tune transformer-based models which, applied to this corpus, yield a superior performance (in terms of ROUGE metrics) compared to a traditional extractive summarization baseline. Our experiments reveal that even with limited amounts of data such transformer-based models are effective in the field of legal document summarization.