Legal Ease: An End-to-End Automated Legal Document Processing System Using LLMs and OCR

Volume: 11 | Issue: 1 | Year 2025 | Subscription
International Journal of Software Computing and Testing
Received Date: 11/25/2024
Acceptance Date: 11/29/2024
Published On: 2025-04-15
First Page: 29
Last Page: 32

Journal Menu


By: Bhasha Sinha, Shrishti Saxena, Anuva Vashishtha, Aradhna Aggarwal, and Rajesh Yadav

Abstract

This paper describes an automated legal document processing system that uses modern technologies, such as Large Language Models (LLMs) and Optical Character Recognition (OCR). Designed to streamline legal procedures, the system allows users to submit legal documents in PDF format and uses Tesseract-powered OCR to extract and digitize text while keeping the document’s layout and structure. The digitized text is processed by an LLM that has been fine-tuned using legal datasets to ensure that key legal nuances are retained. The system provides a variety of functionalities, including full-document summarization, clause-specific summarization, and targeted clause extraction, to meet a wide range of legal purposes. With its user-centric design, the React is interface offers an interactive and customizable experience, allowing users to select specific areas for summarization, highlight critical bits, and fine tune results to meet their needs. The results are presented in a structured way to ensure clarity and practical insights. The backend uses FastAPI for easy processing and PyTorch for model implementation, which ensures robustness and scalability. Furthermore, tools like NLTK help with natural language processing tasks like text analysis and structure. To improve user ease, the system incorporates Google Drive for automatic session management, allowing users to review or edit previously processed documents. This system revolutionizes legal document automation by merging cutting-edge technology, such as Tesseract OCR, PyTorch, and finely designed LLMs. It benefits legal professionals by reducing time, increasing efficiency, and ensuring the correctness and context of legal documents. This system is a vital tool for modern legal offices, meeting the growing demand for intelligent and efficient document processing solutions.

Keywords: Artificial Intelligence (AI) in Law, Document Summarization, Large Language Models (LLMs), Legal Document Automation, Legal Technology, Natural Language Processing (NLP), Optical Character Recognition (OCR)

Loading

Citation:

How to cite this article: Bhasha Sinha, Shrishti Saxena, Anuva Vashishtha, Aradhna Aggarwal, and Rajesh Yadav, Legal Ease: An End-to-End Automated Legal Document Processing System Using LLMs and OCR. International Journal of Software Computing and Testing. 2025; 11(1): 29-32p.

How to cite this URL: Bhasha Sinha, Shrishti Saxena, Anuva Vashishtha, Aradhna Aggarwal, and Rajesh Yadav, Legal Ease: An End-to-End Automated Legal Document Processing System Using LLMs and OCR. International Journal of Software Computing and Testing. 2025; 11(1): 29-32p. Available from:https://journalspub.com/publication/ijsct/article=18193

Refrences:

  1. Kumar H, Jayanth P. Large Language Models for Indian Legal Text Summarisation. In 2024 IEEE Int Conf Electron, Compute Commune Technol. 2024 Jul 12:1–5.
  2. Preti D, Giannone C, Favalli A, Romagnoli R. Automatic Summarization of Legal Texts, Extractive Summarization using LLMs. Ital-IA 2024: 4th National Conference on Artificial Intelligence, organized by CINI, May 29–30, 2024, Naples, Italy. Available from https://ceur-ws.org/Vol-3762/501.pdf.
  3. Millstein F. Natural language processing with python: Natural language processing using NLTK. Frank Millstein; 2020 Jul 6.
  4. Kolandaisamy R, Rajagopal H, Kolandaisamy I, Sinnappan GS. The Smart Document Processing with Artificial Intelligence. The 2024 Int Conf Artif Life and Robot J: Com HorutoHall, Oita, Japan; 2024. 534–540p.
  5. Mittal R, Garg A. Text extraction using OCR: A systematic review. In 2020, The Second Int Conf Inven Res Comput Appl. IEEE. 2020 Jul 15:357–362.
  6. Krook J, Schneiders E, Seabrooke T, Leesakul N, Clos J. Large Language Models (LLMs) for Legal Advice: A Scoping Review. 2024 Oct 4. Available at SSRN 4976189.
  7. Lopresti D. Optical character recognition errors and their effects on natural language processing. In Proceedings of the second workshop on Analytics for Noisy Unstructured Text Data; 2008 Jul 24. pp. 9–16.
  8. Schweighofer E, Merkl D. A learning technique for legal document analysis. In Proceedings of the 7th Int Conf Artif Intell Law. 1999 Jun14:156–163.
  9. Croft WB, Harding SM, Taghva K, Borsack J. An evaluation of information retrieval accuracy with simulated OCR output. In Symposium on Document Analysis and Information Retrieval; 1994 Apr. p. 115–126.
  10. Lin X, Wang W, Li Y, Yang S, Feng F, Wei Y, et al. Data-efficient Fine-tuning for LLM-based Recommendation. In Proceedings of the 47th Int ACM SIGIR Conf Res Develo Information Retr; 2024 Jul 10. pp. 365–374.