img

Project Name

Shabdakalpa: A Historical Dictionary of the Bengali Language

Department

School of Cultural Texts and Records

Project Lead from the Department

Prof. Sukanta Chaudhuri

Faculty

Faculty of Interdisciplinary Studies, Law & Management

Project Lead from GJUAF

Dr. Subhadeep Das

Overview

SHABDAKALPA is a projected historical dictionary of the Bengali language. A historical dictionary traces the full history of every word in a language. It involves three chief tasks: compiling a comprehesive corpus or database of all texts of all periods and on all subjects; developing a software to parse and sort the words; and analysing the output to extract the history of each word, fully documented by examples from the corpus.

The software to create Shabdakalpa is ready. Much progress has been made with the corpus/database, though much remains to be done. When that is complete, work can start on analysing the material to create the final dictionary entries. Shabdakalpa will be the first comprehensive historical dictionary in a non-Western language, and the first born-digital one in any language.

Request for funding

Please see below under financial status

Budget

Please see below under financial status

Quote from vendors

Not Applicable

Funds Transferred

USD 12,600 = INR 10,30,365 (USD 6,600 on November 23, 2023 + USD 6,000 on March 13, 2024)

Procurement status

Not Applicable

Completed or expected completion date

December 31, 2028

Remaining budget for this project

Please see below under financial status

Benefits to students and JU

This is a huge project with benefits for the community and academic world at large. It will offer a complete guide to the history and application of the Bengali language for all purposes: writing books and preparing knowledge material in all fields, reading and analysing works in all fields, analysing Bengali literature, developing paribhasha, providing an encyclopedic work of reference for all aspects of Bengali history, thought, learning and culture. Jadavpur will benefit particularly as the source and centre of this major contribution to knowledge.

Project Status

Work Planned for the project

  • Compiling a corpus/database of all periods and categories of Bengals
  • Developing a software to parse and organize the same
  • Analysing database entries to trace the full history of each word

Work performed during this period

  • Converting PDF files to .TXT by OCR application
  • Checking text of converted files
  • Uploading checked files to database
  • Developing new text conversion software

Percentage of work completed

  • Compiling corpus: approx. 50%
  • Developing software: 100%
  • Analysing database and creating entries: This can only start when the corpus is complete.

Financial Status

  • Budget for the project (as per estimate given to GJUAF): INR 54,00,000
  • Total Donation received: INR 10,30,365
  • Expensed to-date: INR 8,71,629 (INR 7,01,629 expensed + INR 1,70,000 estimated expense for June 2024)
  • Percentage of the donation expensed: 84.6%
  • Remaining donation to-date: INR 1,58,736 (estimated as on June 30, 2024)

Future Work

Work to be accomplished during next 6 months (till December 31, 2024)

  • Substantial progress with OCR conversion, checking and uploading of texts (with a target for completion by March 31, 2025)
  • Enhancing data for bibliographic control
  • Locating and downloading more files as per second bullet above, and processing them as in first bullet above
  • Starting the next phase: downloading and organizing material from the corpus and exporting it to spreadsheets, word by word, as basis of dictionary entries.

Project Close out – How do we measure the success of this project

At this point, by

  1. size and variety of corpus;
  2. accuracy of parsing anf aggregating software: This is now effectively 100%.

At the conclusion of the project, by

  1. the extent of use, as measured by the number of hits;
  2. coverage, as measured by random checking against famous as well as obscure texts and dictionaries
  3. accuracy, as estimated by comparison with other dictionaries;
  4. use as a model for other languages – the ultimate test.