img

Project Name

Shabdakalpa: A Historical Dictionary of the Bengali Language

Department

School of Cultural Texts and Records

Project Lead from the Department

Prof Sukanta Chaudhuri

Project Lead from GJUAF

Dr Subhadeep Das

Overview

SHABDAKALPA is a projected historical dictionary of the Bengali language. A historical dictionary traces the full history of every word in a language. It involves three chief tasks: compiling a comprehesive corpus or database of all texts of all periods and on all subjects; developing a software to parse and sort the words; and analysing the output to extract the history of each word, fully documented by examples from the corpus.

The software to create Shabdakalpa is ready. Much progress has been made with the corpus/database, though much remains to be done. The work has recently been speeded up owing to certain technical improvements. Once it is complete, work can start on analysing the material to create the final dictionary entries. 

Shabdakalpa will be the first comprehensive historical dictionary in a non-Western language, and the first born-digital one in any language. 

Shabdakalpa was presented in an international forum for the first time by the Coordinator, Prof. Sukanta Chaudhuri, at a conference at the Forum Editorik, University of Goettingen, Germany, 13-15 June 2024.  

 

Budget

July 1, 2024 to December 31, 2024
INR 7,00,000
(See below for extended budget till December 31, 2025)

Quotations from vendors: Not Applicable 

Funds Transferred to date
Total: INR 15,40,794 as follows:
Remitted: $12,600 = INR 10,27,847 ($6,600 on Nov. 23, 2023 + $6,000 on March 13, 2024)
Under process: INR 5,12,947 reallocated from Digital Divide programme.
 

Completed or expected completion date
December 31, 2025 for completing all preparatory work (downloading of texts + creating corpus/database + creating spreadsheets of individual words from the database). Actual preparation of dictionary entries can start thereafter, tentatively from January 01, 2026.
We are targeting 21 February 2028 as a very tentative completion date for the entire project, given adequate funding. 2028 will be the centenary year of the completion of the pioneering Oxford English Dictionary. 21 February, of course, is International Language Day (Bhasha Dibas), with special significance for Bengali.
 
Remaining budget for this project (only preparatory phase till December 31, 2025 as indicated above)
INR 25,00,000 (INR 4,94,468, or say INR 5,00,000 balance of funds received + INR 20,00,000 to be requested later for the period January-December 2025)
Total budget requested is INR 54,000, of which the balance will be requested during the final phase, preparation of dictionary entries, in 2026-27.

Request for funding

The funds currently available will suffice till December 2025. We would request the next instalment in January 2025.

Budget

INR 7,00,000

Quote from vendors

Not Applicable

Acknowledgement from Finance Office

Already furnished for first two instalments. Will be sent by the Finance Office for the third instalment under process.

Procurement status

Not Applicable

Completed or expected completion date

December 31, 2025

Remaining budget for this project

INR 25,00,000

Benefits to students and JU

This is a huge project with benefits for the community and the academic world at large. It will offer a complete guide to the history and application of the Bengali language for all purposes: writing books and preparing knowledge material in all fields, reading and analysing works in all fields, analysing Bengali literature, developing paribhasha, providing an encyclopedic work of reference for all aspects of Bengali history, thought, learning and culture. Jadavpur will benefit particularly as the source and centre of this major contribution to knowledge.

Project Status

OTHER RELATED INFORMATION (for Multiyear projects)

PROJECT STATUS
Work Planned for the project

  • Compiling a corpus of all periods and categories of Bengali texts
  • Developing a software to parse and organize the same
  • Creating a parsed database with source and date of all occurrences of all words
  • Analysing database entries to trace the full history of each word and create the dictionary entries

Work performed till date

  • Fully developing the complex software for parsing and aggregating all inflexional variants of verbs, nouns and pronouns
  • Fully developing a software for compiling a database of all examples of all words in the corpus by source and date
  • Converting PDF files to .TXT by OCR application
  • Checking text of converted files
  • Uploading checked files to database

Percentage of work completed

  • Compiling corpus: approx. 70%
  • Developing software for parsing, aggregating and sorting: 100%. We have recently upgraded the OCR software and introduced Patrachitra, a totally new software for PDF conversion and OCR, which should materially speed up the work.
  • Analysing database and creating entries: This can only start when the corpus is complete.

 

Financial Status

  • Budget for the project (as per estimate given to GJUAF): INR 54,00,000
  • Total Donation received: INR 15,40,794 (as per breakup above)
  •  Expenses upto September 31, 2024: INR 10,46,326 
    • Percentage of the donation received till 30.09.2024 expensed: 67.9%
  • Remaining donation to-date: INR 4,94,468. All or most of this will be expensed by December 31, 2024.

Project Close out – How do we measure the success of this project

How do we measure the success of this project?

  1. At this point, by
    • size and variety of corpus.
    • accuracy of parsing and aggregating software: now effectively 100%.
  2. At conclusion of project, by
    • extent of use, as measured by number of hits.
    • coverage, as measured by random checking against famous as well as obscure texts and dictionaries.
    • accuracy, as estimated by comparison with other dictionaries.
    • use as a model for other languages – the ultimate test.