| Home | “Changed and unchanged files may be easily separated by sorting by Date modified.” |

Free Download & Tutorial - How to Automatically Add Diacitics to 94,000 Different Sanskrit Words in 1000s of Files in One Fell Swoop! (Version 9)

By Pratyatosa Dasa (ACBSP), September 3, 2012

Unicode file name feature added: December 26, 2014

(Pratyatosa.com, )

On this web page:

A. Introduction
B. Assumptions
C. Get the Free Download
D. Main Features
E. Demo #1: Add Diacritics
F. Demo #2: Remove Diacritics
G. The Tutorial

A. Introduction

The Final Order on Sanskrit Diacritics:

Srila Prabhupada wrote several letters on the subject of Sanskrit diacritics. He insisted that all of his publications include the diacritics even if it meant individually photocopying and physically cutting / pasting each and every Sanskrit word, or even pencilling them in! Now that we have computers, what excuse do we have?

The following is Srila Prabhupada’s final, written, signed order on Sanskrit diacritics. As usual, it’s crystal clear and doesn’t leave any room for speculation and/or interpretation by the “intellectual class” (rascals).

In reply to Jayadvaita’s questions, henceforward the policy for using diacritic markings is that I want them used everywhere, on large books, small books and also BTG. If there is any difficulty with the pronunciation, then after the correct diacritic spelling, in brackets the words “pronounced as …” may be written. So even on covers the diacritic markings should be used. We should not have to reduce our standard on account of the ignorant masses. Diacritic spelling is accepted internationally, and no learned person will even care to read our books unless this system is maintained. (Letter to Jadurani, 31 December, 1971)

The free download contains the following 4 files:

1. _AddSanskritDiacritics.bat

2. _AddSanskritDiacritics.dat

3. _Krsna.htm

4. _RemoveSanskritDiacritics.bat

1. _AddSanskritDiacritics.bat contains a Perl script which automatically adds Sanskrit diacritics to any number of files. The files can be in ANSI, Unicode, or UTF-8 format and can be a mixture of all 3. The files can be either regular text files or HTML files or a mixture of both. The program includes switches which by default are set to exclude diacritics from HTML titles, meta tags and scripts. The diacritics which are added are coded using the Unicode standard. For example, Krsna would be changed to Krsna. Any file which is not encoded using the UTF-8 format will be automatically converted to UTF-8 if any Sanskrit diacritics are added.

2. _AddSanskritDiacritics.dat is a data file used by _AddSanskritDiacritics.bat. It contains a list of 94,000+ Sanskrit words that contain diacritics.

3. _Krsna.htm is an extremely short, simple HTML file used to test the 2 .bat files.

4. _RemoveSanskritDiacritics.bat contains a Perl script which removes Sanskrit diacritics from any number of files. The files can be in either Unicode or UTF-8 format. It will only remove diacritics which are coded using the Unicode standard. For example, Krsna would be changed to Krsna, but the HTML ANSI version, Kṛṣṇa, would not be changed.

B. Assumptions

This tutorial assumes the following:

1. That extensions for known file types are not hidden: a) Click Start. b) Click Computer. c) Click Organize. d) Click Folder and search options…. e) Click the View tab. f. Make sure that Hide extensions for known file types is unchecked.) g. Click OK. h. Close the Computer window.)

2. That you have installed Notepad2. (Use the installer version rather than the portable version and be sure to invoke the installer option to have Notepad2 replace Notepad as the default text editor.)

3. That you have installed Perl: a) Go to http://www.activestate.com/activeperl/downloads. b) Download the free version of ActivePerl for Windows. c) Install it, making sure that Add Perl to the PATH environment variable is checked (the default).

C. Get the Free Download

1. Download _AddSanskritDiacritics.zip (389 KB)

2. Unzip _AddSanskritDiacritics.zip to a folder named _AddSanskritDiacritics.

B. Main Features

1. Multiple text files can have diacritics added to their diacriticless Sanskrit words in one fell swoop.

2. These text files may have file names which contain Unicode characters.

3. ANSI, Unicode, and UTF-8 format files can be mixed together.

4. A log file is created which contains detailed statistics on all of the changes.

5. Files to which diacritics are added are automatically converted to UTF-8 files.

6 Changed and unchanged files may be easily separated via sorting by Date modified.

C. Demo #1: Add Diacritics

1. Here is what the contents of the _Krsna.htm file looks like:

2. Double click the _AddSanskritDiacritics.bat file’s icon. If Perl was installed correctly, here’s what you will see:

Screen capture of adding diacritics

The corresponding log file:

_AddSanskritDiacritics.log

3. Now the contents of the _Krsna.htm file has changed to the following:

Note: Although N-n-A-a-I-i-S-s-U-u-D-d-H-h-L-l-M-m-N-n-N-n-R-r-R-r-S-s-T-t is obviously not a Sanskrit word, it was added to the list (_AddSanskritDiacritics.dat) for purposes of testing / reference. It includes all 30 of the Sanskrit diacritic characters which are handled by the 2 utility programs (_AddSanskritDiacritics.bat and _RemoveSanskritDiacritics.bat).

Note: No diacritics were also added to the title, the meta tags or the scripts. This is because the associated switches within _AddSanskritDiacritics.bat were set as follows:

D. Demo #2: Remove Diacritics

1. Assuming that you’ve already completed Demo #1 above, double click the _RemoveSanskritDiacritics.bat file’s icon. Here is what you should see:

Screen capture of removing diacritics

The log file:

_RemoveSanskritDiacritics.log

2. Now the contents of the _Krsna.htm file is back to the way it was originally:

E. The Tutorial

Now you are ready to add Sanskrit diacritics to any number of files in one fell swoop:

1. Be sure to keep backup copies of all of the files until you are sure that the changes were made correctly.

2. Copy the files to a temporary folder such as C:\Temp2.

3. Copy the _AddSanskritDiacritics.bat file and the _AddSanskritDiacritics.dat file to that same folder.

4. Double click the _AddSanskritDiacritics.bat file's icon, and if all goes according to plan, all of the files will now have Sanskrit diacritics added automatically!

Note: Near the beginning of the _AddSanskritDiacritics.bat and the _RemoveSanskritDiacritics.bat files are the following:

The above will match .htm, .html, .php, .php3, etc. files. It may be changed to suit your needs. For example, if all that you want to modify are .txt files, then it could be changed as follows:

Note: Another way to add Sanskrit diacritics, albeit one file at a time, is to do it online using: http://pratyatosa.com/?P=71.


| Home | THIS WEB PAGE URL: http://pratyatosa.com/?P=3o |