| Home | “The purpose of the program included in the free download associated with this web page is to convert the 31 Sanskrit diacritics characters used in the Bhaktivedanta VedaBase™ from non-standard ANSI to standard Unicode.” |

Free Download & Tutorial: How to Convert 1000s of Files Containing VedaBase Sanskrit Diacritics to Unicode in One Fell Swoop! (Version 9)

By Pratyatosa Dasa (ACBSP), September 1, 2012

Unicode file name feature added: December 25, 2014

(Pratyatosa.com, )

On this web page:

A. Introduction
B. Assumptions
C. Get the Free Download
D. Main Features
E. The Demo
F. The Tutorial

A. Introduction

The purpose of the program included in the free download associated with this web page is to convert the 31 Sanskrit diacritics characters used in the Bhaktivedanta VedaBase™ from non-standard ANSI to standard Unicode. Here they are:

N n A a I i S s U u D d H h L l M m N n N n R r R r S s T t l

When copy/pasted from the VedaBase into a text file, these 31 characters look, instead, like the following:

Ï ï Ä ä É é Ç ç Ü ü Ò ò Ù ù ß ÿ À à Ì ì Ë ë Å å È è N n Ö ö û

Therefore, “Krsna” looks like “Kånëa” and “Srila Prabhupada” looks like “Çrila Prabhupäda” Therefore, they need to be converted to their Unicode equivalents.

B. Assumptions

This tutorial assumes the following:

1. That extensions for known file types are not hidden: a) Click Start. b) Click Computer. c) Click Organize. d) Click Folder and search options…. e) Click the View tab. f. Make sure that Hide extensions for known file types is unchecked.) g. Click OK. h. Close the Computer window.)

2. That you have installed Notepad2. (Use the installer version rather than the portable version and be sure to invoke the installer option to have Notepad2 replace Notepad as the default text editor.)

3. That you have installed Perl: a) Go to http://www.activestate.com/activeperl/downloads. b) Download the free version of ActivePerl for Windows. c) Install it, making sure that Add Perl to the PATH environment variable is checked (the default).

C. Get the Free Download

1. Download _VedaBaseToUnicode.zip (2.09 KB)

2. Unzip _VedaBaseToUnicode.zip to a folder named _VedaBaseToUnicode.

D. Main Features

1. All files exported from the VedaBase should be in the same folder as the _VedaBaseToUnicode.bat file and they should all have .vb extensions. An alternative would be for them to have .txt extensions, but then they will be overwritten by the converted versions of the files.

2. The file names of the .vb/.txt files may contain Unicode characters.

3. The files may be a mixture of ANSI and UTF-8 (with signature) format. Files encoded as Unicode or UTF-8 without signature will not work.

4. The converted files are always encoded as UTF-8 with signature.

5. Common VedaBase characters such as “, ”, ’ and — are automatically converted to their Unicode equivalents when copy/pasted to a UTF-8 text file. In the case of an ANSI text file, they are automatically converted to Unicode at the same time that the Sanskrit diacritics are converted.

6. A log file is created which lists the number of letters which were changed in each file.

7. Changed and unchanged files may be easily separated via sorting by Date modified.

E. The Demo

1. In reference to the _VedaBaseToUnicode folder created above, here is a listing of the contents of _VedaBaseTextForTesting.vb:

“Hare Kånëa” — Çrila Prabhupäda

Ï ï Ä ä É é Ç ç Ü ü Ò ò Ù ù ß ÿ À à Ì ì Ë ë Å å È è N n Ö ö û

Çrila Prabhupäda’s “Bhagavad-gétä As It Is”

2. Edit the _VedaBaseTextForTesting.vb file using Notepad2 and click File/Encoding. Then you will see that it is an ANSI file:

Screen capture of Notepad2 showing ANSI file

3. Double click the _VedaBaseToUnicode.bat file’s icon. If Perl was installed correctly, here’s what you will see:

Screen capture of conversion to Unicode characters

The .vb file was copied to a .txt file and then the VedaBase Sanskrit diacritics characters within the .txt file were changed to their Unicode equivalents.

A log file containing the following was also created:

CHANGE(S) MADE TO FILE: "_VedaBaseTextForTesting.txt": 40 

CHANGE(S) MADE TO 1 "\.txt$" FILE(S): 40

_VedaBaseToUnicode.log

4. Edit the newly created _VedaBaseTextForTesting.txt file using Notepad2 and again click File/Encoding. You will see that the encoding has been changed to UTF-8 with signature:

Screen capture of Notepad2 showing UTF-8 file

Here is what the newly created Unicode version of the _VedaBaseTextForTesting.txt file looks like:

“Hare Krsna” — Srila Prabhupada

N n A a I i S s U u D d H h L l M m N n N n R r R r S s T t l

Srila Prabhupada’s “Bhagavad-gita As It Is”

F. The Tutorial

1. Copy the _VedaBaseToUnicode.bat file to a temporary folder such as C:\Temp2.

2. Copy the ANSI and/or UTF-8 (with signature) files containing the VedaBase style Sanskrit diacritics to that same folder.

3. These files should all have either .vb or .txt file name extensions. If they have .vb extensions, then they will be first copied to files having the same names but with .txt extensions, and only the .txt versions will be converted. If they have .txt extensions, then they will be converted directly, and the VedaBase diacritics versions will be overwritten, so it’s best to keep backup copies just in case.

4. If these files have file name extensions other than .vb or .txt, then the following 2 lines near the beginning of the .bat file must be adjusted accordingly:

if exist *.vb copy *.vb *.txt

$fileNameMask='\.txt$';

5. If some or all of the input files are in subfolders (subdirectories), then line 3 of the following must be modified:

##### "$doTheSubdirs = 1" means "Do the subdirectories."
##### "$doTheSubdirs = 0" means "Do not do the subdirectories."
$doTheSubdirs = 0;

…but please note that the files in the subfolders must have file name extensions that correspond to those defined in the $fileNameMask=… statement, and that the VedaBase versions of those files will be overwritten.

6. Double click the .bat file’s icon. If all goes according to plan, all of the files containing VedaBase Sanskrit diacritics will now be converted to their Unicode equivalents in one fell swoop!

Note: Another way to convert files containing VedaBase diacritics to Unicode, albeit one file at a time, is to do it online using: http://pratyatosa.com/?P=41.

| Home | THIS WEB PAGE URL: http://pratyatosa.com/?P=4r |