Welcome to PyCDSL’s documentation!
PyCDSL is a python interface to Cologne Digital Sanskrit Lexicon (CDSL).
Free software: GNU General Public License v3
Documentation: https://pycdsl.readthedocs.io.
Features
CDSL Corpus Management (Download, Update, Access)
Unified Programmable Interface to access all dictionaries available at CDSL
Command Line Interfaces for quick and easy search
Console Command:
cdsl
REPL Interface (powered by
cmd2
)
Extensive support for transliteration using
indic-transliteration
moduleSearch by key, value or both
PyCDSL
PyCDSL is a python interface to Cologne Digital Sanskrit Lexicon (CDSL).
Free software: GNU General Public License v3
Documentation: https://pycdsl.readthedocs.io.
Features
CDSL Corpus Management (Download, Update, Access)
Unified Programmable Interface to access all dictionaries available at CDSL
Command Line Interfaces for quick and easy search
Console Command:
cdsl
REPL Interface (powered by
cmd2
)
Extensive support for transliteration using
indic-transliteration
moduleSearch by key, value or both
Install
To install PyCDSL, run this command in your terminal:
$ pip install PyCDSL
Usage
PyCDSL can be used in a python project, as a console command and as an interactive REPL interface.
Using PyCDSL in a Project
Import PyCDSL in a project:
import pycdsl
Create a CDSLCorpus Instance:
# Default installation at ~/cdsl_data
CDSL = pycdsl.CDSLCorpus()
# Custom installation path can be specified with argument `data_dir`
# e.g. CDSL = pycdsl.CDSLCorpus(data_dir="custom-installation-path")
# Custom transliteration schemes for input and output can be specified
# with arguments `input_scheme` and `output_scheme`.
# Values should be valid names of the schemes from `indic-transliteration`
# If unspecified, `DEFAULT_SCHEME` (`devanagari`) would be used.
# e.g. CDSL = pycdsl.CDSLCorpus(input_scheme="hk", output_scheme="iast")
# Search mode can be specified to search values by key or value or both.
# Valid options for `search_mode` are "key", "value", "both".
# These are also stored in convenience variables, and it is recommended
# to use these instead of string literals.
# The variables are, SEARCH_MODE_KEY, SEARCH_MODE_VALUE, SEARCH_MODE_BOTH.
# The variable SEARCH_MODES will always hold the list of all valid modes.
# The variable DEFAULT_SEARCH_MODE will alway point to the default mode.
# e.g. CDSL = pycdsl.CDSLCorpus(search_mode=pycdsl.SEARCH_MODE_VALUE)
Setup default dictionaries (["MW", "MWE", "AP90", "AE"]
):
# Note: Any additional dictionaries that are installed will also be loaded.
CDSL.setup()
# For loading specific dictionaries only,
# a list of dictionary IDs can be passed to the setup function
# e.g. CDSL.setup(["VCP"])
# If `update` flag is True, update check is performed for every dictionary
# in `dict_ids` and if available, the updated version is installed
# e.g. CDSL.setup(["MW"], update=True)
Search in a dictionary:
# Any loaded dictionary is accessible using `[]` operator and dictionary ID
# e.g. CDSL["MW"]
results = CDSL["MW"].search("राम")
# Alternatively, they are also accessible like an attribute
# e.g. CDSL.MW, CDSL.MWE etc.
results = CDSL.MW.search("राम")
# Note: Attribute access and Item access both use the `dicts` property
# under the hood to access the dictionaries.
# >>> CDSL.MW is CDSL.dicts["MW"]
# True
# >>> CDSL["MW"] is CDSL.dicts["MW"]
# True
# `input_scheme` and `output_scheme` can be specified to the search function.
CDSL.MW.search("kṛṣṇa", input_scheme="iast", output_scheme="itrans")[0]
# <MWEntry: 55142: kRRiShNa = 1. kRRiShNa/ mf(A/)n. black, dark, dark-blue (opposed to shveta/, shukla/, ro/hita, and aruNa/), RV.; AV. &c.>
# Search using wildcard (i.e. `*`)
# e.g. To search all etnries starting with kRRi (i.e. कृ)
CDSL.MW.search("kRRi*", input_scheme="itrans")
# Limit and/or Offset the number of search results, e.g.
# Show the first 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10)
# Show the next 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10, offset=10)
# Search using a different search mode
CDSL.MW.search("हृषीकेश", mode=pycdsl.SEARCH_MODE_VALUE)
Access an entry by ID:
# Access entry by `entry_id` using `[]` operator
entry = CDSL.MW["263938"]
# Alternatively, use `CDSLDict.entry` function
entry = CDSL.MW.entry("263938")
# Note: Access using `[]` operator calls the `CDSLDict.entry` function.
# The difference is that, in case an `entry_id` is absent,
# `[]` based access will raise a `KeyError`
# `CDSLDict.entry` will return None and log a `logging.ERROR` level message
# >>> entry
# <MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
# Output transliteration scheme can also be provided
CDSL.MW.entry("263938", output_scheme="iast")
# <MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
Entry
class also supports transliteration after creation.
Thus, any entry fetched either through search()
function or through entry()
function can be transliterated.
Transliterate a single entry:
CDSL.MW.entry("263938").transliterate("slp1")
# <MWEntry: 263938: hfzIkeSa = lord of the senses (said of Manas), BhP.>
Change transliteration scheme for a dictionary:
CDSL.MW.set_scheme(input_scheme="itrans")
CDSL.MW.search("rAma")
Change search mode for a dictionary:
CDSL.MW.set_search_mode(mode="value")
CDSL.MW.search("hRRiShIkesha")
Classes CDSLCorpus
and CDSLDict
are iterable.
Iterating over
CDSLCorpus
yields loaded dictionary instances.Iterating over
CDSLDict
yields entries in that dictionary.
# Iteration over a `CDSLCorpus` instance
for cdsl_dict in CDSL:
print(type(cdsl_dict))
print(cdsl_dict)
break
# <class 'pycdsl.lexicon.CDSLDict'>
# CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
# Iteration over a `CDSLDict` isntance
for entry in CDSL.MW:
print(type(entry))
print(entry)
break
# <class 'pycdsl.models.MWEntry'>
# <MWEntry: 1: अ = 1. अ the first letter of the alphabet>
Note: Please check the documentation of modules in the PyCDSL Package for more detailed information on available classes and functions.
Using Console Interface of PyCDSL
Help to the Console Interface:
usage: cdsl [-h] [-i] [-s SEARCH] [-p PATH] [-d DICTS [DICTS ...]]
[-sm SEARCH_MODE] [-is INPUT_SCHEME] [-os OUTPUT_SCHEME]
[-hf HISTORY_FILE] [-sc STARTUP_SCRIPT]
[-u] [-dbg] [-v]
Access dictionaries from Cologne Digital Sanskrit Lexicon (CDSL)
optional arguments:
-h, --help show this help message and exit
-i, --interactive start in an interactive REPL mode
-s SEARCH, --search SEARCH
search pattern (ignored if `--interactive` mode is set)
-p PATH, --path PATH path to installation
-d DICTS [DICTS ...], --dicts DICTS [DICTS ...]
dictionary id(s)
-sm SEARCH_MODE, --search-mode SEARCH_MODE
search mode
-is INPUT_SCHEME, --input-scheme INPUT_SCHEME
input transliteration scheme
-os OUTPUT_SCHEME, --output-scheme OUTPUT_SCHEME
output transliteration scheme
-hf HISTORY_FILE, --history-file HISTORY_FILE
path to the history file
-sc STARTUP_SCRIPT, --startup-script STARTUP_SCRIPT
path to the startup script
-u, --update update specified dictionaries
-dbg, --debug turn debug mode on
-v, --version show version and exit
Common Usage:
$ cdsl -d MW AP90 -s हृषीकेश
Note: Arguments for specifying installation path, dictionary IDs, input and output transliteration schemes are valid for both interactive REPL shell and non-interactive console command.
Using REPL Interface of PyCDSL
REPL Interface is powered by cmd2
, and thus supports persistent history,
start-up script, and several other rich features.
To use REPL Interface to Cologne Digital Sanskrit Lexicon (CDSL):
$ cdsl -i
cmd2 Inherited REPL Features
Persistent History across sessions is maintained at
~/.cdsl_history
.If Start-up Script is present (
~/.cdslrc
), the commands (one per line) are run at the start-up.Customized shortcuts for several useful commands, such as
!
forshell
,/
forsearch
and$
forshow
.Aliases can be created on runtime.
Output Redirection works like the standard console, e.g.
command args > output.txt
will write the output ofcommand
tooutput.txt
. Similarly,>>
can be used to append the output.Clipboard Integration is supported through
Pyperclip
. If the output file name is omitted, the output is copied to the clipboard, e.g.,command args >
. The output can even be appended to clipboard bycommand args >>
.
References
pyperclip
: https://pypi.org/project/pyperclip/
Note: The locations of history file and start-up script can be customized through CLI options.
REPL Session Example
Cologne Sanskrit Digital Lexicon (CDSL)
---------------------------------------
Install or load dictionaries by typing `use [DICT_IDS..]` e.g. `use MW`.
Type any keyword to search in the selected dictionaries. (help or ? for list of options)
Loaded 4 dictionaries.
(CDSL::None) help -v
Documented commands (use 'help -v' for verbose/'help <topic>' for details):
Core
======================================================================================================
available Display a list of dictionaries available in CDSL
dicts Display a list of dictionaries available locally
info Display information about active dictionaries
search Search in the active dictionaries
show Show a specific entry by ID
stats Display statistics about active dictionaries
update Update loaded dictionaries
use Load the specified dictionaries from CDSL.
If not available locally, they will be installed first.
Utility
======================================================================================================
alias Manage aliases
help List available commands or provide detailed help for a specific command
history View, run, edit, save, or clear previously entered commands
macro Manage macros
quit Exit this application
run_script Run commands in script file that is encoded as either ASCII or UTF-8 text
set Set a settable parameter or show current settings of parameters
shell Execute a command as if at the OS prompt
shortcuts List available shortcuts
version Show the current version of PyCDSL
(CDSL::None) help available
Display a list of dictionaries available in CDSL
(CDSL::None) help search
Usage: search [-h] [--limit LIMIT] [--offset OFFSET] pattern
Search in the active dictionaries
Note
----
* Searching in the active dictionaries is also the default action.
* In general, we do not need to use this command explicitly unless we
want to search the command keywords, such as, `available` `search`,
`version`, `help` etc. in the active dictionaries.
positional arguments:
pattern search pattern
optional arguments:
-h, --help show this help message and exit
--limit LIMIT limit results
--offset OFFSET skip results
(CDSL::None) help dicts
Display a list of dictionaries available locally
(CDSL::None) dicts
CDSLDict(id='AP90', date='1890', name='Apte Practical Sanskrit-English Dictionary')
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
CDSLDict(id='MWE', date='1851', name='Monier-Williams English-Sanskrit Dictionary')
CDSLDict(id='AE', date='1920', name="Apte Student's English-Sanskrit Dictionary")
(CDSL::None) update
Data for dictionary 'AP90' is up-to-date.
Data for dictionary 'MW' is up-to-date.
Data for dictionary 'MWE' is up-to-date.
Data for dictionary 'AE' is up-to-date.
(CDSL::None) use MW
Using 1 dictionaries: ['MW']
(CDSL::MW) हृषीकेश
Found 6 results in MW.
<MWEntry: 263922: हृषीकेश = हृषी-केश a See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) show 263938
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) show 263938 --show-data
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
Data:
<H3A><h><key1>hfzIkeSa<\/key1><key2>hfzIkeSa<\/key2><\/h>
<body> lord of the senses (said of <s1 slp1="manas">Manas<\/s1>), <ls>BhP.<\/ls><info lex="inh"\/><\/body>
<tail><L>263938<\/L><pc>1303,2<\/pc><\/tail><\/H3A>
(CDSL::MW) $263938
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) $263938 > output.txt
(CDSL::MW) !cat output.txt
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) set input_scheme itrans
input_scheme - was: 'devanagari'
now: 'itrans'
(CDSL::MW) hRRiSIkesha
Found 6 results in MW.
<MWEntry: 263922: हृषीकेश = हृषी-केश a See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) set output_scheme iast
output_scheme - was: 'devanagari'
now: 'iast'
(CDSL::MW) hRRiSIkesha
Found 6 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
(CDSL::MW) set limit 2
limit - was: 50
now: 2
(CDSL::MW) hRRiSIkesha
Found 2 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
(CDSL::MW) set limit -1
limit - was: 2
now: None
(CDSL::MW) set search_mode value
search_mode - was: 'key'
now: 'value'
(CDSL::MW) hRRiSIkesha
Found 1 results in MW.
<MWEntry: 263938.1: hṛṣīkeśatva = hṛṣīkeśa—tva n.>
(CDSL::MW) set search_mode both
search_mode - was: 'value'
now: 'both'
(CDSL::MW) hRRiSIkesha
Found 7 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
<MWEntry: 263938.1: hṛṣīkeśatva = hṛṣīkeśa—tva n.>
(CDSL::MW) info
Total 1 dictionaries are active.
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
(CDSL::MW) stats
Total 1 dictionaries are active.
---
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
{'total': 287627, 'distinct': 194044, 'top': [('कृष्ण', 50), ('शिव', 46), ('विजय', 46), ('पुष्कर', 45), ('काल', 39), ('सिद्ध', 39), ('योग', 39), ('चित्र', 38), ('शुचि', 36), ('वसु', 36)]}
(CDSL::MW) use WIL
Downloading 'WIL.web.zip' ... (8394727 bytes)
100%|██████████████████████████████████████████████████████████████████████████████████████| 8.39M/8.39M [00:21<00:00, 386kB/s]
Successfully downloaded 'WIL.web.zip' from 'https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2020/downloads/wilweb1.zip'.
Using 1 dictionaries: ['WIL']
(CDSL::WIL)
(CDSL::WIL) use WIL MW
Using 2 dictionaries: ['WIL', 'MW']
(CDSL::WIL,MW) hRRiSIkesha
Found 1 results in WIL.
<WILEntry: 44411: hṛṣīkeśa = hṛṣīkeśa m. (-śaḥ) KṚṢṆA or VIṢṆU. E. hṛṣīka an organ of sense, and īśa lord.>
Found 6 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
(CDSL::WIL,MW) use MW AP90 MWE AE
Using 4 dictionaries: ['MW', 'AP90', 'MWE', 'AE']
(CDSL::MW+3) use --all
Using 5 dictionaries: ['AP90', 'MW', 'MWE', 'AE', 'WIL']
(CDSL::AP90+3) use --none
Using 0 dictionaries: []
(CDSL::None) quit
Credits
This application uses data from Cologne Digital Sanskrit Dictionaries, Cologne University.
Installation
Stable release
To install PyCDSL, run this command in your terminal:
$ pip install PyCDSL
This is the preferred method to install PyCDSL, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources
The sources for PyCDSL can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/hrishikeshrt/PyCDSL
Or download the tarball:
$ curl -OJL https://github.com/hrishikeshrt/PyCDSL/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
Usage
PyCDSL can be used in a python project, as a console command and as an interactive REPL interface.
Using PyCDSL in a Project
Import PyCDSL in a project:
import pycdsl
Create a CDSLCorpus Instance:
# Default installation at ~/cdsl_data
CDSL = pycdsl.CDSLCorpus()
# Custom installation path can be specified with argument `data_dir`
# e.g. CDSL = pycdsl.CDSLCorpus(data_dir="custom-installation-path")
# Custom transliteration schemes for input and output can be specified
# with arguments `input_scheme` and `output_scheme`.
# Values should be valid names of the schemes from `indic-transliteration`
# If unspecified, `DEFAULT_SCHEME` (`devanagari`) would be used.
# e.g. CDSL = pycdsl.CDSLCorpus(input_scheme="hk", output_scheme="iast")
# Search mode can be specified to search values by key or value or both.
# Valid options for `search_mode` are "key", "value", "both".
# These are also stored in convenience variables, and it is recommended
# to use these instead of string literals.
# The variables are, SEARCH_MODE_KEY, SEARCH_MODE_VALUE, SEARCH_MODE_BOTH.
# The variable SEARCH_MODES will always hold the list of all valid modes.
# The variable DEFAULT_SEARCH_MODE will alway point to the default mode.
# e.g. CDSL = pycdsl.CDSLCorpus(search_mode=pycdsl.SEARCH_MODE_VALUE)
Setup default dictionaries (["MW", "MWE", "AP90", "AE"]
):
# Note: Any additional dictionaries that are installed will also be loaded.
CDSL.setup()
# For loading specific dictionaries only,
# a list of dictionary IDs can be passed to the setup function
# e.g. CDSL.setup(["VCP"])
# If `update` flag is True, update check is performed for every dictionary
# in `dict_ids` and if available, the updated version is installed
# e.g. CDSL.setup(["MW"], update=True)
Search in a dictionary:
# Any loaded dictionary is accessible using `[]` operator and dictionary ID
# e.g. CDSL["MW"]
results = CDSL["MW"].search("राम")
# Alternatively, they are also accessible like an attribute
# e.g. CDSL.MW, CDSL.MWE etc.
results = CDSL.MW.search("राम")
# Note: Attribute access and Item access both use the `dicts` property
# under the hood to access the dictionaries.
# >>> CDSL.MW is CDSL.dicts["MW"]
# True
# >>> CDSL["MW"] is CDSL.dicts["MW"]
# True
# `input_scheme` and `output_scheme` can be specified to the search function.
CDSL.MW.search("kṛṣṇa", input_scheme="iast", output_scheme="itrans")[0]
# <MWEntry: 55142: kRRiShNa = 1. kRRiShNa/ mf(A/)n. black, dark, dark-blue (opposed to shveta/, shukla/, ro/hita, and aruNa/), RV.; AV. &c.>
# Search using wildcard (i.e. `*`)
# e.g. To search all etnries starting with kRRi (i.e. कृ)
CDSL.MW.search("kRRi*", input_scheme="itrans")
# Limit and/or Offset the number of search results, e.g.
# Show the first 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10)
# Show the next 10 results
CDSL.MW.search("kṛ*", input_scheme="iast", limit=10, offset=10)
# Search using a different search mode
CDSL.MW.search("हृषीकेश", mode=pycdsl.SEARCH_MODE_VALUE)
Access an entry by ID:
# Access entry by `entry_id` using `[]` operator
entry = CDSL.MW["263938"]
# Alternatively, use `CDSLDict.entry` function
entry = CDSL.MW.entry("263938")
# Note: Access using `[]` operator calls the `CDSLDict.entry` function.
# The difference is that, in case an `entry_id` is absent,
# `[]` based access will raise a `KeyError`
# `CDSLDict.entry` will return None and log a `logging.ERROR` level message
# >>> entry
# <MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
# Output transliteration scheme can also be provided
CDSL.MW.entry("263938", output_scheme="iast")
# <MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
Entry
class also supports transliteration after creation.
Thus, any entry fetched either through search()
function or through entry()
function can be transliterated.
Transliterate a single entry:
CDSL.MW.entry("263938").transliterate("slp1")
# <MWEntry: 263938: hfzIkeSa = lord of the senses (said of Manas), BhP.>
Change transliteration scheme for a dictionary:
CDSL.MW.set_scheme(input_scheme="itrans")
CDSL.MW.search("rAma")
Change search mode for a dictionary:
CDSL.MW.set_search_mode(mode="value")
CDSL.MW.search("hRRiShIkesha")
Classes CDSLCorpus
and CDSLDict
are iterable.
Iterating over
CDSLCorpus
yields loaded dictionary instances.Iterating over
CDSLDict
yields entries in that dictionary.
# Iteration over a `CDSLCorpus` instance
for cdsl_dict in CDSL:
print(type(cdsl_dict))
print(cdsl_dict)
break
# <class 'pycdsl.lexicon.CDSLDict'>
# CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
# Iteration over a `CDSLDict` isntance
for entry in CDSL.MW:
print(type(entry))
print(entry)
break
# <class 'pycdsl.models.MWEntry'>
# <MWEntry: 1: अ = 1. अ the first letter of the alphabet>
Note: Please check the documentation of modules in the PyCDSL Package for more detailed information on available classes and functions.
Using Console Interface of PyCDSL
Help to the Console Interface:
usage: cdsl [-h] [-i] [-s SEARCH] [-p PATH] [-d DICTS [DICTS ...]]
[-sm SEARCH_MODE] [-is INPUT_SCHEME] [-os OUTPUT_SCHEME]
[-hf HISTORY_FILE] [-sc STARTUP_SCRIPT]
[-u] [-dbg] [-v]
Access dictionaries from Cologne Digital Sanskrit Lexicon (CDSL)
optional arguments:
-h, --help show this help message and exit
-i, --interactive start in an interactive REPL mode
-s SEARCH, --search SEARCH
search pattern (ignored if `--interactive` mode is set)
-p PATH, --path PATH path to installation
-d DICTS [DICTS ...], --dicts DICTS [DICTS ...]
dictionary id(s)
-sm SEARCH_MODE, --search-mode SEARCH_MODE
search mode
-is INPUT_SCHEME, --input-scheme INPUT_SCHEME
input transliteration scheme
-os OUTPUT_SCHEME, --output-scheme OUTPUT_SCHEME
output transliteration scheme
-hf HISTORY_FILE, --history-file HISTORY_FILE
path to the history file
-sc STARTUP_SCRIPT, --startup-script STARTUP_SCRIPT
path to the startup script
-u, --update update specified dictionaries
-dbg, --debug turn debug mode on
-v, --version show version and exit
Common Usage:
$ cdsl -d MW AP90 -s हृषीकेश
Note: Arguments for specifying installation path, dictionary IDs, input and output transliteration schemes are valid for both interactive REPL shell and non-interactive console command.
Using REPL Interface of PyCDSL
REPL Interface is powered by cmd2
, and thus supports persistent history,
start-up script, and several other rich features.
To use REPL Interface to Cologne Digital Sanskrit Lexicon (CDSL):
$ cdsl -i
cmd2 Inherited REPL Features
Persistent History across sessions is maintained at
~/.cdsl_history
.If Start-up Script is present (
~/.cdslrc
), the commands (one per line) are run at the start-up.Customized shortcuts for several useful commands, such as
!
forshell
,/
forsearch
and$
forshow
.Aliases can be created on runtime.
Output Redirection works like the standard console, e.g.
command args > output.txt
will write the output ofcommand
tooutput.txt
. Similarly,>>
can be used to append the output.Clipboard Integration is supported through
Pyperclip
. If the output file name is omitted, the output is copied to the clipboard, e.g.,command args >
. The output can even be appended to clipboard bycommand args >>
.
References
pyperclip
: https://pypi.org/project/pyperclip/
Note: The locations of history file and start-up script can be customized through CLI options.
REPL Session Example
Cologne Sanskrit Digital Lexicon (CDSL)
---------------------------------------
Install or load dictionaries by typing `use [DICT_IDS..]` e.g. `use MW`.
Type any keyword to search in the selected dictionaries. (help or ? for list of options)
Loaded 4 dictionaries.
(CDSL::None) help -v
Documented commands (use 'help -v' for verbose/'help <topic>' for details):
Core
======================================================================================================
available Display a list of dictionaries available in CDSL
dicts Display a list of dictionaries available locally
info Display information about active dictionaries
search Search in the active dictionaries
show Show a specific entry by ID
stats Display statistics about active dictionaries
update Update loaded dictionaries
use Load the specified dictionaries from CDSL.
If not available locally, they will be installed first.
Utility
======================================================================================================
alias Manage aliases
help List available commands or provide detailed help for a specific command
history View, run, edit, save, or clear previously entered commands
macro Manage macros
quit Exit this application
run_script Run commands in script file that is encoded as either ASCII or UTF-8 text
set Set a settable parameter or show current settings of parameters
shell Execute a command as if at the OS prompt
shortcuts List available shortcuts
version Show the current version of PyCDSL
(CDSL::None) help available
Display a list of dictionaries available in CDSL
(CDSL::None) help search
Usage: search [-h] [--limit LIMIT] [--offset OFFSET] pattern
Search in the active dictionaries
Note
----
* Searching in the active dictionaries is also the default action.
* In general, we do not need to use this command explicitly unless we
want to search the command keywords, such as, `available` `search`,
`version`, `help` etc. in the active dictionaries.
positional arguments:
pattern search pattern
optional arguments:
-h, --help show this help message and exit
--limit LIMIT limit results
--offset OFFSET skip results
(CDSL::None) help dicts
Display a list of dictionaries available locally
(CDSL::None) dicts
CDSLDict(id='AP90', date='1890', name='Apte Practical Sanskrit-English Dictionary')
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
CDSLDict(id='MWE', date='1851', name='Monier-Williams English-Sanskrit Dictionary')
CDSLDict(id='AE', date='1920', name="Apte Student's English-Sanskrit Dictionary")
(CDSL::None) update
Data for dictionary 'AP90' is up-to-date.
Data for dictionary 'MW' is up-to-date.
Data for dictionary 'MWE' is up-to-date.
Data for dictionary 'AE' is up-to-date.
(CDSL::None) use MW
Using 1 dictionaries: ['MW']
(CDSL::MW) हृषीकेश
Found 6 results in MW.
<MWEntry: 263922: हृषीकेश = हृषी-केश a See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) show 263938
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) show 263938 --show-data
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
Data:
<H3A><h><key1>hfzIkeSa<\/key1><key2>hfzIkeSa<\/key2><\/h>
<body> lord of the senses (said of <s1 slp1="manas">Manas<\/s1>), <ls>BhP.<\/ls><info lex="inh"\/><\/body>
<tail><L>263938<\/L><pc>1303,2<\/pc><\/tail><\/H3A>
(CDSL::MW) $263938
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) $263938 > output.txt
(CDSL::MW) !cat output.txt
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) set input_scheme itrans
input_scheme - was: 'devanagari'
now: 'itrans'
(CDSL::MW) hRRiSIkesha
Found 6 results in MW.
<MWEntry: 263922: हृषीकेश = हृषी-केश a See below under हृषीक.>
<MWEntry: 263934: हृषीकेश = हृषीकेश b m. (perhaps = हृषी-केश cf. हृषी-वत् above) id. (-त्व n.), MBh.; Hariv. &c.>
<MWEntry: 263935: हृषीकेश = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: हृषीकेश = of a Tīrtha, Cat.>
<MWEntry: 263937: हृषीकेश = of a poet, ib.>
<MWEntry: 263938: हृषीकेश = lord of the senses (said of Manas), BhP.>
(CDSL::MW) set output_scheme iast
output_scheme - was: 'devanagari'
now: 'iast'
(CDSL::MW) hRRiSIkesha
Found 6 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
(CDSL::MW) set limit 2
limit - was: 50
now: 2
(CDSL::MW) hRRiSIkesha
Found 2 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
(CDSL::MW) set limit -1
limit - was: 2
now: None
(CDSL::MW) set search_mode value
search_mode - was: 'key'
now: 'value'
(CDSL::MW) hRRiSIkesha
Found 1 results in MW.
<MWEntry: 263938.1: hṛṣīkeśatva = hṛṣīkeśa—tva n.>
(CDSL::MW) set search_mode both
search_mode - was: 'value'
now: 'both'
(CDSL::MW) hRRiSIkesha
Found 7 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
<MWEntry: 263938.1: hṛṣīkeśatva = hṛṣīkeśa—tva n.>
(CDSL::MW) info
Total 1 dictionaries are active.
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
(CDSL::MW) stats
Total 1 dictionaries are active.
---
CDSLDict(id='MW', date='1899', name='Monier-Williams Sanskrit-English Dictionary')
{'total': 287627, 'distinct': 194044, 'top': [('कृष्ण', 50), ('शिव', 46), ('विजय', 46), ('पुष्कर', 45), ('काल', 39), ('सिद्ध', 39), ('योग', 39), ('चित्र', 38), ('शुचि', 36), ('वसु', 36)]}
(CDSL::MW) use WIL
Downloading 'WIL.web.zip' ... (8394727 bytes)
100%|██████████████████████████████████████████████████████████████████████████████████████| 8.39M/8.39M [00:21<00:00, 386kB/s]
Successfully downloaded 'WIL.web.zip' from 'https://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/2020/downloads/wilweb1.zip'.
Using 1 dictionaries: ['WIL']
(CDSL::WIL)
(CDSL::WIL) use WIL MW
Using 2 dictionaries: ['WIL', 'MW']
(CDSL::WIL,MW) hRRiSIkesha
Found 1 results in WIL.
<WILEntry: 44411: hṛṣīkeśa = hṛṣīkeśa m. (-śaḥ) KṚṢṆA or VIṢṆU. E. hṛṣīka an organ of sense, and īśa lord.>
Found 6 results in MW.
<MWEntry: 263922: hṛṣīkeśa = hṛṣī-keśa a See below under hṛṣīka.>
<MWEntry: 263934: hṛṣīkeśa = hṛṣīkeśa b m. (perhaps = hṛṣī-keśa cf. hṛṣī-vat above) id. (-tva n.), MBh.; Hariv. &c.>
<MWEntry: 263935: hṛṣīkeśa = N. of the tenth month, VarBṛS.>
<MWEntry: 263936: hṛṣīkeśa = of a Tīrtha, Cat.>
<MWEntry: 263937: hṛṣīkeśa = of a poet, ib.>
<MWEntry: 263938: hṛṣīkeśa = lord of the senses (said of Manas), BhP.>
(CDSL::WIL,MW) use MW AP90 MWE AE
Using 4 dictionaries: ['MW', 'AP90', 'MWE', 'AE']
(CDSL::MW+3) use --all
Using 5 dictionaries: ['AP90', 'MW', 'MWE', 'AE', 'WIL']
(CDSL::AP90+3) use --none
Using 0 dictionaries: []
(CDSL::None) quit
pycdsl
pycdsl package
Submodules
pycdsl.cli module
Console script for PyCDSL
pycdsl.constants module
PyCDSL Constants
pycdsl.corpus module
CDSL Corpus Management
- class pycdsl.corpus.CDSLCorpus(data_dir: Optional[str] = None, search_mode: str = 'key', input_scheme: str = 'devanagari', output_scheme: str = 'devanagari', transliterate_keys: bool = True)[source]
Bases:
object
CDSL Corpus Class
Refers to a CDSL installation instance at the location data_dir.
- data_dir: str = None
- search_mode: str = 'key'
- input_scheme: str = 'devanagari'
- output_scheme: str = 'devanagari'
- transliterate_keys: bool = True
- setup(dict_ids: Optional[list] = None, update: bool = False, model_map: Optional[Dict[str, Tuple[pycdsl.models.Lexicon, pycdsl.models.Entry]]] = None) bool [source]
Setup CDSL dictionaries in bulk
Calls CDSLDict.setup() on every CDSLDict, and if successful, also calls CDSLDict.connect() to establish a connection to the database
- Parameters
dict_ids (list or None, optional) – List of dictionary IDs to setup. If None, the dictionaries from DEFAULT_DICTIONARIES as well as locally installed dictionaries will be setup. The default is None.
update (bool, optional) – If True, and update check is performed for every dictionary in dict_ids, and if available, the updated version is installed The default is False.
lexicon_model (object, optional) – Lexicon model argument passed to CDSLDict.connect() The default is None.
entry_model (object, optional) – Entry model argument passed to CDSLDict.connect() The default is None.
model_map (dict, optional) – Map of dictionary ID to a tuple of lexicon model and entry model. The argument is used to specify lexicon_model and entry_model arguments passed to CDSLDict.connect(). If None, the default map DEFAULT_MODEL_MAP will be used. The default is None.
- Returns
True, if the setup of all the dictionaries from dict_ids is successful. i.e. If every CDSLDict.setup() call returns True.
- Return type
bool
- Raises
ValueError – If dict_ids is not a list or None.
- search(pattern: str, dict_ids: Optional[List[str]] = None, mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, ignore_case: bool = False, limit: Optional[int] = None, offset: Optional[int] = None, omit_empty: bool = True) Dict[str, List[pycdsl.models.Entry]] [source]
Search in multiple dictionaries from the corpus
- Parameters
pattern (str) – Search pattern, may contain wildcards (*).
dict_ids (list or None) – List of dictionary IDs to search in. Only the dict_ids that exist in self.dicts will be used. If None, all the dictionaries that have been setup, i.e., the dictionaries from self.dicts will be used. The default is None.
mode (str or None, optional) – Search mode to query by key, value or both. The default is None.
input_scheme (str or None, optional) – Input transliteration scheme If None, self.input_scheme will be used. The default is None.
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
ignore_case (bool, optional) – Ignore case while performing lookup. The default is False.
limit (int or None, optional) – Limit the number of search results to limit. The default is None.
offset (int or None, optional) – Offset the search results by offset. The default is None
omit_empty (bool, optional) – If True, only the non-empty search results will be included. The default is False.
- Returns
Dictionary of (dict_id, list of matching entries)
- Return type
dict
- get_available_dicts() Dict[str, pycdsl.lexicon.CDSLDict] [source]
Fetch a list of dictionaries available for download from CDSL
Homepage of CDSL Project (SERVER_URL) is fetched and parsed to obtain this list.
- get_installed_dicts() Dict[str, pycdsl.lexicon.CDSLDict] [source]
Fetch a list of dictionaries installed locally
pycdsl.lexicon module
CDSL Lexicon Management
- class pycdsl.lexicon.CDSLDict(id: str, date: str, name: str, url: str, db: Optional[str] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, transliterate_keys: Optional[bool] = None)[source]
Bases:
object
Dictionary from CDSL
- id: str
- date: str
- name: str
- url: str
- db: str = None
- search_mode: str = None
- input_scheme: str = None
- output_scheme: str = None
- transliterate_keys: bool = None
- download(download_dir: str) bool [source]
Download and extract dictionary data
- Parameters
download_dir (str or Path) – Full path of directory where the dictionary data should be downloaded and extracted
- Returns
True if successfully downloaded or already up-to-date
- Return type
bool
- setup(data_dir: str, symlink_dir: Optional[str] = None, update: bool = False) bool [source]
Setup the dictionary database path
- Parameters
data_dir (str or Path) – Full path of directory where the dictionary data is stored
symlink_dir (str or Path, optional) – Full path of the directory where the symbolink links to the SQLite database of dictionary will be created If None, symbolic links aren’t created. The default is None.
update (bool, optional) – If True, an attempt to update dictionary data will be made. The default is False.
- Returns
True if the setup was successful
- Return type
bool
- set_scheme(input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, transliterate_keys: Optional[bool] = None)[source]
Set transliteration scheme for the dictionary instance
- Parameters
input_scheme (str, optional) – Input transliteration scheme. If None, INTERNAL_SCHEME is used. The default is None.
output_scheme (str, optional) – Output transliteration scheme. If None, INTERNAL_SCHEME is used. The default is None.
transliterate_keys (bool, optional) – Determines whether the keys in lexicon should be transliterated to scheme or not. If None, the value will be inferred based on dictionary type. The default is None.
- set_search_mode(mode: str)[source]
Set search mode
- Parameters
mode (str) – Valid values are ‘key’, ‘value’, ‘both’ Recommended to use the convenience variables SEARCH_MODE_KEY, SEARCH_MODE_VALUE or SEARCH_MODE_BOTH.
- connect(lexicon_model: Optional[pycdsl.models.Lexicon] = None, entry_model: Optional[pycdsl.models.Entry] = None)[source]
Connect to the SQLite database
If both lexicon_model and entry_model are specified, they are used as the ORM layer, and take preference over model_map.
If any of lexicon_model or entry_model is None, then the models are resolved in the following way.
First, if the current dictionary ID is present in model_map the models specified by the model_map are used. Otherwise, models.lexicon_constructor and models.entry_constructor functions are used, which subclass the models.Lexicon and models.Entry models.
- Parameters
lexicon_model (object, optional) – Lexicon model. The default is None.
entry_model (object, optional) – Entry model. The default is None.
- stats(top: int = 10, output_scheme: str = None) Dict [source]
Display statistics about the lexicon
- Parameters
top (int, optional) – Display top top entries having most different meanings. The default is 10.
output_scheme (str, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
- Returns
Statistics about the dictionary
- Return type
dict
- search(pattern: str, mode: str = None, input_scheme: str = None, output_scheme: str = None, ignore_case: str = False, limit: int = None, offset: int = None) List[pycdsl.models.Entry] [source]
Search in the dictionary
- Parameters
pattern (str) – Search pattern, may contain wildcards (*).
mode (str or None, optional) – Search mode to query by key, value or both. If None, self.search_mode will be used. The default is None.
input_scheme (str or None, optional) – Input transliteration scheme If None, self.input_scheme will be used. The default is None.
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
ignore_case (bool, optional) – Ignore case while performing lookup. The default is False.
limit (int or None, optional) – Limit the number of search results to limit. The default is None.
offset (int or None, optional) – Offset the search results by offset. The default is None
- Returns
List of matching entries
- Return type
list
- entry(entry_id: str, output_scheme: Optional[str] = None) pycdsl.models.Entry [source]
Get an entry by ID
- Parameters
entry_id (str) – Entry ID to lookup
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
- Returns
If the entry_id is valid, Entry with the matching ID otherwise, None.
- Return type
object
- dump(output_path: Optional[str] = None, output_scheme: Optional[str] = None) List[Dict[str, str]] [source]
Dump data as JSON
- Parameters
output_path (str or Path, optional) – Path to the output JSON file. If None, the data isn’t written to the disk, only returned. The default is None.
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None
- Returns
List of all the entries in the dictionary. Every entry is a dict. If output_path is provided, the same list is written as JSON.
- Return type
list
pycdsl.models module
Models for Lexicon Access
- class pycdsl.models.Lexicon(*args, **kwargs)[source]
Bases:
peewee.Model
Lexicon Model
- id = <DecimalField: Lexicon.id>
- key = <CharField: Lexicon.key>
- data = <TextField: Lexicon.data>
- DoesNotExist
alias of
pycdsl.models.LexiconDoesNotExist
- class pycdsl.models.Entry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Bases:
object
Lexicon Entry
Wraps instances of Lexicon model which respresent query results
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- __init__(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- transliterate(scheme: str = 'devanagari', transliterate_keys: bool = True)[source]
Transliterate Data
Part of the data in lexicon that is enclosed in <s> tags will be transliterated to scheme.
- Parameters
scheme (str, optional) – Output transliteration scheme. If invalid or None, no transliteration will take place. The default is DEFAULT_SCHEME.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- Returns
Returns a new transliterated instance
- Return type
object
- pycdsl.models.lexicon_constructor(dict_id: str, table_name: Optional[str] = None) pycdsl.models.Lexicon [source]
Construct a Lexicon Model
- Parameters
dict_id (str) – Dictionary ID
table_name (str, optional) – Name of the table in SQLite database. If None, it will be inferred as dict_id.lower() The default is None.
- Returns
Constructed class (a subclass of Lexicon) for a dictionary
- Return type
object
- pycdsl.models.entry_constructor(dict_id: str) pycdsl.models.Entry [source]
Construct an Entry Model
- Parameters
dict_id (str) – Dictionary ID
- Returns
Constructed class (a subclass of Entry) for a dictionary entry
- Return type
object
- class pycdsl.models.AP90Lexicon(*args, **kwargs)[source]
Bases:
pycdsl.models.Lexicon
- DoesNotExist
alias of
pycdsl.models.AP90LexiconDoesNotExist
- data = <TextField: AP90Lexicon.data>
- id = <DecimalField: AP90Lexicon.id>
- key = <CharField: AP90Lexicon.key>
- class pycdsl.models.AP90Entry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Bases:
pycdsl.models.Entry
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- class pycdsl.models.MWLexicon(*args, **kwargs)[source]
Bases:
pycdsl.models.Lexicon
- DoesNotExist
alias of
pycdsl.models.MWLexiconDoesNotExist
- data = <TextField: MWLexicon.data>
- id = <DecimalField: MWLexicon.id>
- key = <CharField: MWLexicon.key>
- class pycdsl.models.MWEntry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Bases:
pycdsl.models.Entry
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
pycdsl.shell module
REPL Shell for PyCDSL
- class pycdsl.shell.BasicShell(completekey: str = 'tab', stdin: Optional[TextIO] = None, stdout: Optional[TextIO] = None, *, persistent_history_file: str = '', persistent_history_length: int = 1000, startup_script: str = '', silence_startup_script: bool = False, include_py: bool = False, include_ipy: bool = False, allow_cli_args: bool = True, transcript_files: Optional[List[str]] = None, allow_redirection: bool = True, multiline_commands: Optional[List[str]] = None, terminators: Optional[List[str]] = None, shortcuts: Optional[Dict[str, str]] = None, command_sets: Optional[Iterable[cmd2.command_definition.CommandSet]] = None, auto_load_commands: bool = True)[source]
Bases:
cmd2.cmd2.Cmd
An easy but powerful framework for writing line-oriented command interpreters. Extends Python’s cmd package.
- Parameters
completekey – readline name of a completion key, default to Tab
stdin – alternate input file object, if not specified, sys.stdin is used
stdout – alternate output file object, if not specified, sys.stdout is used
persistent_history_file – file path to load a persistent cmd2 command history from
persistent_history_length – max number of history items to write to the persistent history file
startup_script – file path to a script to execute at startup
silence_startup_script – if
True
, then the startup script’s output will be suppressed. Anything written to stderr will still display.include_py – should the “py” command be included for an embedded Python shell
include_ipy – should the “ipy” command be included for an embedded IPython shell
allow_cli_args – if
True
, thencmd2.Cmd.__init__()
will process command line arguments as either commands to be run or, if-t
or--test
are given, transcript files to run. This should be set toFalse
if your application parses its own command line arguments.transcript_files – pass a list of transcript files to be run on initialization. This allows running transcript tests when
allow_cli_args
isFalse
. Ifallow_cli_args
isTrue
this parameter is ignored.allow_redirection – If
False
, prevent output redirection and piping to shell commands. This parameter prevents redirection and piping, but does not alter parsing behavior. A user can still type redirection and piping tokens, and they will be parsed as such but they won’t do anything.multiline_commands – list of commands allowed to accept multi-line input
terminators – list of characters that terminate a command. These are mainly intended for terminating multiline commands, but will also terminate single-line commands. If not supplied, the default is a semicolon. If your app only contains single-line commands and you want terminators to be treated as literals by the parser, then set this to an empty list.
shortcuts – dictionary containing shortcuts for commands. If not supplied, then defaults to constants.DEFAULT_SHORTCUTS. If you do not want any shortcuts, pass an empty dictionary.
command_sets – Provide CommandSet instances to load during cmd2 initialization. This allows CommandSets with custom constructor parameters to be loaded. This also allows the a set of CommandSets to be provided when auto_load_commands is set to False
auto_load_commands – If True, cmd2 will check for all subclasses of CommandSet that are currently loaded by Python and automatically instantiate and register all commands. If False, CommandSets must be manually installed with register_command_set.
- class pycdsl.shell.CDSLShell(data_dir: Optional[str] = None, dict_ids: Optional[List[str]] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, history_file: Optional[str] = None, startup_script: Optional[str] = None)[source]
Bases:
pycdsl.shell.BasicShell
REPL Interface to CDSL
REPL Interface to CDSL
Create an instance of CDSLCorpus as per the providd parameters. CDSLCorpus.setup() is called after the command-loop starts.
- Parameters
data_dir (str or None, optional) – Load a CDSL installation instance at the location data_dir. Passed to CDSLCorpus instance as a keyword argument data_dir.
dict_ids (list or None, optional) – List of dictionary IDs to setup. Passed to a CDSLCorpus.setup() as a keyword argument dict_ids.
search_mode (str or None, optional) – Search mode to query by key, value or both. The default is None.
input_scheme (str or None, optional) – Transliteration scheme for input. If None, DEFAULT_SCHEME is used. The default is None.
output_scheme (str or None, optional) – Transliteration scheme for output. If None, DEFAULT_SCHEME is used. The default is None.
history_file (str or None, optional) – Path to the history file to keep a persistant history. If None, the history does not persist across sessions. The default is None.
startup_script (str or None, optional) – Path to the startup script with a list of startup commands to be executed after initialization. If None, no startup commands are run. The default is None.
- intro = 'Cologne Sanskrit Digital Lexicon (CDSL)\n---------------------------------------'
- desc = 'Install or load dictionaries by typing `use [DICT_IDS..]` e.g. `use MW`.\nType any keyword to search in the selected dictionaries. (help or ? for list of options)'
- prompt = '(CDSL::None) '
- schemes = ['devanagari', 'iast', 'itrans', 'velthuis', 'hk', 'slp1', 'wx']
- search_modes = ['key', 'value', 'both']
- __init__(data_dir: Optional[str] = None, dict_ids: Optional[List[str]] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, history_file: Optional[str] = None, startup_script: Optional[str] = None)[source]
REPL Interface to CDSL
Create an instance of CDSLCorpus as per the providd parameters. CDSLCorpus.setup() is called after the command-loop starts.
- Parameters
data_dir (str or None, optional) – Load a CDSL installation instance at the location data_dir. Passed to CDSLCorpus instance as a keyword argument data_dir.
dict_ids (list or None, optional) – List of dictionary IDs to setup. Passed to a CDSLCorpus.setup() as a keyword argument dict_ids.
search_mode (str or None, optional) – Search mode to query by key, value or both. The default is None.
input_scheme (str or None, optional) – Transliteration scheme for input. If None, DEFAULT_SCHEME is used. The default is None.
output_scheme (str or None, optional) – Transliteration scheme for output. If None, DEFAULT_SCHEME is used. The default is None.
history_file (str or None, optional) – Path to the history file to keep a persistant history. If None, the history does not persist across sessions. The default is None.
startup_script (str or None, optional) – Path to the startup script with a list of startup commands to be executed after initialization. If None, no startup commands are run. The default is None.
- use_parser = Cmd2ArgumentParser(prog='use', usage=None, description='\n Load the specified dictionaries from CDSL.\n If not available locally, they will be installed first.\n ', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
- do_use(namespace: argparse.Namespace)[source]
Load the specified dictionaries from CDSL. If not available locally, they will be installed first.
- show_parser = Cmd2ArgumentParser(prog='show', usage=None, description='Show a specific entry by ID', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
- search_parser = Cmd2ArgumentParser(prog='search', usage=None, description='\n Search in the active dictionaries\n\n Note\n ----\n * Searching in the active dictionaries is also the default action.\n * In general, we do not need to use this command explicitly unless we\n want to search the command keywords, such as, `available` `search`,\n `version`, `help` etc. in the active dictionaries.\n ', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
- do_search(namespace: argparse.Namespace)[source]
Search in the active dictionaries
Note
Searching in the active dictionaries is also the default action.
In general, we do not need to use this command explicitly unless we want to search the command keywords, such as, available search, version, help etc. in the active dictionaries.
- default(statement: cmd2.parsing.Statement)[source]
Executed when the command given isn’t a recognized command implemented by a do_* method.
- Parameters
statement – Statement object with parsed input
- cmdloop(intro: Optional[cmd2.parsing.Statement] = None)[source]
This is an outer wrapper around _cmdloop() which deals with extra features provided by cmd2.
_cmdloop() provides the main loop equivalent to cmd.cmdloop(). This is a wrapper around that which deals with the following extra features provided by cmd2: - transcript testing - intro banner - exit code
- Parameters
intro – if provided this overrides self.intro and serves as the intro banner printed once at start
pycdsl.utils module
Utility Functions
- pycdsl.utils.validate_search_mode(mode: str) str [source]
Validate the search mode
- Parameters
mode (str) – Search mode
- Returns
If mode is valid, mode.lower() otherwise, None.
- Return type
str or None
- pycdsl.utils.validate_scheme(scheme: str) str [source]
Validate the name of transliteration scheme
- Parameters
scheme (str) – Name of the transltieration scheme
- Returns
If scheme is valid, scheme.lower() otherwise, None.
- Return type
str or None
- pycdsl.utils.transliterate_between(text: str, from_scheme: str, to_scheme: str, start_pattern: str, end_pattern: str) str [source]
Transliterate the text appearing between two patterns
Only the text appearing between patterns start_pattern and end_pattern it transliterated. start_pattern and end_pattern can appear multiple times in the full text, and for every occurrence, the text between them is transliterated.
from_scheme and to_scheme should be compatible with scheme names from indic-transliteration
- Parameters
text (str) – Full text
from_scheme (str) – Input transliteration scheme
to_scheme (str) – Output transliteration scheme
start_pattern (regexp) – Pattern describing the start tag
end_pattern (regexp) – Pattern describing the end tag
Module contents
PyCDSL
Python Interface to Cologne Digital Sanskrit Lexicon (CDSL).
Contributing
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions
Report Bugs
Report bugs at https://github.com/hrishikeshrt/PyCDSL/issues.
If you are reporting a bug, please include:
Your operating system name and version.
Any details about your local setup that might be helpful in troubleshooting.
Detailed steps to reproduce the bug.
Fix Bugs
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Implement Features
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.
Write Documentation
Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) could always use more documentation, whether as part of the official Python Interface to Cologne Digital Sanskrit Lexicon (CDSL) docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback
The best way to send feedback is to file an issue at https://github.com/hrishikeshrt/PyCDSL/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Remember that this is a volunteer-driven project, and that contributions are welcome :)
Get Started!
Ready to contribute? Here’s how to set up pycdsl for local development.
Fork the pycdsl repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/pycdsl.git
Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:
$ mkvirtualenv pycdsl $ cd pycdsl/ $ python setup.py develop
Create a branch for local development:
$ git checkout -b name-of-your-bugfix-or-feature
Now you can make your changes locally.
When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
$ flake8 pycdsl tests $ python setup.py test or pytest $ tox
To get flake8 and tox, just pip install them into your virtualenv.
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull Request Guidelines
Before you submit a pull request, check that it meets these guidelines:
The pull request should include tests.
If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.
The pull request should work for Python 3.6, 3.7, and 3.8 and for PyPy. Check https://travis-ci.com/hrishikeshrt/PyCDSL/pull_requests and make sure that the tests pass for all supported Python versions.
Tips
To run a subset of tests:
$ pytest tests.test_pycdsl
Deploying
A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run:
$ bump2version patch # possible: major / minor / patch
$ git push
$ git push --tags
Travis will then deploy to PyPI if tests pass.
Credits
Development Lead
Hrishikesh Terdalkar <hrishikeshrt@linuxmail.org>
Contributors
None yet. Why not be the first?
History
0.9.0 (2022-04-27)
0.8.0 (2022-04-17)
Add search mode support to all interfaces (Issue #24)
Uniformly follow lower case convention for CLI help mesages
Update documentation
Fix bugs
0.7.0 (2022-03-17)
Add the explicit REPL command
search
Add a REPL command
stats
Interpret arguments
all
andnone
to the REPL commanduse
Add
lexicon_id
toEntry
classAdd a placeholder for post-init hook in
Entry
. If implemented, this will be run after__init__()
ofEntry
Remove
model_map
fromCDSLDict
and add toCDSLCorpus
Add tests for lexicon initalization, download, setup, transliteration, iteration, getitem, stats, entry, dump
Add credits to CDSL website
Update documentation
Fix bugs
0.6.0 (2022-02-14)
Add
__getitem__
method toCDSLCorpus
to access loaded dictionaries using [] operator withdict_id
Add
__getitem__
method toCDSLDict
to access dictionary entries using [] operator withentry_id
Add unit tests and integration tests for
pycdsl.utils
Add unit tests and integration tests for
pycdsl.corpus
Update documentation
Fix bugs
0.5.0 (2022-02-13)
Add
model_map
argument toCDSLDict.connect
for better customizationMake
CDSLCorpus
iterable (iterate over loaded dictionaries)Make
CDSLDict
iterable (iterate over dictionary entries)Update documentation
0.4.0 (2022-02-12)
Add ability to limit and offset the number of search results
Add
.to_dict()
method toEntry
classAdd multi-dictionary
.search()
fromCDSLCorpus
Add support for multiple active dictionaries in REPL
Improve code structure (more modular)
Improve documentation formatting
Update documentation
Fix bugs
0.3.0 (2022-02-07)
Functional CLI (console command) for dictionary search
Integration of existing REPL into the CLI. (
--interactive
)Extend transliteration support on Corpus, Dictionary, Search and Entry level
Make the package Python 3.6 compatibile
0.2.0 (2022-02-05)
Improve dictionary setup
Add a function to dump data
Add logging support
Add transliteration support using
indic-transliteration
0.1.0 (2022-01-28)
First release on PyPI.