pycdsl package

Submodules

pycdsl.cli module

Console script for PyCDSL

pycdsl.cli.main()[source]

Command Line Interface for PyCDSL

pycdsl.constants module

PyCDSL Constants

pycdsl.corpus module

CDSL Corpus Management

class pycdsl.corpus.CDSLCorpus(data_dir: Optional[str] = None, search_mode: str = 'key', input_scheme: str = 'devanagari', output_scheme: str = 'devanagari', transliterate_keys: bool = True)[source]

Bases: object

CDSL Corpus Class

Refers to a CDSL installation instance at the location data_dir.

data_dir: str = None
search_mode: str = 'key'
input_scheme: str = 'devanagari'
output_scheme: str = 'devanagari'
transliterate_keys: bool = True
setup(dict_ids: Optional[list] = None, update: bool = False, model_map: Optional[Dict[str, Tuple[pycdsl.models.Lexicon, pycdsl.models.Entry]]] = None) bool[source]

Setup CDSL dictionaries in bulk

Calls CDSLDict.setup() on every CDSLDict, and if successful, also calls CDSLDict.connect() to establish a connection to the database

Parameters
  • dict_ids (list or None, optional) – List of dictionary IDs to setup. If None, the dictionaries from DEFAULT_DICTIONARIES as well as locally installed dictionaries will be setup. The default is None.

  • update (bool, optional) – If True, and update check is performed for every dictionary in dict_ids, and if available, the updated version is installed The default is False.

  • lexicon_model (object, optional) – Lexicon model argument passed to CDSLDict.connect() The default is None.

  • entry_model (object, optional) – Entry model argument passed to CDSLDict.connect() The default is None.

  • model_map (dict, optional) – Map of dictionary ID to a tuple of lexicon model and entry model. The argument is used to specify lexicon_model and entry_model arguments passed to CDSLDict.connect(). If None, the default map DEFAULT_MODEL_MAP will be used. The default is None.

Returns

True, if the setup of all the dictionaries from dict_ids is successful. i.e. If every CDSLDict.setup() call returns True.

Return type

bool

Raises

ValueError – If dict_ids is not a list or None.

search(pattern: str, dict_ids: Optional[List[str]] = None, mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, ignore_case: bool = False, limit: Optional[int] = None, offset: Optional[int] = None, omit_empty: bool = True) Dict[str, List[pycdsl.models.Entry]][source]

Search in multiple dictionaries from the corpus

Parameters
  • pattern (str) – Search pattern, may contain wildcards (*).

  • dict_ids (list or None) – List of dictionary IDs to search in. Only the dict_ids that exist in self.dicts will be used. If None, all the dictionaries that have been setup, i.e., the dictionaries from self.dicts will be used. The default is None.

  • mode (str or None, optional) – Search mode to query by key, value or both. The default is None.

  • input_scheme (str or None, optional) – Input transliteration scheme If None, self.input_scheme will be used. The default is None.

  • output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.

  • ignore_case (bool, optional) – Ignore case while performing lookup. The default is False.

  • limit (int or None, optional) – Limit the number of search results to limit. The default is None.

  • offset (int or None, optional) – Offset the search results by offset. The default is None

  • omit_empty (bool, optional) – If True, only the non-empty search results will be included. The default is False.

Returns

Dictionary of (dict_id, list of matching entries)

Return type

dict

get_available_dicts() Dict[str, pycdsl.lexicon.CDSLDict][source]

Fetch a list of dictionaries available for download from CDSL

Homepage of CDSL Project (SERVER_URL) is fetched and parsed to obtain this list.

get_installed_dicts() Dict[str, pycdsl.lexicon.CDSLDict][source]

Fetch a list of dictionaries installed locally

pycdsl.lexicon module

CDSL Lexicon Management

class pycdsl.lexicon.CDSLDict(id: str, date: str, name: str, url: str, db: Optional[str] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, transliterate_keys: Optional[bool] = None)[source]

Bases: object

Dictionary from CDSL

id: str
date: str
name: str
url: str
db: str = None
search_mode: str = None
input_scheme: str = None
output_scheme: str = None
transliterate_keys: bool = None
download(download_dir: str) bool[source]

Download and extract dictionary data

Parameters

download_dir (str or Path) – Full path of directory where the dictionary data should be downloaded and extracted

Returns

True if successfully downloaded or already up-to-date

Return type

bool

setup(data_dir: str, symlink_dir: Optional[str] = None, update: bool = False) bool[source]

Setup the dictionary database path

Parameters
  • data_dir (str or Path) – Full path of directory where the dictionary data is stored

  • symlink_dir (str or Path, optional) – Full path of the directory where the symbolink links to the SQLite database of dictionary will be created If None, symbolic links aren’t created. The default is None.

  • update (bool, optional) – If True, an attempt to update dictionary data will be made. The default is False.

Returns

True if the setup was successful

Return type

bool

set_scheme(input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, transliterate_keys: Optional[bool] = None)[source]

Set transliteration scheme for the dictionary instance

Parameters
  • input_scheme (str, optional) – Input transliteration scheme. If None, INTERNAL_SCHEME is used. The default is None.

  • output_scheme (str, optional) – Output transliteration scheme. If None, INTERNAL_SCHEME is used. The default is None.

  • transliterate_keys (bool, optional) – Determines whether the keys in lexicon should be transliterated to scheme or not. If None, the value will be inferred based on dictionary type. The default is None.

set_search_mode(mode: str)[source]

Set search mode

Parameters

mode (str) – Valid values are ‘key’, ‘value’, ‘both’ Recommended to use the convenience variables SEARCH_MODE_KEY, SEARCH_MODE_VALUE or SEARCH_MODE_BOTH.

connect(lexicon_model: Optional[pycdsl.models.Lexicon] = None, entry_model: Optional[pycdsl.models.Entry] = None)[source]

Connect to the SQLite database

If both lexicon_model and entry_model are specified, they are used as the ORM layer, and take preference over model_map.

If any of lexicon_model or entry_model is None, then the models are resolved in the following way.

First, if the current dictionary ID is present in model_map the models specified by the model_map are used. Otherwise, models.lexicon_constructor and models.entry_constructor functions are used, which subclass the models.Lexicon and models.Entry models.

Parameters
  • lexicon_model (object, optional) – Lexicon model. The default is None.

  • entry_model (object, optional) – Entry model. The default is None.

stats(top: int = 10, output_scheme: str = None) Dict[source]

Display statistics about the lexicon

Parameters
  • top (int, optional) – Display top top entries having most different meanings. The default is 10.

  • output_scheme (str, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.

Returns

Statistics about the dictionary

Return type

dict

search(pattern: str, mode: str = None, input_scheme: str = None, output_scheme: str = None, ignore_case: str = False, limit: int = None, offset: int = None) List[pycdsl.models.Entry][source]

Search in the dictionary

Parameters
  • pattern (str) – Search pattern, may contain wildcards (*).

  • mode (str or None, optional) – Search mode to query by key, value or both. If None, self.search_mode will be used. The default is None.

  • input_scheme (str or None, optional) – Input transliteration scheme If None, self.input_scheme will be used. The default is None.

  • output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.

  • ignore_case (bool, optional) – Ignore case while performing lookup. The default is False.

  • limit (int or None, optional) – Limit the number of search results to limit. The default is None.

  • offset (int or None, optional) – Offset the search results by offset. The default is None

Returns

List of matching entries

Return type

list

entry(entry_id: str, output_scheme: Optional[str] = None) pycdsl.models.Entry[source]

Get an entry by ID

Parameters
  • entry_id (str) – Entry ID to lookup

  • output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.

Returns

If the entry_id is valid, Entry with the matching ID otherwise, None.

Return type

object

dump(output_path: Optional[str] = None, output_scheme: Optional[str] = None) List[Dict[str, str]][source]

Dump data as JSON

Parameters
  • output_path (str or Path, optional) – Path to the output JSON file. If None, the data isn’t written to the disk, only returned. The default is None.

  • output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None

Returns

List of all the entries in the dictionary. Every entry is a dict. If output_path is provided, the same list is written as JSON.

Return type

list

pycdsl.models module

Models for Lexicon Access

class pycdsl.models.Lexicon(*args, **kwargs)[source]

Bases: peewee.Model

Lexicon Model

id = <DecimalField: Lexicon.id>
key = <CharField: Lexicon.key>
data = <TextField: Lexicon.data>
DoesNotExist

alias of pycdsl.models.LexiconDoesNotExist

class pycdsl.models.Entry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]

Bases: object

Lexicon Entry

Wraps instances of Lexicon model which respresent query results

Lexicon Entry

Parameters
  • lexicon_entry (Lexicon) – Instance of Lexicon model

  • lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs

  • scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.

  • transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.

__init__(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]

Lexicon Entry

Parameters
  • lexicon_entry (Lexicon) – Instance of Lexicon model

  • lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs

  • scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.

  • transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.

__post_init__()[source]

Placeholder to implement a custom post-init hook

transliterate(scheme: str = 'devanagari', transliterate_keys: bool = True)[source]

Transliterate Data

Part of the data in lexicon that is enclosed in <s> tags will be transliterated to scheme.

Parameters
  • scheme (str, optional) – Output transliteration scheme. If invalid or None, no transliteration will take place. The default is DEFAULT_SCHEME.

  • transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.

Returns

Returns a new transliterated instance

Return type

object

meaning() str[source]

Extract meaning of the entry

to_dict() Dict[str, str][source]

Get a python dict representation of the entry

parse()[source]
pycdsl.models.lexicon_constructor(dict_id: str, table_name: Optional[str] = None) pycdsl.models.Lexicon[source]

Construct a Lexicon Model

Parameters
  • dict_id (str) – Dictionary ID

  • table_name (str, optional) – Name of the table in SQLite database. If None, it will be inferred as dict_id.lower() The default is None.

Returns

Constructed class (a subclass of Lexicon) for a dictionary

Return type

object

pycdsl.models.entry_constructor(dict_id: str) pycdsl.models.Entry[source]

Construct an Entry Model

Parameters

dict_id (str) – Dictionary ID

Returns

Constructed class (a subclass of Entry) for a dictionary entry

Return type

object

class pycdsl.models.AP90Lexicon(*args, **kwargs)[source]

Bases: pycdsl.models.Lexicon

DoesNotExist

alias of pycdsl.models.AP90LexiconDoesNotExist

data = <TextField: AP90Lexicon.data>
id = <DecimalField: AP90Lexicon.id>
key = <CharField: AP90Lexicon.key>
class pycdsl.models.AP90Entry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]

Bases: pycdsl.models.Entry

Lexicon Entry

Parameters
  • lexicon_entry (Lexicon) – Instance of Lexicon model

  • lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs

  • scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.

  • transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.

class pycdsl.models.MWLexicon(*args, **kwargs)[source]

Bases: pycdsl.models.Lexicon

DoesNotExist

alias of pycdsl.models.MWLexiconDoesNotExist

data = <TextField: MWLexicon.data>
id = <DecimalField: MWLexicon.id>
key = <CharField: MWLexicon.key>
class pycdsl.models.MWEntry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]

Bases: pycdsl.models.Entry

Lexicon Entry

Parameters
  • lexicon_entry (Lexicon) – Instance of Lexicon model

  • lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs

  • scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.

  • transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.

pycdsl.shell module

REPL Shell for PyCDSL

class pycdsl.shell.BasicShell(completekey: str = 'tab', stdin: Optional[TextIO] = None, stdout: Optional[TextIO] = None, *, persistent_history_file: str = '', persistent_history_length: int = 1000, startup_script: str = '', silence_startup_script: bool = False, include_py: bool = False, include_ipy: bool = False, allow_cli_args: bool = True, transcript_files: Optional[List[str]] = None, allow_redirection: bool = True, multiline_commands: Optional[List[str]] = None, terminators: Optional[List[str]] = None, shortcuts: Optional[Dict[str, str]] = None, command_sets: Optional[Iterable[cmd2.command_definition.CommandSet]] = None, auto_load_commands: bool = True)[source]

Bases: cmd2.cmd2.Cmd

An easy but powerful framework for writing line-oriented command interpreters. Extends Python’s cmd package.

Parameters
  • completekey – readline name of a completion key, default to Tab

  • stdin – alternate input file object, if not specified, sys.stdin is used

  • stdout – alternate output file object, if not specified, sys.stdout is used

  • persistent_history_file – file path to load a persistent cmd2 command history from

  • persistent_history_length – max number of history items to write to the persistent history file

  • startup_script – file path to a script to execute at startup

  • silence_startup_script – if True, then the startup script’s output will be suppressed. Anything written to stderr will still display.

  • include_py – should the “py” command be included for an embedded Python shell

  • include_ipy – should the “ipy” command be included for an embedded IPython shell

  • allow_cli_args – if True, then cmd2.Cmd.__init__() will process command line arguments as either commands to be run or, if -t or --test are given, transcript files to run. This should be set to False if your application parses its own command line arguments.

  • transcript_files – pass a list of transcript files to be run on initialization. This allows running transcript tests when allow_cli_args is False. If allow_cli_args is True this parameter is ignored.

  • allow_redirection – If False, prevent output redirection and piping to shell commands. This parameter prevents redirection and piping, but does not alter parsing behavior. A user can still type redirection and piping tokens, and they will be parsed as such but they won’t do anything.

  • multiline_commands – list of commands allowed to accept multi-line input

  • terminators – list of characters that terminate a command. These are mainly intended for terminating multiline commands, but will also terminate single-line commands. If not supplied, the default is a semicolon. If your app only contains single-line commands and you want terminators to be treated as literals by the parser, then set this to an empty list.

  • shortcuts – dictionary containing shortcuts for commands. If not supplied, then defaults to constants.DEFAULT_SHORTCUTS. If you do not want any shortcuts, pass an empty dictionary.

  • command_sets – Provide CommandSet instances to load during cmd2 initialization. This allows CommandSets with custom constructor parameters to be loaded. This also allows the a set of CommandSets to be provided when auto_load_commands is set to False

  • auto_load_commands – If True, cmd2 will check for all subclasses of CommandSet that are currently loaded by Python and automatically instantiate and register all commands. If False, CommandSets must be manually installed with register_command_set.

class pycdsl.shell.CDSLShell(data_dir: Optional[str] = None, dict_ids: Optional[List[str]] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, history_file: Optional[str] = None, startup_script: Optional[str] = None)[source]

Bases: pycdsl.shell.BasicShell

REPL Interface to CDSL

REPL Interface to CDSL

Create an instance of CDSLCorpus as per the providd parameters. CDSLCorpus.setup() is called after the command-loop starts.

Parameters
  • data_dir (str or None, optional) – Load a CDSL installation instance at the location data_dir. Passed to CDSLCorpus instance as a keyword argument data_dir.

  • dict_ids (list or None, optional) – List of dictionary IDs to setup. Passed to a CDSLCorpus.setup() as a keyword argument dict_ids.

  • search_mode (str or None, optional) – Search mode to query by key, value or both. The default is None.

  • input_scheme (str or None, optional) – Transliteration scheme for input. If None, DEFAULT_SCHEME is used. The default is None.

  • output_scheme (str or None, optional) – Transliteration scheme for output. If None, DEFAULT_SCHEME is used. The default is None.

  • history_file (str or None, optional) – Path to the history file to keep a persistant history. If None, the history does not persist across sessions. The default is None.

  • startup_script (str or None, optional) – Path to the startup script with a list of startup commands to be executed after initialization. If None, no startup commands are run. The default is None.

intro = 'Cologne Sanskrit Digital Lexicon (CDSL)\n---------------------------------------'
desc = 'Install or load dictionaries by typing `use [DICT_IDS..]` e.g. `use MW`.\nType any keyword to search in the selected dictionaries. (help or ? for list of options)'
prompt = '(CDSL::None) '
schemes = ['devanagari', 'iast', 'itrans', 'velthuis', 'hk', 'slp1', 'wx']
search_modes = ['key', 'value', 'both']
__init__(data_dir: Optional[str] = None, dict_ids: Optional[List[str]] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, history_file: Optional[str] = None, startup_script: Optional[str] = None)[source]

REPL Interface to CDSL

Create an instance of CDSLCorpus as per the providd parameters. CDSLCorpus.setup() is called after the command-loop starts.

Parameters
  • data_dir (str or None, optional) – Load a CDSL installation instance at the location data_dir. Passed to CDSLCorpus instance as a keyword argument data_dir.

  • dict_ids (list or None, optional) – List of dictionary IDs to setup. Passed to a CDSLCorpus.setup() as a keyword argument dict_ids.

  • search_mode (str or None, optional) – Search mode to query by key, value or both. The default is None.

  • input_scheme (str or None, optional) – Transliteration scheme for input. If None, DEFAULT_SCHEME is used. The default is None.

  • output_scheme (str or None, optional) – Transliteration scheme for output. If None, DEFAULT_SCHEME is used. The default is None.

  • history_file (str or None, optional) – Path to the history file to keep a persistant history. If None, the history does not persist across sessions. The default is None.

  • startup_script (str or None, optional) – Path to the startup script with a list of startup commands to be executed after initialization. If None, no startup commands are run. The default is None.

do_info(_: cmd2.parsing.Statement)[source]

Display information about active dictionaries

do_stats(_: cmd2.parsing.Statement)[source]

Display statistics about active dictionaries

do_dicts(_: cmd2.parsing.Statement)[source]

Display a list of dictionaries available locally

do_available(_: cmd2.parsing.Statement)[source]

Display a list of dictionaries available in CDSL

do_update(_: cmd2.parsing.Statement)[source]

Update loaded dictionaries

use_parser = Cmd2ArgumentParser(prog='use', usage=None, description='\n        Load the specified dictionaries from CDSL.\n        If not available locally, they will be installed first.\n        ', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
do_use(namespace: argparse.Namespace)[source]

Load the specified dictionaries from CDSL. If not available locally, they will be installed first.

show_parser = Cmd2ArgumentParser(prog='show', usage=None, description='Show a specific entry by ID', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
do_show(namespace: argparse.Namespace)[source]

Show a specific entry by ID

search_parser = Cmd2ArgumentParser(prog='search', usage=None, description='\n        Search in the active dictionaries\n\n        Note\n        ----\n        * Searching in the active dictionaries is also the default action.\n        * In general, we do not need to use this command explicitly unless we\n          want to search the command keywords, such as, `available` `search`,\n          `version`, `help` etc. in the active dictionaries.\n        ', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)

Search in the active dictionaries

Note

  • Searching in the active dictionaries is also the default action.

  • In general, we do not need to use this command explicitly unless we want to search the command keywords, such as, available search, version, help etc. in the active dictionaries.

default(statement: cmd2.parsing.Statement)[source]

Executed when the command given isn’t a recognized command implemented by a do_* method.

Parameters

statement – Statement object with parsed input

cmdloop(intro: Optional[cmd2.parsing.Statement] = None)[source]

This is an outer wrapper around _cmdloop() which deals with extra features provided by cmd2.

_cmdloop() provides the main loop equivalent to cmd.cmdloop(). This is a wrapper around that which deals with the following extra features provided by cmd2: - transcript testing - intro banner - exit code

Parameters

intro – if provided this overrides self.intro and serves as the intro banner printed once at start

do_version(_: cmd2.parsing.Statement)[source]

Show the current version of PyCDSL

pycdsl.utils module

Utility Functions

pycdsl.utils.validate_search_mode(mode: str) str[source]

Validate the search mode

Parameters

mode (str) – Search mode

Returns

If mode is valid, mode.lower() otherwise, None.

Return type

str or None

pycdsl.utils.validate_scheme(scheme: str) str[source]

Validate the name of transliteration scheme

Parameters

scheme (str) – Name of the transltieration scheme

Returns

If scheme is valid, scheme.lower() otherwise, None.

Return type

str or None

pycdsl.utils.transliterate_between(text: str, from_scheme: str, to_scheme: str, start_pattern: str, end_pattern: str) str[source]

Transliterate the text appearing between two patterns

Only the text appearing between patterns start_pattern and end_pattern it transliterated. start_pattern and end_pattern can appear multiple times in the full text, and for every occurrence, the text between them is transliterated.

from_scheme and to_scheme should be compatible with scheme names from indic-transliteration

Parameters
  • text (str) – Full text

  • from_scheme (str) – Input transliteration scheme

  • to_scheme (str) – Output transliteration scheme

  • start_pattern (regexp) – Pattern describing the start tag

  • end_pattern (regexp) – Pattern describing the end tag

Module contents

PyCDSL

Python Interface to Cologne Digital Sanskrit Lexicon (CDSL).