pycdsl package
Submodules
pycdsl.cli module
Console script for PyCDSL
pycdsl.constants module
PyCDSL Constants
pycdsl.corpus module
CDSL Corpus Management
- class pycdsl.corpus.CDSLCorpus(data_dir: Optional[str] = None, search_mode: str = 'key', input_scheme: str = 'devanagari', output_scheme: str = 'devanagari', transliterate_keys: bool = True)[source]
Bases:
object
CDSL Corpus Class
Refers to a CDSL installation instance at the location data_dir.
- data_dir: str = None
- search_mode: str = 'key'
- input_scheme: str = 'devanagari'
- output_scheme: str = 'devanagari'
- transliterate_keys: bool = True
- setup(dict_ids: Optional[list] = None, update: bool = False, model_map: Optional[Dict[str, Tuple[pycdsl.models.Lexicon, pycdsl.models.Entry]]] = None) bool [source]
Setup CDSL dictionaries in bulk
Calls CDSLDict.setup() on every CDSLDict, and if successful, also calls CDSLDict.connect() to establish a connection to the database
- Parameters
dict_ids (list or None, optional) – List of dictionary IDs to setup. If None, the dictionaries from DEFAULT_DICTIONARIES as well as locally installed dictionaries will be setup. The default is None.
update (bool, optional) – If True, and update check is performed for every dictionary in dict_ids, and if available, the updated version is installed The default is False.
lexicon_model (object, optional) – Lexicon model argument passed to CDSLDict.connect() The default is None.
entry_model (object, optional) – Entry model argument passed to CDSLDict.connect() The default is None.
model_map (dict, optional) – Map of dictionary ID to a tuple of lexicon model and entry model. The argument is used to specify lexicon_model and entry_model arguments passed to CDSLDict.connect(). If None, the default map DEFAULT_MODEL_MAP will be used. The default is None.
- Returns
True, if the setup of all the dictionaries from dict_ids is successful. i.e. If every CDSLDict.setup() call returns True.
- Return type
bool
- Raises
ValueError – If dict_ids is not a list or None.
- search(pattern: str, dict_ids: Optional[List[str]] = None, mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, ignore_case: bool = False, limit: Optional[int] = None, offset: Optional[int] = None, omit_empty: bool = True) Dict[str, List[pycdsl.models.Entry]] [source]
Search in multiple dictionaries from the corpus
- Parameters
pattern (str) – Search pattern, may contain wildcards (*).
dict_ids (list or None) – List of dictionary IDs to search in. Only the dict_ids that exist in self.dicts will be used. If None, all the dictionaries that have been setup, i.e., the dictionaries from self.dicts will be used. The default is None.
mode (str or None, optional) – Search mode to query by key, value or both. The default is None.
input_scheme (str or None, optional) – Input transliteration scheme If None, self.input_scheme will be used. The default is None.
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
ignore_case (bool, optional) – Ignore case while performing lookup. The default is False.
limit (int or None, optional) – Limit the number of search results to limit. The default is None.
offset (int or None, optional) – Offset the search results by offset. The default is None
omit_empty (bool, optional) – If True, only the non-empty search results will be included. The default is False.
- Returns
Dictionary of (dict_id, list of matching entries)
- Return type
dict
- get_available_dicts() Dict[str, pycdsl.lexicon.CDSLDict] [source]
Fetch a list of dictionaries available for download from CDSL
Homepage of CDSL Project (SERVER_URL) is fetched and parsed to obtain this list.
- get_installed_dicts() Dict[str, pycdsl.lexicon.CDSLDict] [source]
Fetch a list of dictionaries installed locally
pycdsl.lexicon module
CDSL Lexicon Management
- class pycdsl.lexicon.CDSLDict(id: str, date: str, name: str, url: str, db: Optional[str] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, transliterate_keys: Optional[bool] = None)[source]
Bases:
object
Dictionary from CDSL
- id: str
- date: str
- name: str
- url: str
- db: str = None
- search_mode: str = None
- input_scheme: str = None
- output_scheme: str = None
- transliterate_keys: bool = None
- download(download_dir: str) bool [source]
Download and extract dictionary data
- Parameters
download_dir (str or Path) – Full path of directory where the dictionary data should be downloaded and extracted
- Returns
True if successfully downloaded or already up-to-date
- Return type
bool
- setup(data_dir: str, symlink_dir: Optional[str] = None, update: bool = False) bool [source]
Setup the dictionary database path
- Parameters
data_dir (str or Path) – Full path of directory where the dictionary data is stored
symlink_dir (str or Path, optional) – Full path of the directory where the symbolink links to the SQLite database of dictionary will be created If None, symbolic links aren’t created. The default is None.
update (bool, optional) – If True, an attempt to update dictionary data will be made. The default is False.
- Returns
True if the setup was successful
- Return type
bool
- set_scheme(input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, transliterate_keys: Optional[bool] = None)[source]
Set transliteration scheme for the dictionary instance
- Parameters
input_scheme (str, optional) – Input transliteration scheme. If None, INTERNAL_SCHEME is used. The default is None.
output_scheme (str, optional) – Output transliteration scheme. If None, INTERNAL_SCHEME is used. The default is None.
transliterate_keys (bool, optional) – Determines whether the keys in lexicon should be transliterated to scheme or not. If None, the value will be inferred based on dictionary type. The default is None.
- set_search_mode(mode: str)[source]
Set search mode
- Parameters
mode (str) – Valid values are ‘key’, ‘value’, ‘both’ Recommended to use the convenience variables SEARCH_MODE_KEY, SEARCH_MODE_VALUE or SEARCH_MODE_BOTH.
- connect(lexicon_model: Optional[pycdsl.models.Lexicon] = None, entry_model: Optional[pycdsl.models.Entry] = None)[source]
Connect to the SQLite database
If both lexicon_model and entry_model are specified, they are used as the ORM layer, and take preference over model_map.
If any of lexicon_model or entry_model is None, then the models are resolved in the following way.
First, if the current dictionary ID is present in model_map the models specified by the model_map are used. Otherwise, models.lexicon_constructor and models.entry_constructor functions are used, which subclass the models.Lexicon and models.Entry models.
- Parameters
lexicon_model (object, optional) – Lexicon model. The default is None.
entry_model (object, optional) – Entry model. The default is None.
- stats(top: int = 10, output_scheme: str = None) Dict [source]
Display statistics about the lexicon
- Parameters
top (int, optional) – Display top top entries having most different meanings. The default is 10.
output_scheme (str, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
- Returns
Statistics about the dictionary
- Return type
dict
- search(pattern: str, mode: str = None, input_scheme: str = None, output_scheme: str = None, ignore_case: str = False, limit: int = None, offset: int = None) List[pycdsl.models.Entry] [source]
Search in the dictionary
- Parameters
pattern (str) – Search pattern, may contain wildcards (*).
mode (str or None, optional) – Search mode to query by key, value or both. If None, self.search_mode will be used. The default is None.
input_scheme (str or None, optional) – Input transliteration scheme If None, self.input_scheme will be used. The default is None.
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
ignore_case (bool, optional) – Ignore case while performing lookup. The default is False.
limit (int or None, optional) – Limit the number of search results to limit. The default is None.
offset (int or None, optional) – Offset the search results by offset. The default is None
- Returns
List of matching entries
- Return type
list
- entry(entry_id: str, output_scheme: Optional[str] = None) pycdsl.models.Entry [source]
Get an entry by ID
- Parameters
entry_id (str) – Entry ID to lookup
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None.
- Returns
If the entry_id is valid, Entry with the matching ID otherwise, None.
- Return type
object
- dump(output_path: Optional[str] = None, output_scheme: Optional[str] = None) List[Dict[str, str]] [source]
Dump data as JSON
- Parameters
output_path (str or Path, optional) – Path to the output JSON file. If None, the data isn’t written to the disk, only returned. The default is None.
output_scheme (str or None, optional) – Output transliteration scheme If None, self.output_scheme will be used. The default is None
- Returns
List of all the entries in the dictionary. Every entry is a dict. If output_path is provided, the same list is written as JSON.
- Return type
list
pycdsl.models module
Models for Lexicon Access
- class pycdsl.models.Lexicon(*args, **kwargs)[source]
Bases:
peewee.Model
Lexicon Model
- id = <DecimalField: Lexicon.id>
- key = <CharField: Lexicon.key>
- data = <TextField: Lexicon.data>
- DoesNotExist
alias of
pycdsl.models.LexiconDoesNotExist
- class pycdsl.models.Entry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Bases:
object
Lexicon Entry
Wraps instances of Lexicon model which respresent query results
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- __init__(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- transliterate(scheme: str = 'devanagari', transliterate_keys: bool = True)[source]
Transliterate Data
Part of the data in lexicon that is enclosed in <s> tags will be transliterated to scheme.
- Parameters
scheme (str, optional) – Output transliteration scheme. If invalid or None, no transliteration will take place. The default is DEFAULT_SCHEME.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- Returns
Returns a new transliterated instance
- Return type
object
- pycdsl.models.lexicon_constructor(dict_id: str, table_name: Optional[str] = None) pycdsl.models.Lexicon [source]
Construct a Lexicon Model
- Parameters
dict_id (str) – Dictionary ID
table_name (str, optional) – Name of the table in SQLite database. If None, it will be inferred as dict_id.lower() The default is None.
- Returns
Constructed class (a subclass of Lexicon) for a dictionary
- Return type
object
- pycdsl.models.entry_constructor(dict_id: str) pycdsl.models.Entry [source]
Construct an Entry Model
- Parameters
dict_id (str) – Dictionary ID
- Returns
Constructed class (a subclass of Entry) for a dictionary entry
- Return type
object
- class pycdsl.models.AP90Lexicon(*args, **kwargs)[source]
Bases:
pycdsl.models.Lexicon
- DoesNotExist
alias of
pycdsl.models.AP90LexiconDoesNotExist
- data = <TextField: AP90Lexicon.data>
- id = <DecimalField: AP90Lexicon.id>
- key = <CharField: AP90Lexicon.key>
- class pycdsl.models.AP90Entry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Bases:
pycdsl.models.Entry
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
- class pycdsl.models.MWLexicon(*args, **kwargs)[source]
Bases:
pycdsl.models.Lexicon
- DoesNotExist
alias of
pycdsl.models.MWLexiconDoesNotExist
- data = <TextField: MWLexicon.data>
- id = <DecimalField: MWLexicon.id>
- key = <CharField: MWLexicon.key>
- class pycdsl.models.MWEntry(lexicon_entry: pycdsl.models.Lexicon, lexicon_id: Optional[str] = None, scheme: Optional[str] = None, transliterate_keys: bool = True)[source]
Bases:
pycdsl.models.Entry
Lexicon Entry
- Parameters
lexicon_entry (Lexicon) – Instance of Lexicon model
lexicon_id (str, optional) – ID of the Lexicon to which the entry belongs
scheme (str, optional) – Output transliteration scheme. If valid, parts of the data in lexicon which are enclosed in <s> tags will be transliterated to scheme. If invalid or None, no transliteration will take place. The default is None.
transliterate_keys (bool, optional) – If True, the keys in lexicon will be transliterated to scheme. The default is True.
pycdsl.shell module
REPL Shell for PyCDSL
- class pycdsl.shell.BasicShell(completekey: str = 'tab', stdin: Optional[TextIO] = None, stdout: Optional[TextIO] = None, *, persistent_history_file: str = '', persistent_history_length: int = 1000, startup_script: str = '', silence_startup_script: bool = False, include_py: bool = False, include_ipy: bool = False, allow_cli_args: bool = True, transcript_files: Optional[List[str]] = None, allow_redirection: bool = True, multiline_commands: Optional[List[str]] = None, terminators: Optional[List[str]] = None, shortcuts: Optional[Dict[str, str]] = None, command_sets: Optional[Iterable[cmd2.command_definition.CommandSet]] = None, auto_load_commands: bool = True)[source]
Bases:
cmd2.cmd2.Cmd
An easy but powerful framework for writing line-oriented command interpreters. Extends Python’s cmd package.
- Parameters
completekey – readline name of a completion key, default to Tab
stdin – alternate input file object, if not specified, sys.stdin is used
stdout – alternate output file object, if not specified, sys.stdout is used
persistent_history_file – file path to load a persistent cmd2 command history from
persistent_history_length – max number of history items to write to the persistent history file
startup_script – file path to a script to execute at startup
silence_startup_script – if
True
, then the startup script’s output will be suppressed. Anything written to stderr will still display.include_py – should the “py” command be included for an embedded Python shell
include_ipy – should the “ipy” command be included for an embedded IPython shell
allow_cli_args – if
True
, thencmd2.Cmd.__init__()
will process command line arguments as either commands to be run or, if-t
or--test
are given, transcript files to run. This should be set toFalse
if your application parses its own command line arguments.transcript_files – pass a list of transcript files to be run on initialization. This allows running transcript tests when
allow_cli_args
isFalse
. Ifallow_cli_args
isTrue
this parameter is ignored.allow_redirection – If
False
, prevent output redirection and piping to shell commands. This parameter prevents redirection and piping, but does not alter parsing behavior. A user can still type redirection and piping tokens, and they will be parsed as such but they won’t do anything.multiline_commands – list of commands allowed to accept multi-line input
terminators – list of characters that terminate a command. These are mainly intended for terminating multiline commands, but will also terminate single-line commands. If not supplied, the default is a semicolon. If your app only contains single-line commands and you want terminators to be treated as literals by the parser, then set this to an empty list.
shortcuts – dictionary containing shortcuts for commands. If not supplied, then defaults to constants.DEFAULT_SHORTCUTS. If you do not want any shortcuts, pass an empty dictionary.
command_sets – Provide CommandSet instances to load during cmd2 initialization. This allows CommandSets with custom constructor parameters to be loaded. This also allows the a set of CommandSets to be provided when auto_load_commands is set to False
auto_load_commands – If True, cmd2 will check for all subclasses of CommandSet that are currently loaded by Python and automatically instantiate and register all commands. If False, CommandSets must be manually installed with register_command_set.
- class pycdsl.shell.CDSLShell(data_dir: Optional[str] = None, dict_ids: Optional[List[str]] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, history_file: Optional[str] = None, startup_script: Optional[str] = None)[source]
Bases:
pycdsl.shell.BasicShell
REPL Interface to CDSL
REPL Interface to CDSL
Create an instance of CDSLCorpus as per the providd parameters. CDSLCorpus.setup() is called after the command-loop starts.
- Parameters
data_dir (str or None, optional) – Load a CDSL installation instance at the location data_dir. Passed to CDSLCorpus instance as a keyword argument data_dir.
dict_ids (list or None, optional) – List of dictionary IDs to setup. Passed to a CDSLCorpus.setup() as a keyword argument dict_ids.
search_mode (str or None, optional) – Search mode to query by key, value or both. The default is None.
input_scheme (str or None, optional) – Transliteration scheme for input. If None, DEFAULT_SCHEME is used. The default is None.
output_scheme (str or None, optional) – Transliteration scheme for output. If None, DEFAULT_SCHEME is used. The default is None.
history_file (str or None, optional) – Path to the history file to keep a persistant history. If None, the history does not persist across sessions. The default is None.
startup_script (str or None, optional) – Path to the startup script with a list of startup commands to be executed after initialization. If None, no startup commands are run. The default is None.
- intro = 'Cologne Sanskrit Digital Lexicon (CDSL)\n---------------------------------------'
- desc = 'Install or load dictionaries by typing `use [DICT_IDS..]` e.g. `use MW`.\nType any keyword to search in the selected dictionaries. (help or ? for list of options)'
- prompt = '(CDSL::None) '
- schemes = ['devanagari', 'iast', 'itrans', 'velthuis', 'hk', 'slp1', 'wx']
- search_modes = ['key', 'value', 'both']
- __init__(data_dir: Optional[str] = None, dict_ids: Optional[List[str]] = None, search_mode: Optional[str] = None, input_scheme: Optional[str] = None, output_scheme: Optional[str] = None, history_file: Optional[str] = None, startup_script: Optional[str] = None)[source]
REPL Interface to CDSL
Create an instance of CDSLCorpus as per the providd parameters. CDSLCorpus.setup() is called after the command-loop starts.
- Parameters
data_dir (str or None, optional) – Load a CDSL installation instance at the location data_dir. Passed to CDSLCorpus instance as a keyword argument data_dir.
dict_ids (list or None, optional) – List of dictionary IDs to setup. Passed to a CDSLCorpus.setup() as a keyword argument dict_ids.
search_mode (str or None, optional) – Search mode to query by key, value or both. The default is None.
input_scheme (str or None, optional) – Transliteration scheme for input. If None, DEFAULT_SCHEME is used. The default is None.
output_scheme (str or None, optional) – Transliteration scheme for output. If None, DEFAULT_SCHEME is used. The default is None.
history_file (str or None, optional) – Path to the history file to keep a persistant history. If None, the history does not persist across sessions. The default is None.
startup_script (str or None, optional) – Path to the startup script with a list of startup commands to be executed after initialization. If None, no startup commands are run. The default is None.
- use_parser = Cmd2ArgumentParser(prog='use', usage=None, description='\n Load the specified dictionaries from CDSL.\n If not available locally, they will be installed first.\n ', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
- do_use(namespace: argparse.Namespace)[source]
Load the specified dictionaries from CDSL. If not available locally, they will be installed first.
- show_parser = Cmd2ArgumentParser(prog='show', usage=None, description='Show a specific entry by ID', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
- search_parser = Cmd2ArgumentParser(prog='search', usage=None, description='\n Search in the active dictionaries\n\n Note\n ----\n * Searching in the active dictionaries is also the default action.\n * In general, we do not need to use this command explicitly unless we\n want to search the command keywords, such as, `available` `search`,\n `version`, `help` etc. in the active dictionaries.\n ', formatter_class=<class 'cmd2.argparse_custom.Cmd2HelpFormatter'>, conflict_handler='error', add_help=True)
- do_search(namespace: argparse.Namespace)[source]
Search in the active dictionaries
Note
Searching in the active dictionaries is also the default action.
In general, we do not need to use this command explicitly unless we want to search the command keywords, such as, available search, version, help etc. in the active dictionaries.
- default(statement: cmd2.parsing.Statement)[source]
Executed when the command given isn’t a recognized command implemented by a do_* method.
- Parameters
statement – Statement object with parsed input
- cmdloop(intro: Optional[cmd2.parsing.Statement] = None)[source]
This is an outer wrapper around _cmdloop() which deals with extra features provided by cmd2.
_cmdloop() provides the main loop equivalent to cmd.cmdloop(). This is a wrapper around that which deals with the following extra features provided by cmd2: - transcript testing - intro banner - exit code
- Parameters
intro – if provided this overrides self.intro and serves as the intro banner printed once at start
pycdsl.utils module
Utility Functions
- pycdsl.utils.validate_search_mode(mode: str) str [source]
Validate the search mode
- Parameters
mode (str) – Search mode
- Returns
If mode is valid, mode.lower() otherwise, None.
- Return type
str or None
- pycdsl.utils.validate_scheme(scheme: str) str [source]
Validate the name of transliteration scheme
- Parameters
scheme (str) – Name of the transltieration scheme
- Returns
If scheme is valid, scheme.lower() otherwise, None.
- Return type
str or None
- pycdsl.utils.transliterate_between(text: str, from_scheme: str, to_scheme: str, start_pattern: str, end_pattern: str) str [source]
Transliterate the text appearing between two patterns
Only the text appearing between patterns start_pattern and end_pattern it transliterated. start_pattern and end_pattern can appear multiple times in the full text, and for every occurrence, the text between them is transliterated.
from_scheme and to_scheme should be compatible with scheme names from indic-transliteration
- Parameters
text (str) – Full text
from_scheme (str) – Input transliteration scheme
to_scheme (str) – Output transliteration scheme
start_pattern (regexp) – Pattern describing the start tag
end_pattern (regexp) – Pattern describing the end tag
Module contents
PyCDSL
Python Interface to Cologne Digital Sanskrit Lexicon (CDSL).