Python common library codes to be shared by other RDA python utility programs.
rda-python-common is the foundation that every other rda-python-* repo
builds on. To consume it from a new or existing repo, follow these steps.
For local development, clone this repo alongside your project and install it in editable mode so that changes are picked up without re-installing:
git clone https://github.com/NCAR/rda-python-common.git
cd rda-python-common
pip install -e .For a regular (non-editable) install from a checkout:
pip install /path/to/rda-python-commonFor a production install on a system that uses the published distribution:
pip install rda_python_commonThe package brings in its own transitive dependencies (psycopg2-binary,
rda-python-globus, unidecode, hvac).
Add rda_python_common to the dependencies list of your project's
pyproject.toml so that downstream installs pull it in automatically:
[project]
name = "rda_python_yourtool"
version = "0.1.0"
dependencies = [
"rda_python_common",
# ... other deps
]This is the same pattern used by rda-python-dsarch, rda-python-dsupdt,
rda-python-dsrqst, rda-python-dscheck, rda-python-metrics, and
rda-python-miscs.
Two import styles are supported (see Usage examples below):
# Preferred for new code -- import the class from the lower-case module
from rda_python_common.pg_log import PgLOG
from rda_python_common.pg_dbi import PgDBI
# Legacy module-style imports remain supported for back-compatibility
from rda_python_common import PgLOG, PgDBI
PgLOG.pglog("hello", PgLOG.LOGWRN)python -c "import rda_python_common; print(rda_python_common.__version__)"You should see the installed version (currently 2.1.11). If the import
fails, double-check that the active Python environment is the one where you
ran pip install.
All shared functionality lives under src/rda_python_common/ and is organised as
a single-inheritance class hierarchy. Each module defines exactly one class;
later classes extend earlier ones, so an application that instantiates the
top-of-chain class (typically PgOPT or PgCMD) gets every helper through one
object.
Inheritance tree (top-down; multi-inheritance shown as two arrows converging on the same child):
PgLOG
┌────┴────┐
▼ ▼
PgUtil PgDBI
│ │ │ │ │
│ └────┐ ┌─┘ │ └─► PgPassword
│ ▼ ▼ │
│ PgSplit │ (multi-inherits
│ │ PgUtil + PgDBI)
│ ▼
│ PgSIG
│ │
│ ┌──────────┘
▼ ▼
PgFile (multi-inherits
│ PgUtil + PgSIG)
├─► PgOPT
│
└─► PgLock
│
└─► PgCMD
The tree is single inheritance everywhere except at two join points:
-
PgFile(PgUtil, PgSIG)— combines date/record utilities (PgUtilviaPgLOG) with daemon/signal/DB control (PgSIG→PgDBI→PgLOG), so its descendantsPgOPT,PgLock, andPgCMDinherit logging, DB, util, signal, and file facilities through one MRO. -
PgSplit(PgUtil, PgDBI)— combines record-manipulation helpers (PgUtil) with thepgadd/pgget/pgmget/pgupdt/pgdelDB operations (PgDBI) it needs to keep the sharedwfiletable and the per-datasetwfile_<dsid>partitions in sync. -
pg_log.py—PgLOG. Root of the hierarchy. Provides the central logging facility (bit-masklogactflags such asMSGLOG,WARNLG,ERRLOG,EXITLG), e-mail dispatch, system-command execution, process metadata lookup, and the globalPGLOGsettings dictionary used by every other module. -
pg_util.py—PgUtil(PgLOG). Miscellaneous date/time, dataset-ID, and column-oriented record-manipulation helpers. Holds theDATEFMTSregex table,MONTHS/MNS/WDAYS/WDSlookup lists, and theMDAYSdays-per-month array used for date arithmetic, formatting, parsing, and record sort/search/classification across all RDA tools. -
pg_file.py—PgFile(PgUtil, PgSIG). Unified file-operation layer spanning local file systems, remote hosts (rsync/ssh/scp), AWS S3 / object store, and Globus endpoints. Used byrdacp,dsarch,dsupdt, and related tools whenever data is moved, listed, or stat-ed. -
pg_lock.py—PgLock(PgFile). RDADB record-locking primitives for thedscheck,dsrqst,dlupdt,dcupdt,ptrqst, anddatasettables. Acquires, refreshes, and releases per-record locks so that long-running batch jobs coordinate cleanly. -
pg_dbi.py—PgDBI(PgLOG). PostgreSQL database interface built onpsycopg2. Wraps connection management, batchINSERT/SELECT/UPDATE/DELETE, transaction control, and credential lookup from.pgpassor OpenBao. All RDA tools talk to therdadbdatabase through this class. -
pg_sig.py—PgSIG(PgDBI). Daemon process control, POSIX signal handling, child/background-process management, and PBS/Torque batch-job status queries. Provides thePGSIGruntime dictionary plusVUSERS,CPIDS,CBIDS, andSDUMPtables that drive RDA daemon programs. -
pg_cmd.py—PgCMD(PgLock). Managesdscheckbatch and delayed- mode command tracking. Records, updates, and reaps the per-command rows that let RDA utilities resume or be monitored across PBS batch jobs. -
pg_split.py—PgSplit(PgUtil, PgDBI). Synchroniseswfilerecords between the sharedwfiletable and the per-datasetwfile_<dsid>partition tables. Provides compare/add/update/delete helpers used when archiving or reconciling dataset file inventories. -
pg_opt.py—PgOPT(PgFile). Command-line option parsing and application configuration framework for RDA tools (dsarch,dsupdt,dsrqst, ...). Holds the masterOPTSdefinition table, parsedparams, command-line vs. input-file option tracking (CMDOPTS/INOPTS), output formatting, dataset/help/media/storage/backup type maps, and the globalPGOPTsettings. -
pgpassword.py—PgPassword(PgDBI). Standalone CLI entry point (pgpassword) that resolves a PostgreSQL login password from OpenBao (get_baopassword) or~/.pgpass(get_pgpassword()) given database/schema/ host/port/user selectors via-d,-c,-h,-p,-u,-l,-k. Prints the resolved password to stdout so shell scripts can capture it.
Each class lives in its own submodule. Import the class you need, then either instantiate it directly or subclass it to add application-specific state and methods.
# Logging only
from rda_python_common.pg_log import PgLOG
log = PgLOG()
log.pglog("dsarch started", log.LOGWRN)
# Database access (PgDBI inherits PgLOG, so you get logging too)
from rda_python_common.pg_dbi import PgDBI
db = PgDBI()
rec = db.pgget('dataset', 'dsid, title', "dsid = 'd633000'")
print(rec)# A small utility that needs date/record helpers plus logging.
from rda_python_common.pg_util import PgUtil
class DateReport(PgUtil):
def __init__(self):
super().__init__() # initialise PgUtil (and PgLOG)
self.today = self.curtime() # method inherited from PgUtil
def run(self):
self.pglog(f"report date: {self.today}", self.LOGWRN)
DateReport().run()# A worker that needs file I/O (PgFile) and dscheck command tracking (PgCMD).
# PgCMD already extends PgFile via PgLock, so a single base is enough.
from rda_python_common.pg_cmd import PgCMD
class Worker(PgCMD):
def __init__(self):
super().__init__()
self.jobs = []
def archive_one(self, src, dst):
# PgFile method, available through the inheritance chain
self.local_copy_local(src, dst)
# PgDBI method, available through PgCMD -> PgLock -> PgFile -> PgSIG -> PgDBI
self.pgupdt('wfile', {'status': 'A'}, f"wfile = '{dst}'")
Worker().archive_one('/in/file', '/out/file')This mirrors how RDA tools such as dsarch are structured. The leaf class
multi-inherits several common classes so a single object exposes options,
command tracking, and wfile splitting.
# Excerpt of the pattern used by rda_python_dsarch/dsarch.py
from rda_python_common.pg_opt import PgOPT
from rda_python_common.pg_cmd import PgCMD
from rda_python_common.pg_split import PgSplit
class PgArch(PgOPT, PgCMD, PgSplit):
"""Shared state + helpers for a CLI archiving tool."""
def __init__(self):
super().__init__()
self.RTPATH = {} # runtime path cache
self.OPTS = {} # option table (populated by subclass)
class DsArch(PgArch):
def __init__(self):
super().__init__()
self.ALLCNT = self.ADDCNT = self.MODCNT = 0
def main(self):
self.read_parameters() # from PgOPT
self.start_actions() # dispatch
if __name__ == "__main__":
DsArch().main()from rda_python_common.pgpassword import PgPassword
pw = PgPassword()
pw.default_scinfo('rdadb', 'dssdb', 'rda-pgdb', 'gdexweb', None, 5432)
password = pw.get_baopassword() or pw.get_pgpassword()In every case super().__init__() cooperates correctly across the
multi-inheriting joins (PgFile and PgSplit), so subclasses only need
to call it once.