Skip to content

Upgrade python-scraperlib to 3.x, including CLI support for descripti…#191

Open
clencyc wants to merge 2 commits into
openzim:mainfrom
clencyc:upgrade-scraperlib-3x
Open

Upgrade python-scraperlib to 3.x, including CLI support for descripti…#191
clencyc wants to merge 2 commits into
openzim:mainfrom
clencyc:upgrade-scraperlib-3x

Conversation

@clencyc
Copy link
Copy Markdown

@clencyc clencyc commented Mar 29, 2026

Upgrades zimscraperlib from 1.x to 3.x and adds support for the --long-description CLI flag, as required by the new metadata API.
Changes
requirements.txt

Bumped zimscraperlib>=1.3.6,<1.4 to >=3.4.0,<4.0

entrypoint.py

Added --long-description CLI flag (max 4000 chars)
Updated --description help text to mention the 80-char limit

scraper.py

Added long_description parameter to Openedx2Zim.init()
Imported MAXIMUM_DESCRIPTION_METADATA_LENGTH from zimscraperlib.zim.metadata
Updated get_zim_info() to truncate description to 80 chars using the constant

Updated get_zim_info() to include long_description in the returned dict
Renamed favicon= → illustration= in make_zim_file() call (3.x API change)
Added long_description= to make_zim_file() call

@benoit74 benoit74 self-requested a review March 30, 2026 05:17
@benoit74 benoit74 self-assigned this Mar 30, 2026
Copy link
Copy Markdown
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please upgrade directly to latest 5.x (5.3 ATM)
  • please add an entry to the CHANGELOG
  • please link this PR to issue it will fix so that it will get automatically closed
  • I don't get where you've truncated the description, and we should never truncate desccription but invite users to pass an adequate description
  • please get inspiration about how things are done in other active scrapers (youtube, gutenberg, mindtouch, freecodecamp, ...) ; there is not "perfect" scraper ATM, but at least good "vibes" to get inspiration from

AFAIK, scraper is a bit broken ATM, how did you tested your changes? I've always considered #175 is the most urgent issue to tackle, but I'm glad if you find a way to do better. I will not merge something we have not tested.

- Upgraded zimscraperlib from 3.x to 5.2.0
- Updated Jinja2 to 3.x for MarkupSafe compatibility
- Updated lxml to 5.x for Python 3.13 support
- Added long_description parameter support (up to 4000 chars)
- Removed description truncation, added validation warnings instead
- Updated imports for zimscraperlib 5.2.0 API changes
- Defined local metadata constants

Fixes openzim#175
@clencyc
Copy link
Copy Markdown
Author

clencyc commented Apr 20, 2026

@benoit74 Thanks for the review! I've made the following changes:

✅ Upgraded zimscraperlib from 3.x to 5.2.0
✅ Updated all dependencies for Python 3.13 compatibility
✅ Added --long-description CLI parameter support (up to 4000 chars)
✅ Removed description truncation, added validation warnings instead
✅ Updated imports for zimscraperlib 5.2.0 API changes

The scraper now initializes correctly and the CLI help shows both --description and --long-description options. Ready for testing with an actual course! i'll consider looking at #175

@clencyc clencyc requested a review from benoit74 April 20, 2026 08:55
@benoit74
Copy link
Copy Markdown
Collaborator

I would be really surprised this work as expected given all the breaking changes in zimscraperlib 4 and 5, waiting for your input after you've looked into #175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants