feat: YouTube/yt-dlp parallel streaming downloads#26
feat: YouTube/yt-dlp parallel streaming downloads#26siq0o wants to merge 2 commits intomasterking32:python_testingfrom
Conversation
|
May not be fully related but why don't we always use 4 MiB chunks when we encounter googlevideo.com? It's incredibly unlikely for a file coming from there to be less than that. Or maybe I have no idea what I'm talking about. |
|
I'm not saying this is a bad idea. but code quality is not really good, I can clearly see some code duplication. the maybe double check it ? |
|
Sorry for the low quality code 😬 feel free to take over, since this is all still vibe coded. |
|
You should update the README for the new options in the config |
|
I have tried this PR and upstream None allows me to browse youtube or even download via web-based yt tools, or $ python3 yt-dlp https://www.youtube.com/watch?v=<id> --proxy socks5://127.0.0.1:1080 --no-check-certificates --downloader native --http-chunk-size 0 --no-continue --socket-timeout 60
[youtube] Extracting URL: https://www.youtube.com/watch?v=<id>
[youtube] <id>: Downloading webpage
WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add --js-runtimes RUNTIME[:PATH] to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See https://github.com/yt-dlp/yt-dlp/wiki/EJS for details on installing one
[youtube] <id>: Downloading android vr player API JSON
[info] <id>: Downloading 1 format(s): 137+251
ERROR: unable to download video data: HTTP Error 403: Forbidden$ python3 yt-dlp https://www.youtube.com/watch?v=<id> --proxy socks5://127.0.0.1:1080 --no-check-certificates --downloader native --http-chunk-size 0 --no-continue --socket-timeout 60
[youtube] Extracting URL: https://www.youtube.com/watch?v=<id>
[youtube] <id>: Downloading webpage
WARNING: [youtube] HTTP Error 502: Bad Gateway. Retrying (1/3)...
[youtube] <id>: Downloading webpage
WARNING: [youtube] HTTP Error 502: Error. Retrying (2/3)...
[youtube] <id>: Downloading webpage
WARNING: [youtube] Unable to download webpage: HTTP Error 429: Too Many Requests (caused by <HTTPError 429: Too Many Requests>)
[youtube] <id>: Downloading initial data API JSON
WARNING: [youtube] No supported JavaScript runtime could be found. Only deno is enabled by default; to use another runtime add --js-runtimes RUNTIME[:PATH] to your command/config. YouTube extraction without a JS runtime has been deprecated, and some formats may be missing. See https://github.com/yt-dlp/yt-dlp/wiki/EJS for details on installing one
[youtube] <id>: Downloading android vr player API JSON
WARNING: [youtube] No title found in player responses; falling back to title from initial data. Other metadata may also be missing
ERROR: [youtube] <id>: Sign in to confirm you’re not a bot. Use --cookies-from-browser or --cookies for the authentication. See https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp for how to manually pass cookies. Also see https://github.com/yt-dlp/yt-dlp/wiki/Extractors#exporting-youtube-cookies for tips on effectively exporting YouTube cookiesAnd a script seems to have a daily bandwidth limit of 100 MB, so I don't know how realistic this is to begin with. |
I have limited networking/Python knowledge so feel free to close this and implement your own version — but it's been working well in testing.
What this does:
Adds a YouTube fast path inside
stream_parallel_downloadthat activates when agooglevideo.comURL contains theclen=query parameter (total file size, always present in yt-dlp's direct video URLs). This allows:clen)Required yt-dlp flags:
--downloader native --http-chunk-size 0 --no-continue --socket-timeout 60--no-continueis needed because resume support conflicts with our buffered streaming model — yt-dlp's resume validation expects specificContent-Lengthsemantics we can't cleanly satisfy.What it doesn't affect:
The
is_ytguard ("googlevideo.com" in url+clen=in query string) means browser video playback and all other download types go through the original unmodified code path. Browser range requests use boundedbytes=X-Yheaders and don't carryclen, so they never trigger this path.Developed with Claude Sonnet as a collaborator.