mirror of
https://github.com/HugeFrog24/jailbirdz-dl.git
synced 2026-03-02 09:04:33 +00:00
Cookie validation logic
This commit is contained in:
13
.env.example
13
.env.example
@@ -1,7 +1,12 @@
|
|||||||
# Copy your wordpress_logged_in_... cookie from browser DevTools → Storage → Cookies.
|
# Login credentials for grab_cookie.py (recommended)
|
||||||
# Paste the full name=value pair below.
|
# These are used to obtain a fresh WP_LOGIN_COOKIE via the WooCommerce AJAX endpoint.
|
||||||
# wordpress_sec_... is the wp-admin cookie — irrelevant for read-only viewers.
|
WP_USERNAME=your-email-or-username
|
||||||
# __cf_bm is a Cloudflare bot-management cookie — also not needed.
|
WP_PASSWORD=your-password
|
||||||
|
|
||||||
|
# Alternatively, set WP_LOGIN_COOKIE manually (expires in ~2 weeks).
|
||||||
|
# Get it from browser DevTools → Storage → Cookies while on jailbirdz.com.
|
||||||
|
# Copy the full name=value of the wordpress_logged_in_* cookie.
|
||||||
|
# wordpress_sec_... is the wp-admin cookie — not needed.
|
||||||
WP_LOGIN_COOKIE=wordpress_logged_in_<hash>=<value>
|
WP_LOGIN_COOKIE=wordpress_logged_in_<hash>=<value>
|
||||||
|
|
||||||
# PeerTube upload target
|
# PeerTube upload target
|
||||||
|
|||||||
47
.github/workflows/nightly-index.yml
vendored
Normal file
47
.github/workflows/nightly-index.yml
vendored
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
name: Nightly Index
|
||||||
|
|
||||||
|
on:
|
||||||
|
schedule:
|
||||||
|
- cron: '0 3 * * *' # 03:00 UTC daily
|
||||||
|
workflow_dispatch: # manual trigger via GitHub UI
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: write # needed to push video_map.json back
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: nightly-index
|
||||||
|
cancel-in-progress: false # let an in-progress scrape finish; queue the next run
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
index:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
timeout-minutes: 300 # 5 h ceiling; scraper resumes where it left off on next run
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- uses: actions/setup-python@v5
|
||||||
|
with:
|
||||||
|
python-version: '3.12'
|
||||||
|
cache: pip
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install -r requirements.txt
|
||||||
|
|
||||||
|
- name: Install Playwright Firefox
|
||||||
|
run: playwright install firefox --with-deps
|
||||||
|
|
||||||
|
- name: Run scraper
|
||||||
|
run: python main.py
|
||||||
|
env:
|
||||||
|
WP_USERNAME: ${{ secrets.WP_USERNAME }}
|
||||||
|
WP_PASSWORD: ${{ secrets.WP_PASSWORD }}
|
||||||
|
|
||||||
|
- name: Commit updated video_map.json
|
||||||
|
if: always() # save progress even if main.py crashed or timed out
|
||||||
|
run: |
|
||||||
|
git config user.name "github-actions[bot]"
|
||||||
|
git config user.email "github-actions[bot]@users.noreply.github.com"
|
||||||
|
git add video_map.json
|
||||||
|
git diff --staged --quiet || git commit -m "chore: nightly index update [skip ci]"
|
||||||
|
git push
|
||||||
9
.gitignore
vendored
9
.gitignore
vendored
@@ -1,5 +1,14 @@
|
|||||||
|
# Temporary cache
|
||||||
__pycache__/
|
__pycache__/
|
||||||
|
.ruff_cache/
|
||||||
|
|
||||||
|
# Local IDE config
|
||||||
|
.vscode
|
||||||
|
|
||||||
|
# Project output & artifacts
|
||||||
downloads/
|
downloads/
|
||||||
*.mp4
|
*.mp4
|
||||||
*.mp4.part
|
*.mp4.part
|
||||||
|
|
||||||
|
# Secrets & sensitive info
|
||||||
.env
|
.env
|
||||||
|
|||||||
4
.vscode/settings.json
vendored
4
.vscode/settings.json
vendored
@@ -1,4 +0,0 @@
|
|||||||
{
|
|
||||||
"snyk.advanced.organization": "512ef4a1-6034-4537-a391-9692d282122a",
|
|
||||||
"snyk.advanced.autoSelectOrganization": true
|
|
||||||
}
|
|
||||||
32
README.md
32
README.md
@@ -21,21 +21,14 @@ cp .env.example .env
|
|||||||
|
|
||||||
### WP_LOGIN_COOKIE
|
### WP_LOGIN_COOKIE
|
||||||
|
|
||||||
You need to be logged into jailbirdz.com in a browser. Then either:
|
**Option A — credentials (recommended):** set `WP_USERNAME` and `WP_PASSWORD` in `.env`. `main.py` logs in automatically on startup — no separate step needed.
|
||||||
|
|
||||||
**Option A — auto (recommended):** let `grab_cookie.py` read it from your browser and write it to `.env` automatically:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python grab_cookie.py # tries Firefox, Chrome, Edge, Brave in order
|
|
||||||
python grab_cookie.py -b firefox # or target a specific browser
|
|
||||||
```
|
|
||||||
|
|
||||||
> **Note:** Chrome and Edge on Windows 130+ require the script to run as Administrator due to App-bound Encryption. Firefox works without elevated privileges.
|
|
||||||
|
|
||||||
**Option B — manual:** open `.env` and set `WP_LOGIN_COOKIE` yourself. Get the value from browser DevTools → Storage → Cookies while on jailbirdz.com — copy the full `name=value` of the `wordpress_logged_in_*` cookie.
|
**Option B — manual:** open `.env` and set `WP_LOGIN_COOKIE` yourself. Get the value from browser DevTools → Storage → Cookies while on jailbirdz.com — copy the full `name=value` of the `wordpress_logged_in_*` cookie.
|
||||||
|
|
||||||
### Other `.env` values
|
### Other `.env` values
|
||||||
|
|
||||||
|
- `WP_USERNAME` — jailbirdz.com login (email or username).
|
||||||
|
- `WP_PASSWORD` — jailbirdz.com password.
|
||||||
- `PEERTUBE_URL` — base URL of your PeerTube instance.
|
- `PEERTUBE_URL` — base URL of your PeerTube instance.
|
||||||
- `PEERTUBE_USER` — PeerTube username.
|
- `PEERTUBE_USER` — PeerTube username.
|
||||||
- `PEERTUBE_CHANNEL` — channel to upload to.
|
- `PEERTUBE_CHANNEL` — channel to upload to.
|
||||||
@@ -89,6 +82,25 @@ Options:
|
|||||||
|
|
||||||
Uploads in resumable 10 MB chunks. After each batch, waits for transcoding and object storage to complete before uploading the next batch — this prevents disk exhaustion on the PeerTube server. Videos already present on the channel (matched by name) are skipped. Progress is tracked in `.uploaded` inside the input directory.
|
Uploads in resumable 10 MB chunks. After each batch, waits for transcoding and object storage to complete before uploading the next batch — this prevents disk exhaustion on the PeerTube server. Videos already present on the channel (matched by name) are skipped. Progress is tracked in `.uploaded` inside the input directory.
|
||||||
|
|
||||||
|
## CI / Nightly Indexing
|
||||||
|
|
||||||
|
`.github/workflows/nightly-index.yml` runs `main.py` at 03:00 UTC daily and commits any new `video_map.json` entries back to the repo.
|
||||||
|
|
||||||
|
**One-time setup — add repo secrets:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gh secret set WP_USERNAME
|
||||||
|
gh secret set WP_PASSWORD
|
||||||
|
```
|
||||||
|
|
||||||
|
**Seed CI with your current progress before the first run:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add video_map.json && git commit -m "chore: seed video_map"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Trigger manually:** Actions → Nightly Index → Run workflow.
|
||||||
|
|
||||||
## Utilities
|
## Utilities
|
||||||
|
|
||||||
### Check for filename clashes
|
### Check for filename clashes
|
||||||
|
|||||||
@@ -9,38 +9,58 @@ Importable functions:
|
|||||||
fetch_sizes(urls, workers, on_progress) - bulk size lookup
|
fetch_sizes(urls, workers, on_progress) - bulk size lookup
|
||||||
make_session() - requests.Session with required headers
|
make_session() - requests.Session with required headers
|
||||||
load_video_map() - load video_map.json, returns {} on missing/corrupt
|
load_video_map() - load video_map.json, returns {} on missing/corrupt
|
||||||
|
is_valid_url(url) - True if url is a plain http(s) URL with no HTML artefacts
|
||||||
|
expects_video(url) - True if url is a members-only video page
|
||||||
"""
|
"""
|
||||||
|
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||||
from pathlib import Path, PurePosixPath
|
from pathlib import Path, PurePosixPath
|
||||||
|
from typing import Any, Callable, Optional, cast
|
||||||
from urllib.parse import urlparse, unquote
|
from urllib.parse import urlparse, unquote
|
||||||
import json
|
import json
|
||||||
import requests
|
import requests
|
||||||
from config import BASE_URL
|
from config import BASE_URL
|
||||||
|
|
||||||
REFERER = f"{BASE_URL}/"
|
REFERER: str = f"{BASE_URL}/"
|
||||||
VIDEO_MAP_FILE = "video_map.json"
|
VIDEO_MAP_FILE: str = "video_map.json"
|
||||||
VIDEO_EXTS = {".mp4", ".mov", ".m4v", ".webm", ".avi"}
|
VIDEO_EXTS: set[str] = {".mp4", ".mov", ".m4v", ".webm", ".avi"}
|
||||||
|
|
||||||
|
|
||||||
def load_video_map():
|
def is_valid_url(url: str) -> bool:
|
||||||
|
"""True if url is a plain http(s) URL with no HTML artefacts (<, >, href= etc.)."""
|
||||||
|
return (
|
||||||
|
url.startswith("http")
|
||||||
|
and "<" not in url
|
||||||
|
and ">" not in url
|
||||||
|
and " href=" not in url
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def expects_video(url: str) -> bool:
|
||||||
|
"""True if url is a members-only video page that should contain a video."""
|
||||||
|
return "/pinkcuffs-videos/" in url
|
||||||
|
|
||||||
|
|
||||||
|
def load_video_map() -> dict[str, Any]:
|
||||||
if Path(VIDEO_MAP_FILE).exists():
|
if Path(VIDEO_MAP_FILE).exists():
|
||||||
try:
|
try:
|
||||||
with open(VIDEO_MAP_FILE, encoding="utf-8") as f:
|
with open(VIDEO_MAP_FILE, encoding="utf-8") as f:
|
||||||
return json.load(f)
|
data_any: Any = json.load(f)
|
||||||
|
data = cast(dict[str, Any], data_any)
|
||||||
|
return data
|
||||||
except (json.JSONDecodeError, OSError):
|
except (json.JSONDecodeError, OSError):
|
||||||
return {}
|
return {}
|
||||||
return {}
|
return {}
|
||||||
|
|
||||||
|
|
||||||
def make_session():
|
def make_session() -> requests.Session:
|
||||||
s = requests.Session()
|
s = requests.Session()
|
||||||
s.headers.update({"Referer": REFERER})
|
s.headers.update({"Referer": REFERER})
|
||||||
return s
|
return s
|
||||||
|
|
||||||
|
|
||||||
def fmt_size(b):
|
def fmt_size(b: float | int) -> str:
|
||||||
for unit in ("B", "KB", "MB", "GB"):
|
for unit in ("B", "KB", "MB", "GB"):
|
||||||
if b < 1024:
|
if b < 1024:
|
||||||
return f"{b:.1f} {unit}"
|
return f"{b:.1f} {unit}"
|
||||||
@@ -48,30 +68,34 @@ def fmt_size(b):
|
|||||||
return f"{b:.1f} TB"
|
return f"{b:.1f} TB"
|
||||||
|
|
||||||
|
|
||||||
def url_to_filename(url):
|
def url_to_filename(url: str) -> str:
|
||||||
return unquote(PurePosixPath(urlparse(url).path).name)
|
return unquote(PurePosixPath(urlparse(url).path).name)
|
||||||
|
|
||||||
|
|
||||||
def find_clashes(urls):
|
def find_clashes(urls: list[str]) -> dict[str, list[str]]:
|
||||||
# Case-insensitive grouping so that e.g. "DaisyArrest.mp4" and
|
# Case-insensitive grouping so that e.g. "DaisyArrest.mp4" and
|
||||||
# "daisyarrest.mp4" are treated as a clash. This is required for
|
# "daisyarrest.mp4" are treated as a clash. This is required for
|
||||||
# correctness on case-insensitive filesystems (NTFS, exFAT, macOS HFS+)
|
# correctness on case-insensitive filesystems (NTFS, exFAT, macOS HFS+)
|
||||||
# and harmless on case-sensitive ones (ext4) — the actual filenames on
|
# and harmless on case-sensitive ones (ext4) — the actual filenames on
|
||||||
# disk keep their original casing; only the clash *detection* is folded.
|
# disk keep their original casing; only the clash *detection* is folded.
|
||||||
by_lower = defaultdict(list)
|
by_lower: defaultdict[str, list[str]] = defaultdict(list)
|
||||||
for url in urls:
|
for url in urls:
|
||||||
by_lower[url_to_filename(url).lower()].append(url)
|
by_lower[url_to_filename(url).lower()].append(url)
|
||||||
return {url_to_filename(srcs[0]): srcs
|
return {
|
||||||
for srcs in by_lower.values() if len(srcs) > 1}
|
url_to_filename(srcs[0]): srcs for srcs in by_lower.values() if len(srcs) > 1
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
def _clash_subfolder(url):
|
def _clash_subfolder(url: str) -> str:
|
||||||
"""Parent path segment used as disambiguator for clashing filenames."""
|
"""Parent path segment used as disambiguator for clashing filenames."""
|
||||||
parts = urlparse(url).path.rstrip("/").split("/")
|
parts = urlparse(url).path.rstrip("/").split("/")
|
||||||
return unquote(parts[-2]) if len(parts) >= 2 else "unknown"
|
return unquote(parts[-2]) if len(parts) >= 2 else "unknown"
|
||||||
|
|
||||||
|
|
||||||
def build_download_paths(urls, output_dir):
|
def build_download_paths(
|
||||||
|
urls: list[str],
|
||||||
|
output_dir: str | Path,
|
||||||
|
) -> dict[str, Path]:
|
||||||
"""Map each URL to a local file path. Flat layout; clashing names get a subfolder."""
|
"""Map each URL to a local file path. Flat layout; clashing names get a subfolder."""
|
||||||
clashes = find_clashes(urls)
|
clashes = find_clashes(urls)
|
||||||
clash_lower = {name.lower() for name in clashes}
|
clash_lower = {name.lower() for name in clashes}
|
||||||
@@ -86,7 +110,10 @@ def build_download_paths(urls, output_dir):
|
|||||||
return paths
|
return paths
|
||||||
|
|
||||||
|
|
||||||
def get_remote_size(session, url):
|
def get_remote_size(
|
||||||
|
session: requests.Session,
|
||||||
|
url: str,
|
||||||
|
) -> Optional[int]:
|
||||||
try:
|
try:
|
||||||
r = session.head(url, allow_redirects=True, timeout=15)
|
r = session.head(url, allow_redirects=True, timeout=15)
|
||||||
if r.status_code < 400 and "Content-Length" in r.headers:
|
if r.status_code < 400 and "Content-Length" in r.headers:
|
||||||
@@ -94,8 +121,7 @@ def get_remote_size(session, url):
|
|||||||
except Exception:
|
except Exception:
|
||||||
pass
|
pass
|
||||||
try:
|
try:
|
||||||
r = session.get(
|
r = session.get(url, headers={"Range": "bytes=0-0"}, stream=True, timeout=15)
|
||||||
url, headers={"Range": "bytes=0-0"}, stream=True, timeout=15)
|
|
||||||
r.close()
|
r.close()
|
||||||
cr = r.headers.get("Content-Range", "")
|
cr = r.headers.get("Content-Range", "")
|
||||||
if "/" in cr:
|
if "/" in cr:
|
||||||
@@ -105,10 +131,14 @@ def get_remote_size(session, url):
|
|||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def fetch_sizes(urls, workers=20, on_progress=None):
|
def fetch_sizes(
|
||||||
|
urls: list[str],
|
||||||
|
workers: int = 20,
|
||||||
|
on_progress: Optional[Callable[[int, int], None]] = None,
|
||||||
|
) -> dict[str, Optional[int]]:
|
||||||
"""Return {url: size_or_None}. on_progress(done, total) called after each URL."""
|
"""Return {url: size_or_None}. on_progress(done, total) called after each URL."""
|
||||||
session = make_session()
|
session = make_session()
|
||||||
sizes = {}
|
sizes: dict[str, Optional[int]] = {}
|
||||||
total = len(urls)
|
total = len(urls)
|
||||||
|
|
||||||
with ThreadPoolExecutor(max_workers=workers) as pool:
|
with ThreadPoolExecutor(max_workers=workers) as pool:
|
||||||
@@ -117,7 +147,7 @@ def fetch_sizes(urls, workers=20, on_progress=None):
|
|||||||
for fut in as_completed(futures):
|
for fut in as_completed(futures):
|
||||||
sizes[futures[fut]] = fut.result()
|
sizes[futures[fut]] = fut.result()
|
||||||
done += 1
|
done += 1
|
||||||
if on_progress:
|
if on_progress is not None:
|
||||||
on_progress(done, total)
|
on_progress(done, total)
|
||||||
|
|
||||||
return sizes
|
return sizes
|
||||||
@@ -125,14 +155,20 @@ def fetch_sizes(urls, workers=20, on_progress=None):
|
|||||||
|
|
||||||
# --------------- CLI ---------------
|
# --------------- CLI ---------------
|
||||||
|
|
||||||
def main():
|
|
||||||
|
def main() -> None:
|
||||||
vm = load_video_map()
|
vm = load_video_map()
|
||||||
urls = [u for entry in vm.values() for u in entry.get("videos", []) if u.startswith("http")]
|
urls = [
|
||||||
|
u
|
||||||
|
for entry in vm.values()
|
||||||
|
for u in entry.get("videos", [])
|
||||||
|
if u.startswith("http")
|
||||||
|
]
|
||||||
|
|
||||||
clashes = find_clashes(urls)
|
clashes = find_clashes(urls)
|
||||||
|
|
||||||
print(f"Total URLs: {len(urls)}")
|
print(f"Total URLs: {len(urls)}")
|
||||||
by_name = defaultdict(list)
|
by_name: defaultdict[str, list[str]] = defaultdict(list)
|
||||||
for url in urls:
|
for url in urls:
|
||||||
by_name[url_to_filename(url)].append(url)
|
by_name[url_to_filename(url)].append(url)
|
||||||
print(f"Unique filenames: {len(by_name)}")
|
print(f"Unique filenames: {len(by_name)}")
|
||||||
|
|||||||
@@ -1,2 +1,5 @@
|
|||||||
BASE_URL = "https://www.jailbirdz.com"
|
# config.py
|
||||||
COOKIE_DOMAIN = "jailbirdz.com" # rookiepy domain filter (no www)
|
from typing import Final
|
||||||
|
|
||||||
|
BASE_URL: Final[str] = "https://www.jailbirdz.com"
|
||||||
|
COOKIE_DOMAIN: Final[str] = "jailbirdz.com" # rookiepy domain filter (no www)
|
||||||
|
|||||||
167
download.py
167
download.py
@@ -17,6 +17,8 @@ import re
|
|||||||
import shutil
|
import shutil
|
||||||
from collections import defaultdict
|
from collections import defaultdict
|
||||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||||
|
from typing import Any, Optional
|
||||||
|
import requests
|
||||||
|
|
||||||
from check_clashes import (
|
from check_clashes import (
|
||||||
make_session,
|
make_session,
|
||||||
@@ -25,32 +27,35 @@ from check_clashes import (
|
|||||||
find_clashes,
|
find_clashes,
|
||||||
build_download_paths,
|
build_download_paths,
|
||||||
fetch_sizes,
|
fetch_sizes,
|
||||||
|
load_video_map,
|
||||||
|
is_valid_url,
|
||||||
|
VIDEO_MAP_FILE,
|
||||||
)
|
)
|
||||||
|
|
||||||
VIDEO_MAP_FILE = "video_map.json"
|
|
||||||
CHUNK_SIZE = 8 * 1024 * 1024
|
CHUNK_SIZE = 8 * 1024 * 1024
|
||||||
DEFAULT_OUTPUT = "downloads"
|
DEFAULT_OUTPUT: str = "downloads"
|
||||||
DEFAULT_WORKERS = 4
|
DEFAULT_WORKERS: int = 4
|
||||||
MODE_FILE = ".naming_mode"
|
MODE_FILE: str = ".naming_mode"
|
||||||
MODE_ORIGINAL = "original"
|
MODE_ORIGINAL: str = "original"
|
||||||
MODE_TITLE = "title"
|
MODE_TITLE: str = "title"
|
||||||
|
|
||||||
|
|
||||||
# ── Naming mode persistence ──────────────────────────────────────────
|
# ── Naming mode persistence ──────────────────────────────────────────
|
||||||
|
|
||||||
def read_mode(output_dir):
|
|
||||||
|
def read_mode(output_dir: str | Path) -> Optional[str]:
|
||||||
p = Path(output_dir) / MODE_FILE
|
p = Path(output_dir) / MODE_FILE
|
||||||
if p.exists():
|
if p.exists():
|
||||||
return p.read_text().strip()
|
return p.read_text().strip()
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def write_mode(output_dir, mode):
|
def write_mode(output_dir: str | Path, mode: str) -> None:
|
||||||
Path(output_dir).mkdir(parents=True, exist_ok=True)
|
Path(output_dir).mkdir(parents=True, exist_ok=True)
|
||||||
(Path(output_dir) / MODE_FILE).write_text(mode)
|
(Path(output_dir) / MODE_FILE).write_text(mode)
|
||||||
|
|
||||||
|
|
||||||
def resolve_mode(args):
|
def resolve_mode(args: argparse.Namespace) -> str:
|
||||||
"""Determine naming mode from CLI flags + saved marker. Returns mode string."""
|
"""Determine naming mode from CLI flags + saved marker. Returns mode string."""
|
||||||
saved = read_mode(args.output)
|
saved = read_mode(args.output)
|
||||||
|
|
||||||
@@ -69,13 +74,18 @@ def resolve_mode(args):
|
|||||||
|
|
||||||
# ── Filename helpers ─────────────────────────────────────────────────
|
# ── Filename helpers ─────────────────────────────────────────────────
|
||||||
|
|
||||||
def sanitize_filename(title, max_len=180):
|
|
||||||
name = re.sub(r'[<>:"/\\|?*]', '', title)
|
def sanitize_filename(title: str, max_len: int = 180) -> str:
|
||||||
name = re.sub(r'\s+', ' ', name).strip().rstrip('.')
|
name = re.sub(r'[<>:"/\\|?*]', "", title)
|
||||||
|
name = re.sub(r"\s+", " ", name).strip().rstrip(".")
|
||||||
return name[:max_len].rstrip() if len(name) > max_len else name
|
return name[:max_len].rstrip() if len(name) > max_len else name
|
||||||
|
|
||||||
|
|
||||||
def build_title_paths(urls, url_to_title, output_dir):
|
def build_title_paths(
|
||||||
|
urls: list[str],
|
||||||
|
url_to_title: dict[str, str],
|
||||||
|
output_dir: str | Path,
|
||||||
|
) -> dict[str, Path]:
|
||||||
name_to_urls = defaultdict(list)
|
name_to_urls = defaultdict(list)
|
||||||
url_to_base = {}
|
url_to_base = {}
|
||||||
|
|
||||||
@@ -91,14 +101,19 @@ def build_title_paths(urls, url_to_title, output_dir):
|
|||||||
base, ext = url_to_base[url]
|
base, ext = url_to_base[url]
|
||||||
full = base + ext
|
full = base + ext
|
||||||
if len(name_to_urls[full]) > 1:
|
if len(name_to_urls[full]) > 1:
|
||||||
slug = url_to_filename(url).rsplit('.', 1)[0]
|
slug = url_to_filename(url).rsplit(".", 1)[0]
|
||||||
paths[url] = Path(output_dir) / f"{base} [{slug}]{ext}"
|
paths[url] = Path(output_dir) / f"{base} [{slug}]{ext}"
|
||||||
else:
|
else:
|
||||||
paths[url] = Path(output_dir) / full
|
paths[url] = Path(output_dir) / full
|
||||||
return paths
|
return paths
|
||||||
|
|
||||||
|
|
||||||
def get_paths_for_mode(mode, urls, video_map, output_dir):
|
def get_paths_for_mode(
|
||||||
|
mode: str,
|
||||||
|
urls: list[str],
|
||||||
|
video_map: dict[str, Any],
|
||||||
|
output_dir: str | Path,
|
||||||
|
) -> dict[str, Path]:
|
||||||
if mode == MODE_TITLE:
|
if mode == MODE_TITLE:
|
||||||
url_title = build_url_title_map(video_map)
|
url_title = build_url_title_map(video_map)
|
||||||
return build_title_paths(urls, url_title, output_dir)
|
return build_title_paths(urls, url_title, output_dir)
|
||||||
@@ -107,7 +122,14 @@ def get_paths_for_mode(mode, urls, video_map, output_dir):
|
|||||||
|
|
||||||
# ── Reorganize ───────────────────────────────────────────────────────
|
# ── Reorganize ───────────────────────────────────────────────────────
|
||||||
|
|
||||||
def reorganize(urls, video_map, output_dir, target_mode, dry_run=False):
|
|
||||||
|
def reorganize(
|
||||||
|
urls: list[str],
|
||||||
|
video_map: dict[str, Any],
|
||||||
|
output_dir: str | Path,
|
||||||
|
target_mode: str,
|
||||||
|
dry_run: bool = False,
|
||||||
|
) -> None:
|
||||||
"""Rename existing files from one naming scheme to another."""
|
"""Rename existing files from one naming scheme to another."""
|
||||||
other_mode = MODE_TITLE if target_mode == MODE_ORIGINAL else MODE_ORIGINAL
|
other_mode = MODE_TITLE if target_mode == MODE_ORIGINAL else MODE_ORIGINAL
|
||||||
old_paths = get_paths_for_mode(other_mode, urls, video_map, output_dir)
|
old_paths = get_paths_for_mode(other_mode, urls, video_map, output_dir)
|
||||||
@@ -163,21 +185,27 @@ def reorganize(urls, video_map, output_dir, target_mode, dry_run=False):
|
|||||||
|
|
||||||
# ── Download ─────────────────────────────────────────────────────────
|
# ── Download ─────────────────────────────────────────────────────────
|
||||||
|
|
||||||
def download_one(session, url, dest, expected_size):
|
|
||||||
|
def download_one(
|
||||||
|
session: requests.Session,
|
||||||
|
url: str,
|
||||||
|
dest: str | Path,
|
||||||
|
expected_size: Optional[int],
|
||||||
|
) -> tuple[str, int]:
|
||||||
dest = Path(dest)
|
dest = Path(dest)
|
||||||
part = dest.parent / (dest.name + ".part")
|
part = dest.parent / (dest.name + ".part")
|
||||||
dest.parent.mkdir(parents=True, exist_ok=True)
|
dest.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
if dest.exists():
|
if dest.exists():
|
||||||
local = dest.stat().st_size
|
local = dest.stat().st_size
|
||||||
if expected_size and local == expected_size:
|
if expected_size is not None and local == expected_size:
|
||||||
return "ok", 0
|
return "ok", 0
|
||||||
if expected_size and local != expected_size:
|
if expected_size is not None and local != expected_size:
|
||||||
dest.unlink()
|
dest.unlink()
|
||||||
|
|
||||||
existing = part.stat().st_size if part.exists() else 0
|
existing = part.stat().st_size if part.exists() else 0
|
||||||
headers = {}
|
headers = {}
|
||||||
if existing and expected_size and existing < expected_size:
|
if existing and expected_size is not None and existing < expected_size:
|
||||||
headers["Range"] = f"bytes={existing}-"
|
headers["Range"] = f"bytes={existing}-"
|
||||||
|
|
||||||
try:
|
try:
|
||||||
@@ -205,33 +233,21 @@ def download_one(session, url, dest, expected_size):
|
|||||||
return f"error: {e}", written
|
return f"error: {e}", written
|
||||||
|
|
||||||
final_size = existing + written
|
final_size = existing + written
|
||||||
if expected_size and final_size != expected_size:
|
if expected_size is not None and final_size != expected_size:
|
||||||
return "size_mismatch", written
|
return "size_mismatch", written
|
||||||
|
|
||||||
part.rename(dest)
|
part.rename(dest)
|
||||||
return "ok", written
|
return "ok", written
|
||||||
|
|
||||||
|
|
||||||
# ── Data loading ─────────────────────────────────────────────────────
|
def collect_urls(video_map: dict[str, Any]) -> list[str]:
|
||||||
|
|
||||||
def load_video_map():
|
|
||||||
with open(VIDEO_MAP_FILE, encoding="utf-8") as f:
|
|
||||||
return json.load(f)
|
|
||||||
|
|
||||||
|
|
||||||
def _is_valid_url(url):
|
|
||||||
return url.startswith(
|
|
||||||
"http") and "<" not in url and ">" not in url and " href=" not in url
|
|
||||||
|
|
||||||
|
|
||||||
def collect_urls(video_map):
|
|
||||||
urls, seen, skipped = [], set(), 0
|
urls, seen, skipped = [], set(), 0
|
||||||
for entry in video_map.values():
|
for entry in video_map.values():
|
||||||
for video_url in entry.get("videos", []):
|
for video_url in entry.get("videos", []):
|
||||||
if video_url in seen:
|
if video_url in seen:
|
||||||
continue
|
continue
|
||||||
seen.add(video_url)
|
seen.add(video_url)
|
||||||
if _is_valid_url(video_url):
|
if is_valid_url(video_url):
|
||||||
urls.append(video_url)
|
urls.append(video_url)
|
||||||
else:
|
else:
|
||||||
skipped += 1
|
skipped += 1
|
||||||
@@ -240,7 +256,7 @@ def collect_urls(video_map):
|
|||||||
return urls
|
return urls
|
||||||
|
|
||||||
|
|
||||||
def build_url_title_map(video_map):
|
def build_url_title_map(video_map: dict[str, Any]) -> dict[str, str]:
|
||||||
url_title = {}
|
url_title = {}
|
||||||
for entry in video_map.values():
|
for entry in video_map.values():
|
||||||
title = entry.get("title", "")
|
title = entry.get("title", "")
|
||||||
@@ -252,24 +268,44 @@ def build_url_title_map(video_map):
|
|||||||
|
|
||||||
# ── Main ─────────────────────────────────────────────────────────────
|
# ── Main ─────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
def main():
|
|
||||||
parser = argparse.ArgumentParser(
|
def main() -> None:
|
||||||
description="Download videos from video_map.json")
|
parser = argparse.ArgumentParser(description="Download videos from video_map.json")
|
||||||
parser.add_argument("--output", "-o", default=DEFAULT_OUTPUT,
|
parser.add_argument(
|
||||||
help=f"Download directory (default: {DEFAULT_OUTPUT})")
|
"--output",
|
||||||
|
"-o",
|
||||||
|
default=DEFAULT_OUTPUT,
|
||||||
|
help=f"Download directory (default: {DEFAULT_OUTPUT})",
|
||||||
|
)
|
||||||
|
|
||||||
naming = parser.add_mutually_exclusive_group()
|
naming = parser.add_mutually_exclusive_group()
|
||||||
naming.add_argument("--titles", "-t", action="store_true",
|
naming.add_argument(
|
||||||
help="Use title-based filenames (saved as default for this directory)")
|
"--titles",
|
||||||
naming.add_argument("--original", action="store_true",
|
"-t",
|
||||||
help="Use original CloudFront filenames (saved as default for this directory)")
|
action="store_true",
|
||||||
|
help="Use title-based filenames (saved as default for this directory)",
|
||||||
|
)
|
||||||
|
naming.add_argument(
|
||||||
|
"--original",
|
||||||
|
action="store_true",
|
||||||
|
help="Use original CloudFront filenames (saved as default for this directory)",
|
||||||
|
)
|
||||||
|
|
||||||
parser.add_argument("--reorganize", action="store_true",
|
parser.add_argument(
|
||||||
help="Rename existing files to match the current naming mode")
|
"--reorganize",
|
||||||
parser.add_argument("--dry-run", "-n", action="store_true",
|
action="store_true",
|
||||||
help="Preview without making changes")
|
help="Rename existing files to match the current naming mode",
|
||||||
parser.add_argument("--workers", "-w", type=int, default=DEFAULT_WORKERS,
|
)
|
||||||
help=f"Concurrent downloads (default: {DEFAULT_WORKERS})")
|
parser.add_argument(
|
||||||
|
"--dry-run", "-n", action="store_true", help="Preview without making changes"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--workers",
|
||||||
|
"-w",
|
||||||
|
type=int,
|
||||||
|
default=DEFAULT_WORKERS,
|
||||||
|
help=f"Concurrent downloads (default: {DEFAULT_WORKERS})",
|
||||||
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
video_map = load_video_map()
|
video_map = load_video_map()
|
||||||
@@ -287,7 +323,8 @@ def main():
|
|||||||
if mode_changed and not args.reorganize:
|
if mode_changed and not args.reorganize:
|
||||||
print(f"\n[!] Mode changed from '{saved}' to '{mode}'.")
|
print(f"\n[!] Mode changed from '{saved}' to '{mode}'.")
|
||||||
print(
|
print(
|
||||||
" Use --reorganize to rename existing files, or --dry-run to preview.")
|
" Use --reorganize to rename existing files, or --dry-run to preview."
|
||||||
|
)
|
||||||
print(" Refusing to download until existing files are reorganized.")
|
print(" Refusing to download until existing files are reorganized.")
|
||||||
return
|
return
|
||||||
reorganize(urls, video_map, args.output, mode, dry_run=args.dry_run)
|
reorganize(urls, video_map, args.output, mode, dry_run=args.dry_run)
|
||||||
@@ -303,7 +340,8 @@ def main():
|
|||||||
clashes = find_clashes(urls)
|
clashes = find_clashes(urls)
|
||||||
if clashes:
|
if clashes:
|
||||||
print(
|
print(
|
||||||
f"[+] {len(clashes)} filename clash(es) resolved with subfolders/suffixes")
|
f"[+] {len(clashes)} filename clash(es) resolved with subfolders/suffixes"
|
||||||
|
)
|
||||||
|
|
||||||
already = [u for u in urls if paths[u].exists()]
|
already = [u for u in urls if paths[u].exists()]
|
||||||
pending = [u for u in urls if not paths[u].exists()]
|
pending = [u for u in urls if not paths[u].exists()]
|
||||||
@@ -316,8 +354,7 @@ def main():
|
|||||||
return
|
return
|
||||||
|
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
print(
|
print(f"\n[dry-run] Would download {len(pending)} files to {args.output}/")
|
||||||
f"\n[dry-run] Would download {len(pending)} files to {args.output}/")
|
|
||||||
for url in pending[:20]:
|
for url in pending[:20]:
|
||||||
print(f" → {paths[url].name}")
|
print(f" → {paths[url].name}")
|
||||||
if len(pending) > 20:
|
if len(pending) > 20:
|
||||||
@@ -330,8 +367,7 @@ def main():
|
|||||||
|
|
||||||
sized = {u: s for u, s in remote_sizes.items() if s is not None}
|
sized = {u: s for u, s in remote_sizes.items() if s is not None}
|
||||||
total_bytes = sum(sized.values())
|
total_bytes = sum(sized.values())
|
||||||
print(
|
print(f"[+] Download size: {fmt_size(total_bytes)} across {len(pending)} files")
|
||||||
f"[+] Download size: {fmt_size(total_bytes)} across {len(pending)} files")
|
|
||||||
|
|
||||||
if already:
|
if already:
|
||||||
print(f"[+] Verifying {len(already)} existing files…")
|
print(f"[+] Verifying {len(already)} existing files…")
|
||||||
@@ -344,14 +380,15 @@ def main():
|
|||||||
remote = already_sizes.get(url)
|
remote = already_sizes.get(url)
|
||||||
if remote and local != remote:
|
if remote and local != remote:
|
||||||
mismatched += 1
|
mismatched += 1
|
||||||
print(f"[!] Size mismatch: {dest.name} "
|
print(
|
||||||
f"(local {fmt_size(local)} vs remote {fmt_size(remote)})")
|
f"[!] Size mismatch: {dest.name} "
|
||||||
|
f"(local {fmt_size(local)} vs remote {fmt_size(remote)})"
|
||||||
|
)
|
||||||
pending.append(url)
|
pending.append(url)
|
||||||
remote_sizes[url] = remote
|
remote_sizes[url] = remote
|
||||||
|
|
||||||
if mismatched:
|
if mismatched:
|
||||||
print(
|
print(f"[!] {mismatched} file(s) will be re-downloaded due to size mismatch")
|
||||||
f"[!] {mismatched} file(s) will be re-downloaded due to size mismatch")
|
|
||||||
|
|
||||||
print(f"\n[⚡] Downloading with {args.workers} threads…\n")
|
print(f"\n[⚡] Downloading with {args.workers} threads…\n")
|
||||||
|
|
||||||
@@ -361,7 +398,7 @@ def main():
|
|||||||
total = len(pending)
|
total = len(pending)
|
||||||
interrupted = False
|
interrupted = False
|
||||||
|
|
||||||
def do_download(url):
|
def do_download(url: str) -> tuple[str, tuple[str, int]]:
|
||||||
dest = paths[url]
|
dest = paths[url]
|
||||||
expected = remote_sizes.get(url)
|
expected = remote_sizes.get(url)
|
||||||
return url, download_one(session, url, dest, expected)
|
return url, download_one(session, url, dest, expected)
|
||||||
@@ -376,11 +413,9 @@ def main():
|
|||||||
name = paths[url].name
|
name = paths[url].name
|
||||||
|
|
||||||
if status == "ok" and written > 0:
|
if status == "ok" and written > 0:
|
||||||
print(
|
print(f" [{completed}/{total}] ✓ {name} ({fmt_size(written)})")
|
||||||
f" [{completed}/{total}] ✓ {name} ({fmt_size(written)})")
|
|
||||||
elif status == "ok":
|
elif status == "ok":
|
||||||
print(
|
print(f" [{completed}/{total}] ✓ {name} (already complete)")
|
||||||
f" [{completed}/{total}] ✓ {name} (already complete)")
|
|
||||||
elif status == "size_mismatch":
|
elif status == "size_mismatch":
|
||||||
print(f" [{completed}/{total}] ⚠ {name} (size mismatch)")
|
print(f" [{completed}/{total}] ⚠ {name} (size mismatch)")
|
||||||
failed.append(url)
|
failed.append(url)
|
||||||
|
|||||||
130
grab_cookie.py
130
grab_cookie.py
@@ -1,53 +1,25 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
"""
|
"""
|
||||||
grab_cookie.py — read the WordPress login cookie from an
|
grab_cookie.py — log in to jailbirdz.com and write the session cookie to .env.
|
||||||
installed browser and write it to .env as WP_LOGIN_COOKIE=name=value.
|
|
||||||
|
Requires WP_USERNAME and WP_PASSWORD to be set in the environment or .env.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
python grab_cookie.py # tries Firefox, Chrome, Edge, Brave
|
python grab_cookie.py
|
||||||
python grab_cookie.py --browser firefox # explicit browser
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import argparse
|
import os
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from config import COOKIE_DOMAIN
|
from typing import Literal
|
||||||
|
import requests
|
||||||
|
from config import BASE_URL
|
||||||
|
|
||||||
ENV_FILE = Path(".env")
|
ENV_FILE = Path(".env")
|
||||||
ENV_KEY = "WP_LOGIN_COOKIE"
|
ENV_KEY = "WP_LOGIN_COOKIE"
|
||||||
COOKIE_PREFIX = "wordpress_logged_in_"
|
COOKIE_PREFIX = "wordpress_logged_in_"
|
||||||
|
|
||||||
BROWSER_NAMES = ["firefox", "chrome", "edge", "brave"]
|
|
||||||
|
|
||||||
|
def update_env(name: str, value: str) -> Literal["updated", "appended", "created"]:
|
||||||
def find_cookie(browser_name):
|
|
||||||
"""Return (name, value) for the wordpress_logged_in_* cookie, or (None, None)."""
|
|
||||||
try:
|
|
||||||
import rookiepy
|
|
||||||
except ImportError:
|
|
||||||
raise ImportError("rookiepy not installed — run: pip install rookiepy")
|
|
||||||
|
|
||||||
fn = getattr(rookiepy, browser_name, None)
|
|
||||||
if fn is None:
|
|
||||||
raise ValueError(f"rookiepy does not support '{browser_name}'.")
|
|
||||||
|
|
||||||
try:
|
|
||||||
cookies = fn([COOKIE_DOMAIN])
|
|
||||||
except PermissionError:
|
|
||||||
raise PermissionError(
|
|
||||||
f"Permission denied reading {browser_name} cookies.\n"
|
|
||||||
" Close the browser, or on Windows run as Administrator for Chrome/Edge."
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
raise RuntimeError(f"Could not read {browser_name} cookies: {e}")
|
|
||||||
|
|
||||||
for c in cookies:
|
|
||||||
if c.get("name", "").startswith(COOKIE_PREFIX):
|
|
||||||
return c["name"], c["value"]
|
|
||||||
|
|
||||||
return None, None
|
|
||||||
|
|
||||||
|
|
||||||
def update_env(name, value):
|
|
||||||
"""Write WP_LOGIN_COOKIE=name=value into .env, replacing any existing line."""
|
"""Write WP_LOGIN_COOKIE=name=value into .env, replacing any existing line."""
|
||||||
new_line = f"{ENV_KEY}={name}={value}\n"
|
new_line = f"{ENV_KEY}={name}={value}\n"
|
||||||
|
|
||||||
@@ -55,7 +27,8 @@ def update_env(name, value):
|
|||||||
text = ENV_FILE.read_text(encoding="utf-8")
|
text = ENV_FILE.read_text(encoding="utf-8")
|
||||||
lines = text.splitlines(keepends=True)
|
lines = text.splitlines(keepends=True)
|
||||||
for i, line in enumerate(lines):
|
for i, line in enumerate(lines):
|
||||||
if line.startswith(f"{ENV_KEY}=") or line.strip() == ENV_KEY:
|
key, sep, _ = line.partition("=")
|
||||||
|
if key.strip() == ENV_KEY and sep:
|
||||||
lines[i] = new_line
|
lines[i] = new_line
|
||||||
ENV_FILE.write_text("".join(lines), encoding="utf-8")
|
ENV_FILE.write_text("".join(lines), encoding="utf-8")
|
||||||
return "updated"
|
return "updated"
|
||||||
@@ -69,46 +42,65 @@ def update_env(name, value):
|
|||||||
return "created"
|
return "created"
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def login_and_get_cookie(username: str, password: str) -> tuple[str, str]:
|
||||||
parser = argparse.ArgumentParser(
|
"""POST to wp-admin/admin-ajax.php (xootix action) and return (cookie_name, cookie_value).
|
||||||
description=f"Copy the {COOKIE_DOMAIN} login cookie from your browser into .env."
|
|
||||||
|
No browser needed — the xootix login endpoint takes plain form fields and returns
|
||||||
|
the wordpress_logged_in_* cookie directly in the response Set-Cookie headers.
|
||||||
|
"""
|
||||||
|
session = requests.Session()
|
||||||
|
r = session.post(
|
||||||
|
f"{BASE_URL}/wp-admin/admin-ajax.php",
|
||||||
|
data={
|
||||||
|
"xoo-el-username": username,
|
||||||
|
"xoo-el-password": password,
|
||||||
|
"xoo-el-rememberme": "forever",
|
||||||
|
"_xoo_el_form": "login",
|
||||||
|
"xoo_el_redirect": "/",
|
||||||
|
"action": "xoo_el_form_action",
|
||||||
|
"display": "popup",
|
||||||
|
},
|
||||||
|
headers={
|
||||||
|
"Referer": f"{BASE_URL}/",
|
||||||
|
"Origin": BASE_URL,
|
||||||
|
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:147.0) Gecko/20100101 Firefox/147.0",
|
||||||
|
},
|
||||||
|
timeout=30,
|
||||||
)
|
)
|
||||||
parser.add_argument(
|
r.raise_for_status()
|
||||||
"--browser", "-b",
|
result = r.json()
|
||||||
choices=BROWSER_NAMES,
|
if result.get("error"):
|
||||||
metavar="BROWSER",
|
raise RuntimeError(f"Login rejected by server: {result.get('notice', result)}")
|
||||||
help=f"Browser to read from: {', '.join(BROWSER_NAMES)} (default: try all in order)",
|
|
||||||
|
for name, value in session.cookies.items():
|
||||||
|
if name.startswith(COOKIE_PREFIX):
|
||||||
|
return name, value
|
||||||
|
|
||||||
|
raise RuntimeError(
|
||||||
|
"Server accepted login but no wordpress_logged_in_* cookie was set.\n"
|
||||||
|
" Check that WP_USERNAME and WP_PASSWORD are correct."
|
||||||
)
|
)
|
||||||
args = parser.parse_args()
|
|
||||||
|
|
||||||
order = [args.browser] if args.browser else BROWSER_NAMES
|
|
||||||
|
|
||||||
cookie_name = cookie_value = None
|
def _auto_login() -> None:
|
||||||
for browser in order:
|
username = os.environ.get("WP_USERNAME", "").strip()
|
||||||
print(f"[…] Trying {browser}…")
|
password = os.environ.get("WP_PASSWORD", "").strip()
|
||||||
try:
|
if not username or not password:
|
||||||
cookie_name, cookie_value = find_cookie(browser)
|
|
||||||
except ImportError as e:
|
|
||||||
raise SystemExit(f"[!] {e}")
|
|
||||||
except (ValueError, PermissionError, RuntimeError) as e:
|
|
||||||
print(f"[!] {e}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
if cookie_name:
|
|
||||||
print(f"[+] Found in {browser}: {cookie_name}")
|
|
||||||
break
|
|
||||||
print(f" No {COOKIE_PREFIX}* cookie found in {browser}.")
|
|
||||||
|
|
||||||
if not cookie_name:
|
|
||||||
raise SystemExit(
|
raise SystemExit(
|
||||||
f"\n[!] No {COOKIE_PREFIX}* cookie found in any browser.\n"
|
"[!] WP_USERNAME and WP_PASSWORD must be set in the environment or .env — see .env.example."
|
||||||
f" Make sure you are logged into {COOKIE_DOMAIN}, then re-run.\n"
|
|
||||||
" Or set WP_LOGIN_COOKIE manually in .env — see .env.example."
|
|
||||||
)
|
)
|
||||||
|
try:
|
||||||
|
cookie_name, cookie_value = login_and_get_cookie(username, password)
|
||||||
|
except RuntimeError as e:
|
||||||
|
raise SystemExit(f"[!] {e}")
|
||||||
|
print(f"[+] Login succeeded: {cookie_name}")
|
||||||
action = update_env(cookie_name, cookie_value)
|
action = update_env(cookie_name, cookie_value)
|
||||||
print(f"[✓] {ENV_KEY} {action} in {ENV_FILE}.")
|
print(f"[✓] {ENV_KEY} {action} in {ENV_FILE}.")
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
_auto_login()
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|||||||
340
main.py
340
main.py
@@ -7,27 +7,36 @@ import asyncio
|
|||||||
import tempfile
|
import tempfile
|
||||||
import requests
|
import requests
|
||||||
from pathlib import Path, PurePosixPath
|
from pathlib import Path, PurePosixPath
|
||||||
|
from typing import Any, Optional
|
||||||
from urllib.parse import urlparse
|
from urllib.parse import urlparse
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
from playwright.async_api import async_playwright
|
from playwright.async_api import async_playwright, BrowserContext
|
||||||
from check_clashes import VIDEO_EXTS
|
from check_clashes import VIDEO_EXTS, load_video_map, is_valid_url, VIDEO_MAP_FILE, expects_video
|
||||||
from config import BASE_URL
|
from config import BASE_URL
|
||||||
|
from grab_cookie import login_and_get_cookie, update_env
|
||||||
|
|
||||||
load_dotenv()
|
load_dotenv()
|
||||||
|
|
||||||
|
|
||||||
def _is_video_url(url):
|
def _is_video_url(url: str) -> bool:
|
||||||
"""True if `url` ends with a recognised video extension (case-insensitive, path only)."""
|
"""True if `url` ends with a recognised video extension (case-insensitive, path only)."""
|
||||||
return PurePosixPath(urlparse(url).path).suffix.lower() in VIDEO_EXTS
|
return PurePosixPath(urlparse(url).path).suffix.lower() in VIDEO_EXTS
|
||||||
|
|
||||||
|
|
||||||
WP_API = f"{BASE_URL}/wp-json/wp/v2"
|
WP_API = f"{BASE_URL}/wp-json/wp/v2"
|
||||||
|
|
||||||
SKIP_TYPES = {
|
SKIP_TYPES = {
|
||||||
"attachment", "nav_menu_item", "wp_block", "wp_template",
|
"attachment",
|
||||||
"wp_template_part", "wp_global_styles", "wp_navigation",
|
"nav_menu_item",
|
||||||
"wp_font_family", "wp_font_face",
|
"wp_block",
|
||||||
|
"wp_template",
|
||||||
|
"wp_template_part",
|
||||||
|
"wp_global_styles",
|
||||||
|
"wp_navigation",
|
||||||
|
"wp_font_family",
|
||||||
|
"wp_font_face",
|
||||||
}
|
}
|
||||||
|
|
||||||
VIDEO_MAP_FILE = "video_map.json"
|
|
||||||
MAX_WORKERS = 4
|
MAX_WORKERS = 4
|
||||||
|
|
||||||
API_HEADERS = {
|
API_HEADERS = {
|
||||||
@@ -37,22 +46,53 @@ API_HEADERS = {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def _get_login_cookie():
|
def _probe_cookie(name: str, value: str) -> bool:
|
||||||
raw = os.environ.get("WP_LOGIN_COOKIE", "").strip() # strip accidental whitespace
|
"""HEAD request to a members-only video page. Returns True if the cookie is still valid."""
|
||||||
if not raw:
|
video_map = load_video_map()
|
||||||
raise RuntimeError(
|
probe_url = next((url for url in video_map if expects_video(url)), None)
|
||||||
"WP_LOGIN_COOKIE not set. Copy it from your browser into .env — see .env.example.")
|
if probe_url is None:
|
||||||
|
return False # no video URLs yet — can't validate, fall through to re-auth
|
||||||
|
r = requests.head(
|
||||||
|
probe_url,
|
||||||
|
headers={"Cookie": f"{name}={value}", "User-Agent": API_HEADERS["User-Agent"]},
|
||||||
|
allow_redirects=False,
|
||||||
|
timeout=10,
|
||||||
|
)
|
||||||
|
return r.status_code == 200
|
||||||
|
|
||||||
|
|
||||||
|
def _get_login_cookie() -> tuple[str, str]:
|
||||||
|
username = os.environ.get("WP_USERNAME", "").strip()
|
||||||
|
password = os.environ.get("WP_PASSWORD", "").strip()
|
||||||
|
has_credentials = bool(username and password)
|
||||||
|
|
||||||
|
raw = os.environ.get("WP_LOGIN_COOKIE", "").strip()
|
||||||
|
if raw:
|
||||||
name, _, value = raw.partition("=")
|
name, _, value = raw.partition("=")
|
||||||
if not value:
|
if value and name.startswith("wordpress_logged_in_"):
|
||||||
raise RuntimeError(
|
if not has_credentials:
|
||||||
"WP_LOGIN_COOKIE looks malformed (no '=' found). Expected: name=value")
|
return name, value # cookie-only mode — trust it
|
||||||
if not name.startswith("wordpress_logged_in_"):
|
print("[+] Cookie found — validating…")
|
||||||
raise RuntimeError(
|
if _probe_cookie(name, value):
|
||||||
"WP_LOGIN_COOKIE doesn't look right — expected a wordpress_logged_in_... cookie.")
|
print("[✓] Cookie still valid — skipping login.")
|
||||||
return name, value
|
return name, value
|
||||||
|
print("[!] Cookie expired — re-authenticating…")
|
||||||
|
|
||||||
|
if has_credentials:
|
||||||
|
cookie_name, cookie_value = login_and_get_cookie(username, password)
|
||||||
|
action = update_env(cookie_name, cookie_value)
|
||||||
|
print(f"[✓] Logged in: {cookie_name} ({action} in .env)")
|
||||||
|
return cookie_name, cookie_value
|
||||||
|
|
||||||
|
raise RuntimeError(
|
||||||
|
"No credentials or cookie found. Set either:\n"
|
||||||
|
" • WP_USERNAME + WP_PASSWORD (recommended — always gets a fresh cookie)\n"
|
||||||
|
" • WP_LOGIN_COOKIE (fallback — may expire mid-run)\n"
|
||||||
|
"See .env.example."
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def discover_content_types(session):
|
def discover_content_types(session: requests.Session) -> list[tuple[str, str, str]]:
|
||||||
"""Query /wp-json/wp/v2/types and return a list of (name, rest_base, type_slug) for content types worth scraping."""
|
"""Query /wp-json/wp/v2/types and return a list of (name, rest_base, type_slug) for content types worth scraping."""
|
||||||
r = session.get(f"{WP_API}/types", timeout=30)
|
r = session.get(f"{WP_API}/types", timeout=30)
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
@@ -69,7 +109,12 @@ def discover_content_types(session):
|
|||||||
return targets
|
return targets
|
||||||
|
|
||||||
|
|
||||||
def fetch_all_posts_for_type(session, type_name, rest_base, type_slug):
|
def fetch_all_posts_for_type(
|
||||||
|
session: requests.Session,
|
||||||
|
type_name: str,
|
||||||
|
rest_base: str,
|
||||||
|
type_slug: str,
|
||||||
|
) -> list[tuple[str, str, str]]:
|
||||||
"""Paginate one content type and return (url, title, description) tuples.
|
"""Paginate one content type and return (url, title, description) tuples.
|
||||||
Uses the `link` field when available; falls back to building from slug."""
|
Uses the `link` field when available; falls back to building from slug."""
|
||||||
url_prefix = type_slug.replace("_", "-")
|
url_prefix = type_slug.replace("_", "-")
|
||||||
@@ -96,11 +141,15 @@ def fetch_all_posts_for_type(session, type_name, rest_base, type_slug):
|
|||||||
else:
|
else:
|
||||||
continue
|
continue
|
||||||
title_obj = post.get("title", {})
|
title_obj = post.get("title", {})
|
||||||
title = title_obj.get("rendered", "") if isinstance(
|
title = (
|
||||||
title_obj, dict) else str(title_obj)
|
title_obj.get("rendered", "")
|
||||||
|
if isinstance(title_obj, dict)
|
||||||
|
else str(title_obj)
|
||||||
|
)
|
||||||
content_obj = post.get("content", {})
|
content_obj = post.get("content", {})
|
||||||
content_html = content_obj.get(
|
content_html = (
|
||||||
"rendered", "") if isinstance(content_obj, dict) else ""
|
content_obj.get("rendered", "") if isinstance(content_obj, dict) else ""
|
||||||
|
)
|
||||||
description = html_to_text(content_html) if content_html else ""
|
description = html_to_text(content_html) if content_html else ""
|
||||||
results.append((link, title, description))
|
results.append((link, title, description))
|
||||||
print(f" {type_name} page {page}: {len(data)} items")
|
print(f" {type_name} page {page}: {len(data)} items")
|
||||||
@@ -109,21 +158,25 @@ def fetch_all_posts_for_type(session, type_name, rest_base, type_slug):
|
|||||||
return results
|
return results
|
||||||
|
|
||||||
|
|
||||||
def fetch_post_urls_from_api(headers):
|
def fetch_post_urls_from_api(headers: dict[str, str]) -> list[str]:
|
||||||
"""Auto-discover all content types via the WP REST API and collect every post URL.
|
"""Auto-discover all content types via the WP REST API and collect every post URL.
|
||||||
Also builds video_map.json with titles pre-populated."""
|
Also builds video_map.json with titles pre-populated."""
|
||||||
print("[+] video_map.json empty or missing — discovering content types from REST API…")
|
print(
|
||||||
|
"[+] video_map.json empty or missing — discovering content types from REST API…"
|
||||||
|
)
|
||||||
session = requests.Session()
|
session = requests.Session()
|
||||||
session.headers.update(headers)
|
session.headers.update(headers)
|
||||||
|
|
||||||
targets = discover_content_types(session)
|
targets = discover_content_types(session)
|
||||||
print(
|
print(
|
||||||
f"[+] Found {len(targets)} content types: {', '.join(name for name, _, _ in targets)}\n")
|
f"[+] Found {len(targets)} content types: {', '.join(name for name, _, _ in targets)}\n"
|
||||||
|
)
|
||||||
|
|
||||||
all_results = []
|
all_results = []
|
||||||
for type_name, rest_base, type_slug in targets:
|
for type_name, rest_base, type_slug in targets:
|
||||||
type_results = fetch_all_posts_for_type(
|
type_results = fetch_all_posts_for_type(
|
||||||
session, type_name, rest_base, type_slug)
|
session, type_name, rest_base, type_slug
|
||||||
|
)
|
||||||
all_results.extend(type_results)
|
all_results.extend(type_results)
|
||||||
|
|
||||||
seen = set()
|
seen = set()
|
||||||
@@ -135,8 +188,11 @@ def fetch_post_urls_from_api(headers):
|
|||||||
seen.add(url)
|
seen.add(url)
|
||||||
deduped_urls.append(url)
|
deduped_urls.append(url)
|
||||||
if url not in video_map:
|
if url not in video_map:
|
||||||
video_map[url] = {"title": title,
|
video_map[url] = {
|
||||||
"description": description, "videos": []}
|
"title": title,
|
||||||
|
"description": description,
|
||||||
|
"videos": [],
|
||||||
|
}
|
||||||
else:
|
else:
|
||||||
if not video_map[url].get("title"):
|
if not video_map[url].get("title"):
|
||||||
video_map[url]["title"] = title
|
video_map[url]["title"] = title
|
||||||
@@ -145,18 +201,25 @@ def fetch_post_urls_from_api(headers):
|
|||||||
|
|
||||||
save_video_map(video_map)
|
save_video_map(video_map)
|
||||||
print(
|
print(
|
||||||
f"\n[+] Discovered {len(deduped_urls)} unique URLs → saved to {VIDEO_MAP_FILE}")
|
f"\n[+] Discovered {len(deduped_urls)} unique URLs → saved to {VIDEO_MAP_FILE}"
|
||||||
print(
|
)
|
||||||
f"[+] Pre-populated {len(video_map)} entries in {VIDEO_MAP_FILE}")
|
print(f"[+] Pre-populated {len(video_map)} entries in {VIDEO_MAP_FILE}")
|
||||||
return deduped_urls
|
return deduped_urls
|
||||||
|
|
||||||
|
|
||||||
def fetch_metadata_from_api(video_map, urls, headers):
|
def fetch_metadata_from_api(
|
||||||
|
video_map: dict[str, Any],
|
||||||
|
urls: list[str],
|
||||||
|
headers: dict[str, str],
|
||||||
|
) -> None:
|
||||||
"""Populate missing titles and descriptions in video_map from the REST API."""
|
"""Populate missing titles and descriptions in video_map from the REST API."""
|
||||||
missing = [u for u in urls
|
missing = [
|
||||||
|
u
|
||||||
|
for u in urls
|
||||||
if u not in video_map
|
if u not in video_map
|
||||||
or not video_map[u].get("title")
|
or not video_map[u].get("title")
|
||||||
or not video_map[u].get("description")]
|
or not video_map[u].get("description")
|
||||||
|
]
|
||||||
if not missing:
|
if not missing:
|
||||||
return
|
return
|
||||||
|
|
||||||
@@ -168,7 +231,8 @@ def fetch_metadata_from_api(video_map, urls, headers):
|
|||||||
|
|
||||||
for type_name, rest_base, type_slug in targets:
|
for type_name, rest_base, type_slug in targets:
|
||||||
type_results = fetch_all_posts_for_type(
|
type_results = fetch_all_posts_for_type(
|
||||||
session, type_name, rest_base, type_slug)
|
session, type_name, rest_base, type_slug
|
||||||
|
)
|
||||||
for url, title, description in type_results:
|
for url, title, description in type_results:
|
||||||
if url in video_map:
|
if url in video_map:
|
||||||
if not video_map[url].get("title"):
|
if not video_map[url].get("title"):
|
||||||
@@ -176,18 +240,20 @@ def fetch_metadata_from_api(video_map, urls, headers):
|
|||||||
if not video_map[url].get("description"):
|
if not video_map[url].get("description"):
|
||||||
video_map[url]["description"] = description
|
video_map[url]["description"] = description
|
||||||
else:
|
else:
|
||||||
video_map[url] = {"title": title,
|
video_map[url] = {
|
||||||
"description": description, "videos": []}
|
"title": title,
|
||||||
|
"description": description,
|
||||||
|
"videos": [],
|
||||||
|
}
|
||||||
|
|
||||||
save_video_map(video_map)
|
save_video_map(video_map)
|
||||||
populated_t = sum(1 for u in urls if video_map.get(u, {}).get("title"))
|
populated_t = sum(1 for u in urls if video_map.get(u, {}).get("title"))
|
||||||
populated_d = sum(1 for u in urls if video_map.get(
|
populated_d = sum(1 for u in urls if video_map.get(u, {}).get("description"))
|
||||||
u, {}).get("description"))
|
|
||||||
print(f"[+] Titles populated: {populated_t}/{len(urls)}")
|
print(f"[+] Titles populated: {populated_t}/{len(urls)}")
|
||||||
print(f"[+] Descriptions populated: {populated_d}/{len(urls)}")
|
print(f"[+] Descriptions populated: {populated_d}/{len(urls)}")
|
||||||
|
|
||||||
|
|
||||||
def load_post_urls(headers):
|
def load_post_urls(headers: dict[str, str]) -> list[str]:
|
||||||
vm = load_video_map()
|
vm = load_video_map()
|
||||||
if vm:
|
if vm:
|
||||||
print(f"[+] {VIDEO_MAP_FILE} found — loading {len(vm)} post URLs.")
|
print(f"[+] {VIDEO_MAP_FILE} found — loading {len(vm)} post URLs.")
|
||||||
@@ -195,48 +261,40 @@ def load_post_urls(headers):
|
|||||||
return fetch_post_urls_from_api(headers)
|
return fetch_post_urls_from_api(headers)
|
||||||
|
|
||||||
|
|
||||||
def html_to_text(html_str):
|
def html_to_text(html_str: str) -> str:
|
||||||
"""Strip HTML tags, decode entities, and collapse whitespace into clean plain text."""
|
"""Strip HTML tags, decode entities, and collapse whitespace into clean plain text."""
|
||||||
import html
|
import html
|
||||||
text = re.sub(r'<br\s*/?>', '\n', html_str)
|
|
||||||
text = text.replace('</p>', '\n\n')
|
text = re.sub(r"<br\s*/?>", "\n", html_str)
|
||||||
text = re.sub(r'<[^>]+>', '', text)
|
text = text.replace("</p>", "\n\n")
|
||||||
|
text = re.sub(r"<[^>]+>", "", text)
|
||||||
text = html.unescape(text)
|
text = html.unescape(text)
|
||||||
lines = [line.strip() for line in text.splitlines()]
|
lines = [line.strip() for line in text.splitlines()]
|
||||||
text = '\n'.join(lines)
|
text = "\n".join(lines)
|
||||||
text = re.sub(r'\n{3,}', '\n\n', text)
|
text = re.sub(r"\n{3,}", "\n\n", text)
|
||||||
return text.strip()
|
return text.strip()
|
||||||
|
|
||||||
|
|
||||||
def extract_mp4_from_html(html):
|
def extract_mp4_from_html(html: str) -> list[str]:
|
||||||
candidates = re.findall(r'https?://[^\s"\'<>]+', html)
|
candidates = re.findall(r'https?://[^\s"\'<>]+', html)
|
||||||
return [u for u in candidates if _is_video_url(u)]
|
return [u for u in candidates if _is_video_url(u)]
|
||||||
|
|
||||||
|
|
||||||
def extract_title_from_html(html):
|
def extract_title_from_html(html: str) -> Optional[str]:
|
||||||
m = re.search(
|
m = re.search(r'<h1[^>]*class="entry-title"[^>]*>(.*?)</h1>', html, re.DOTALL)
|
||||||
r'<h1[^>]*class="entry-title"[^>]*>(.*?)</h1>', html, re.DOTALL)
|
|
||||||
if m:
|
if m:
|
||||||
title = re.sub(r'<[^>]+>', '', m.group(1)).strip()
|
title = re.sub(r"<[^>]+>", "", m.group(1)).strip()
|
||||||
return title
|
return title
|
||||||
m = re.search(r'<title>(.*?)(?:\s*[-–|].*)?</title>', html, re.DOTALL)
|
m = re.search(r"<title>(.*?)(?:\s*[-–|].*)?</title>", html, re.DOTALL)
|
||||||
if m:
|
if m:
|
||||||
return m.group(1).strip()
|
return m.group(1).strip()
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def load_video_map():
|
def save_video_map(video_map: dict[str, Any]) -> None:
|
||||||
if Path(VIDEO_MAP_FILE).exists():
|
fd, tmp_path = tempfile.mkstemp(
|
||||||
try:
|
dir=Path(VIDEO_MAP_FILE).resolve().parent, suffix=".tmp"
|
||||||
with open(VIDEO_MAP_FILE, encoding="utf-8") as f:
|
)
|
||||||
return json.load(f)
|
|
||||||
except (json.JSONDecodeError, OSError):
|
|
||||||
return {}
|
|
||||||
return {}
|
|
||||||
|
|
||||||
|
|
||||||
def save_video_map(video_map):
|
|
||||||
fd, tmp_path = tempfile.mkstemp(dir=Path(VIDEO_MAP_FILE).resolve().parent, suffix=".tmp")
|
|
||||||
try:
|
try:
|
||||||
with os.fdopen(fd, "w", encoding="utf-8") as f:
|
with os.fdopen(fd, "w", encoding="utf-8") as f:
|
||||||
json.dump(video_map, f, indent=2, ensure_ascii=False)
|
json.dump(video_map, f, indent=2, ensure_ascii=False)
|
||||||
@@ -250,19 +308,30 @@ def save_video_map(video_map):
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
def _expects_video(url):
|
|
||||||
return "/pinkcuffs-videos/" in url
|
|
||||||
|
|
||||||
|
|
||||||
MAX_RETRIES = 2
|
MAX_RETRIES = 2
|
||||||
|
|
||||||
|
|
||||||
async def worker(worker_id, queue, context, known,
|
async def worker(
|
||||||
total, retry_counts, video_map, map_lock, shutdown_event):
|
worker_id: int,
|
||||||
|
queue: asyncio.Queue[tuple[int, str]],
|
||||||
|
context: BrowserContext,
|
||||||
|
known: set[str],
|
||||||
|
total: int,
|
||||||
|
retry_counts: dict[int, int],
|
||||||
|
video_map: dict[str, Any],
|
||||||
|
map_lock: asyncio.Lock,
|
||||||
|
shutdown_event: asyncio.Event,
|
||||||
|
reauth_lock: asyncio.Lock,
|
||||||
|
reauth_done: list[bool],
|
||||||
|
cookie_domain: str,
|
||||||
|
) -> None:
|
||||||
page = await context.new_page()
|
page = await context.new_page()
|
||||||
video_hits = set()
|
video_hits = set()
|
||||||
|
|
||||||
page.on("response", lambda resp: video_hits.add(resp.url) if _is_video_url(resp.url) else None)
|
page.on(
|
||||||
|
"response",
|
||||||
|
lambda resp: video_hits.add(resp.url) if _is_video_url(resp.url) else None,
|
||||||
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
while not shutdown_event.is_set():
|
while not shutdown_event.is_set():
|
||||||
@@ -279,11 +348,11 @@ async def worker(worker_id, queue, context, known,
|
|||||||
await page.goto(url, wait_until="networkidle", timeout=60000)
|
await page.goto(url, wait_until="networkidle", timeout=60000)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"[W{worker_id}] Navigation error: {e}")
|
print(f"[W{worker_id}] Navigation error: {e}")
|
||||||
if _expects_video(url) and attempt < MAX_RETRIES:
|
if expects_video(url) and attempt < MAX_RETRIES:
|
||||||
retry_counts[idx] = attempt + 1
|
retry_counts[idx] = attempt + 1
|
||||||
queue.put_nowait((idx, url))
|
queue.put_nowait((idx, url))
|
||||||
print(f"[W{worker_id}] Re-queued for retry.")
|
print(f"[W{worker_id}] Re-queued for retry.")
|
||||||
elif not _expects_video(url):
|
elif not expects_video(url):
|
||||||
async with map_lock:
|
async with map_lock:
|
||||||
entry = video_map.get(url, {})
|
entry = video_map.get(url, {})
|
||||||
entry["scraped_at"] = int(time.time())
|
entry["scraped_at"] = int(time.time())
|
||||||
@@ -291,7 +360,48 @@ async def worker(worker_id, queue, context, known,
|
|||||||
save_video_map(video_map)
|
save_video_map(video_map)
|
||||||
else:
|
else:
|
||||||
print(
|
print(
|
||||||
f"[W{worker_id}] Still failing after {MAX_RETRIES} retries — will retry next run.")
|
f"[W{worker_id}] Still failing after {MAX_RETRIES} retries — will retry next run."
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
|
||||||
|
if "NoDirectAccessAllowed" in page.url:
|
||||||
|
recovered = False
|
||||||
|
async with reauth_lock:
|
||||||
|
if not reauth_done[0]:
|
||||||
|
username = os.environ.get("WP_USERNAME", "").strip()
|
||||||
|
password = os.environ.get("WP_PASSWORD", "").strip()
|
||||||
|
if username and password:
|
||||||
|
print(f"[W{worker_id}] Cookie expired — re-authenticating…")
|
||||||
|
try:
|
||||||
|
new_name, new_value = await asyncio.to_thread(
|
||||||
|
login_and_get_cookie, username, password
|
||||||
|
)
|
||||||
|
update_env(new_name, new_value)
|
||||||
|
await context.add_cookies([{
|
||||||
|
"name": new_name,
|
||||||
|
"value": new_value,
|
||||||
|
"domain": cookie_domain,
|
||||||
|
"path": "/",
|
||||||
|
"httpOnly": True,
|
||||||
|
"secure": True,
|
||||||
|
"sameSite": "None",
|
||||||
|
}])
|
||||||
|
reauth_done[0] = True
|
||||||
|
recovered = True
|
||||||
|
print(f"[W{worker_id}] Re-auth succeeded — re-queuing.")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[W{worker_id}] Re-auth failed: {e}")
|
||||||
|
shutdown_event.set()
|
||||||
|
else:
|
||||||
|
print(
|
||||||
|
f"[W{worker_id}] Cookie expired. "
|
||||||
|
"Set WP_USERNAME + WP_PASSWORD in .env for auto re-auth."
|
||||||
|
)
|
||||||
|
shutdown_event.set()
|
||||||
|
else:
|
||||||
|
recovered = True # another worker already re-authed
|
||||||
|
if recovered:
|
||||||
|
queue.put_nowait((idx, url))
|
||||||
continue
|
continue
|
||||||
|
|
||||||
await asyncio.sleep(1.5)
|
await asyncio.sleep(1.5)
|
||||||
@@ -301,9 +411,15 @@ async def worker(worker_id, queue, context, known,
|
|||||||
found = set(html_videos) | set(video_hits)
|
found = set(html_videos) | set(video_hits)
|
||||||
video_hits.clear()
|
video_hits.clear()
|
||||||
|
|
||||||
all_videos = [m for m in found if m not in (
|
all_videos = [
|
||||||
|
m
|
||||||
|
for m in found
|
||||||
|
if is_valid_url(m)
|
||||||
|
and m
|
||||||
|
not in (
|
||||||
f"{BASE_URL}/wp-content/plugins/easy-video-player/lib/blank.mp4",
|
f"{BASE_URL}/wp-content/plugins/easy-video-player/lib/blank.mp4",
|
||||||
)]
|
)
|
||||||
|
]
|
||||||
|
|
||||||
async with map_lock:
|
async with map_lock:
|
||||||
new_found = found - known
|
new_found = found - known
|
||||||
@@ -312,7 +428,8 @@ async def worker(worker_id, queue, context, known,
|
|||||||
known.update(new_found)
|
known.update(new_found)
|
||||||
elif all_videos:
|
elif all_videos:
|
||||||
print(
|
print(
|
||||||
f"[W{worker_id}] {len(all_videos)} video(s) already known — skipping write.")
|
f"[W{worker_id}] {len(all_videos)} video(s) already known — skipping write."
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
print(f"[W{worker_id}] No video found on page.")
|
print(f"[W{worker_id}] No video found on page.")
|
||||||
|
|
||||||
@@ -322,7 +439,7 @@ async def worker(worker_id, queue, context, known,
|
|||||||
existing_videos = set(entry.get("videos", []))
|
existing_videos = set(entry.get("videos", []))
|
||||||
existing_videos.update(all_videos)
|
existing_videos.update(all_videos)
|
||||||
entry["videos"] = sorted(existing_videos)
|
entry["videos"] = sorted(existing_videos)
|
||||||
mark_done = bool(all_videos) or not _expects_video(url)
|
mark_done = bool(all_videos) or not expects_video(url)
|
||||||
if mark_done:
|
if mark_done:
|
||||||
entry["scraped_at"] = int(time.time())
|
entry["scraped_at"] = int(time.time())
|
||||||
video_map[url] = entry
|
video_map[url] = entry
|
||||||
@@ -333,19 +450,21 @@ async def worker(worker_id, queue, context, known,
|
|||||||
retry_counts[idx] = attempt + 1
|
retry_counts[idx] = attempt + 1
|
||||||
queue.put_nowait((idx, url))
|
queue.put_nowait((idx, url))
|
||||||
print(
|
print(
|
||||||
f"[W{worker_id}] Re-queued for retry ({attempt + 1}/{MAX_RETRIES}).")
|
f"[W{worker_id}] Re-queued for retry ({attempt + 1}/{MAX_RETRIES})."
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
print(
|
print(
|
||||||
f"[W{worker_id}] No video after {MAX_RETRIES} retries — will retry next run.")
|
f"[W{worker_id}] No video after {MAX_RETRIES} retries — will retry next run."
|
||||||
|
)
|
||||||
finally:
|
finally:
|
||||||
await page.close()
|
await page.close()
|
||||||
|
|
||||||
|
|
||||||
async def run():
|
async def run() -> None:
|
||||||
shutdown_event = asyncio.Event()
|
shutdown_event = asyncio.Event()
|
||||||
loop = asyncio.get_running_loop()
|
loop = asyncio.get_running_loop()
|
||||||
|
|
||||||
def _handle_shutdown(signum, _frame):
|
def _handle_shutdown(signum: int, _frame: object) -> None:
|
||||||
print(f"\n[!] Signal {signum} received — finishing active pages then exiting…")
|
print(f"\n[!] Signal {signum} received — finishing active pages then exiting…")
|
||||||
loop.call_soon_threadsafe(shutdown_event.set)
|
loop.call_soon_threadsafe(shutdown_event.set)
|
||||||
|
|
||||||
@@ -362,10 +481,13 @@ async def run():
|
|||||||
urls = load_post_urls(req_headers)
|
urls = load_post_urls(req_headers)
|
||||||
|
|
||||||
video_map = load_video_map()
|
video_map = load_video_map()
|
||||||
if any(u not in video_map
|
if any(
|
||||||
|
u not in video_map
|
||||||
or not video_map[u].get("title")
|
or not video_map[u].get("title")
|
||||||
or not video_map[u].get("description")
|
or not video_map[u].get("description")
|
||||||
for u in urls if _expects_video(u)):
|
for u in urls
|
||||||
|
if expects_video(u)
|
||||||
|
):
|
||||||
fetch_metadata_from_api(video_map, urls, req_headers)
|
fetch_metadata_from_api(video_map, urls, req_headers)
|
||||||
|
|
||||||
known = {u for entry in video_map.values() for u in entry.get("videos", [])}
|
known = {u for entry in video_map.values() for u in entry.get("videos", [])}
|
||||||
@@ -377,7 +499,7 @@ async def run():
|
|||||||
entry = video_map.get(u, {})
|
entry = video_map.get(u, {})
|
||||||
if not entry.get("scraped_at"):
|
if not entry.get("scraped_at"):
|
||||||
pending.append((i, u))
|
pending.append((i, u))
|
||||||
elif _expects_video(u) and not entry.get("videos"):
|
elif expects_video(u) and not entry.get("videos"):
|
||||||
pending.append((i, u))
|
pending.append((i, u))
|
||||||
needs_map += 1
|
needs_map += 1
|
||||||
|
|
||||||
@@ -388,26 +510,31 @@ async def run():
|
|||||||
if done_count:
|
if done_count:
|
||||||
remaining_new = len(pending) - needs_map
|
remaining_new = len(pending) - needs_map
|
||||||
print(
|
print(
|
||||||
f"[↻] Resuming: {done_count} done, {remaining_new} new + {needs_map} needing map data.")
|
f"[↻] Resuming: {done_count} done, {remaining_new} new + {needs_map} needing map data."
|
||||||
|
)
|
||||||
if not pending:
|
if not pending:
|
||||||
print("[✓] All URLs already processed and mapped.")
|
print("[✓] All URLs already processed and mapped.")
|
||||||
return
|
return
|
||||||
|
|
||||||
print(
|
print(
|
||||||
f"[⚡] Running with {min(MAX_WORKERS, len(pending))} concurrent workers.\n")
|
f"[⚡] Running with {min(MAX_WORKERS, len(pending))} concurrent workers.\n"
|
||||||
|
)
|
||||||
|
|
||||||
queue = asyncio.Queue()
|
queue: asyncio.Queue[tuple[int, str]] = asyncio.Queue()
|
||||||
for item in pending:
|
for item in pending:
|
||||||
queue.put_nowait(item)
|
queue.put_nowait(item)
|
||||||
|
|
||||||
map_lock = asyncio.Lock()
|
map_lock = asyncio.Lock()
|
||||||
retry_counts = {}
|
reauth_lock = asyncio.Lock()
|
||||||
|
reauth_done: list[bool] = [False]
|
||||||
|
retry_counts: dict[int, int] = {}
|
||||||
|
|
||||||
async with async_playwright() as p:
|
async with async_playwright() as p:
|
||||||
browser = await p.firefox.launch(headless=True)
|
browser = await p.firefox.launch(headless=True)
|
||||||
context = await browser.new_context()
|
context = await browser.new_context()
|
||||||
|
|
||||||
_cookie_domain = urlparse(BASE_URL).netloc
|
_parsed = urlparse(BASE_URL)
|
||||||
|
_cookie_domain = _parsed.hostname or _parsed.netloc
|
||||||
site_cookies = [
|
site_cookies = [
|
||||||
{
|
{
|
||||||
"name": cookie_name,
|
"name": cookie_name,
|
||||||
@@ -416,23 +543,35 @@ async def run():
|
|||||||
"path": "/",
|
"path": "/",
|
||||||
"httpOnly": True,
|
"httpOnly": True,
|
||||||
"secure": True,
|
"secure": True,
|
||||||
"sameSite": "None"
|
"sameSite": "None",
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"name": "eav-age-verified",
|
"name": "eav-age-verified",
|
||||||
"value": "1",
|
"value": "1",
|
||||||
"domain": _cookie_domain,
|
"domain": _cookie_domain,
|
||||||
"path": "/"
|
"path": "/",
|
||||||
}
|
},
|
||||||
]
|
]
|
||||||
|
|
||||||
await context.add_cookies(site_cookies)
|
await context.add_cookies(site_cookies) # type: ignore[arg-type]
|
||||||
|
|
||||||
num_workers = min(MAX_WORKERS, len(pending))
|
num_workers = min(MAX_WORKERS, len(pending))
|
||||||
workers = [
|
workers = [
|
||||||
asyncio.create_task(
|
asyncio.create_task(
|
||||||
worker(i, queue, context, known,
|
worker(
|
||||||
total, retry_counts, video_map, map_lock, shutdown_event)
|
i,
|
||||||
|
queue,
|
||||||
|
context,
|
||||||
|
known,
|
||||||
|
total,
|
||||||
|
retry_counts,
|
||||||
|
video_map,
|
||||||
|
map_lock,
|
||||||
|
shutdown_event,
|
||||||
|
reauth_lock,
|
||||||
|
reauth_done,
|
||||||
|
_cookie_domain,
|
||||||
|
)
|
||||||
)
|
)
|
||||||
for i in range(num_workers)
|
for i in range(num_workers)
|
||||||
]
|
]
|
||||||
@@ -442,7 +581,8 @@ async def run():
|
|||||||
|
|
||||||
mapped = sum(1 for v in video_map.values() if v.get("videos"))
|
mapped = sum(1 for v in video_map.values() if v.get("videos"))
|
||||||
print(
|
print(
|
||||||
f"\n[+] Video map: {mapped} posts with videos, {len(video_map)} total entries.")
|
f"\n[+] Video map: {mapped} posts with videos, {len(video_map)} total entries."
|
||||||
|
)
|
||||||
|
|
||||||
if not shutdown_event.is_set():
|
if not shutdown_event.is_set():
|
||||||
print(f"[✓] Completed. Full map in {VIDEO_MAP_FILE}")
|
print(f"[✓] Completed. Full map in {VIDEO_MAP_FILE}")
|
||||||
@@ -454,7 +594,7 @@ async def run():
|
|||||||
signal.signal(signal.SIGTERM, signal.SIG_DFL)
|
signal.signal(signal.SIGTERM, signal.SIG_DFL)
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main() -> None:
|
||||||
try:
|
try:
|
||||||
asyncio.run(run())
|
asyncio.run(run())
|
||||||
except KeyboardInterrupt:
|
except KeyboardInterrupt:
|
||||||
|
|||||||
@@ -1,4 +1,3 @@
|
|||||||
playwright==1.58.0
|
playwright==1.58.0
|
||||||
python-dotenv==1.2.1
|
python-dotenv==1.2.1
|
||||||
Requests==2.32.5
|
Requests==2.32.5
|
||||||
rookiepy==0.5.6
|
|
||||||
|
|||||||
@@ -4,16 +4,35 @@ Importable function:
|
|||||||
summarize_sizes(sizes) - return dict with total, smallest, largest, average, failed
|
summarize_sizes(sizes) - return dict with total, smallest, largest, average, failed
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
from typing import Optional, TypedDict
|
||||||
|
|
||||||
from check_clashes import fmt_size, fetch_sizes, load_video_map, VIDEO_MAP_FILE
|
from check_clashes import fmt_size, fetch_sizes, load_video_map, VIDEO_MAP_FILE
|
||||||
|
|
||||||
|
|
||||||
def summarize_sizes(sizes):
|
class SizeStats(TypedDict):
|
||||||
|
sized: int
|
||||||
|
total: int
|
||||||
|
total_bytes: int
|
||||||
|
smallest: int
|
||||||
|
largest: int
|
||||||
|
average: int
|
||||||
|
failed: list[str]
|
||||||
|
|
||||||
|
|
||||||
|
def summarize_sizes(sizes: dict[str, Optional[int]]) -> SizeStats:
|
||||||
"""Given {url: size_or_None}, return a stats dict."""
|
"""Given {url: size_or_None}, return a stats dict."""
|
||||||
known = {u: s for u, s in sizes.items() if s is not None}
|
known = {u: s for u, s in sizes.items() if s is not None}
|
||||||
failed = [u for u, s in sizes.items() if s is None]
|
failed = [u for u, s in sizes.items() if s is None]
|
||||||
if not known:
|
if not known:
|
||||||
return {"sized": 0, "total": len(sizes), "total_bytes": 0,
|
return {
|
||||||
"smallest": 0, "largest": 0, "average": 0, "failed": failed}
|
"sized": 0,
|
||||||
|
"total": len(sizes),
|
||||||
|
"total_bytes": 0,
|
||||||
|
"smallest": 0,
|
||||||
|
"largest": 0,
|
||||||
|
"average": 0,
|
||||||
|
"failed": failed,
|
||||||
|
}
|
||||||
total_bytes = sum(known.values())
|
total_bytes = sum(known.values())
|
||||||
return {
|
return {
|
||||||
"sized": len(known),
|
"sized": len(known),
|
||||||
@@ -28,14 +47,20 @@ def summarize_sizes(sizes):
|
|||||||
|
|
||||||
# --------------- CLI ---------------
|
# --------------- CLI ---------------
|
||||||
|
|
||||||
def _progress(done, total):
|
|
||||||
|
def _progress(done: int, total: int) -> None:
|
||||||
if done % 200 == 0 or done == total:
|
if done % 200 == 0 or done == total:
|
||||||
print(f" {done}/{total}")
|
print(f" {done}/{total}")
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main() -> None:
|
||||||
vm = load_video_map()
|
vm = load_video_map()
|
||||||
urls = [u for entry in vm.values() for u in entry.get("videos", []) if u.startswith("http")]
|
urls: list[str] = [
|
||||||
|
u
|
||||||
|
for entry in vm.values()
|
||||||
|
for u in entry.get("videos", [])
|
||||||
|
if u.startswith("http")
|
||||||
|
]
|
||||||
|
|
||||||
print(f"[+] {len(urls)} URLs in {VIDEO_MAP_FILE}")
|
print(f"[+] {len(urls)} URLs in {VIDEO_MAP_FILE}")
|
||||||
print("[+] Fetching file sizes (20 threads)…\n")
|
print("[+] Fetching file sizes (20 threads)…\n")
|
||||||
|
|||||||
289
upload.py
289
upload.py
@@ -26,13 +26,13 @@ from pathlib import Path
|
|||||||
import re
|
import re
|
||||||
import sys
|
import sys
|
||||||
import time
|
import time
|
||||||
|
from typing import Any, Callable, cast
|
||||||
|
|
||||||
import requests
|
import requests
|
||||||
from dotenv import load_dotenv
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
from check_clashes import fmt_size, url_to_filename, VIDEO_EXTS
|
from check_clashes import fmt_size, url_to_filename, VIDEO_EXTS, load_video_map
|
||||||
from download import (
|
from download import (
|
||||||
load_video_map,
|
|
||||||
collect_urls,
|
collect_urls,
|
||||||
get_paths_for_mode,
|
get_paths_for_mode,
|
||||||
read_mode,
|
read_mode,
|
||||||
@@ -52,21 +52,21 @@ PT_NAME_MAX = 120
|
|||||||
|
|
||||||
# ── Text helpers ─────────────────────────────────────────────────────
|
# ── Text helpers ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
def clean_description(raw):
|
|
||||||
|
def clean_description(raw: str) -> str:
|
||||||
"""Strip WordPress shortcodes and HTML from a description."""
|
"""Strip WordPress shortcodes and HTML from a description."""
|
||||||
if not raw:
|
if not raw:
|
||||||
return ""
|
return ""
|
||||||
text = re.sub(r'\[/?[^\]]+\]', '', raw)
|
text = re.sub(r"\[/?[^\]]+\]", "", raw)
|
||||||
text = re.sub(r'<[^>]+>', '', text)
|
text = re.sub(r"<[^>]+>", "", text)
|
||||||
text = html.unescape(text)
|
text = html.unescape(text)
|
||||||
text = re.sub(r'\n{3,}', '\n\n', text).strip()
|
text = re.sub(r"\n{3,}", "\n\n", text).strip()
|
||||||
return text[:10000]
|
return text[:10000]
|
||||||
|
|
||||||
|
|
||||||
def make_pt_name(title, fallback_filename):
|
def make_pt_name(title: str, fallback_filename: str) -> str:
|
||||||
"""Build a PeerTube-safe video name (3-120 chars)."""
|
"""Build a PeerTube-safe video name (3-120 chars)."""
|
||||||
name = html.unescape(title).strip(
|
name = html.unescape(title).strip() if title else Path(fallback_filename).stem
|
||||||
) if title else Path(fallback_filename).stem
|
|
||||||
if len(name) > PT_NAME_MAX:
|
if len(name) > PT_NAME_MAX:
|
||||||
name = name[: PT_NAME_MAX - 1].rstrip() + "\u2026"
|
name = name[: PT_NAME_MAX - 1].rstrip() + "\u2026"
|
||||||
while len(name) < 3:
|
while len(name) < 3:
|
||||||
@@ -76,7 +76,8 @@ def make_pt_name(title, fallback_filename):
|
|||||||
|
|
||||||
# ── PeerTube API ─────────────────────────────────────────────────────
|
# ── PeerTube API ─────────────────────────────────────────────────────
|
||||||
|
|
||||||
def get_oauth_token(base, username, password):
|
|
||||||
|
def get_oauth_token(base: str, username: str, password: str) -> str:
|
||||||
r = requests.get(f"{base}/api/v1/oauth-clients/local", timeout=15)
|
r = requests.get(f"{base}/api/v1/oauth-clients/local", timeout=15)
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
client = r.json()
|
client = r.json()
|
||||||
@@ -93,26 +94,36 @@ def get_oauth_token(base, username, password):
|
|||||||
timeout=15,
|
timeout=15,
|
||||||
)
|
)
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
return r.json()["access_token"]
|
data_any: Any = r.json()
|
||||||
|
data = cast(dict[str, Any], data_any)
|
||||||
|
token = data.get("access_token")
|
||||||
|
if not isinstance(token, str) or not token:
|
||||||
|
raise RuntimeError("PeerTube token response missing access_token")
|
||||||
|
return token
|
||||||
|
|
||||||
|
|
||||||
def api_headers(token):
|
def api_headers(token: str) -> dict[str, str]:
|
||||||
return {"Authorization": f"Bearer {token}"}
|
return {"Authorization": f"Bearer {token}"}
|
||||||
|
|
||||||
|
|
||||||
def get_channel_id(base, token, channel_name):
|
def get_channel_id(base: str, token: str, channel_name: str) -> int:
|
||||||
r = requests.get(
|
r = requests.get(
|
||||||
f"{base}/api/v1/video-channels/{channel_name}",
|
f"{base}/api/v1/video-channels/{channel_name}",
|
||||||
headers=api_headers(token),
|
headers=api_headers(token),
|
||||||
timeout=15,
|
timeout=15,
|
||||||
)
|
)
|
||||||
r.raise_for_status()
|
r.raise_for_status()
|
||||||
return r.json()["id"]
|
data_any: Any = r.json()
|
||||||
|
data = cast(dict[str, Any], data_any)
|
||||||
|
cid = data.get("id")
|
||||||
|
if not isinstance(cid, int):
|
||||||
|
raise RuntimeError("PeerTube channel response missing id")
|
||||||
|
return cid
|
||||||
|
|
||||||
|
|
||||||
def get_channel_video_names(base, token, channel_name):
|
def get_channel_video_names(base: str, token: str, channel_name: str) -> Counter[str]:
|
||||||
"""Paginate through the channel and return a Counter of video names."""
|
"""Paginate through the channel and return a Counter of video names."""
|
||||||
counts = Counter()
|
counts: Counter[str] = Counter()
|
||||||
start = 0
|
start = 0
|
||||||
while True:
|
while True:
|
||||||
r = requests.get(
|
r = requests.get(
|
||||||
@@ -135,8 +146,16 @@ CHUNK_SIZE = 10 * 1024 * 1024 # 10 MB
|
|||||||
MAX_RETRIES = 5
|
MAX_RETRIES = 5
|
||||||
|
|
||||||
|
|
||||||
def _init_resumable(base, token, channel_id, filepath, filename, name,
|
def _init_resumable(
|
||||||
description="", nsfw=False):
|
base: str,
|
||||||
|
token: str,
|
||||||
|
channel_id: int,
|
||||||
|
filepath: Path,
|
||||||
|
filename: str,
|
||||||
|
name: str,
|
||||||
|
description: str = "",
|
||||||
|
nsfw: bool = False,
|
||||||
|
) -> tuple[str, int]:
|
||||||
"""POST to create a resumable upload session. Returns upload URL."""
|
"""POST to create a resumable upload session. Returns upload URL."""
|
||||||
file_size = Path(filepath).stat().st_size
|
file_size = Path(filepath).stat().st_size
|
||||||
metadata = {
|
metadata = {
|
||||||
@@ -171,7 +190,7 @@ def _init_resumable(base, token, channel_id, filepath, filename, name,
|
|||||||
return location, file_size
|
return location, file_size
|
||||||
|
|
||||||
|
|
||||||
def _query_offset(upload_url, token, file_size):
|
def _query_offset(upload_url: str, token: str, file_size: int) -> int:
|
||||||
"""Ask the server how many bytes it has received so far."""
|
"""Ask the server how many bytes it has received so far."""
|
||||||
r = requests.put(
|
r = requests.put(
|
||||||
upload_url,
|
upload_url,
|
||||||
@@ -193,8 +212,15 @@ def _query_offset(upload_url, token, file_size):
|
|||||||
return 0
|
return 0
|
||||||
|
|
||||||
|
|
||||||
def upload_video(base, token, channel_id, filepath, name,
|
def upload_video(
|
||||||
description="", nsfw=False):
|
base: str,
|
||||||
|
token: str,
|
||||||
|
channel_id: int,
|
||||||
|
filepath: Path,
|
||||||
|
name: str,
|
||||||
|
description: str = "",
|
||||||
|
nsfw: bool = False,
|
||||||
|
) -> tuple[bool, str | None]:
|
||||||
"""Resumable chunked upload. Returns (ok, uuid)."""
|
"""Resumable chunked upload. Returns (ok, uuid)."""
|
||||||
filepath = Path(filepath)
|
filepath = Path(filepath)
|
||||||
filename = filepath.name
|
filename = filepath.name
|
||||||
@@ -202,8 +228,14 @@ def upload_video(base, token, channel_id, filepath, name,
|
|||||||
|
|
||||||
try:
|
try:
|
||||||
upload_url, _ = _init_resumable(
|
upload_url, _ = _init_resumable(
|
||||||
base, token, channel_id, filepath, filename,
|
base,
|
||||||
name, description, nsfw,
|
token,
|
||||||
|
channel_id,
|
||||||
|
filepath,
|
||||||
|
filename,
|
||||||
|
name,
|
||||||
|
description,
|
||||||
|
nsfw,
|
||||||
)
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f" Init failed: {e}")
|
print(f" Init failed: {e}")
|
||||||
@@ -221,8 +253,11 @@ def upload_video(base, token, channel_id, filepath, name,
|
|||||||
chunk = f.read(chunk_len)
|
chunk = f.read(chunk_len)
|
||||||
|
|
||||||
pct = int(100 * (end + 1) / file_size)
|
pct = int(100 * (end + 1) / file_size)
|
||||||
print(f" {fmt_size(offset)}/{fmt_size(file_size)} ({pct}%)",
|
print(
|
||||||
end="\r", flush=True)
|
f" {fmt_size(offset)}/{fmt_size(file_size)} ({pct}%)",
|
||||||
|
end="\r",
|
||||||
|
flush=True,
|
||||||
|
)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
r = requests.put(
|
r = requests.put(
|
||||||
@@ -239,12 +274,13 @@ def upload_video(base, token, channel_id, filepath, name,
|
|||||||
except (requests.ConnectionError, requests.Timeout) as e:
|
except (requests.ConnectionError, requests.Timeout) as e:
|
||||||
retries += 1
|
retries += 1
|
||||||
if retries > MAX_RETRIES:
|
if retries > MAX_RETRIES:
|
||||||
print(
|
print(f"\n Upload failed after {MAX_RETRIES} retries: {e}")
|
||||||
f"\n Upload failed after {MAX_RETRIES} retries: {e}")
|
|
||||||
return False, None
|
return False, None
|
||||||
wait = min(2 ** retries, 60)
|
wait = min(2**retries, 60)
|
||||||
print(f"\n Connection error, retry {retries}/{MAX_RETRIES} "
|
print(
|
||||||
f"in {wait}s ...")
|
f"\n Connection error, retry {retries}/{MAX_RETRIES} "
|
||||||
|
f"in {wait}s ..."
|
||||||
|
)
|
||||||
time.sleep(wait)
|
time.sleep(wait)
|
||||||
try:
|
try:
|
||||||
offset = _query_offset(upload_url, token, file_size)
|
offset = _query_offset(upload_url, token, file_size)
|
||||||
@@ -261,8 +297,7 @@ def upload_video(base, token, channel_id, filepath, name,
|
|||||||
retries = 0
|
retries = 0
|
||||||
|
|
||||||
elif r.status_code == 200:
|
elif r.status_code == 200:
|
||||||
print(
|
print(f" {fmt_size(file_size)}/{fmt_size(file_size)} (100%)")
|
||||||
f" {fmt_size(file_size)}/{fmt_size(file_size)} (100%)")
|
|
||||||
uuid = r.json().get("video", {}).get("uuid")
|
uuid = r.json().get("video", {}).get("uuid")
|
||||||
return True, uuid
|
return True, uuid
|
||||||
|
|
||||||
@@ -270,11 +305,9 @@ def upload_video(base, token, channel_id, filepath, name,
|
|||||||
retry_after = int(r.headers.get("Retry-After", 10))
|
retry_after = int(r.headers.get("Retry-After", 10))
|
||||||
retries += 1
|
retries += 1
|
||||||
if retries > MAX_RETRIES:
|
if retries > MAX_RETRIES:
|
||||||
print(
|
print(f"\n Upload failed: server returned {r.status_code}")
|
||||||
f"\n Upload failed: server returned {r.status_code}")
|
|
||||||
return False, None
|
return False, None
|
||||||
print(
|
print(f"\n Server {r.status_code}, retry in {retry_after}s ...")
|
||||||
f"\n Server {r.status_code}, retry in {retry_after}s ...")
|
|
||||||
time.sleep(retry_after)
|
time.sleep(retry_after)
|
||||||
try:
|
try:
|
||||||
offset = _query_offset(upload_url, token, file_size)
|
offset = _query_offset(upload_url, token, file_size)
|
||||||
@@ -301,7 +334,7 @@ _STATE = {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def get_video_state(base, token, uuid):
|
def get_video_state(base: str, token: str, uuid: str) -> tuple[int, str]:
|
||||||
r = requests.get(
|
r = requests.get(
|
||||||
f"{base}/api/v1/videos/{uuid}",
|
f"{base}/api/v1/videos/{uuid}",
|
||||||
headers=api_headers(token),
|
headers=api_headers(token),
|
||||||
@@ -312,7 +345,7 @@ def get_video_state(base, token, uuid):
|
|||||||
return state["id"], state.get("label", "")
|
return state["id"], state.get("label", "")
|
||||||
|
|
||||||
|
|
||||||
def wait_for_published(base, token, uuid, poll_interval):
|
def wait_for_published(base: str, token: str, uuid: str, poll_interval: int) -> int:
|
||||||
"""Block until the video reaches state 1 (Published) or a failure state."""
|
"""Block until the video reaches state 1 (Published) or a failure state."""
|
||||||
started = time.monotonic()
|
started = time.monotonic()
|
||||||
while True:
|
while True:
|
||||||
@@ -329,8 +362,10 @@ def wait_for_published(base, token, uuid, poll_interval):
|
|||||||
try:
|
try:
|
||||||
sid, label = get_video_state(base, token, uuid)
|
sid, label = get_video_state(base, token, uuid)
|
||||||
except requests.exceptions.RequestException as e:
|
except requests.exceptions.RequestException as e:
|
||||||
print(f" -> Poll error ({e.__class__.__name__}) "
|
print(
|
||||||
f"after {elapsed_str}, retrying in {poll_interval}s …")
|
f" -> Poll error ({e.__class__.__name__}) "
|
||||||
|
f"after {elapsed_str}, retrying in {poll_interval}s …"
|
||||||
|
)
|
||||||
time.sleep(poll_interval)
|
time.sleep(poll_interval)
|
||||||
continue
|
continue
|
||||||
|
|
||||||
@@ -343,13 +378,16 @@ def wait_for_published(base, token, uuid, poll_interval):
|
|||||||
print(f" -> FAILED: {display}")
|
print(f" -> FAILED: {display}")
|
||||||
return sid
|
return sid
|
||||||
|
|
||||||
print(f" -> {display} … {elapsed_str} elapsed (next check in {poll_interval}s)")
|
print(
|
||||||
|
f" -> {display} … {elapsed_str} elapsed (next check in {poll_interval}s)"
|
||||||
|
)
|
||||||
time.sleep(poll_interval)
|
time.sleep(poll_interval)
|
||||||
|
|
||||||
|
|
||||||
# ── State tracker ────────────────────────────────────────────────────
|
# ── State tracker ────────────────────────────────────────────────────
|
||||||
|
|
||||||
def load_uploaded(input_dir):
|
|
||||||
|
def load_uploaded(input_dir: str) -> set[Path]:
|
||||||
path = Path(input_dir) / UPLOADED_FILE
|
path = Path(input_dir) / UPLOADED_FILE
|
||||||
if not path.exists():
|
if not path.exists():
|
||||||
return set()
|
return set()
|
||||||
@@ -357,36 +395,58 @@ def load_uploaded(input_dir):
|
|||||||
return {Path(line.strip()) for line in f if line.strip()}
|
return {Path(line.strip()) for line in f if line.strip()}
|
||||||
|
|
||||||
|
|
||||||
def mark_uploaded(input_dir, rel_path):
|
def mark_uploaded(input_dir: str, rel_path: Path) -> None:
|
||||||
with open(Path(input_dir) / UPLOADED_FILE, "a") as f:
|
with open(Path(input_dir) / UPLOADED_FILE, "a") as f:
|
||||||
f.write(f"{rel_path}\n")
|
f.write(f"{rel_path}\n")
|
||||||
|
|
||||||
|
|
||||||
# ── File / metadata helpers ─────────────────────────────────────────
|
# ── File / metadata helpers ─────────────────────────────────────────
|
||||||
|
|
||||||
def build_path_to_meta(video_map, input_dir):
|
|
||||||
"""Map each expected download path (relative) to {title, description}."""
|
def build_path_to_meta(
|
||||||
|
video_map: dict[str, Any],
|
||||||
|
input_dir: str,
|
||||||
|
) -> dict[Path, dict[str, str]]:
|
||||||
|
"""Map each expected download path (relative) to {title, description, original_filename}."""
|
||||||
urls = collect_urls(video_map)
|
urls = collect_urls(video_map)
|
||||||
mode = read_mode(input_dir) or MODE_ORIGINAL
|
mode = read_mode(input_dir) or MODE_ORIGINAL
|
||||||
paths = get_paths_for_mode(mode, urls, video_map, input_dir)
|
|
||||||
|
|
||||||
url_meta = {}
|
get_paths_typed = cast(
|
||||||
for entry in video_map.values():
|
Callable[[str, list[str], dict[str, Any], str], dict[str, Path]],
|
||||||
t = entry.get("title", "")
|
get_paths_for_mode,
|
||||||
d = entry.get("description", "")
|
)
|
||||||
for video_url in entry.get("videos", []):
|
paths = get_paths_typed(mode, urls, video_map, input_dir)
|
||||||
if video_url not in url_meta:
|
|
||||||
url_meta[video_url] = {"title": t, "description": d}
|
|
||||||
|
|
||||||
result = {}
|
url_meta: dict[str, dict[str, str]] = {}
|
||||||
|
for entry_any in video_map.values():
|
||||||
|
entry = cast(dict[str, Any], entry_any)
|
||||||
|
|
||||||
|
t = entry.get("title")
|
||||||
|
d = entry.get("description")
|
||||||
|
title = t if isinstance(t, str) else ""
|
||||||
|
desc = d if isinstance(d, str) else ""
|
||||||
|
|
||||||
|
videos_any = entry.get("videos", [])
|
||||||
|
if isinstance(videos_any, list):
|
||||||
|
for video_url_any in videos_any:
|
||||||
|
if not isinstance(video_url_any, str):
|
||||||
|
continue
|
||||||
|
if video_url_any not in url_meta:
|
||||||
|
url_meta[video_url_any] = {"title": title, "description": desc}
|
||||||
|
|
||||||
|
result: dict[Path, dict[str, str]] = {}
|
||||||
for url, abs_path in paths.items():
|
for url, abs_path in paths.items():
|
||||||
rel = Path(abs_path).relative_to(input_dir)
|
rel = abs_path.relative_to(input_dir)
|
||||||
meta = url_meta.get(url, {"title": "", "description": ""})
|
meta = url_meta.get(url, {"title": "", "description": ""})
|
||||||
result[rel] = {**meta, "original_filename": url_to_filename(url)}
|
result[rel] = {
|
||||||
|
"title": meta.get("title", ""),
|
||||||
|
"description": meta.get("description", ""),
|
||||||
|
"original_filename": url_to_filename(url),
|
||||||
|
}
|
||||||
return result
|
return result
|
||||||
|
|
||||||
|
|
||||||
def find_videos(input_dir):
|
def find_videos(input_dir: str) -> set[Path]:
|
||||||
"""Walk input_dir and return a set of relative paths for all video files."""
|
"""Walk input_dir and return a set of relative paths for all video files."""
|
||||||
found = set()
|
found = set()
|
||||||
for root, dirs, files in os.walk(input_dir):
|
for root, dirs, files in os.walk(input_dir):
|
||||||
@@ -399,7 +459,12 @@ def find_videos(input_dir):
|
|||||||
|
|
||||||
# ── Channel match helpers ─────────────────────────────────────────────
|
# ── Channel match helpers ─────────────────────────────────────────────
|
||||||
|
|
||||||
def _channel_match(rel, path_meta, existing):
|
|
||||||
|
def _channel_match(
|
||||||
|
rel: Path,
|
||||||
|
path_meta: dict[Path, dict[str, str]],
|
||||||
|
existing: set[str],
|
||||||
|
) -> tuple[bool, str]:
|
||||||
"""Return (matched, name) for a local file against the channel name set.
|
"""Return (matched, name) for a local file against the channel name set.
|
||||||
|
|
||||||
Checks both the title-derived name and the original-filename-derived name
|
Checks both the title-derived name and the original-filename-derived name
|
||||||
@@ -409,38 +474,62 @@ def _channel_match(rel, path_meta, existing):
|
|||||||
"""
|
"""
|
||||||
meta = path_meta.get(rel, {})
|
meta = path_meta.get(rel, {})
|
||||||
name = make_pt_name(meta.get("title", ""), rel.name)
|
name = make_pt_name(meta.get("title", ""), rel.name)
|
||||||
|
|
||||||
orig_fn = meta.get("original_filename", "")
|
orig_fn = meta.get("original_filename", "")
|
||||||
raw_name = make_pt_name("", orig_fn) if orig_fn else None
|
raw_name: str | None = make_pt_name("", orig_fn) if orig_fn else None
|
||||||
matched = name in existing or (raw_name and raw_name != name and raw_name in existing)
|
|
||||||
|
matched = name in existing
|
||||||
|
if not matched and raw_name is not None and raw_name != name:
|
||||||
|
matched = raw_name in existing
|
||||||
|
|
||||||
return matched, name
|
return matched, name
|
||||||
|
|
||||||
|
|
||||||
# ── CLI ──────────────────────────────────────────────────────────────
|
# ── CLI ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
def main():
|
|
||||||
|
def main() -> None:
|
||||||
ap = argparse.ArgumentParser(
|
ap = argparse.ArgumentParser(
|
||||||
description="Upload videos to PeerTube with transcoding-aware batching",
|
description="Upload videos to PeerTube with transcoding-aware batching",
|
||||||
)
|
)
|
||||||
ap.add_argument("--input", "-i", default=DEFAULT_OUTPUT,
|
ap.add_argument(
|
||||||
help=f"Directory with downloaded videos (default: {DEFAULT_OUTPUT})")
|
"--input",
|
||||||
ap.add_argument("--url",
|
"-i",
|
||||||
help="PeerTube instance URL (or set PEERTUBE_URL env var)")
|
default=DEFAULT_OUTPUT,
|
||||||
ap.add_argument("--username", "-U",
|
help=f"Directory with downloaded videos (default: {DEFAULT_OUTPUT})",
|
||||||
help="PeerTube username (or set PEERTUBE_USER env var)")
|
)
|
||||||
ap.add_argument("--password", "-p",
|
ap.add_argument("--url", help="PeerTube instance URL (or set PEERTUBE_URL env var)")
|
||||||
help="PeerTube password (or set PEERTUBE_PASSWORD env var)")
|
ap.add_argument(
|
||||||
ap.add_argument("--channel", "-C",
|
"--username", "-U", help="PeerTube username (or set PEERTUBE_USER env var)"
|
||||||
help="Channel to upload to (or set PEERTUBE_CHANNEL env var)")
|
)
|
||||||
ap.add_argument("--batch-size", "-b", type=int, default=DEFAULT_BATCH_SIZE,
|
ap.add_argument(
|
||||||
help="Videos to upload before waiting for transcoding (default: 1)")
|
"--password", "-p", help="PeerTube password (or set PEERTUBE_PASSWORD env var)"
|
||||||
ap.add_argument("--poll-interval", type=int, default=DEFAULT_POLL,
|
)
|
||||||
help=f"Seconds between state polls (default: {DEFAULT_POLL})")
|
ap.add_argument(
|
||||||
ap.add_argument("--skip-wait", action="store_true",
|
"--channel", "-C", help="Channel to upload to (or set PEERTUBE_CHANNEL env var)"
|
||||||
help="Upload everything without waiting for transcoding")
|
)
|
||||||
ap.add_argument("--nsfw", action="store_true",
|
ap.add_argument(
|
||||||
help="Mark videos as NSFW")
|
"--batch-size",
|
||||||
ap.add_argument("--dry-run", "-n", action="store_true",
|
"-b",
|
||||||
help="Preview what would be uploaded")
|
type=int,
|
||||||
|
default=DEFAULT_BATCH_SIZE,
|
||||||
|
help="Videos to upload before waiting for transcoding (default: 1)",
|
||||||
|
)
|
||||||
|
ap.add_argument(
|
||||||
|
"--poll-interval",
|
||||||
|
type=int,
|
||||||
|
default=DEFAULT_POLL,
|
||||||
|
help=f"Seconds between state polls (default: {DEFAULT_POLL})",
|
||||||
|
)
|
||||||
|
ap.add_argument(
|
||||||
|
"--skip-wait",
|
||||||
|
action="store_true",
|
||||||
|
help="Upload everything without waiting for transcoding",
|
||||||
|
)
|
||||||
|
ap.add_argument("--nsfw", action="store_true", help="Mark videos as NSFW")
|
||||||
|
ap.add_argument(
|
||||||
|
"--dry-run", "-n", action="store_true", help="Preview what would be uploaded"
|
||||||
|
)
|
||||||
args = ap.parse_args()
|
args = ap.parse_args()
|
||||||
|
|
||||||
url = args.url or os.environ.get("PEERTUBE_URL")
|
url = args.url or os.environ.get("PEERTUBE_URL")
|
||||||
@@ -449,12 +538,16 @@ def main():
|
|||||||
password = args.password or os.environ.get("PEERTUBE_PASSWORD")
|
password = args.password or os.environ.get("PEERTUBE_PASSWORD")
|
||||||
|
|
||||||
if not args.dry_run:
|
if not args.dry_run:
|
||||||
missing = [label for label, val in [
|
missing = [
|
||||||
|
label
|
||||||
|
for label, val in [
|
||||||
("--url / PEERTUBE_URL", url),
|
("--url / PEERTUBE_URL", url),
|
||||||
("--username / PEERTUBE_USER", username),
|
("--username / PEERTUBE_USER", username),
|
||||||
("--channel / PEERTUBE_CHANNEL", channel),
|
("--channel / PEERTUBE_CHANNEL", channel),
|
||||||
("--password / PEERTUBE_PASSWORD", password),
|
("--password / PEERTUBE_PASSWORD", password),
|
||||||
] if not val]
|
]
|
||||||
|
if not val
|
||||||
|
]
|
||||||
if missing:
|
if missing:
|
||||||
for label in missing:
|
for label in missing:
|
||||||
print(f"[!] Required: {label}")
|
print(f"[!] Required: {label}")
|
||||||
@@ -468,7 +561,8 @@ def main():
|
|||||||
unmatched = on_disk - set(path_meta.keys())
|
unmatched = on_disk - set(path_meta.keys())
|
||||||
if unmatched:
|
if unmatched:
|
||||||
print(
|
print(
|
||||||
f"[!] {len(unmatched)} file(s) on disk not in video_map (will use filename as title)")
|
f"[!] {len(unmatched)} file(s) on disk not in video_map (will use filename as title)"
|
||||||
|
)
|
||||||
for rel in unmatched:
|
for rel in unmatched:
|
||||||
path_meta[rel] = {"title": "", "description": ""}
|
path_meta[rel] = {"title": "", "description": ""}
|
||||||
|
|
||||||
@@ -493,10 +587,14 @@ def main():
|
|||||||
sz = (Path(args.input) / rel).stat().st_size
|
sz = (Path(args.input) / rel).stat().st_size
|
||||||
total_bytes += sz
|
total_bytes += sz
|
||||||
print(f" [{fmt_size(sz):>10}] {name}")
|
print(f" [{fmt_size(sz):>10}] {name}")
|
||||||
print(
|
print(f"\n Total: {fmt_size(total_bytes)} across {len(pending)} videos")
|
||||||
f"\n Total: {fmt_size(total_bytes)} across {len(pending)} videos")
|
|
||||||
return
|
return
|
||||||
|
|
||||||
|
assert url is not None
|
||||||
|
assert username is not None
|
||||||
|
assert channel is not None
|
||||||
|
assert password is not None
|
||||||
|
|
||||||
# ── authenticate ──
|
# ── authenticate ──
|
||||||
base = url.rstrip("/")
|
base = url.rstrip("/")
|
||||||
if not base.startswith("http"):
|
if not base.startswith("http"):
|
||||||
@@ -533,7 +631,9 @@ def main():
|
|||||||
if _channel_match(rel, path_meta, existing)[0]:
|
if _channel_match(rel, path_meta, existing)[0]:
|
||||||
pre_matched.append(rel)
|
pre_matched.append(rel)
|
||||||
if pre_matched:
|
if pre_matched:
|
||||||
print(f"\n[+] Pre-sweep: {len(pre_matched)} local file(s) already on channel — marking uploaded")
|
print(
|
||||||
|
f"\n[+] Pre-sweep: {len(pre_matched)} local file(s) already on channel — marking uploaded"
|
||||||
|
)
|
||||||
for rel in pre_matched:
|
for rel in pre_matched:
|
||||||
mark_uploaded(args.input, rel)
|
mark_uploaded(args.input, rel)
|
||||||
pending = [rel for rel in pending if rel not in set(pre_matched)]
|
pending = [rel for rel in pending if rel not in set(pre_matched)]
|
||||||
@@ -548,7 +648,8 @@ def main():
|
|||||||
# ── flush batch if full ──
|
# ── flush batch if full ──
|
||||||
if not args.skip_wait and len(batch) >= args.batch_size:
|
if not args.skip_wait and len(batch) >= args.batch_size:
|
||||||
print(
|
print(
|
||||||
f"\n[+] Waiting for {len(batch)} video(s) to finish processing ...")
|
f"\n[+] Waiting for {len(batch)} video(s) to finish processing ..."
|
||||||
|
)
|
||||||
for uuid, bname in batch:
|
for uuid, bname in batch:
|
||||||
print(f"\n [{bname}]")
|
print(f"\n [{bname}]")
|
||||||
wait_for_published(base, token, uuid, args.poll_interval)
|
wait_for_published(base, token, uuid, args.poll_interval)
|
||||||
@@ -568,18 +669,17 @@ def main():
|
|||||||
print(f"\n[{total_up + 1}/{len(pending)}] {name}")
|
print(f"\n[{total_up + 1}/{len(pending)}] {name}")
|
||||||
print(f" File: {rel} ({fmt_size(sz)})")
|
print(f" File: {rel} ({fmt_size(sz)})")
|
||||||
|
|
||||||
ok, uuid = upload_video(
|
ok, uuid_opt = upload_video(base, token, channel_id, filepath, name, desc, nsfw)
|
||||||
base, token, channel_id, filepath, name, desc, nsfw)
|
|
||||||
if not ok:
|
if not ok:
|
||||||
continue
|
continue
|
||||||
|
|
||||||
print(f" Uploaded uuid={uuid}")
|
print(f" Uploaded uuid={uuid_opt}")
|
||||||
mark_uploaded(args.input, rel)
|
mark_uploaded(args.input, rel)
|
||||||
total_up += 1
|
total_up += 1
|
||||||
existing.add(name)
|
existing.add(name)
|
||||||
|
|
||||||
if uuid:
|
if uuid_opt is not None:
|
||||||
batch.append((uuid, name))
|
batch.append((uuid_opt, name))
|
||||||
|
|
||||||
# ── wait for final batch ──
|
# ── wait for final batch ──
|
||||||
if batch and not args.skip_wait:
|
if batch and not args.skip_wait:
|
||||||
@@ -589,8 +689,7 @@ def main():
|
|||||||
wait_for_published(base, token, uuid, args.poll_interval)
|
wait_for_published(base, token, uuid, args.poll_interval)
|
||||||
|
|
||||||
except KeyboardInterrupt:
|
except KeyboardInterrupt:
|
||||||
print(
|
print(f"\n\n[!] Interrupted after {total_up} uploads. Re-run to continue.")
|
||||||
f"\n\n[!] Interrupted after {total_up} uploads. Re-run to continue.")
|
|
||||||
sys.exit(130)
|
sys.exit(130)
|
||||||
|
|
||||||
print(f"\n{'=' * 50}")
|
print(f"\n{'=' * 50}")
|
||||||
|
|||||||
@@ -2144,7 +2144,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"https://www.jailbirdz.com/pinkcuffs-videos/sunday-august-4th-live-show/": {
|
"https://www.jailbirdz.com/pinkcuffs-videos/sunday-august-4th-live-show/": {
|
||||||
"title": "Sunday August 4th Live Show!!!!!",
|
"title": "Page Not Found",
|
||||||
"description": "[vc_row type=”in_container” full_screen_row_position=”middle” column_margin=”default” column_direction=”default” column_direction_tablet=”default” column_direction_phone=”default” scene_position=”center” text_color=”dark” text_align=”left” row_border_radius=”none” row_border_radius_applies=”bg” overflow=”visible” overlay_strength=”0.3″ gradient_direction=”left_to_right” shape_divider_position=”bottom” bg_image_animation=”none”][vc_column column_padding=”no-extra-padding” column_padding_tablet=”inherit” column_padding_phone=”inherit” column_padding_position=”all” column_element_direction_desktop=”default” column_element_spacing=”default” desktop_text_alignment=”default” tablet_text_alignment=”default” phone_text_alignment=”default” background_color_opacity=”1″ background_hover_color_opacity=”1″ column_backdrop_filter=”none” column_shadow=”none” column_border_radius=”none” column_link_target=”_self” column_position=”default” gradient_direction=”left_to_right” overlay_strength=”0.3″ width=”1/1″ tablet_width_inherit=”default” animation_type=”default” bg_image_animation=”none” border_type=”simple” column_border_width=”none” column_border_style=”solid”][vc_column_text]Live show with Mia Brooks and Autumn! BOTH models will be getting cuffed up this show 🙂 Click to Watch Show[/vc_column_text][/vc_column][/vc_row]",
|
"description": "[vc_row type=”in_container” full_screen_row_position=”middle” column_margin=”default” column_direction=”default” column_direction_tablet=”default” column_direction_phone=”default” scene_position=”center” text_color=”dark” text_align=”left” row_border_radius=”none” row_border_radius_applies=”bg” overflow=”visible” overlay_strength=”0.3″ gradient_direction=”left_to_right” shape_divider_position=”bottom” bg_image_animation=”none”][vc_column column_padding=”no-extra-padding” column_padding_tablet=”inherit” column_padding_phone=”inherit” column_padding_position=”all” column_element_direction_desktop=”default” column_element_spacing=”default” desktop_text_alignment=”default” tablet_text_alignment=”default” phone_text_alignment=”default” background_color_opacity=”1″ background_hover_color_opacity=”1″ column_backdrop_filter=”none” column_shadow=”none” column_border_radius=”none” column_link_target=”_self” column_position=”default” gradient_direction=”left_to_right” overlay_strength=”0.3″ width=”1/1″ tablet_width_inherit=”default” animation_type=”default” bg_image_animation=”none” border_type=”simple” column_border_width=”none” column_border_style=”solid”][vc_column_text]Live show with Mia Brooks and Autumn! BOTH models will be getting cuffed up this show 🙂 Click to Watch Show[/vc_column_text][/vc_column][/vc_row]",
|
||||||
"videos": []
|
"videos": []
|
||||||
},
|
},
|
||||||
@@ -2317,7 +2317,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
"https://www.jailbirdz.com/pinkcuffs-videos/sunday-show-this-week/": {
|
"https://www.jailbirdz.com/pinkcuffs-videos/sunday-show-this-week/": {
|
||||||
"title": "Sunday Show This week!",
|
"title": "Page Not Found",
|
||||||
"description": "[vc_row type=”in_container” full_screen_row_position=”middle” column_margin=”default” column_direction=”default” column_direction_tablet=”default” column_direction_phone=”default” scene_position=”center” text_color=”dark” text_align=”left” row_border_radius=”none” row_border_radius_applies=”bg” overflow=”visible” overlay_strength=”0.3″ gradient_direction=”left_to_right” shape_divider_position=”bottom” bg_image_animation=”none”][vc_column column_padding=”no-extra-padding” column_padding_tablet=”inherit” column_padding_phone=”inherit” column_padding_position=”all” column_element_direction_desktop=”default” column_element_spacing=”default” desktop_text_alignment=”default” tablet_text_alignment=”default” phone_text_alignment=”default” background_color_opacity=”1″ background_hover_color_opacity=”1″ column_backdrop_filter=”none” column_shadow=”none” column_border_radius=”none” column_link_target=”_self” column_position=”default” gradient_direction=”left_to_right” overlay_strength=”0.3″ width=”1/1″ tablet_width_inherit=”default” animation_type=”default” bg_image_animation=”none” border_type=”simple” column_border_width=”none” column_border_style=”solid”][vc_column_text]Come watch Mia brooks Arrest and Cuff up Zoe Jade!\n\nLink To View Show[/vc_column_text][/vc_column][/vc_row]",
|
"description": "[vc_row type=”in_container” full_screen_row_position=”middle” column_margin=”default” column_direction=”default” column_direction_tablet=”default” column_direction_phone=”default” scene_position=”center” text_color=”dark” text_align=”left” row_border_radius=”none” row_border_radius_applies=”bg” overflow=”visible” overlay_strength=”0.3″ gradient_direction=”left_to_right” shape_divider_position=”bottom” bg_image_animation=”none”][vc_column column_padding=”no-extra-padding” column_padding_tablet=”inherit” column_padding_phone=”inherit” column_padding_position=”all” column_element_direction_desktop=”default” column_element_spacing=”default” desktop_text_alignment=”default” tablet_text_alignment=”default” phone_text_alignment=”default” background_color_opacity=”1″ background_hover_color_opacity=”1″ column_backdrop_filter=”none” column_shadow=”none” column_border_radius=”none” column_link_target=”_self” column_position=”default” gradient_direction=”left_to_right” overlay_strength=”0.3″ width=”1/1″ tablet_width_inherit=”default” animation_type=”default” bg_image_animation=”none” border_type=”simple” column_border_width=”none” column_border_style=”solid”][vc_column_text]Come watch Mia brooks Arrest and Cuff up Zoe Jade!\n\nLink To View Show[/vc_column_text][/vc_column][/vc_row]",
|
||||||
"videos": []
|
"videos": []
|
||||||
},
|
},
|
||||||
@@ -7754,7 +7754,6 @@
|
|||||||
"description": "Officer Jackie Jupiter is hard at work just doing her job when a couple of mischievous little rookies decide to make her day more interesting. Officer Maggie and Officer Serendipity decided to swing by and see how Officer Jackie was doing on their day off. They go into her office and get into some trouble. Come see what happens in this part 1!",
|
"description": "Officer Jackie Jupiter is hard at work just doing her job when a couple of mischievous little rookies decide to make her day more interesting. Officer Maggie and Officer Serendipity decided to swing by and see how Officer Jackie was doing on their day off. They go into her office and get into some trouble. Come see what happens in this part 1!",
|
||||||
"scraped_at": 1771809265,
|
"scraped_at": 1771809265,
|
||||||
"videos": [
|
"videos": [
|
||||||
"http://prisonteens.s3.amazonaws.com/2020/Jackie Jupiter/corrupt cops caught part1.mp4</a><a href=",
|
|
||||||
"https://d1nqppmcfahrr7.cloudfront.net/2020/Jackie%20Jupiter/corrupt%20cops%20caught%20part1.mp4"
|
"https://d1nqppmcfahrr7.cloudfront.net/2020/Jackie%20Jupiter/corrupt%20cops%20caught%20part1.mp4"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -8275,7 +8274,6 @@
|
|||||||
"description": "Officer Lisa pulls over Jade and Nathalie for running a light. The girls are surprised when she comes back after running their information and asks them to get out of the vehicle. Come see what happens in Part 1!",
|
"description": "Officer Lisa pulls over Jade and Nathalie for running a light. The girls are surprised when she comes back after running their information and asks them to get out of the vehicle. Come see what happens in Part 1!",
|
||||||
"scraped_at": 1771809265,
|
"scraped_at": 1771809265,
|
||||||
"videos": [
|
"videos": [
|
||||||
"http://prisonteens.s3.amazonaws.com/2019/Lisa/OfficerLisaCarArrestPart1.mp4 car ride</a><a href=",
|
|
||||||
"https://d1nqppmcfahrr7.cloudfront.net/2019/Lisa/OfficerLisaCarArrestPart1.mp4"
|
"https://d1nqppmcfahrr7.cloudfront.net/2019/Lisa/OfficerLisaCarArrestPart1.mp4"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -11021,5 +11019,13 @@
|
|||||||
"videos": [
|
"videos": [
|
||||||
"https://d1nqppmcfahrr7.cloudfront.net/2020/Persephone/serendipity%20and%20persephone%20part1.mp4"
|
"https://d1nqppmcfahrr7.cloudfront.net/2020/Persephone/serendipity%20and%20persephone%20part1.mp4"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
"https://www.jailbirdz.com/pinkcuffs-videos/kay-is-under-investigation-then-lilly-sneaks-into-the-cop-car-pt4/": {
|
||||||
|
"title": "Kay Is Under Investigation —Then Lilly Sneaks Into the Cop Car pt4",
|
||||||
|
"description": "Lilly and Kay Bunny remain nude and chained to the bench when Officer Luna Lux notices Kay chewing gum. Luna confiscates it and announces that both women will face additional restraints as punishment. Lilly protests that it’s unfair, but Luna only chuckles.\n\nLuna uncuffs and stands Kay Bunny up, walking her over to another room then repeats the same process for Lilly. Afterward, Luna uncuffs them and hands them lime-green jumpsuits. With hands on their heads, both are fitted with waist chains, front cuffs, and shackles.\n\nLuna then walks them to their cell. Both remain restrained overnight, sharing a small cot, trying to get comfortable. The series ends with them awaiting the next day.",
|
||||||
|
"videos": [
|
||||||
|
"https://d1nqppmcfahrr7.cloudfront.net/2026/LunaLux/LillyRedKayBunniept4.mp4"
|
||||||
|
],
|
||||||
|
"scraped_at": 1772204332
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
Reference in New Issue
Block a user