Garbage commit; added junk code

2026-06-30 00:27:13 +00:00 · 2026-02-20 18:07:02 +01:00
commit 80444405e9
15 changed files with 34271 additions and 0 deletions
@@ -0,0 +1,11 @@
+# Copy your wordpress_logged_in_... cookie from browser DevTools → Storage → Cookies.
+# Paste the full name=value pair below.
+# wordpress_sec_... is the wp-admin cookie — irrelevant for read-only viewers.
+# __cf_bm is a Cloudflare bot-management cookie — also not needed.
+WP_LOGIN_COOKIE=wordpress_logged_in_<hash>=<value>
+
+# PeerTube upload target
+PEERTUBE_URL=https://your-peertube-instance.example
+PEERTUBE_USER=admin
+PEERTUBE_CHANNEL=your-channel-name
+PEERTUBE_PASSWORD=your-password
@@ -0,0 +1,14 @@
+---
+alwaysApply: true
+always_on: true
+trigger: always_on
+applyTo: "**"
+description: Snyk Security At Inception
+---
+
+# Project security best practices
+
+- Always run snyk_code_scan tool for new first party code that is generated in a Snyk-supported language.
+- If any security issues are found based on newly introduced or modified code or dependencies, attempt to fix the issues using the results context from Snyk.
+- Rescan the code after fixing the issues to ensure that the issues were fixed and that there are no newly introduced issues.
+- Repeat this process until no new issues are found.
@@ -0,0 +1,5 @@
+__pycache__/
+downloads/
+*.mp4
+*.mp4.part
+.env
@@ -0,0 +1,4 @@
+{
+    "snyk.advanced.organization": "512ef4a1-6034-4537-a391-9692d282122a",
+    "snyk.advanced.autoSelectOrganization": true
+}
@@ -0,0 +1,142 @@
+# 𝒥𝒶𝒾𝓁𝒷𝒾𝓇𝒹𝓏-𝒹𝓁
+
+Jailbirdz.com is an Arizona-based subscription video site publishing arrest and jail roleplay scenarios featuring women. This tool scrapes the member area, downloads the videos, and re-hosts them on a self-owned PeerTube instance.
+
+> [!NOTE]  
+> This tool does not bypass authentication, modify the site, or intercept anything it isn't entitled to. A valid, paid membership is required. The scraper authenticates using your own session cookie and accesses only content your account can already view in a browser.
+>
+> Downloading content for private, personal use is permitted in many jurisdictions under private copy provisions (e.g., § 53 UrhG in Germany). You are responsible for determining whether this applies in yours.
+
+## Requirements
+
+- Python 3.10+
+- `pip install -r requirements.txt`
+- `playwright install firefox`
+
+## Setup
+
+```bash
+cp .env.example .env
+```
+
+### WP_LOGIN_COOKIE
+
+You need to be logged into jailbirdz.com in a browser. Then either:
+
+**Option A — auto (recommended):** let `grab_cookie.py` read it from your browser and write it to `.env` automatically:
+
+```bash
+python grab_cookie.py              # tries Firefox, Chrome, Edge, Brave in order
+python grab_cookie.py -b firefox   # or target a specific browser
+```
+
+> **Note:** Chrome and Edge on Windows 130+ require the script to run as Administrator due to App-bound Encryption. Firefox works without elevated privileges.
+
+**Option B — manual:** open `.env` and set `WP_LOGIN_COOKIE` yourself. Get the value from browser DevTools → Storage → Cookies while on jailbirdz.com — copy the full `name=value` of the `wordpress_logged_in_*` cookie.
+
+### Other `.env` values
+
+- `PEERTUBE_URL` — base URL of your PeerTube instance.
+- `PEERTUBE_USER` — PeerTube username.
+- `PEERTUBE_CHANNEL` — channel to upload to.
+- `PEERTUBE_PASSWORD` — PeerTube password.
+
+## Workflow
+
+### 1. Scrape
+
+Discovers all post URLs via the WordPress REST API, then visits each page with a headless Firefox browser to intercept video network requests (MP4, MOV, WebM, AVI, M4V).
+
+```bash
+python main.py
+```
+
+Results are written to `video_map.json`. Safe to re-run — already-scraped posts are skipped.
+
+### 2. Download
+
+```bash
+python download.py [options]
+
+Options:
+  -o, --output DIR      Download directory (default: downloads)
+  -t, --titles          Name files by post title
+      --original        Name files by original CloudFront filename (default)
+      --reorganize      Rename existing files to match current naming mode
+  -w, --workers N       Concurrent downloads (default: 4)
+  -n, --dry-run         Print what would be downloaded
+```
+
+Resumes partial downloads. The chosen naming mode is saved to `.naming_mode` inside the output directory and persists across runs. Filenames that would clash are placed into subfolders.
+
+### 3. Upload
+
+```bash
+python upload.py [options]
+
+Options:
+  -i, --input DIR           MP4 source directory (default: downloads)
+      --url URL             PeerTube instance URL (or set PEERTUBE_URL)
+  -U, --username NAME       PeerTube username (or set PEERTUBE_USER)
+  -p, --password SECRET     PeerTube password (or set PEERTUBE_PASSWORD)
+  -C, --channel NAME        Channel to upload to (or set PEERTUBE_CHANNEL)
+  -b, --batch-size N        Videos to upload before waiting for transcoding (default: 1)
+      --poll-interval SECS  State poll interval in seconds (default: 30)
+      --skip-wait           Upload without waiting for transcoding
+      --nsfw                Mark videos as NSFW
+  -n, --dry-run             Print what would be uploaded
+```
+
+Uploads in resumable 10 MB chunks. After each batch, waits for transcoding and object storage to complete before uploading the next batch — this prevents disk exhaustion on the PeerTube server. Videos already present on the channel (matched by name) are skipped. Progress is tracked in `.uploaded` inside the input directory.
+
+## Utilities
+
+### Check for filename clashes
+
+```bash
+python check_clashes.py
+```
+
+Lists filenames that map to more than one source URL, with sizes.
+
+### Estimate total download size
+
+```bash
+python total_size.py
+```
+
+Fetches `Content-Length` for every video URL in `video_map.json` and prints a size summary. Does not download anything.
+
+## Data files
+
+| File             | Location         | Description                                                           |
+| ---------------- | ---------------- | --------------------------------------------------------------------- |
+| `video_map.json` | project root     | Scraped post URLs mapped to titles, descriptions, and video URLs      |
+| `.naming_mode`   | output directory | Saved filename mode (`original` or `title`)                           |
+| `.uploaded`      | input directory  | Newline-delimited list of relative paths already uploaded to PeerTube |
+
+## FAQ
+
+**Is this necessary?**  
+Yes, obviously.
+
+**Is this project exactly what it looks like?**  
+Also yes.
+
+**Why go to all this trouble?**  
+Middle school girls bullied me so hard I decided if you're going to be the weird kid anyway, you might as well commit to the bit and build highly specific pipelines for highly specific content.  
+Now it's their turn to get booked.  
+Checkmate, society.  
+No apologies.
+
+**Why not just download everything manually?**  
+Dude.  
+Bondage fantasy.  
+Not pain play.  
+Huge difference.  
+1,300 clicks = torture.  
+Know your genres.
+
+---
+
+This is the most normal thing I've scripted this month.
@@ -0,0 +1,159 @@
+"""Filename clash detection and shared URL utilities.
+
+Importable functions:
+    url_to_filename(url)       - extract clean filename from a URL
+    find_clashes(urls)         - {filename: [urls]} for filenames with >1 source
+    build_download_paths(urls, output_dir) - {url: local_path} with clash resolution
+    fmt_size(bytes)            - human-readable size string
+    get_remote_size(session, url) - file size via HEAD without downloading
+    fetch_sizes(urls, workers, on_progress) - bulk size lookup
+    make_session()             - requests.Session with required headers
+    load_video_map()           - load video_map.json, returns {} on missing/corrupt
+"""
+
+from collections import defaultdict
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from pathlib import Path, PurePosixPath
+from urllib.parse import urlparse, unquote
+import json
+import requests
+from config import BASE_URL
+
+REFERER = f"{BASE_URL}/"
+VIDEO_MAP_FILE = "video_map.json"
+VIDEO_EXTS = {".mp4", ".mov", ".m4v", ".webm", ".avi"}
+
+
+def load_video_map():
+    if Path(VIDEO_MAP_FILE).exists():
+        try:
+            with open(VIDEO_MAP_FILE, encoding="utf-8") as f:
+                return json.load(f)
+        except (json.JSONDecodeError, OSError):
+            return {}
+    return {}
+
+
+def make_session():
+    s = requests.Session()
+    s.headers.update({"Referer": REFERER})
+    return s
+
+
+def fmt_size(b):
+    for unit in ("B", "KB", "MB", "GB"):
+        if b < 1024:
+            return f"{b:.1f} {unit}"
+        b /= 1024
+    return f"{b:.1f} TB"
+
+
+def url_to_filename(url):
+    return unquote(PurePosixPath(urlparse(url).path).name)
+
+
+def find_clashes(urls):
+    # Case-insensitive grouping so that e.g. "DaisyArrest.mp4" and
+    # "daisyarrest.mp4" are treated as a clash.  This is required for
+    # correctness on case-insensitive filesystems (NTFS, exFAT, macOS HFS+)
+    # and harmless on case-sensitive ones (ext4) — the actual filenames on
+    # disk keep their original casing; only the clash *detection* is folded.
+    by_lower = defaultdict(list)
+    for url in urls:
+        by_lower[url_to_filename(url).lower()].append(url)
+    return {url_to_filename(srcs[0]): srcs
+            for srcs in by_lower.values() if len(srcs) > 1}
+
+
+def _clash_subfolder(url):
+    """Parent path segment used as disambiguator for clashing filenames."""
+    parts = urlparse(url).path.rstrip("/").split("/")
+    return unquote(parts[-2]) if len(parts) >= 2 else "unknown"
+
+
+def build_download_paths(urls, output_dir):
+    """Map each URL to a local file path. Flat layout; clashing names get a subfolder."""
+    clashes = find_clashes(urls)
+    clash_lower = {name.lower() for name in clashes}
+
+    paths = {}
+    for url in urls:
+        filename = url_to_filename(url)
+        if filename.lower() in clash_lower:
+            paths[url] = Path(output_dir) / _clash_subfolder(url) / filename
+        else:
+            paths[url] = Path(output_dir) / filename
+    return paths
+
+
+def get_remote_size(session, url):
+    try:
+        r = session.head(url, allow_redirects=True, timeout=15)
+        if r.status_code < 400 and "Content-Length" in r.headers:
+            return int(r.headers["Content-Length"])
+    except Exception:
+        pass
+    try:
+        r = session.get(
+            url, headers={"Range": "bytes=0-0"}, stream=True, timeout=15)
+        r.close()
+        cr = r.headers.get("Content-Range", "")
+        if "/" in cr:
+            return int(cr.split("/")[-1])
+    except Exception:
+        pass
+    return None
+
+
+def fetch_sizes(urls, workers=20, on_progress=None):
+    """Return {url: size_or_None}. on_progress(done, total) called after each URL."""
+    session = make_session()
+    sizes = {}
+    total = len(urls)
+
+    with ThreadPoolExecutor(max_workers=workers) as pool:
+        futures = {pool.submit(get_remote_size, session, u): u for u in urls}
+        done = 0
+        for fut in as_completed(futures):
+            sizes[futures[fut]] = fut.result()
+            done += 1
+            if on_progress:
+                on_progress(done, total)
+
+    return sizes
+
+
+# --------------- CLI ---------------
+
+def main():
+    vm = load_video_map()
+    urls = [u for entry in vm.values() for u in entry.get("videos", []) if u.startswith("http")]
+
+    clashes = find_clashes(urls)
+
+    print(f"Total URLs: {len(urls)}")
+    by_name = defaultdict(list)
+    for url in urls:
+        by_name[url_to_filename(url)].append(url)
+    print(f"Unique filenames: {len(by_name)}")
+
+    if not clashes:
+        print("\nNo filename clashes — every filename is unique.")
+        return
+
+    clash_urls = [u for srcs in clashes.values() for u in srcs]
+    print(f"\n[+] Fetching file sizes for {len(clash_urls)} clashing URLs…")
+    sizes = fetch_sizes(clash_urls)
+
+    print(f"\n{len(clashes)} filename clash(es):\n")
+    for name, srcs in sorted(clashes.items()):
+        print(f"  {name}  ({len(srcs)} sources)")
+        for s in srcs:
+            sz = sizes.get(s)
+            tag = fmt_size(sz) if sz is not None else "unknown"
+            print(f"    [{tag}] {s}")
+        print()
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,2 @@
+BASE_URL = "https://www.jailbirdz.com"
+COOKIE_DOMAIN = "jailbirdz.com"  # rookiepy domain filter (no www)
@@ -0,0 +1,408 @@
+"""Download videos from video_map.json with resume, integrity checks, and naming modes.
+
+Usage:
+    python download.py                        # downloads with remembered (or default original) naming
+    python download.py --output /mnt/nas      # custom directory
+    python download.py --titles               # switch to title-based filenames (remembers choice)
+    python download.py --original             # switch back to original filenames (remembers choice)
+    python download.py --reorganize           # rename existing files to match current mode
+    python download.py --dry-run              # preview what would happen
+    python download.py --workers 6            # override concurrency (default 4)
+"""
+
+import argparse
+import json
+from pathlib import Path
+import re
+import shutil
+from collections import defaultdict
+from concurrent.futures import ThreadPoolExecutor, as_completed
+
+from check_clashes import (
+    make_session,
+    fmt_size,
+    url_to_filename,
+    find_clashes,
+    build_download_paths,
+    fetch_sizes,
+)
+
+VIDEO_MAP_FILE = "video_map.json"
+CHUNK_SIZE = 8 * 1024 * 1024
+DEFAULT_OUTPUT = "downloads"
+DEFAULT_WORKERS = 4
+MODE_FILE = ".naming_mode"
+MODE_ORIGINAL = "original"
+MODE_TITLE = "title"
+
+
+# ── Naming mode persistence ──────────────────────────────────────────
+
+def read_mode(output_dir):
+    p = Path(output_dir) / MODE_FILE
+    if p.exists():
+        return p.read_text().strip()
+    return None
+
+
+def write_mode(output_dir, mode):
+    Path(output_dir).mkdir(parents=True, exist_ok=True)
+    (Path(output_dir) / MODE_FILE).write_text(mode)
+
+
+def resolve_mode(args):
+    """Determine naming mode from CLI flags + saved marker. Returns mode string."""
+    saved = read_mode(args.output)
+
+    if args.titles and args.original:
+        print("[!] Cannot use --titles and --original together.")
+        raise SystemExit(1)
+
+    if args.titles:
+        return MODE_TITLE
+    if args.original:
+        return MODE_ORIGINAL
+    if saved:
+        return saved
+    return MODE_ORIGINAL
+
+
+# ── Filename helpers ─────────────────────────────────────────────────
+
+def sanitize_filename(title, max_len=180):
+    name = re.sub(r'[<>:"/\\|?*]', '', title)
+    name = re.sub(r'\s+', ' ', name).strip().rstrip('.')
+    return name[:max_len].rstrip() if len(name) > max_len else name
+
+
+def build_title_paths(urls, url_to_title, output_dir):
+    name_to_urls = defaultdict(list)
+    url_to_base = {}
+
+    for url in urls:
+        title = url_to_title.get(url)
+        ext = Path(url_to_filename(url)).suffix or ".mp4"
+        base = sanitize_filename(title) if title else Path(url_to_filename(url)).stem
+        url_to_base[url] = (base, ext)
+        name_to_urls[base + ext].append(url)
+
+    paths = {}
+    for url in urls:
+        base, ext = url_to_base[url]
+        full = base + ext
+        if len(name_to_urls[full]) > 1:
+            slug = url_to_filename(url).rsplit('.', 1)[0]
+            paths[url] = Path(output_dir) / f"{base} [{slug}]{ext}"
+        else:
+            paths[url] = Path(output_dir) / full
+    return paths
+
+
+def get_paths_for_mode(mode, urls, video_map, output_dir):
+    if mode == MODE_TITLE:
+        url_title = build_url_title_map(video_map)
+        return build_title_paths(urls, url_title, output_dir)
+    return build_download_paths(urls, output_dir)
+
+
+# ── Reorganize ───────────────────────────────────────────────────────
+
+def reorganize(urls, video_map, output_dir, target_mode, dry_run=False):
+    """Rename existing files from one naming scheme to another."""
+    other_mode = MODE_TITLE if target_mode == MODE_ORIGINAL else MODE_ORIGINAL
+    old_paths = get_paths_for_mode(other_mode, urls, video_map, output_dir)
+    new_paths = get_paths_for_mode(target_mode, urls, video_map, output_dir)
+
+    moves = []
+    for url in urls:
+        old = old_paths[url]
+        new = new_paths[url]
+        if old == new:
+            continue
+        if old.exists() and not new.exists():
+            moves.append((old, new))
+        # also handle .part files
+        old_part = old.parent / (old.name + ".part")
+        new_part = new.parent / (new.name + ".part")
+        if old_part.exists() and not new_part.exists():
+            moves.append((old_part, new_part))
+
+    if not moves:
+        print("[✓] Nothing to reorganize — files already match the target mode.")
+        return
+
+    print(f"[+] {len(moves)} file(s) to rename ({other_mode} → {target_mode}):\n")
+
+    for old, new in moves:
+        old_rel = old.relative_to(output_dir)
+        new_rel = new.relative_to(output_dir)
+        if dry_run:
+            print(f"  [dry-run] {old_rel}  →  {new_rel}")
+        else:
+            new.parent.mkdir(parents=True, exist_ok=True)
+            shutil.move(old, new)
+            print(f"  ✓ {old_rel}  →  {new_rel}")
+
+    if not dry_run:
+        # Clean up empty directories left behind
+        output_path = Path(output_dir)
+        for old, _ in moves:
+            d = old.parent
+            while d != output_path:
+                try:
+                    d.rmdir()
+                except OSError:
+                    break
+                d = d.parent
+
+        write_mode(output_dir, target_mode)
+        print(f"\n[✓] Reorganized. Mode saved: {target_mode}")
+    else:
+        print(f"\n[dry-run] Would rename {len(moves)} files. No changes made.")
+
+
+# ── Download ─────────────────────────────────────────────────────────
+
+def download_one(session, url, dest, expected_size):
+    dest = Path(dest)
+    part = dest.parent / (dest.name + ".part")
+    dest.parent.mkdir(parents=True, exist_ok=True)
+
+    if dest.exists():
+        local = dest.stat().st_size
+        if expected_size and local == expected_size:
+            return "ok", 0
+        if expected_size and local != expected_size:
+            dest.unlink()
+
+    existing = part.stat().st_size if part.exists() else 0
+    headers = {}
+    if existing and expected_size and existing < expected_size:
+        headers["Range"] = f"bytes={existing}-"
+
+    try:
+        r = session.get(url, headers=headers, stream=True, timeout=60)
+
+        if r.status_code == 416:
+            part.rename(dest)
+            return "ok", 0
+
+        r.raise_for_status()
+    except Exception as e:
+        return f"error: {e}", 0
+
+    mode = "ab" if headers.get("Range") else "wb"
+    if mode == "wb":
+        existing = 0
+
+    written = 0
+    try:
+        with open(part, mode) as f:
+            for chunk in r.iter_content(chunk_size=CHUNK_SIZE):
+                f.write(chunk)
+                written += len(chunk)
+    except Exception as e:
+        return f"error: {e}", written
+
+    final_size = existing + written
+    if expected_size and final_size != expected_size:
+        return "size_mismatch", written
+
+    part.rename(dest)
+    return "ok", written
+
+
+# ── Data loading ─────────────────────────────────────────────────────
+
+def load_video_map():
+    with open(VIDEO_MAP_FILE, encoding="utf-8") as f:
+        return json.load(f)
+
+
+def _is_valid_url(url):
+    return url.startswith(
+        "http") and "<" not in url and ">" not in url and " href=" not in url
+
+
+def collect_urls(video_map):
+    urls, seen, skipped = [], set(), 0
+    for entry in video_map.values():
+        for video_url in entry.get("videos", []):
+            if video_url in seen:
+                continue
+            seen.add(video_url)
+            if _is_valid_url(video_url):
+                urls.append(video_url)
+            else:
+                skipped += 1
+    if skipped:
+        print(f"[!] Skipped {skipped} malformed URL(s)")
+    return urls
+
+
+def build_url_title_map(video_map):
+    url_title = {}
+    for entry in video_map.values():
+        title = entry.get("title", "")
+        for video_url in entry.get("videos", []):
+            if video_url not in url_title:
+                url_title[video_url] = title
+    return url_title
+
+
+# ── Main ─────────────────────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Download videos from video_map.json")
+    parser.add_argument("--output", "-o", default=DEFAULT_OUTPUT,
+                        help=f"Download directory (default: {DEFAULT_OUTPUT})")
+
+    naming = parser.add_mutually_exclusive_group()
+    naming.add_argument("--titles", "-t", action="store_true",
+                        help="Use title-based filenames (saved as default for this directory)")
+    naming.add_argument("--original", action="store_true",
+                        help="Use original CloudFront filenames (saved as default for this directory)")
+
+    parser.add_argument("--reorganize", action="store_true",
+                        help="Rename existing files to match the current naming mode")
+    parser.add_argument("--dry-run", "-n", action="store_true",
+                        help="Preview without making changes")
+    parser.add_argument("--workers", "-w", type=int, default=DEFAULT_WORKERS,
+                        help=f"Concurrent downloads (default: {DEFAULT_WORKERS})")
+    args = parser.parse_args()
+
+    video_map = load_video_map()
+    urls = collect_urls(video_map)
+    mode = resolve_mode(args)
+
+    saved = read_mode(args.output)
+    mode_changed = saved is not None and saved != mode
+
+    print(f"[+] {len(urls)} MP4 URLs from {VIDEO_MAP_FILE}")
+    print(f"[+] Naming mode: {mode}" + (" (changed!)" if mode_changed else ""))
+
+    # Handle reorganize
+    if args.reorganize or mode_changed:
+        if mode_changed and not args.reorganize:
+            print(f"\n[!] Mode changed from '{saved}' to '{mode}'.")
+            print(
+                "    Use --reorganize to rename existing files, or --dry-run to preview.")
+            print("    Refusing to download until existing files are reorganized.")
+            return
+        reorganize(urls, video_map, args.output, mode, dry_run=args.dry_run)
+        if args.dry_run or args.reorganize:
+            return
+
+    # Save mode
+    if not args.dry_run:
+        write_mode(args.output, mode)
+
+    paths = get_paths_for_mode(mode, urls, video_map, args.output)
+
+    clashes = find_clashes(urls)
+    if clashes:
+        print(
+            f"[+] {len(clashes)} filename clash(es) resolved with subfolders/suffixes")
+
+    already = [u for u in urls if paths[u].exists()]
+    pending = [u for u in urls if not paths[u].exists()]
+
+    print(f"[+] Already downloaded: {len(already)}")
+    print(f"[+] To download: {len(pending)}")
+
+    if not pending:
+        print("\n[✓] Everything is already downloaded.")
+        return
+
+    if args.dry_run:
+        print(
+            f"\n[dry-run] Would download {len(pending)} files to {args.output}/")
+        for url in pending[:20]:
+            print(f"  → {paths[url].name}")
+        if len(pending) > 20:
+            print(f"  … and {len(pending) - 20} more")
+        return
+
+    print("\n[+] Fetching remote file sizes…")
+    session = make_session()
+    remote_sizes = fetch_sizes(pending, workers=20)
+
+    sized = {u: s for u, s in remote_sizes.items() if s is not None}
+    total_bytes = sum(sized.values())
+    print(
+        f"[+] Download size: {fmt_size(total_bytes)} across {len(pending)} files")
+
+    if already:
+        print(f"[+] Verifying {len(already)} existing files…")
+        already_sizes = fetch_sizes(already, workers=20)
+
+    mismatched = 0
+    for url in already:
+        dest = paths[url]
+        local = dest.stat().st_size
+        remote = already_sizes.get(url)
+        if remote and local != remote:
+            mismatched += 1
+            print(f"[!] Size mismatch: {dest.name} "
+                  f"(local {fmt_size(local)} vs remote {fmt_size(remote)})")
+            pending.append(url)
+            remote_sizes[url] = remote
+
+    if mismatched:
+        print(
+            f"[!] {mismatched} file(s) will be re-downloaded due to size mismatch")
+
+    print(f"\n[⚡] Downloading with {args.workers} threads…\n")
+
+    completed = 0
+    failed = []
+    total_written = 0
+    total = len(pending)
+    interrupted = False
+
+    def do_download(url):
+        dest = paths[url]
+        expected = remote_sizes.get(url)
+        return url, download_one(session, url, dest, expected)
+
+    try:
+        with ThreadPoolExecutor(max_workers=args.workers) as pool:
+            futures = {pool.submit(do_download, u): u for u in pending}
+            for fut in as_completed(futures):
+                url, (status, written) = fut.result()
+                total_written += written
+                completed += 1
+                name = paths[url].name
+
+                if status == "ok" and written > 0:
+                    print(
+                        f"  [{completed}/{total}] ✓ {name} ({fmt_size(written)})")
+                elif status == "ok":
+                    print(
+                        f"  [{completed}/{total}] ✓ {name} (already complete)")
+                elif status == "size_mismatch":
+                    print(f"  [{completed}/{total}] ⚠ {name} (size mismatch)")
+                    failed.append(url)
+                else:
+                    print(f"  [{completed}/{total}] ✗ {name} ({status})")
+                    failed.append(url)
+    except KeyboardInterrupt:
+        interrupted = True
+        pool.shutdown(wait=False, cancel_futures=True)
+        print("\n\n[⏸] Interrupted! Partial downloads saved as .part files.")
+
+    print(f"\n{'=' * 50}")
+    print(f"  Downloaded: {fmt_size(total_written)}")
+    print(f"  Completed:  {completed}/{total}")
+    if failed:
+        print(f"  Failed:     {len(failed)} (re-run to retry)")
+    if interrupted:
+        print("  Paused — re-run to resume.")
+    elif not failed:
+        print("  All done!")
+    print(f"{'=' * 50}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,114 @@
+#!/usr/bin/env python3
+"""
+grab_cookie.py — read the WordPress login cookie from an
+installed browser and write it to .env as WP_LOGIN_COOKIE=name=value.
+
+Usage:
+    python grab_cookie.py                        # tries Firefox, Chrome, Edge, Brave
+    python grab_cookie.py --browser firefox      # explicit browser
+"""
+
+import argparse
+from pathlib import Path
+from config import COOKIE_DOMAIN
+
+ENV_FILE = Path(".env")
+ENV_KEY = "WP_LOGIN_COOKIE"
+COOKIE_PREFIX = "wordpress_logged_in_"
+
+BROWSER_NAMES = ["firefox", "chrome", "edge", "brave"]
+
+
+def find_cookie(browser_name):
+    """Return (name, value) for the wordpress_logged_in_* cookie, or (None, None)."""
+    try:
+        import rookiepy
+    except ImportError:
+        raise ImportError("rookiepy not installed — run: pip install rookiepy")
+
+    fn = getattr(rookiepy, browser_name, None)
+    if fn is None:
+        raise ValueError(f"rookiepy does not support '{browser_name}'.")
+
+    try:
+        cookies = fn([COOKIE_DOMAIN])
+    except PermissionError:
+        raise PermissionError(
+            f"Permission denied reading {browser_name} cookies.\n"
+            "    Close the browser, or on Windows run as Administrator for Chrome/Edge."
+        )
+    except Exception as e:
+        raise RuntimeError(f"Could not read {browser_name} cookies: {e}")
+
+    for c in cookies:
+        if c.get("name", "").startswith(COOKIE_PREFIX):
+            return c["name"], c["value"]
+
+    return None, None
+
+
+def update_env(name, value):
+    """Write WP_LOGIN_COOKIE=name=value into .env, replacing any existing line."""
+    new_line = f"{ENV_KEY}={name}={value}\n"
+
+    if ENV_FILE.exists():
+        text = ENV_FILE.read_text(encoding="utf-8")
+        lines = text.splitlines(keepends=True)
+        for i, line in enumerate(lines):
+            if line.startswith(f"{ENV_KEY}=") or line.strip() == ENV_KEY:
+                lines[i] = new_line
+                ENV_FILE.write_text("".join(lines), encoding="utf-8")
+                return "updated"
+        # Key not present — append
+        if text and not text.endswith("\n"):
+            text += "\n"
+        ENV_FILE.write_text(text + new_line, encoding="utf-8")
+        return "appended"
+    else:
+        ENV_FILE.write_text(new_line, encoding="utf-8")
+        return "created"
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description=f"Copy the {COOKIE_DOMAIN} login cookie from your browser into .env."
+    )
+    parser.add_argument(
+        "--browser", "-b",
+        choices=BROWSER_NAMES,
+        metavar="BROWSER",
+        help=f"Browser to read from: {', '.join(BROWSER_NAMES)} (default: try all in order)",
+    )
+    args = parser.parse_args()
+
+    order = [args.browser] if args.browser else BROWSER_NAMES
+
+    cookie_name = cookie_value = None
+    for browser in order:
+        print(f"[…] Trying {browser}…")
+        try:
+            cookie_name, cookie_value = find_cookie(browser)
+        except ImportError as e:
+            raise SystemExit(f"[!] {e}")
+        except (ValueError, PermissionError, RuntimeError) as e:
+            print(f"[!] {e}")
+            continue
+
+        if cookie_name:
+            print(f"[+] Found in {browser}: {cookie_name}")
+            break
+        print(f"    No {COOKIE_PREFIX}* cookie found in {browser}.")
+
+    if not cookie_name:
+        raise SystemExit(
+            f"\n[!] No {COOKIE_PREFIX}* cookie found in any browser.\n"
+            f"    Make sure you are logged into {COOKIE_DOMAIN}, then re-run.\n"
+            "    Or set WP_LOGIN_COOKIE manually in .env — see .env.example."
+        )
+
+    action = update_env(cookie_name, cookie_value)
+    print(f"[✓] {ENV_KEY} {action} in {ENV_FILE}.")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,467 @@
+import re
+import json
+import os
+import time
+import signal
+import asyncio
+import tempfile
+import requests
+from pathlib import Path, PurePosixPath
+from urllib.parse import urlparse
+from dotenv import load_dotenv
+from playwright.async_api import async_playwright
+from check_clashes import VIDEO_EXTS
+from config import BASE_URL
+
+load_dotenv()
+
+
+def _is_video_url(url):
+    """True if `url` ends with a recognised video extension (case-insensitive, path only)."""
+    return PurePosixPath(urlparse(url).path).suffix.lower() in VIDEO_EXTS
+WP_API = f"{BASE_URL}/wp-json/wp/v2"
+
+SKIP_TYPES = {
+    "attachment", "nav_menu_item", "wp_block", "wp_template",
+    "wp_template_part", "wp_global_styles", "wp_navigation",
+    "wp_font_family", "wp_font_face",
+}
+
+VIDEO_MAP_FILE = "video_map.json"
+MAX_WORKERS = 4
+
+API_HEADERS = {
+    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:147.0) Gecko/20100101 Firefox/147.0",
+    "Accept": "application/json",
+    "Referer": f"{BASE_URL}/",
+}
+
+
+def _get_login_cookie():
+    raw = os.environ.get("WP_LOGIN_COOKIE", "").strip()  # strip accidental whitespace
+    if not raw:
+        raise RuntimeError(
+            "WP_LOGIN_COOKIE not set. Copy it from your browser into .env — see .env.example.")
+    name, _, value = raw.partition("=")
+    if not value:
+        raise RuntimeError(
+            "WP_LOGIN_COOKIE looks malformed (no '=' found). Expected: name=value")
+    if not name.startswith("wordpress_logged_in_"):
+        raise RuntimeError(
+            "WP_LOGIN_COOKIE doesn't look right — expected a wordpress_logged_in_... cookie.")
+    return name, value
+
+
+def discover_content_types(session):
+    """Query /wp-json/wp/v2/types and return a list of (name, rest_base, type_slug) for content types worth scraping."""
+    r = session.get(f"{WP_API}/types", timeout=30)
+    r.raise_for_status()
+    types = r.json()
+
+    targets = []
+    for type_slug, info in types.items():
+        if type_slug in SKIP_TYPES:
+            continue
+        rest_base = info.get("rest_base")
+        name = info.get("name", type_slug)
+        if rest_base:
+            targets.append((name, rest_base, type_slug))
+    return targets
+
+
+def fetch_all_posts_for_type(session, type_name, rest_base, type_slug):
+    """Paginate one content type and return (url, title, description) tuples.
+    Uses the `link` field when available; falls back to building from slug."""
+    url_prefix = type_slug.replace("_", "-")
+    results = []
+    page = 1
+
+    while True:
+        r = session.get(
+            f"{WP_API}/{rest_base}",
+            params={"per_page": 100, "page": page},
+            timeout=30,
+        )
+        if r.status_code == 400 or not r.ok:
+            break
+        data = r.json()
+        if not data:
+            break
+        for post in data:
+            link = post.get("link", "")
+            if not link.startswith("http"):
+                slug = post.get("slug")
+                if slug:
+                    link = f"{BASE_URL}/{url_prefix}/{slug}/"
+                else:
+                    continue
+            title_obj = post.get("title", {})
+            title = title_obj.get("rendered", "") if isinstance(
+                title_obj, dict) else str(title_obj)
+            content_obj = post.get("content", {})
+            content_html = content_obj.get(
+                "rendered", "") if isinstance(content_obj, dict) else ""
+            description = html_to_text(content_html) if content_html else ""
+            results.append((link, title, description))
+        print(f"    {type_name} page {page}: {len(data)} items")
+        page += 1
+
+    return results
+
+
+def fetch_post_urls_from_api(headers):
+    """Auto-discover all content types via the WP REST API and collect every post URL.
+    Also builds video_map.json with titles pre-populated."""
+    print("[+] video_map.json empty or missing — discovering content types from REST API…")
+    session = requests.Session()
+    session.headers.update(headers)
+
+    targets = discover_content_types(session)
+    print(
+        f"[+] Found {len(targets)} content types: {', '.join(name for name, _, _ in targets)}\n")
+
+    all_results = []
+    for type_name, rest_base, type_slug in targets:
+        type_results = fetch_all_posts_for_type(
+            session, type_name, rest_base, type_slug)
+        all_results.extend(type_results)
+
+    seen = set()
+    deduped_urls = []
+    video_map = load_video_map()
+
+    for url, title, description in all_results:
+        if url not in seen and url.startswith("http"):
+            seen.add(url)
+            deduped_urls.append(url)
+            if url not in video_map:
+                video_map[url] = {"title": title,
+                                  "description": description, "videos": []}
+            else:
+                if not video_map[url].get("title"):
+                    video_map[url]["title"] = title
+                if not video_map[url].get("description"):
+                    video_map[url]["description"] = description
+
+    save_video_map(video_map)
+    print(
+        f"\n[+] Discovered {len(deduped_urls)} unique URLs → saved to {VIDEO_MAP_FILE}")
+    print(
+        f"[+] Pre-populated {len(video_map)} entries in {VIDEO_MAP_FILE}")
+    return deduped_urls
+
+
+def fetch_metadata_from_api(video_map, urls, headers):
+    """Populate missing titles and descriptions in video_map from the REST API."""
+    missing = [u for u in urls
+               if u not in video_map
+               or not video_map[u].get("title")
+               or not video_map[u].get("description")]
+    if not missing:
+        return
+
+    print(f"[+] Fetching metadata from REST API for {len(missing)} posts…")
+    session = requests.Session()
+    session.headers.update(headers)
+
+    targets = discover_content_types(session)
+
+    for type_name, rest_base, type_slug in targets:
+        type_results = fetch_all_posts_for_type(
+            session, type_name, rest_base, type_slug)
+        for url, title, description in type_results:
+            if url in video_map:
+                if not video_map[url].get("title"):
+                    video_map[url]["title"] = title
+                if not video_map[url].get("description"):
+                    video_map[url]["description"] = description
+            else:
+                video_map[url] = {"title": title,
+                                  "description": description, "videos": []}
+
+    save_video_map(video_map)
+    populated_t = sum(1 for u in urls if video_map.get(u, {}).get("title"))
+    populated_d = sum(1 for u in urls if video_map.get(
+        u, {}).get("description"))
+    print(f"[+] Titles populated: {populated_t}/{len(urls)}")
+    print(f"[+] Descriptions populated: {populated_d}/{len(urls)}")
+
+
+def load_post_urls(headers):
+    vm = load_video_map()
+    if vm:
+        print(f"[+] {VIDEO_MAP_FILE} found — loading {len(vm)} post URLs.")
+        return list(vm.keys())
+    return fetch_post_urls_from_api(headers)
+
+
+def html_to_text(html_str):
+    """Strip HTML tags, decode entities, and collapse whitespace into clean plain text."""
+    import html
+    text = re.sub(r'<br\s*/?>', '\n', html_str)
+    text = text.replace('</p>', '\n\n')
+    text = re.sub(r'<[^>]+>', '', text)
+    text = html.unescape(text)
+    lines = [line.strip() for line in text.splitlines()]
+    text = '\n'.join(lines)
+    text = re.sub(r'\n{3,}', '\n\n', text)
+    return text.strip()
+
+
+def extract_mp4_from_html(html):
+    candidates = re.findall(r'https?://[^\s"\'<>]+', html)
+    return [u for u in candidates if _is_video_url(u)]
+
+
+def extract_title_from_html(html):
+    m = re.search(
+        r'<h1[^>]*class="entry-title"[^>]*>(.*?)</h1>', html, re.DOTALL)
+    if m:
+        title = re.sub(r'<[^>]+>', '', m.group(1)).strip()
+        return title
+    m = re.search(r'<title>(.*?)(?:\s*[-–|].*)?</title>', html, re.DOTALL)
+    if m:
+        return m.group(1).strip()
+    return None
+
+
+def load_video_map():
+    if Path(VIDEO_MAP_FILE).exists():
+        try:
+            with open(VIDEO_MAP_FILE, encoding="utf-8") as f:
+                return json.load(f)
+        except (json.JSONDecodeError, OSError):
+            return {}
+    return {}
+
+
+def save_video_map(video_map):
+    fd, tmp_path = tempfile.mkstemp(dir=Path(VIDEO_MAP_FILE).resolve().parent, suffix=".tmp")
+    try:
+        with os.fdopen(fd, "w", encoding="utf-8") as f:
+            json.dump(video_map, f, indent=2, ensure_ascii=False)
+        Path(tmp_path).replace(VIDEO_MAP_FILE)
+    except Exception:
+        try:
+            Path(tmp_path).unlink()
+        except OSError:
+            pass
+        raise
+
+
+
+def _expects_video(url):
+    return "/pinkcuffs-videos/" in url
+
+
+MAX_RETRIES = 2
+
+
+async def worker(worker_id, queue, context, known,
+                 total, retry_counts, video_map, map_lock, shutdown_event):
+    page = await context.new_page()
+    video_hits = set()
+
+    page.on("response", lambda resp: video_hits.add(resp.url) if _is_video_url(resp.url) else None)
+
+    try:
+        while not shutdown_event.is_set():
+            try:
+                idx, url = queue.get_nowait()
+            except asyncio.QueueEmpty:
+                break
+
+            attempt = retry_counts.get(idx, 0)
+            label = f" (retry {attempt}/{MAX_RETRIES})" if attempt else ""
+            print(f"[W{worker_id}] ({idx + 1}/{total}) {url}{label}")
+
+            try:
+                await page.goto(url, wait_until="networkidle", timeout=60000)
+            except Exception as e:
+                print(f"[W{worker_id}] Navigation error: {e}")
+                if _expects_video(url) and attempt < MAX_RETRIES:
+                    retry_counts[idx] = attempt + 1
+                    queue.put_nowait((idx, url))
+                    print(f"[W{worker_id}] Re-queued for retry.")
+                elif not _expects_video(url):
+                    async with map_lock:
+                        entry = video_map.get(url, {})
+                        entry["scraped_at"] = int(time.time())
+                        video_map[url] = entry
+                        save_video_map(video_map)
+                else:
+                    print(
+                        f"[W{worker_id}] Still failing after {MAX_RETRIES} retries — will retry next run.")
+                continue
+
+            await asyncio.sleep(1.5)
+            html = await page.content()
+            title = extract_title_from_html(html)
+            html_videos = extract_mp4_from_html(html)
+            found = set(html_videos) | set(video_hits)
+            video_hits.clear()
+
+            all_videos = [m for m in found if m not in (
+                f"{BASE_URL}/wp-content/plugins/easy-video-player/lib/blank.mp4",
+            )]
+
+            async with map_lock:
+                new_found = found - known
+                if new_found:
+                    print(f"[W{worker_id}] Found {len(new_found)} new video URLs")
+                    known.update(new_found)
+                elif all_videos:
+                    print(
+                        f"[W{worker_id}] {len(all_videos)} video(s) already known — skipping write.")
+                else:
+                    print(f"[W{worker_id}] No video found on page.")
+
+                entry = video_map.get(url, {})
+                if title:
+                    entry["title"] = title
+                existing_videos = set(entry.get("videos", []))
+                existing_videos.update(all_videos)
+                entry["videos"] = sorted(existing_videos)
+                mark_done = bool(all_videos) or not _expects_video(url)
+                if mark_done:
+                    entry["scraped_at"] = int(time.time())
+                video_map[url] = entry
+                save_video_map(video_map)
+
+            if not mark_done:
+                if attempt < MAX_RETRIES:
+                    retry_counts[idx] = attempt + 1
+                    queue.put_nowait((idx, url))
+                    print(
+                        f"[W{worker_id}] Re-queued for retry ({attempt + 1}/{MAX_RETRIES}).")
+                else:
+                    print(
+                        f"[W{worker_id}] No video after {MAX_RETRIES} retries — will retry next run.")
+    finally:
+        await page.close()
+
+
+async def run():
+    shutdown_event = asyncio.Event()
+    loop = asyncio.get_running_loop()
+
+    def _handle_shutdown(signum, _frame):
+        print(f"\n[!] Signal {signum} received — finishing active pages then exiting…")
+        loop.call_soon_threadsafe(shutdown_event.set)
+
+    signal.signal(signal.SIGINT, _handle_shutdown)
+    signal.signal(signal.SIGTERM, _handle_shutdown)
+
+    try:
+        cookie_name, cookie_value = _get_login_cookie()
+        req_headers = {
+            **API_HEADERS,
+            "Cookie": f"{cookie_name}={cookie_value}; eav-age-verified=1",
+        }
+
+        urls = load_post_urls(req_headers)
+
+        video_map = load_video_map()
+        if any(u not in video_map
+               or not video_map[u].get("title")
+               or not video_map[u].get("description")
+               for u in urls if _expects_video(u)):
+            fetch_metadata_from_api(video_map, urls, req_headers)
+
+        known = {u for entry in video_map.values() for u in entry.get("videos", [])}
+
+        total = len(urls)
+        pending = []
+        needs_map = 0
+        for i, u in enumerate(urls):
+            entry = video_map.get(u, {})
+            if not entry.get("scraped_at"):
+                pending.append((i, u))
+            elif _expects_video(u) and not entry.get("videos"):
+                pending.append((i, u))
+                needs_map += 1
+
+        done_count = sum(1 for v in video_map.values() if v.get("scraped_at"))
+        print(f"[+] Loaded {total} post URLs.")
+        print(f"[+] Already have {len(known)} video URLs mapped.")
+        print(f"[+] Video map: {len(video_map)} entries in {VIDEO_MAP_FILE}")
+        if done_count:
+            remaining_new = len(pending) - needs_map
+            print(
+                f"[↻] Resuming: {done_count} done, {remaining_new} new + {needs_map} needing map data.")
+        if not pending:
+            print("[✓] All URLs already processed and mapped.")
+            return
+
+        print(
+            f"[⚡] Running with {min(MAX_WORKERS, len(pending))} concurrent workers.\n")
+
+        queue = asyncio.Queue()
+        for item in pending:
+            queue.put_nowait(item)
+
+        map_lock = asyncio.Lock()
+        retry_counts = {}
+
+        async with async_playwright() as p:
+            browser = await p.firefox.launch(headless=True)
+            context = await browser.new_context()
+
+            _cookie_domain = urlparse(BASE_URL).netloc
+            site_cookies = [
+                {
+                    "name": cookie_name,
+                    "value": cookie_value,
+                    "domain": _cookie_domain,
+                    "path": "/",
+                    "httpOnly": True,
+                    "secure": True,
+                    "sameSite": "None"
+                },
+                {
+                    "name": "eav-age-verified",
+                    "value": "1",
+                    "domain": _cookie_domain,
+                    "path": "/"
+                }
+            ]
+
+            await context.add_cookies(site_cookies)
+
+            num_workers = min(MAX_WORKERS, len(pending))
+            workers = [
+                asyncio.create_task(
+                    worker(i, queue, context, known,
+                           total, retry_counts, video_map, map_lock, shutdown_event)
+                )
+                for i in range(num_workers)
+            ]
+
+            await asyncio.gather(*workers)
+            await browser.close()
+
+        mapped = sum(1 for v in video_map.values() if v.get("videos"))
+        print(
+            f"\n[+] Video map: {mapped} posts with videos, {len(video_map)} total entries.")
+
+        if not shutdown_event.is_set():
+            print(f"[✓] Completed. Full map in {VIDEO_MAP_FILE}")
+        else:
+            done = sum(1 for v in video_map.values() if v.get("scraped_at"))
+            print(f"[⏸] Paused — {done}/{total} done. Run again to resume.")
+    finally:
+        signal.signal(signal.SIGINT, signal.SIG_DFL)
+        signal.signal(signal.SIGTERM, signal.SIG_DFL)
+
+
+def main():
+    try:
+        asyncio.run(run())
+    except KeyboardInterrupt:
+        print("\n[!] Interrupted. Run again to resume.")
+    except RuntimeError as e:
+        raise SystemExit(f"[!] {e}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,4 @@
+playwright==1.58.0
+python-dotenv==1.2.1
+Requests==2.32.5
+rookiepy==0.5.6
@@ -0,0 +1,61 @@
+"""Calculate total disk space needed to download all videos.
+
+Importable function:
+    summarize_sizes(sizes) - return dict with total, smallest, largest, average, failed
+"""
+
+from check_clashes import fmt_size, fetch_sizes, load_video_map, VIDEO_MAP_FILE
+
+
+def summarize_sizes(sizes):
+    """Given {url: size_or_None}, return a stats dict."""
+    known = {u: s for u, s in sizes.items() if s is not None}
+    failed = [u for u, s in sizes.items() if s is None]
+    if not known:
+        return {"sized": 0, "total": len(sizes), "total_bytes": 0,
+                "smallest": 0, "largest": 0, "average": 0, "failed": failed}
+    total_bytes = sum(known.values())
+    return {
+        "sized": len(known),
+        "total": len(sizes),
+        "total_bytes": total_bytes,
+        "smallest": min(known.values()),
+        "largest": max(known.values()),
+        "average": total_bytes // len(known),
+        "failed": failed,
+    }
+
+
+# --------------- CLI ---------------
+
+def _progress(done, total):
+    if done % 200 == 0 or done == total:
+        print(f"    {done}/{total}")
+
+
+def main():
+    vm = load_video_map()
+    urls = [u for entry in vm.values() for u in entry.get("videos", []) if u.startswith("http")]
+
+    print(f"[+] {len(urls)} URLs in {VIDEO_MAP_FILE}")
+    print("[+] Fetching file sizes (20 threads)…\n")
+
+    sizes = fetch_sizes(urls, workers=20, on_progress=_progress)
+    stats = summarize_sizes(sizes)
+
+    print(f"\n{'=' * 45}")
+    print(f"  Sized:    {stats['sized']}/{stats['total']} files")
+    print(f"  Total:    {fmt_size(stats['total_bytes'])}")
+    print(f"  Smallest: {fmt_size(stats['smallest'])}")
+    print(f"  Largest:  {fmt_size(stats['largest'])}")
+    print(f"  Average:  {fmt_size(stats['average'])}")
+    print(f"{'=' * 45}")
+
+    if stats["failed"]:
+        print(f"\n[!] {len(stats['failed'])} URL(s) could not be sized:")
+        for u in stats["failed"]:
+            print(f"    {u}")
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,603 @@
+"""Upload videos to PeerTube with transcoding-aware flow control.
+
+Uploads videos one batch at a time, waits for each batch to be fully transcoded
+and moved to object storage before uploading the next — preventing disk
+exhaustion on the PeerTube server.
+
+Usage:
+    python upload.py                      # upload from ./downloads
+    python upload.py -i /mnt/vol/dl      # custom input dir
+    python upload.py --batch-size 2      # upload 2, wait, repeat
+    python upload.py --dry-run           # preview without uploading
+    python upload.py --skip-wait         # upload without waiting
+
+Required (CLI flag or env var):
+    --url / PEERTUBE_URL
+    --username / PEERTUBE_USER
+    --channel / PEERTUBE_CHANNEL
+    --password / PEERTUBE_PASSWORD
+"""
+
+import argparse
+from collections import Counter
+import html
+import os
+from pathlib import Path
+import re
+import sys
+import time
+
+import requests
+from dotenv import load_dotenv
+
+from check_clashes import fmt_size, url_to_filename, VIDEO_EXTS
+from download import (
+    load_video_map,
+    collect_urls,
+    get_paths_for_mode,
+    read_mode,
+    MODE_ORIGINAL,
+    DEFAULT_OUTPUT,
+)
+
+load_dotenv()
+
+# ── Defaults ─────────────────────────────────────────────────────────
+
+DEFAULT_BATCH_SIZE = 1
+DEFAULT_POLL = 30
+UPLOADED_FILE = ".uploaded"
+PT_NAME_MAX = 120
+
+
+# ── Text helpers ─────────────────────────────────────────────────────
+
+def clean_description(raw):
+    """Strip WordPress shortcodes and HTML from a description."""
+    if not raw:
+        return ""
+    text = re.sub(r'\[/?[^\]]+\]', '', raw)
+    text = re.sub(r'<[^>]+>', '', text)
+    text = html.unescape(text)
+    text = re.sub(r'\n{3,}', '\n\n', text).strip()
+    return text[:10000]
+
+
+def make_pt_name(title, fallback_filename):
+    """Build a PeerTube-safe video name (3-120 chars)."""
+    name = html.unescape(title).strip(
+    ) if title else Path(fallback_filename).stem
+    if len(name) > PT_NAME_MAX:
+        name = name[: PT_NAME_MAX - 1].rstrip() + "\u2026"
+    while len(name) < 3:
+        name += "_"
+    return name
+
+
+# ── PeerTube API ─────────────────────────────────────────────────────
+
+def get_oauth_token(base, username, password):
+    r = requests.get(f"{base}/api/v1/oauth-clients/local", timeout=15)
+    r.raise_for_status()
+    client = r.json()
+
+    r = requests.post(
+        f"{base}/api/v1/users/token",
+        data={
+            "client_id": client["client_id"],
+            "client_secret": client["client_secret"],
+            "grant_type": "password",
+            "username": username,
+            "password": password,
+        },
+        timeout=15,
+    )
+    r.raise_for_status()
+    return r.json()["access_token"]
+
+
+def api_headers(token):
+    return {"Authorization": f"Bearer {token}"}
+
+
+def get_channel_id(base, token, channel_name):
+    r = requests.get(
+        f"{base}/api/v1/video-channels/{channel_name}",
+        headers=api_headers(token),
+        timeout=15,
+    )
+    r.raise_for_status()
+    return r.json()["id"]
+
+
+def get_channel_video_names(base, token, channel_name):
+    """Paginate through the channel and return a Counter of video names."""
+    counts = Counter()
+    start = 0
+    while True:
+        r = requests.get(
+            f"{base}/api/v1/video-channels/{channel_name}/videos",
+            params={"start": start, "count": 100},
+            headers=api_headers(token),
+            timeout=30,
+        )
+        r.raise_for_status()
+        data = r.json()
+        for v in data.get("data", []):
+            counts[v["name"]] += 1
+        start += 100
+        if start >= data.get("total", 0):
+            break
+    return counts
+
+
+CHUNK_SIZE = 10 * 1024 * 1024  # 10 MB
+MAX_RETRIES = 5
+
+
+def _init_resumable(base, token, channel_id, filepath, filename, name,
+                    description="", nsfw=False):
+    """POST to create a resumable upload session.  Returns upload URL."""
+    file_size = Path(filepath).stat().st_size
+    metadata = {
+        "name": name,
+        "channelId": channel_id,
+        "filename": filename,
+        "nsfw": nsfw,
+        "waitTranscoding": True,
+        "privacy": 1,
+    }
+    if description:
+        metadata["description"] = description
+
+    r = requests.post(
+        f"{base}/api/v1/videos/upload-resumable",
+        headers={
+            **api_headers(token),
+            "Content-Type": "application/json",
+            "X-Upload-Content-Length": str(file_size),
+            "X-Upload-Content-Type": "video/mp4",
+        },
+        json=metadata,
+        timeout=30,
+    )
+    r.raise_for_status()
+
+    location = r.headers["Location"]
+    if location.startswith("//"):
+        location = "https:" + location
+    elif location.startswith("/"):
+        location = base + location
+    return location, file_size
+
+
+def _query_offset(upload_url, token, file_size):
+    """Ask the server how many bytes it has received so far."""
+    r = requests.put(
+        upload_url,
+        headers={
+            **api_headers(token),
+            "Content-Range": f"bytes */{file_size}",
+            "Content-Length": "0",
+        },
+        timeout=15,
+    )
+    if r.status_code == 308:
+        range_hdr = r.headers.get("Range", "")
+        if range_hdr:
+            return int(range_hdr.split("-")[1]) + 1
+        return 0
+    if r.status_code == 200:
+        return file_size
+    r.raise_for_status()
+    return 0
+
+
+def upload_video(base, token, channel_id, filepath, name,
+                 description="", nsfw=False):
+    """Resumable chunked upload.  Returns (ok, uuid)."""
+    filepath = Path(filepath)
+    filename = filepath.name
+    file_size = filepath.stat().st_size
+
+    try:
+        upload_url, _ = _init_resumable(
+            base, token, channel_id, filepath, filename,
+            name, description, nsfw,
+        )
+    except Exception as e:
+        print(f"    Init failed: {e}")
+        return False, None
+
+    offset = 0
+    retries = 0
+
+    with open(filepath, "rb") as f:
+        while offset < file_size:
+            end = min(offset + CHUNK_SIZE, file_size) - 1
+            chunk_len = end - offset + 1
+
+            f.seek(offset)
+            chunk = f.read(chunk_len)
+
+            pct = int(100 * (end + 1) / file_size)
+            print(f"    {fmt_size(offset)}/{fmt_size(file_size)}  ({pct}%)",
+                  end="\r", flush=True)
+
+            try:
+                r = requests.put(
+                    upload_url,
+                    headers={
+                        **api_headers(token),
+                        "Content-Type": "application/octet-stream",
+                        "Content-Range": f"bytes {offset}-{end}/{file_size}",
+                        "Content-Length": str(chunk_len),
+                    },
+                    data=chunk,
+                    timeout=120,
+                )
+            except (requests.ConnectionError, requests.Timeout) as e:
+                retries += 1
+                if retries > MAX_RETRIES:
+                    print(
+                        f"\n    Upload failed after {MAX_RETRIES} retries: {e}")
+                    return False, None
+                wait = min(2 ** retries, 60)
+                print(f"\n    Connection error, retry {retries}/{MAX_RETRIES} "
+                      f"in {wait}s ...")
+                time.sleep(wait)
+                try:
+                    offset = _query_offset(upload_url, token, file_size)
+                except Exception:
+                    pass
+                continue
+
+            if r.status_code == 308:
+                range_hdr = r.headers.get("Range", "")
+                if range_hdr:
+                    offset = int(range_hdr.split("-")[1]) + 1
+                else:
+                    offset = end + 1
+                retries = 0
+
+            elif r.status_code == 200:
+                print(
+                    f"    {fmt_size(file_size)}/{fmt_size(file_size)}  (100%)")
+                uuid = r.json().get("video", {}).get("uuid")
+                return True, uuid
+
+            elif r.status_code in (502, 503, 429):
+                retry_after = int(r.headers.get("Retry-After", 10))
+                retries += 1
+                if retries > MAX_RETRIES:
+                    print(
+                        f"\n    Upload failed: server returned {r.status_code}")
+                    return False, None
+                print(
+                    f"\n    Server {r.status_code}, retry in {retry_after}s ...")
+                time.sleep(retry_after)
+                try:
+                    offset = _query_offset(upload_url, token, file_size)
+                except Exception:
+                    pass
+
+            else:
+                detail = r.text[:300] if r.text else str(r.status_code)
+                print(f"\n    Upload failed ({r.status_code}): {detail}")
+                return False, None
+
+    print("\n    Unexpected: sent all bytes but no 200 response")
+    return False, None
+
+
+_STATE = {
+    1: "Published",
+    2: "To transcode",
+    3: "To import",
+    6: "Moving to object storage",
+    7: "Transcoding failed",
+    8: "Storage move failed",
+    9: "To edit",
+}
+
+
+def get_video_state(base, token, uuid):
+    r = requests.get(
+        f"{base}/api/v1/videos/{uuid}",
+        headers=api_headers(token),
+        timeout=15,
+    )
+    r.raise_for_status()
+    state = r.json()["state"]
+    return state["id"], state.get("label", "")
+
+
+def wait_for_published(base, token, uuid, poll_interval):
+    """Block until the video reaches state 1 (Published) or a failure state."""
+    started = time.monotonic()
+    while True:
+        elapsed = int(time.monotonic() - started)
+        hours, rem = divmod(elapsed, 3600)
+        mins, secs = divmod(rem, 60)
+        if hours:
+            elapsed_str = f"{hours}h {mins:02d}m {secs:02d}s"
+        elif mins:
+            elapsed_str = f"{mins}m {secs:02d}s"
+        else:
+            elapsed_str = f"{secs}s"
+
+        try:
+            sid, label = get_video_state(base, token, uuid)
+        except requests.exceptions.RequestException as e:
+            print(f"    -> Poll error ({e.__class__.__name__}) "
+                  f"after {elapsed_str}, retrying in {poll_interval}s …")
+            time.sleep(poll_interval)
+            continue
+
+        display = _STATE.get(sid, label or f"state {sid}")
+
+        if sid == 1:
+            print(f"    -> {display}")
+            return sid
+        if sid in (7, 8):
+            print(f"    -> FAILED: {display}")
+            return sid
+
+        print(f"    -> {display} … {elapsed_str} elapsed (next check in {poll_interval}s)")
+        time.sleep(poll_interval)
+
+
+# ── State tracker ────────────────────────────────────────────────────
+
+def load_uploaded(input_dir):
+    path = Path(input_dir) / UPLOADED_FILE
+    if not path.exists():
+        return set()
+    with open(path) as f:
+        return {Path(line.strip()) for line in f if line.strip()}
+
+
+def mark_uploaded(input_dir, rel_path):
+    with open(Path(input_dir) / UPLOADED_FILE, "a") as f:
+        f.write(f"{rel_path}\n")
+
+
+# ── File / metadata helpers ─────────────────────────────────────────
+
+def build_path_to_meta(video_map, input_dir):
+    """Map each expected download path (relative) to {title, description}."""
+    urls = collect_urls(video_map)
+    mode = read_mode(input_dir) or MODE_ORIGINAL
+    paths = get_paths_for_mode(mode, urls, video_map, input_dir)
+
+    url_meta = {}
+    for entry in video_map.values():
+        t = entry.get("title", "")
+        d = entry.get("description", "")
+        for video_url in entry.get("videos", []):
+            if video_url not in url_meta:
+                url_meta[video_url] = {"title": t, "description": d}
+
+    result = {}
+    for url, abs_path in paths.items():
+        rel = Path(abs_path).relative_to(input_dir)
+        meta = url_meta.get(url, {"title": "", "description": ""})
+        result[rel] = {**meta, "original_filename": url_to_filename(url)}
+    return result
+
+
+def find_videos(input_dir):
+    """Walk input_dir and return a set of relative paths for all video files."""
+    found = set()
+    for root, dirs, files in os.walk(input_dir):
+        dirs[:] = [d for d in dirs if not d.startswith(".")]
+        for f in files:
+            if Path(f).suffix.lower() in VIDEO_EXTS:
+                found.add((Path(root) / f).relative_to(input_dir))
+    return found
+
+
+# ── Channel match helpers ─────────────────────────────────────────────
+
+def _channel_match(rel, path_meta, existing):
+    """Return (matched, name) for a local file against the channel name set.
+
+    Checks both the title-derived name and the original-filename-derived name
+    so that videos uploaded under either form are recognised.  Extracted to
+    avoid duplicating the logic between the pre-reconcile sweep and the per-
+    file check inside the upload loop.
+    """
+    meta = path_meta.get(rel, {})
+    name = make_pt_name(meta.get("title", ""), rel.name)
+    orig_fn = meta.get("original_filename", "")
+    raw_name = make_pt_name("", orig_fn) if orig_fn else None
+    matched = name in existing or (raw_name and raw_name != name and raw_name in existing)
+    return matched, name
+
+
+# ── CLI ──────────────────────────────────────────────────────────────
+
+def main():
+    ap = argparse.ArgumentParser(
+        description="Upload videos to PeerTube with transcoding-aware batching",
+    )
+    ap.add_argument("--input", "-i", default=DEFAULT_OUTPUT,
+                    help=f"Directory with downloaded videos (default: {DEFAULT_OUTPUT})")
+    ap.add_argument("--url",
+                    help="PeerTube instance URL (or set PEERTUBE_URL env var)")
+    ap.add_argument("--username", "-U",
+                    help="PeerTube username (or set PEERTUBE_USER env var)")
+    ap.add_argument("--password", "-p",
+                    help="PeerTube password (or set PEERTUBE_PASSWORD env var)")
+    ap.add_argument("--channel", "-C",
+                    help="Channel to upload to (or set PEERTUBE_CHANNEL env var)")
+    ap.add_argument("--batch-size", "-b", type=int, default=DEFAULT_BATCH_SIZE,
+                    help="Videos to upload before waiting for transcoding (default: 1)")
+    ap.add_argument("--poll-interval", type=int, default=DEFAULT_POLL,
+                    help=f"Seconds between state polls (default: {DEFAULT_POLL})")
+    ap.add_argument("--skip-wait", action="store_true",
+                    help="Upload everything without waiting for transcoding")
+    ap.add_argument("--nsfw", action="store_true",
+                    help="Mark videos as NSFW")
+    ap.add_argument("--dry-run", "-n", action="store_true",
+                    help="Preview what would be uploaded")
+    args = ap.parse_args()
+
+    url      = args.url      or os.environ.get("PEERTUBE_URL")
+    username = args.username or os.environ.get("PEERTUBE_USER")
+    channel  = args.channel  or os.environ.get("PEERTUBE_CHANNEL")
+    password = args.password or os.environ.get("PEERTUBE_PASSWORD")
+
+    if not args.dry_run:
+        missing = [label for label, val in [
+            ("--url / PEERTUBE_URL", url),
+            ("--username / PEERTUBE_USER", username),
+            ("--channel / PEERTUBE_CHANNEL", channel),
+            ("--password / PEERTUBE_PASSWORD", password),
+        ] if not val]
+        if missing:
+            for label in missing:
+                print(f"[!] Required: {label}")
+            sys.exit(1)
+
+    # ── load metadata & scan disk ──
+    video_map = load_video_map()
+    path_meta = build_path_to_meta(video_map, args.input)
+    on_disk = find_videos(args.input)
+
+    unmatched = on_disk - set(path_meta.keys())
+    if unmatched:
+        print(
+            f"[!] {len(unmatched)} file(s) on disk not in video_map (will use filename as title)")
+        for rel in unmatched:
+            path_meta[rel] = {"title": "", "description": ""}
+
+    uploaded = load_uploaded(args.input)
+    pending = sorted(rel for rel in on_disk if rel not in uploaded)
+
+    print(f"[+] {len(on_disk)} video files in {args.input}/")
+    print(f"[+] {len(uploaded)} already uploaded")
+    print(f"[+] {len(pending)} pending")
+    print(f"[+] Batch size: {args.batch_size}")
+
+    if not pending:
+        print("\nAll videos already uploaded.")
+        return
+
+    # ── dry run ──
+    if args.dry_run:
+        total_bytes = 0
+        for rel in pending:
+            meta = path_meta.get(rel, {})
+            name = make_pt_name(meta.get("title", ""), rel.name)
+            sz = (Path(args.input) / rel).stat().st_size
+            total_bytes += sz
+            print(f"  [{fmt_size(sz):>10}]  {name}")
+        print(
+            f"\n  Total: {fmt_size(total_bytes)} across {len(pending)} videos")
+        return
+
+    # ── authenticate ──
+    base = url.rstrip("/")
+    if not base.startswith("http"):
+        base = "https://" + base
+
+    print(f"\n[+] Authenticating with {base} ...")
+    token = get_oauth_token(base, username, password)
+    print(f"[+] Authenticated as {username}")
+
+    channel_id = get_channel_id(base, token, channel)
+    print(f"[+] Channel: {channel} (id {channel_id})")
+
+    name_counts = get_channel_video_names(base, token, channel)
+    existing = set(name_counts)
+    total = sum(name_counts.values())
+    print(f"[+] Found {total} video(s) on channel ({len(name_counts)} unique name(s))")
+
+    dupes = {name: count for name, count in name_counts.items() if count > 1}
+    if dupes:
+        print(f"[!] {len(dupes)} duplicate name(s) detected on channel:")
+        for name, count in sorted(dupes.items()):
+            print(f"    x{count}  {name}")
+
+    # ── pre-reconcile: sweep all pending against channel names ────────
+    # The main upload loop discovers already-uploaded videos lazily as it
+    # walks the sorted pending list — meaning on a fresh run (no .uploaded
+    # file) you won't know how many files are genuinely new until the loop
+    # has processed everything.  Doing a full sweep here, before any
+    # upload starts, gives an accurate count up-front and pre-populates
+    # .uploaded so that interrupted/re-run sessions skip them instantly
+    # without re-checking each time.
+    pre_matched = []
+    for rel in pending:
+        if _channel_match(rel, path_meta, existing)[0]:
+            pre_matched.append(rel)
+    if pre_matched:
+        print(f"\n[+] Pre-sweep: {len(pre_matched)} local file(s) already on channel — marking uploaded")
+        for rel in pre_matched:
+            mark_uploaded(args.input, rel)
+        pending = [rel for rel in pending if rel not in set(pre_matched)]
+        print(f"[+] {len(pending)} left to upload\n")
+
+    nsfw = args.nsfw
+    total_up = 0
+    batch: list[tuple[str, str]] = []   # [(uuid, name), ...]
+
+    try:
+        for rel in pending:
+            # ── flush batch if full ──
+            if not args.skip_wait and len(batch) >= args.batch_size:
+                print(
+                    f"\n[+] Waiting for {len(batch)} video(s) to finish processing ...")
+                for uuid, bname in batch:
+                    print(f"\n  [{bname}]")
+                    wait_for_published(base, token, uuid, args.poll_interval)
+                batch.clear()
+
+            filepath = Path(args.input) / rel
+            meta = path_meta.get(rel, {})
+            name = make_pt_name(meta.get("title", ""), rel.name)
+            desc = clean_description(meta.get("description", ""))
+            sz = filepath.stat().st_size
+
+            if _channel_match(rel, path_meta, existing)[0]:
+                print(f"\n[skip] already on channel: {name}")
+                mark_uploaded(args.input, rel)
+                continue
+
+            print(f"\n[{total_up + 1}/{len(pending)}] {name}")
+            print(f"    File: {rel}  ({fmt_size(sz)})")
+
+            ok, uuid = upload_video(
+                base, token, channel_id, filepath, name, desc, nsfw)
+            if not ok:
+                continue
+
+            print(f"    Uploaded  uuid={uuid}")
+            mark_uploaded(args.input, rel)
+            total_up += 1
+            existing.add(name)
+
+            if uuid:
+                batch.append((uuid, name))
+
+        # ── wait for final batch ──
+        if batch and not args.skip_wait:
+            print(f"\n[+] Waiting for final {len(batch)} video(s) ...")
+            for uuid, bname in batch:
+                print(f"\n  [{bname}]")
+                wait_for_published(base, token, uuid, args.poll_interval)
+
+    except KeyboardInterrupt:
+        print(
+            f"\n\n[!] Interrupted after {total_up} uploads. Re-run to continue.")
+        sys.exit(130)
+
+    print(f"\n{'=' * 50}")
+    print(f"  Uploaded: {total_up} video(s)")
+    print("  Done!")
+    print(f"{'=' * 50}")
+
+
+if __name__ == "__main__":
+    main()