Files
nyaa-crawler/AGENTS.md
jason ded0875e72 feat: initial full-stack nyaa-crawler implementation
- Node.js + TypeScript + Express backend using built-in node:sqlite
- React + Vite frontend with dark-themed UI
- Nyaa.si RSS polling via fast-xml-parser
- Watch list with show/episode CRUD and status tracking
- Auto-download scheduler with node-cron (configurable interval)
- .torrent file downloader with batch-release filtering
- Settings page for poll interval and quality defaults
- Dockerfile and docker-compose for Unraid deployment
- SQLite DB with migrations (shows, episodes, settings tables)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 14:00:09 -05:00

234 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AGENTS.md
## Mission
Build a small, dockerized web service that lets a user:
- Search and select anime releases from Nyaa.si.
- Persist a personal “watch list” of shows and their release patterns.
- Poll Nyaa (via RSS or lightweight scraping / API wrapper) for new episodes.
- Automatically download the next .torrent file for each tracked show into a host-mounted download directory.
- Track which episodes are:
- Automatically downloaded (auto-checked),
- Manually checked as already downloaded by the user.
Target deployment is an Unraid server using a single Docker container with a simple web UI and a lightweight persistence layer (SQLite preferred).[^1]
***
## High-level Architecture
- **Frontend**: Minimal web UI (SPA or server-rendered) for:
- Searching Nyaa.si.
- Adding/removing shows from the watch list.
- Viewing episodes per show with status (pending, downloaded).
- Manually checking episodes as downloaded.
- **Backend**:
- HTTP API for the UI.
- Nyaa integration (RSS and/or search scraping).
- Scheduler/worker to periodically poll Nyaa and enqueue downloads.
- Torrent fetcher that downloads `.torrent` files to a host-mounted directory.
- **Data store**:
- SQLite database stored on a bind-mounted volume for easy backup and migration.
- **Containerization**:
- Single Docker image with app + scheduler.
- Config via environment variables.
- Unraid-friendly: configurable ports, volume mapping for DB and torrents.[^2][^1]
***
## Functional Requirements
### 1. Nyaa Integration
- Use Nyaas RSS endpoints for polling where possible (e.g. `https://nyaa.si/?page=rss` plus query parameters), falling back to HTML scraping or an existing wrapper library if necessary.[^3][^4][^5][^6][^7]
- Support user-driven search:
- Input: search term (e.g. “Jujutsu Kaisen 1080p SubsPlease”).
- Output: recent matching torrents with:
- Title
- Torrent ID
- Category
- Size
- Magnet/torrent link URL if exposed in the feed or page.[^8][^9][^10]
- When a user “adds” an anime:
- Store a normalized pattern to match future episodes (e.g. base title + quality/resolution + sub group).
- Maintain reference to the Nyaa search or RSS query that defines this feed.[^6][^3]
### 2. Watch List \& Episodes
- Entities:
- **Show**: id, display name, search/RSS query, quality filter, fansub group, active flag.
- **Episode**: id, show_id, episode_number (string or parsed integer), nyaa_torrent_id, title, status (`pending`, `downloaded_auto`, `downloaded_manual`), torrent_url, created_at, downloaded_at.
- Behavior:
- Adding a show:
- Run an immediate search.
- Populate existing episodes in DB as `pending` (no download) to let the user backfill by manually checking already downloaded ones.
- Removing a show:
- Leave episodes in DB but mark show as inactive (no further polling).
- Manual check:
- User can mark an episode as already downloaded (`downloaded_manual`), no torrent action taken.
### 3. Auto-Download Logic
- Periodic job (e.g. every 515 minutes, configurable):
- For each active show:
- Query Nyaa using its stored RSS/search parameters.[^4][^3][^6]
- Determine the “next” episode:
- Prefer simplest rule: highest episode number not yet marked downloaded.
- Guard against batch torrents by using size or title pattern heuristics (e.g. skip titles containing “Batch”).
- If the next episodes torrent is not yet in DB:
- Create an Episode record with status `downloaded_auto`.
- Download the `.torrent` file (NOT the media itself) into the mapped host directory.
- Filename suggestion: `<show-slug>-ep<episode>-<torrent-id>.torrent`.
- Do not attempt to control or integrate directly with a torrent client (scope is “download the .torrent file” only).
### 4. Web UI
- Views:
- **Shows list**:
- Add show (form: name, search query, quality, group).
- Toggle active/inactive.
- Quick link to show detail.
- **Show detail**:
- Table of episodes: episode number/title, Nyaa ID, status, timestamps.
- Controls:
- Manually mark individual episodes as downloaded.
- Bulk “mark previous episodes as downloaded” helper (e.g. “mark up to episode N”).
- **Settings**:
- Poll interval.
- Default quality / sub group preferences.
- Torrent download directory (read-only display; actual path comes from environment/volume).
- UX constraints:
- Keep it extremely simple; focus is internal tool.
- Assume a single user instance behind LAN.
***
## Non-Functional Requirements
- **Language/Stack**:
- Prefer Node.js + TypeScript backend with a minimal React or server-rendered frontend to align with existing projects, unless you choose a simpler stack.
- **Security**:
- App is assumed to run behind LAN; basic auth or reverse-proxy auth can be added later.
- Do not expose any admin-only functionality without at least a simple auth hook.
- **Resilience**:
- Polling should be robust to Nyaa timeouts and 4xx/5xx responses (retry with backoff, log errors).
- Do not spam Nyaa with aggressive polling; default interval should be conservative (e.g. 15 minutes, configurable).
- **Observability**:
- Minimal logging for:
- Polling attempts.
- New episodes found.
- Torrent downloads started/completed or failed.
***
## Data Model (Initial)
### Tables
- `shows`
- `id` (PK)
- `name` (string)
- `search_query` (string)
- `quality` (string, nullable)
- `sub_group` (string, nullable)
- `rss_url` (string, nullable)
- `is_active` (boolean, default true)
- `created_at`, `updated_at`
- `episodes`
- `id` (PK)
- `show_id` (FK → shows.id)
- `episode_code` (string, e.g. “S01E03” or “03”)
- `title` (string)
- `torrent_id` (string, Nyaa ID)
- `torrent_url` (string)
- `status` (enum: `pending`, `downloaded_auto`, `downloaded_manual`)
- `downloaded_at` (datetime, nullable)
- `created_at`, `updated_at`
***
## Container \& Unraid Integration
### Environment
- `PORT` HTTP port to listen on (default 3000).
- `POLL_INTERVAL_SECONDS` Polling frequency.
- `TORRENT_OUTPUT_DIR` Inside-container path where `.torrent` files are written.
- `DATABASE_PATH` Inside-container path to SQLite file.
### Volumes
- Map SQLite DB to persistent storage:
- `/data/db.sqlite` → Unraid share: e.g. `/mnt/user/appdata/nyaa-watcher/db.sqlite`.[^1][^2]
- Map torrent output directory to a download share:
- `/data/torrents` → e.g. `/mnt/user/downloads/torrents/nyaa/`.
### Ports
- Expose app port to LAN (bridge mode):
- Container: `3000`, Host: `YOUR_PORT` (e.g. 8082).
### Example docker-compose snippet
```yaml
services:
nyaa-watcher:
image: your-registry/nyaa-watcher:latest
container_name: nyaa-watcher
restart: unless-stopped
environment:
- PORT=3000
- POLL_INTERVAL_SECONDS=900
- TORRENT_OUTPUT_DIR=/data/torrents
- DATABASE_PATH=/data/db.sqlite
volumes:
- /mnt/user/appdata/nyaa-watcher:/data
- /mnt/user/downloads/torrents/nyaa:/data/torrents
ports:
- "8082:3000"
```
This can be translated to an Unraid template or used via docker-compose with a Docker context pointing at the Unraid host.[^11][^2][^1]
***
## Implementation Roadmap
1. **Skeleton app**
- Set up HTTP server, health endpoint, and a static web page.
- Wire SQLite with migrations for `shows` and `episodes`.
2. **Nyaa client**
- Implement RSS-based polling for a hard-coded query.
- Parse feed, extract torrent IDs, titles, and links.[^5][^3][^4][^6]
- Optionally evaluate an existing node `nyaa-si` wrapper as a shortcut.[^7]
3. **Watch list CRUD**
- API endpoints + UI for managing shows.
- Initial search → show add flow.
4. **Episode tracking**
- When adding a show, ingest existing feed items into `episodes` as `pending`.
- Implement manual check/mark endpoints and UI.
5. **Auto-download worker**
- Background job to poll active shows and write `.torrent` files.
- Update episode status to `downloaded_auto`.
6. **Dockerization \& Unraid deployment**
- Dockerfile, volume mappings, environment configuration.
- Test deployment on Unraid, ensure persistence and torrent file visibility.
7. **Polish**
- Basic auth or IP allowlist if desired.
- Guardrails against batch torrent downloads.
- Minimal styling for the UI.
***
## Open Questions for Product Owner
- What poll interval do you consider acceptable by default (e.g. 5, 10, or 15 minutes)?
- Do you want any basic auth in front of the UI out of the box, or will this live behind an existing reverse proxy?