Files
nyaa-crawler/AGENTS.md

234 lines
8.7 KiB
Markdown
Raw Normal View History

# AGENTS.md
## Mission
Build a small, dockerized web service that lets a user:
- Search and select anime releases from Nyaa.si.
- Persist a personal “watch list” of shows and their release patterns.
- Poll Nyaa (via RSS or lightweight scraping / API wrapper) for new episodes.
- Automatically download the next .torrent file for each tracked show into a host-mounted download directory.
- Track which episodes are:
- Automatically downloaded (auto-checked),
- Manually checked as already downloaded by the user.
Target deployment is an Unraid server using a single Docker container with a simple web UI and a lightweight persistence layer (SQLite preferred).[^1]
***
## High-level Architecture
- **Frontend**: Minimal web UI (SPA or server-rendered) for:
- Searching Nyaa.si.
- Adding/removing shows from the watch list.
- Viewing episodes per show with status (pending, downloaded).
- Manually checking episodes as downloaded.
- **Backend**:
- HTTP API for the UI.
- Nyaa integration (RSS and/or search scraping).
- Scheduler/worker to periodically poll Nyaa and enqueue downloads.
- Torrent fetcher that downloads `.torrent` files to a host-mounted directory.
- **Data store**:
- SQLite database stored on a bind-mounted volume for easy backup and migration.
- **Containerization**:
- Single Docker image with app + scheduler.
- Config via environment variables.
- Unraid-friendly: configurable ports, volume mapping for DB and torrents.[^2][^1]
***
## Functional Requirements
### 1. Nyaa Integration
- Use Nyaas RSS endpoints for polling where possible (e.g. `https://nyaa.si/?page=rss` plus query parameters), falling back to HTML scraping or an existing wrapper library if necessary.[^3][^4][^5][^6][^7]
- Support user-driven search:
- Input: search term (e.g. “Jujutsu Kaisen 1080p SubsPlease”).
- Output: recent matching torrents with:
- Title
- Torrent ID
- Category
- Size
- Magnet/torrent link URL if exposed in the feed or page.[^8][^9][^10]
- When a user “adds” an anime:
- Store a normalized pattern to match future episodes (e.g. base title + quality/resolution + sub group).
- Maintain reference to the Nyaa search or RSS query that defines this feed.[^6][^3]
### 2. Watch List \& Episodes
- Entities:
- **Show**: id, display name, search/RSS query, quality filter, fansub group, active flag.
- **Episode**: id, show_id, episode_number (string or parsed integer), nyaa_torrent_id, title, status (`pending`, `downloaded_auto`, `downloaded_manual`), torrent_url, created_at, downloaded_at.
- Behavior:
- Adding a show:
- Run an immediate search.
- Populate existing episodes in DB as `pending` (no download) to let the user backfill by manually checking already downloaded ones.
- Removing a show:
- Leave episodes in DB but mark show as inactive (no further polling).
- Manual check:
- User can mark an episode as already downloaded (`downloaded_manual`), no torrent action taken.
### 3. Auto-Download Logic
- Periodic job (e.g. every 515 minutes, configurable):
- For each active show:
- Query Nyaa using its stored RSS/search parameters.[^4][^3][^6]
- Determine the “next” episode:
- Prefer simplest rule: highest episode number not yet marked downloaded.
- Guard against batch torrents by using size or title pattern heuristics (e.g. skip titles containing “Batch”).
- If the next episodes torrent is not yet in DB:
- Create an Episode record with status `downloaded_auto`.
- Download the `.torrent` file (NOT the media itself) into the mapped host directory.
- Filename suggestion: `<show-slug>-ep<episode>-<torrent-id>.torrent`.
- Do not attempt to control or integrate directly with a torrent client (scope is “download the .torrent file” only).
### 4. Web UI
- Views:
- **Shows list**:
- Add show (form: name, search query, quality, group).
- Toggle active/inactive.
- Quick link to show detail.
- **Show detail**:
- Table of episodes: episode number/title, Nyaa ID, status, timestamps.
- Controls:
- Manually mark individual episodes as downloaded.
- Bulk “mark previous episodes as downloaded” helper (e.g. “mark up to episode N”).
- **Settings**:
- Poll interval.
- Default quality / sub group preferences.
- Torrent download directory (read-only display; actual path comes from environment/volume).
- UX constraints:
- Keep it extremely simple; focus is internal tool.
- Assume a single user instance behind LAN.
***
## Non-Functional Requirements
- **Language/Stack**:
- Prefer Node.js + TypeScript backend with a minimal React or server-rendered frontend to align with existing projects, unless you choose a simpler stack.
- **Security**:
- App is assumed to run behind LAN; basic auth or reverse-proxy auth can be added later.
- Do not expose any admin-only functionality without at least a simple auth hook.
- **Resilience**:
- Polling should be robust to Nyaa timeouts and 4xx/5xx responses (retry with backoff, log errors).
- Do not spam Nyaa with aggressive polling; default interval should be conservative (e.g. 15 minutes, configurable).
- **Observability**:
- Minimal logging for:
- Polling attempts.
- New episodes found.
- Torrent downloads started/completed or failed.
***
## Data Model (Initial)
### Tables
- `shows`
- `id` (PK)
- `name` (string)
- `search_query` (string)
- `quality` (string, nullable)
- `sub_group` (string, nullable)
- `rss_url` (string, nullable)
- `is_active` (boolean, default true)
- `created_at`, `updated_at`
- `episodes`
- `id` (PK)
- `show_id` (FK → shows.id)
- `episode_code` (string, e.g. “S01E03” or “03”)
- `title` (string)
- `torrent_id` (string, Nyaa ID)
- `torrent_url` (string)
- `status` (enum: `pending`, `downloaded_auto`, `downloaded_manual`)
- `downloaded_at` (datetime, nullable)
- `created_at`, `updated_at`
***
## Container \& Unraid Integration
### Environment
- `PORT` HTTP port to listen on (default 3000).
- `POLL_INTERVAL_SECONDS` Polling frequency.
- `TORRENT_OUTPUT_DIR` Inside-container path where `.torrent` files are written.
- `DATABASE_PATH` Inside-container path to SQLite file.
### Volumes
- Map SQLite DB to persistent storage:
- `/data/db.sqlite` → Unraid share: e.g. `/mnt/user/appdata/nyaa-watcher/db.sqlite`.[^1][^2]
- Map torrent output directory to a download share:
- `/data/torrents` → e.g. `/mnt/user/downloads/torrents/nyaa/`.
### Ports
- Expose app port to LAN (bridge mode):
- Container: `3000`, Host: `YOUR_PORT` (e.g. 8082).
### Example docker-compose snippet
```yaml
services:
nyaa-watcher:
image: your-registry/nyaa-watcher:latest
container_name: nyaa-watcher
restart: unless-stopped
environment:
- PORT=3000
- POLL_INTERVAL_SECONDS=900
- TORRENT_OUTPUT_DIR=/data/torrents
- DATABASE_PATH=/data/db.sqlite
volumes:
- /mnt/user/appdata/nyaa-watcher:/data
- /mnt/user/downloads/torrents/nyaa:/data/torrents
ports:
- "8082:3000"
```
This can be translated to an Unraid template or used via docker-compose with a Docker context pointing at the Unraid host.[^11][^2][^1]
***
## Implementation Roadmap
1. **Skeleton app**
- Set up HTTP server, health endpoint, and a static web page.
- Wire SQLite with migrations for `shows` and `episodes`.
2. **Nyaa client**
- Implement RSS-based polling for a hard-coded query.
- Parse feed, extract torrent IDs, titles, and links.[^5][^3][^4][^6]
- Optionally evaluate an existing node `nyaa-si` wrapper as a shortcut.[^7]
3. **Watch list CRUD**
- API endpoints + UI for managing shows.
- Initial search → show add flow.
4. **Episode tracking**
- When adding a show, ingest existing feed items into `episodes` as `pending`.
- Implement manual check/mark endpoints and UI.
5. **Auto-download worker**
- Background job to poll active shows and write `.torrent` files.
- Update episode status to `downloaded_auto`.
6. **Dockerization \& Unraid deployment**
- Dockerfile, volume mappings, environment configuration.
- Test deployment on Unraid, ensure persistence and torrent file visibility.
7. **Polish**
- Basic auth or IP allowlist if desired.
- Guardrails against batch torrent downloads.
- Minimal styling for the UI.
***
## Open Questions for Product Owner
- What poll interval do you consider acceptable by default (e.g. 5, 10, or 15 minutes)?
- Do you want any basic auth in front of the UI out of the box, or will this live behind an existing reverse proxy?