1.8 KiB
1.8 KiB
JARVIS Windows — Project Roadmap
Phase 1 — Core Stack ✅ (Week 1)
- FastAPI server with WebSocket endpoint
- Whisper STT (base model, local)
- Fish Audio TTS integration
- Claude API connection
- WebSocket manager (connect/disconnect/broadcast)
- Basic end-to-end voice loop (speak → transcribe → Claude → respond)
Phase 2 — Google Workspace Integrations (Weeks 2–3)
2a — Auth + Core Services
- Google OAuth 2.0 setup (credentials.json → token.json)
- Google Calendar: read today's events, upcoming schedule
- Gmail: unread messages, search by sender/subject
- Google Tasks: list pending, create, complete
2b — Notes + Screen
- Google Keep via gkeepapi (or Markdown fallback)
- Screen vision: mss screenshot + win32gui active window
- Window list enumeration
2c — System Capabilities
- Terminal control via subprocess (PowerShell + CMD)
- File system: create projects, manage folders
- Git status/operations via subprocess
- Chrome automation via Selenium
Phase 3 — Claude Tool Use (Week 4)
- Define tools manifest (all capabilities as function specs)
- Tool dispatch router (route Claude tool_use to capability modules)
- Multi-turn conversation with tool results
- Error handling + tool fallbacks
Phase 4 — Dashboard + Packaging (Week 5)
- Wire WebSocket to Three.js dashboard frontend
- Live transcript display
- State: idle / listening / processing / speaking
- Windows startup registration (Task Scheduler)
- .env validation on startup
Phase 5 — Optional Enhancements
- Docker containerization (for Unraid or headless server)
- Wake word detection ("Hey JARVIS")
- Conversation history / memory across sessions
- Google Drive file access
- React/Electron desktop wrapper