Orin Nano
A local-first, multi-service system on Jetson Orin Nano that streams audio from a web UI to backend services (WebSockets → FastAPI → gRPC), persists session results, and stays portable across hardware.
At a Glance
- Status
- In progress
- Role
- Solo Dev
- Timeline
- Ongoing
- Real-time audio streaming UX
- Local-first pipeline (offline-capable)
- Portable multi-service architecture
- React/TypeScript
- Next.js
- Tailwind
- FastAPI
- Postgres
- Docker
- Linux
- WebSockets
- gRPC
- NVIDIA Jetson Orin Nano (8GB)
- NVMe (256GB)
Problem and Constraints
- Goal: build a reliable local pipeline for streaming and processing multi-modal content on edge hardware, with an architecture that is not tightly coupled to NVIDIA.
- Constraints: Jetson Orin Nano (8GB RAM) resource limits, local-first/offline requirement, multi-service networking complexity (service discovery/ports/health), iteration speed, and debug/observability needs for streaming pipelines.
Success Criteria
- Streaming UX
- User can start/stop streaming and see partial + final results without page reload; stable session states (idle/recording/paused/completed).
- Reliability
- No dropped connections during a 5+ minute session; graceful reconnect and safe recovery on transient failures.
- Latency (target)
- First transcript chunk returned within a defined target window (e.g., 1–3s depending on model/service), with a plan to measure and iterate.
- Portability
- Core services run on Jetson via Docker Compose with minimal configuration changes.
- Observability
- Session-scoped logs/IDs across services plus DB persistence for transcript artifacts and timings to enable replay/debug.
Scope
- mvp:
- Web UI streams audio; FastAPI WebSocket ingestion; gRPC service-to-service streaming to ASR backend; Postgres persistence for session metadata + transcript outputs; Docker Compose dev environment.:
- non-goals:
- Cloud deployment/autoscaling, authentication, production-grade model benchmarking.:
- next:
- Session replay/debug view (transcript + timings), stronger session state machine + reconnect, measurement (time-to-first-token, dropped frames), and a clearer portability story.:
Architecture
Key Decisions
- Browser streams microphone audio via WebSockets to simplify real-time client streaming and feedback loops.
- FastAPI handles WebSocket ingestion and coordinates the streaming pipeline to downstream services.
- gRPC is used for service-to-service streaming to enforce explicit contracts and improve streaming semantics across internal boundaries.
- Docker Compose orchestrates multi-service development for reproducibility and faster iteration across Jetson and laptop environments.
- Postgres persistence stores sessions + intermediate artifacts to make debugging streaming pipelines feasible and to enable replay/analysis.
Risks & Mitigations
Risk: Streaming instability (dropped connections, ordering issues, reconnection complexity).
Mitigation: Implement a session state machine, heartbeat/timeouts, bounded buffering/backpressure, and reconnect logic with graceful recovery paths.
Risk: Resource constraints on Jetson (memory/CPU/GPU contention causing latency spikes or crashes).
Mitigation: Set service resource budgets, reduce payload sizes, isolate heavy workloads, and add measurement to identify bottlenecks early.
Risk: Multi-service networking/config drift (ports, hostnames, env vars differ across machines).
Mitigation: Centralize configuration, standardize env var naming, add health checks, and document a runbook with common failure modes.
Risk: Hard-to-debug failures across boundaries (WebSocket ↔ API ↔ gRPC ↔ model service).
Mitigation: Use session IDs propagated across services, structured logs, DB audit tables, and basic timing metrics (e.g., time-to-first-chunk).
Tradeoffs
Decision: Use WebSockets for browser → API streaming instead of upload-a-file batch processing
Why: Enables real-time feedback and interactive UX for streaming transcription and future sensor/event pipelines.
Cost: Adds complexity around buffering, reconnection, backpressure, and long-lived connection reliability.
Decision: Use gRPC for service → service streaming instead of purely HTTP calls
Why: Defines explicit contracts (proto) and supports streaming semantics more naturally across internal boundaries.
Cost: Requires proto generation/tooling, stricter versioning discipline, and more complex local networking/debugging.
Decision: Adopt a multi-service architecture (Compose) instead of a single monolith
Why: Creates clear boundaries for portability and future expansion (different model backends, hardware targets).
Cost: More operational overhead: service discovery, env coordination, debugging across containers, and slower mental model.
Decision: Local-first execution (Jetson) rather than cloud-first
Why: Supports offline operation, reduces ongoing cost, and keeps data local; aligns with edge robotics use cases.
Cost: Harder model/runtime management and resource tuning; less elastic scaling compared to cloud.
Shipping & Quality
- Deployment
- Docker Compose-based workflows for Jetson; documented run commands and environment configuration.
- Measurement
- Track session IDs, time-to-first-chunk, dropped frames/reconnect events, and basic service health to guide iteration.
- Reliability
- Health checks, graceful fallback behavior, and a runbook for debugging networking/streaming issues.
Impact
- Built a product-shaped vertical slice: browser streaming → API ingestion → gRPC service boundary → persistence, creating a foundation for edge AI workflows.
- Improved ability to debug multi-service streaming behavior by persisting sessions and transcript artifacts for analysis.
Retro and Improvements
- Plan work as vertical slices (MVP → reliability → observability → portability) to reduce scope churn.
- Add instrumentation earlier (session IDs everywhere + timing metrics) to shorten debug loops.
- Write a concise runbook: how to run, common failure modes, and how to validate each boundary.