Orin Nano

A local-first, multi-service system on Jetson Orin Nano that streams audio from a web UI to backend services (WebSockets → FastAPI → gRPC), persists session results, and stays portable across hardware.

At a Glance

Status: In progress
Role: Solo Dev
Timeline: Ongoing

Primary outcomes

Real-time audio streaming UX
Local-first pipeline (offline-capable)
Portable multi-service architecture

Stack

React/TypeScript
Next.js
Tailwind
FastAPI
Postgres
Docker
Linux
WebSockets
gRPC
NVIDIA Jetson Orin Nano (8GB)
NVMe (256GB)

Problem and Constraints

Goal: build a reliable local pipeline for streaming and processing multi-modal content on edge hardware, with an architecture that is not tightly coupled to NVIDIA.
Constraints: Jetson Orin Nano (8GB RAM) resource limits, local-first/offline requirement, multi-service networking complexity (service discovery/ports/health), iteration speed, and debug/observability needs for streaming pipelines.

Success Criteria

Streaming UX: User can start/stop streaming and see partial + final results without page reload; stable session states (idle/recording/paused/completed).
Reliability: No dropped connections during a 5+ minute session; graceful reconnect and safe recovery on transient failures.
Latency (target): First transcript chunk returned within a defined target window (e.g., 1–3s depending on model/service), with a plan to measure and iterate.
Portability: Core services run on Jetson via Docker Compose with minimal configuration changes.
Observability: Session-scoped logs/IDs across services plus DB persistence for transcript artifacts and timings to enable replay/debug.

Scope

mvp:: Web UI streams audio; FastAPI WebSocket ingestion; gRPC service-to-service streaming to ASR backend; Postgres persistence for session metadata + transcript outputs; Docker Compose dev environment.:
non-goals:: Cloud deployment/autoscaling, authentication, production-grade model benchmarking.:
next:: Session replay/debug view (transcript + timings), stronger session state machine + reconnect, measurement (time-to-first-token, dropped frames), and a clearer portability story.:

Architecture

Key Decisions

Browser streams microphone audio via WebSockets to simplify real-time client streaming and feedback loops.
FastAPI handles WebSocket ingestion and coordinates the streaming pipeline to downstream services.
gRPC is used for service-to-service streaming to enforce explicit contracts and improve streaming semantics across internal boundaries.
Docker Compose orchestrates multi-service development for reproducibility and faster iteration across Jetson and laptop environments.
Postgres persistence stores sessions + intermediate artifacts to make debugging streaming pipelines feasible and to enable replay/analysis.

Risks & Mitigations

Risk: Streaming instability (dropped connections, ordering issues, reconnection complexity).
Mitigation: Implement a session state machine, heartbeat/timeouts, bounded buffering/backpressure, and reconnect logic with graceful recovery paths.
Risk: Resource constraints on Jetson (memory/CPU/GPU contention causing latency spikes or crashes).
Mitigation: Set service resource budgets, reduce payload sizes, isolate heavy workloads, and add measurement to identify bottlenecks early.
Risk: Multi-service networking/config drift (ports, hostnames, env vars differ across machines).
Mitigation: Centralize configuration, standardize env var naming, add health checks, and document a runbook with common failure modes.
Risk: Hard-to-debug failures across boundaries (WebSocket ↔ API ↔ gRPC ↔ model service).
Mitigation: Use session IDs propagated across services, structured logs, DB audit tables, and basic timing metrics (e.g., time-to-first-chunk).

Tradeoffs

Decision: Use WebSockets for browser → API streaming instead of upload-a-file batch processing
Why: Enables real-time feedback and interactive UX for streaming transcription and future sensor/event pipelines.
Cost: Adds complexity around buffering, reconnection, backpressure, and long-lived connection reliability.
Decision: Use gRPC for service → service streaming instead of purely HTTP calls
Why: Defines explicit contracts (proto) and supports streaming semantics more naturally across internal boundaries.
Cost: Requires proto generation/tooling, stricter versioning discipline, and more complex local networking/debugging.
Decision: Adopt a multi-service architecture (Compose) instead of a single monolith
Why: Creates clear boundaries for portability and future expansion (different model backends, hardware targets).
Cost: More operational overhead: service discovery, env coordination, debugging across containers, and slower mental model.
Decision: Local-first execution (Jetson) rather than cloud-first
Why: Supports offline operation, reduces ongoing cost, and keeps data local; aligns with edge robotics use cases.
Cost: Harder model/runtime management and resource tuning; less elastic scaling compared to cloud.

Shipping & Quality

Deployment: Docker Compose-based workflows for Jetson; documented run commands and environment configuration.
Measurement: Track session IDs, time-to-first-chunk, dropped frames/reconnect events, and basic service health to guide iteration.
Reliability: Health checks, graceful fallback behavior, and a runbook for debugging networking/streaming issues.

Impact

Built a product-shaped vertical slice: browser streaming → API ingestion → gRPC service boundary → persistence, creating a foundation for edge AI workflows.
Improved ability to debug multi-service streaming behavior by persisting sessions and transcript artifacts for analysis.

Retro and Improvements

Plan work as vertical slices (MVP → reliability → observability → portability) to reduce scope churn.
Add instrumentation earlier (session IDs everywhere + timing metrics) to shorten debug loops.
Write a concise runbook: how to run, common failure modes, and how to validate each boundary.