Orin Nano

A local-first, multi-service system on Jetson Orin Nano that streams audio from a web UI to backend services (WebSockets → FastAPI → gRPC), persists session results, and stays portable across hardware.

At a Glance

Status
In progress
Role
Solo Dev
Timeline
Ongoing
Primary outcomes
  • Real-time audio streaming UX
  • Local-first pipeline (offline-capable)
  • Portable multi-service architecture
Stack
  • React/TypeScript
  • Next.js
  • Tailwind
  • FastAPI
  • Postgres
  • Docker
  • Linux
  • WebSockets
  • gRPC
  • NVIDIA Jetson Orin Nano (8GB)
  • NVMe (256GB)

Problem and Constraints

  • Goal: build a reliable local pipeline for streaming and processing multi-modal content on edge hardware, with an architecture that is not tightly coupled to NVIDIA.
  • Constraints: Jetson Orin Nano (8GB RAM) resource limits, local-first/offline requirement, multi-service networking complexity (service discovery/ports/health), iteration speed, and debug/observability needs for streaming pipelines.

Success Criteria

Streaming UX
User can start/stop streaming and see partial + final results without page reload; stable session states (idle/recording/paused/completed).
Reliability
No dropped connections during a 5+ minute session; graceful reconnect and safe recovery on transient failures.
Latency (target)
First transcript chunk returned within a defined target window (e.g., 1–3s depending on model/service), with a plan to measure and iterate.
Portability
Core services run on Jetson via Docker Compose with minimal configuration changes.
Observability
Session-scoped logs/IDs across services plus DB persistence for transcript artifacts and timings to enable replay/debug.

Scope

mvp:
Web UI streams audio; FastAPI WebSocket ingestion; gRPC service-to-service streaming to ASR backend; Postgres persistence for session metadata + transcript outputs; Docker Compose dev environment.:
non-goals:
Cloud deployment/autoscaling, authentication, production-grade model benchmarking.:
next:
Session replay/debug view (transcript + timings), stronger session state machine + reconnect, measurement (time-to-first-token, dropped frames), and a clearer portability story.:

Architecture

Key Decisions

  • Browser streams microphone audio via WebSockets to simplify real-time client streaming and feedback loops.
  • FastAPI handles WebSocket ingestion and coordinates the streaming pipeline to downstream services.
  • gRPC is used for service-to-service streaming to enforce explicit contracts and improve streaming semantics across internal boundaries.
  • Docker Compose orchestrates multi-service development for reproducibility and faster iteration across Jetson and laptop environments.
  • Postgres persistence stores sessions + intermediate artifacts to make debugging streaming pipelines feasible and to enable replay/analysis.

Risks & Mitigations

  • Risk: Streaming instability (dropped connections, ordering issues, reconnection complexity).

    Mitigation: Implement a session state machine, heartbeat/timeouts, bounded buffering/backpressure, and reconnect logic with graceful recovery paths.

  • Risk: Resource constraints on Jetson (memory/CPU/GPU contention causing latency spikes or crashes).

    Mitigation: Set service resource budgets, reduce payload sizes, isolate heavy workloads, and add measurement to identify bottlenecks early.

  • Risk: Multi-service networking/config drift (ports, hostnames, env vars differ across machines).

    Mitigation: Centralize configuration, standardize env var naming, add health checks, and document a runbook with common failure modes.

  • Risk: Hard-to-debug failures across boundaries (WebSocket ↔ API ↔ gRPC ↔ model service).

    Mitigation: Use session IDs propagated across services, structured logs, DB audit tables, and basic timing metrics (e.g., time-to-first-chunk).

Tradeoffs

  • Decision: Use WebSockets for browser → API streaming instead of upload-a-file batch processing

    Why: Enables real-time feedback and interactive UX for streaming transcription and future sensor/event pipelines.

    Cost: Adds complexity around buffering, reconnection, backpressure, and long-lived connection reliability.

  • Decision: Use gRPC for service → service streaming instead of purely HTTP calls

    Why: Defines explicit contracts (proto) and supports streaming semantics more naturally across internal boundaries.

    Cost: Requires proto generation/tooling, stricter versioning discipline, and more complex local networking/debugging.

  • Decision: Adopt a multi-service architecture (Compose) instead of a single monolith

    Why: Creates clear boundaries for portability and future expansion (different model backends, hardware targets).

    Cost: More operational overhead: service discovery, env coordination, debugging across containers, and slower mental model.

  • Decision: Local-first execution (Jetson) rather than cloud-first

    Why: Supports offline operation, reduces ongoing cost, and keeps data local; aligns with edge robotics use cases.

    Cost: Harder model/runtime management and resource tuning; less elastic scaling compared to cloud.

Shipping & Quality

Deployment
Docker Compose-based workflows for Jetson; documented run commands and environment configuration.
Measurement
Track session IDs, time-to-first-chunk, dropped frames/reconnect events, and basic service health to guide iteration.
Reliability
Health checks, graceful fallback behavior, and a runbook for debugging networking/streaming issues.

Impact

  • Built a product-shaped vertical slice: browser streaming → API ingestion → gRPC service boundary → persistence, creating a foundation for edge AI workflows.
  • Improved ability to debug multi-service streaming behavior by persisting sessions and transcript artifacts for analysis.

Retro and Improvements

  • Plan work as vertical slices (MVP → reliability → observability → portability) to reduce scope churn.
  • Add instrumentation earlier (session IDs everywhere + timing metrics) to shorten debug loops.
  • Write a concise runbook: how to run, common failure modes, and how to validate each boundary.