AI in Sound System Diagnostics: From Measurement Data to Actionable Fixes
TL;DR
SonaVyx's AI diagnostic engine uses a two-tier approach: Tier 1 is a rule-based engine that runs instantly for free, checking frequency flatness (±6 dB tolerance), phase alignment, RT60 against room-type targets, STI thresholds, SPL uniformity, and noise floor. Tier 2 uses Claude Sonnet 4 (claude-sonnet-4-20250514) with structured JSON output to provide deeper analysis, EQ recommendations with specific frequency/gain/Q values, and prioritized fix lists. The system generates a health score from 0-100 across six categories. All measurement data stays on-device — only processed metrics (not raw audio) are sent to the AI when requested.
The Interpretation Gap in Audio Measurement
Professional audio measurement tools produce enormous amounts of data: frequency response curves with thousands of data points, phase traces, coherence plots, impulse responses, RT60 values across octave bands, STI scores, and SPL distributions. The data is precise. The problem is interpretation.
A sound engineer with 20 years of experience looks at a transfer function and instantly sees the 6 dB dip at 250 Hz is a comb filter from a reflection 2.3 meters away. A venue manager with no acoustics background sees a squiggly line. AI bridges this gap.
Tier 1: Rule-Based Analysis (Instant, Free)
SonaVyx's first diagnostic tier runs entirely on-device using deterministic rules. It completes in under 100 ms and costs nothing to operate. The diagnostic engine checks six categories:
1. Frequency Balance
The engine computes deviation from a target curve across four bands: sub-bass (20-80 Hz), low-mid (80-500 Hz), mid-high (500-4000 Hz), and high (4000-20000 Hz). Deviations exceeding ±6 dB are flagged as problems. A buildup above +8 dB in the 200-500 Hz range triggers a specific "muddy sound" diagnosis with a recommendation to check boundary coupling and low-mid accumulation.
2. Phase Alignment
Phase deviation from linear at the crossover frequency between system components indicates alignment issues. The engine checks for polarity inversion (180° offset) and time misalignment (frequency-dependent phase slope). A phase offset exceeding 90° at the crossover triggers a system alignment recommendation.
3. Coverage Uniformity
When multiple measurement positions are available, SPL standard deviation across positions quantifies coverage uniformity. A standard deviation above 4 dB suggests coverage issues — either speaker aiming, delay timing, or architectural interference.
4. Noise Floor
The A-weighted noise floor is compared against NC (Noise Criteria) curves appropriate for the venue type. A worship space with NC-35 ambient noise is acceptable; the same level in a recording studio (target NC-15) is a critical problem. The engine references NC, NR, and PNC curves for the comparison.
5. Reverberation
RT60 values are compared against target ranges for 10 venue types. A lecture hall should be 0.6-0.8 s; a concert hall 1.5-2.2 s. Values outside the target range by more than 30% generate a warning with a link to the treatment calculator.
6. Speech Intelligibility
STI scores below 0.50 (the "Fair" threshold per IEC 60268-16) trigger a warning. Below 0.45 is flagged as critical, especially for emergency announcement systems where regulatory minimums apply.
Health Score Computation
Each category scores 0-100 based on how far measured values deviate from targets. The overall health score is a weighted average:
- Frequency Balance: 25%
- Phase Alignment: 20%
- Coverage: 15%
- Noise Floor: 15%
- Reverberation: 15%
- Intelligibility: 10%
A score above 85 indicates a well-tuned system. Between 60-85 means correctable issues exist. Below 60 indicates significant problems requiring attention.
Tier 2: Claude Sonnet 4 Deep Analysis (Pro)
When the free tier teaser shows interesting findings, Pro users can request a full AI analysis. This sends processed measurement metrics (not raw audio) to Claude Sonnet 4 (claude-sonnet-4-20250514) with a structured prompt.
The AI receives:
- Frequency response data (magnitude and phase)
- Coherence values per frequency band
- RT60 per octave band
- STI score and per-band MTF
- SPL statistics (Leq, Lmax, percentiles)
- Problem detector results (feedback, hum, polarity, comb filtering, clipping, THD+N)
- Equipment scan results (if available)
- Venue type and dimensions (if provided)
Structured JSON Output
The AI returns a structured JSON response containing:
- Summary: 2-3 sentence plain-English assessment
- Problems: Prioritized list with severity (critical/warning/info), affected frequency range, root cause, and specific fix
- EQ Recommendations: Parametric EQ bands with center frequency (Hz), gain (dB), and Q factor
- Category Scores: Refined scores for each of the 6 categories
Prompt caching via cache_control: ephemeral on the system prompt reduces API cost by approximately 90% for repeated analyses, bringing the per-analysis cost to roughly $0.003 for cached requests versus $0.03 for cold requests.
Pattern Recognition vs Rule Engine
The AI tier excels at recognizing patterns the rule engine cannot encode:
- Comb filter signatures: Periodic nulls in the frequency response that indicate a specific reflection delay — the AI calculates the implied distance and suggests physical remediation
- Equipment-specific issues: Recognizing that a frequency response dip at 2.5 kHz combined with a THD spike matches a known crossover problem in a specific speaker model
- Compound problems: When muddy low-mids AND poor intelligibility AND high RT60 all appear together, the AI prioritizes acoustic treatment over EQ because the root cause is reverberation, not system tuning
What AI Cannot Replace
AI diagnostics augment but do not replace engineering judgment. The system cannot:
- Physically reposition speakers or microphones
- Detect issues outside the measurement bandwidth (e.g., structural vibration below 20 Hz)
- Account for artistic intent (a mixing engineer may want more low-end for a specific genre)
- Verify that recommended changes were physically implemented correctly
The before/after comparison tool closes this loop — measure, apply AI recommendations, measure again, and verify improvement with objective metrics.
Privacy: On-Device First
All Tier 1 analysis runs entirely in the browser via Rust WASM. No audio data leaves the device. Tier 2 sends only processed metrics (frequency/magnitude/phase arrays, scalar values) — never raw audio samples. The privacy policy details exactly what data is transmitted.
For users who need fully offline operation, the problem detection suite (7 detectors) and all measurement tools work without any network connection using the WASM engine and optional ONNX edge ML models.
Try It Now
Open this measurement tool in your browser — free, no download required.
Frequently Asked Questions
Last updated: March 19, 2026