IEC 60268-16: The Standard Behind STI
TL;DR
IEC 60268-16 Edition 5 defines the Speech Transmission Index — the objective measure of speech intelligibility through any transmission channel. The full STI computation uses 7 octave bands (125-8000 Hz) × 14 modulation frequencies (0.63-12.5 Hz) = 98 modulation transfer function values. Each MTF value is converted to an apparent signal-to-noise ratio, band-averaged with male speech weighting factors (α<sub>k</sub> from Table 4), corrected for inter-band redundancy (β from Table 5), and summed to produce a single 0-1 index. STIPA (Table F.1) reduces measurement to 14 MTF values using a specially designed test signal. STI can also be computed from the impulse response per clause A.4, enabling room-only intelligibility prediction without the PA system.
Standard Structure
IEC 60268-16 Ed.5 (2020) is the definitive international standard for objective speech intelligibility measurement. It applies to any transmission channel: room acoustics, PA systems, telephone networks, hearing aids, voice alarm systems, and intercoms.
Key sections:
- Clause 4: MTF definition and computation
- Clause 5: STI computation from MTF values
- Clause 6: Measurement procedures (full and simplified)
- Annex A: STI from impulse response (computational method)
- Annex D: STI from SNR and RT60 (estimation method)
- Annex F: STIPA test signal specification (Table F.1)
The Modulation Transfer Function
Speech intelligibility depends on temporal modulations — the amplitude variations that carry phonemic information. These modulations occur at rates between 0.63 and 12.5 Hz (corresponding to syllable rates and consonant/vowel transitions).
The MTF m(F, fmod) measures how well the transmission channel preserves these modulations at each octave band center frequency F and modulation frequency fmod:
- m = 1.0: Modulation perfectly preserved
- m = 0.5: Modulation reduced to half its original depth
- m = 0.0: Modulation completely destroyed
Two mechanisms reduce MTF: reverberation (smears modulations in time) and noise (fills in modulation troughs). The combined effect per clause 4.3:
m(F, fmod) = 1 / (1 + (2πfmodT/13.8)) × 1 / (1 + 10-(S-N)/10)
Where T is the reverberation time and (S-N) is the signal-to-noise ratio at band F.
From MTF to STI (Clause 5)
Step 1: Apparent SNR
Each MTF value is converted to an apparent signal-to-noise ratio: SNRapp(k,j) = 10 log10(mkj / (1 - mkj)). This is clamped to the range -15 to +15 dB.
Step 2: Band Average
For each octave band k, the 14 apparent SNR values (or 2 for STIPA) are arithmetically averaged to produce MTIk (Modulation Transmission Index per band).
Step 3: Redundancy Correction (Table 5)
Adjacent octave bands carry partially redundant information (the correlation between 500 Hz and 1000 Hz content is higher than between 125 Hz and 8000 Hz). The correction: STI = Σ αk × MTIk - Σ βj,j+1 × √(MTIj × MTIj+1)
Step 4: Male Speech Weighting (Table 4)
The weighting factors αk reflect the relative importance of each band for male speech intelligibility:
| Band (Hz) | 125 | 250 | 500 | 1000 | 2000 | 4000 | 8000 |
|---|---|---|---|---|---|---|---|
| α (male) | 0.085 | 0.127 | 0.230 | 0.233 | 0.309 | 0.224 | 0.173 |
The 2000 Hz band has the highest weight — this is the region containing the most speech-discriminating information (sibilants, plosives). Female speech weighting differs slightly (higher weight at 4000-8000 Hz).
STIPA: The Practical Method (Annex F)
Full STI measurement with 98 MTF values requires either an impulse response or an impractically long test signal. STIPA reduces this to 14 values — 2 per octave band — using a specially designed multi-band modulated test signal (Table F.1).
The STIPA test signal must be played for at least 15 seconds to ensure statistically reliable modulation depth extraction. SonaVyx's STI tool generates this signal in Rust WASM at 48 kHz and extracts the MTF using the full IEC 60268-16 computation chain.
STI from Impulse Response (Clause A.4)
When an impulse response is available, STI can be computed without the STIPA test signal:
m(F, fmod) = |∫0∞ hF²(t) exp(-j2πfmodt) dt| / ∫0∞ hF²(t) dt
Where hF(t) is the impulse response band-pass filtered to octave band F. This method yields the room's contribution to intelligibility without including PA system non-linearity or self-noise. It is useful for predicting STI from room acoustic measurements before the PA system is installed.
STI from RT60 and SNR (Annex D)
The simplest estimation method uses only the reverberation time and signal-to-noise ratio per band. This is an approximation — it assumes exponential decay and diffuse field conditions. SonaVyx implements this as a quick-check tool on the STI page: enter RT60 and ambient noise levels, and get an estimated STI without any acoustic measurement.
Quality Ratings
| STI | CIS | Rating | Example Context |
|---|---|---|---|
| 0.00-0.30 | 0.00-0.29 | Bad | Unintelligible — emergency PA failure |
| 0.30-0.45 | 0.29-0.54 | Poor | Very reverberant space, high noise |
| 0.45-0.60 | 0.54-0.73 | Fair | Minimum acceptable for announcement |
| 0.60-0.75 | 0.73-0.88 | Good | Classroom, conference room target |
| 0.75-1.00 | 0.88-1.00 | Excellent | Studio, courtroom, critical listening |
CIS (Common Intelligibility Scale) is a linear transform of STI designed to better match subjective perception. Both are reported by SonaVyx.
Regulatory Requirements
Many building codes and fire safety regulations mandate minimum STI for voice alarm systems:
- BS 5839-8 (UK): STI ≥ 0.50 for Category L systems
- EN 54-16 (EU): References IEC 60268-16 for VA system verification
- NFPA 72 (USA): Does not mandate STI directly but references intelligibility requirements that correspond to approximately STI ≥ 0.50
- AS 1670.4 (Australia): STI ≥ 0.50 for emergency warning systems
The room analysis workflow includes STI as part of the comprehensive assessment, and the AI diagnostic flags intelligibility issues with specific remediation paths.
Try It Now
Open this measurement tool in your browser — free, no download required.
Frequently Asked Questions
Last updated: March 19, 2026