The deepSTRF metrics paradigm
This note documents how deepSTRF computes losses and performance metrics on
predictions produced by encoding models, and what invariants any metric
function must respect. It is the third leg of the data/model/metrics
contract, alongside data_paradigm.md and
model_paradigm.md, and is the single reference for
the public functional API in deepSTRF.metrics.
1. Scope: a functional, NaN-aware, single-axis API
deepSTRF.metrics ships pure functions that map predictions and ground
truth onto either a per-neuron tensor (reduction='none') or a single
scalar (reduction='mean'|'sum'). Two flavours:
Flavour |
Examples |
Gradient? |
NaN handling |
|---|---|---|---|
Losses |
|
yes |
Internal: NaN-positions are dropped before reducing |
Metrics |
|
no — |
Internal, same as losses |
Out of scope for v1 (deferred to follow-ups, listed at the end):
torchmetrics-style stateful module API, feature_decorrelation,
pop_corrcoef. Anything population-aggregated. Cross-batch accumulators.
2. Input shape conventions
The metrics API matches the data and model contracts exactly. There is no “reshape your tensors before calling a metric” step.
Tensor |
Shape |
Source |
|---|---|---|
|
|
model |
|
|
dataloader output, NaN-padded |
|
|
typically |
|
|
dataloader output, derived from |
The R-axis is the only friction point. Most metrics need a PSTH (one
ground-truth time series per neuron); the repeat-aware metrics
(signal_power, noise_power, snr, normalized_corrcoef) need the raw
repeats. The two functional signatures handle this without a class
hierarchy:
# prediction-vs-PSTH metrics: gt may be the PSTH (B, N, 1, T) OR raw responses (B, N, R, T).
# When R > 1 the metric collapses to PSTH internally via nanmean(dim=2, keepdim=True).
corrcoef(pred, gt, mask=None, reduction='mean') # (B, N, 1, T) × (B, N, 1|R, T)
fve(pred, gt, mask=None, reduction='mean') # (B, N, 1, T) × (B, N, 1|R, T)
mse_loss(pred, gt, mask=None, reduction='mean') # (B, N, 1, T) × (B, N, 1|R, T)
poisson_loss(pred, gt, mask=None, reduction='mean') # (B, N, 1, T) × (B, N, 1|R, T)
# repeat-aware metrics: ground truth keeps its R dimension.
signal_power(responses, mask=None, reduction='mean') # (B, N, R, T)
noise_power(responses, mask=None, reduction='mean') # (B, N, R, T)
snr(responses, mask=None, reduction='mean') # (B, N, R, T)
normalized_corrcoef(pred, responses, method='schoppe', # (B, N, 1, T) × (B, N, R, T)
mask=None, reduction='mean')
# eval-only frequency-domain metric: needs a regular grid, NaN-free, PSTH-only
coherence(pred, gt_psth, dt_ms, reduction='mean') # (B, N, 1, T) × (B, N, 1, T)
The R = 1 singleton on pred mirrors what the model emits. We do not
silently squeeze it — that would break a future probabilistic model that
populates the axis with multiple samples.
Auto-PSTH collapse on the gt argument
The four prediction-vs-PSTH metrics (mse_loss, poisson_loss,
corrcoef, fve) accept gt in either canonical shape:
(B, N, 1, T)— pre-computed PSTH or a deliberately chosen single-trial target — used as-is.(B, N, R, T)withR > 1— raw responses — collapsed to PSTH internally viananmean(dim=2, keepdim=True)before the formula is applied.
This means the canonical training step is
loss = mse_loss(pred, responses) # one line, auto-PSTH
with no caller-side responses.nanmean(...) boilerplate. NaN-padded
missing-trial slabs flow through the nanmean cleanly: a position where
every repeat is NaN comes out NaN at the collapsed PSTH and is dropped
by the metric’s NaN-derived mask (§4). When gt is already
(B, N, 1, T) the collapse is a no-op, so passing a custom target
(e.g. for a non-PSTH fitting objective) keeps working unchanged.
The repeat-aware metrics (signal_power, noise_power, snr,
normalized_corrcoef) and coherence do not auto-collapse: the
first three need the per-trial structure for their formulas, and
coherence is documented as PSTH-only and NaN-intolerant (§6.8).
3. The reduction over time happens inside
Every metric computes one scalar per neuron by reducing across the entire batch and time axis, in a way that depends on the metric’s mathematical definition. The user does not pick this. Specifically:
corrcoef,fve,normalized_corrcoef,coherence,signal_power,noise_power,snrall flatten(B, T)(andR, when present) into one long pseudo-time series per neuron, and compute their formula on that. Boolean indexing withvalid_maskmakes this safe even when stim durations vary or some neurons are uncorded for some stims.mse_lossandpoisson_losscompute the per-element residual, then applyreductionover the valid positions only.
This matches Schoppe (2016)’s original derivation, which assumes one concatenated long signal per neuron. Per-stim CC values that are then averaged across stims (the “Pennington recipe”, §6.4) are a separate quantity; we do not ship it in v1, and document the difference if a future follow-up adds it.
The takeaway: the time axis is collapsed by the metric’s formula, not by
reduction. What reduction controls is the neuron axis only.
4. NaN handling
The data paradigm uses NaN as the universal sentinel for missingness
(data_paradigm.md §3–4). All metrics in deepSTRF.metrics are
NaN-aware by default and require no preprocessing:
loss = mse_loss(pred, gt_psth) # picks up NaNs from gt_psth automatically
cc = corrcoef(pred, gt_psth) # same
sp = signal_power(responses) # NaN-padded repeats handled internally
Internally, each metric derives valid = ~gt.isnan() (or the analogous
mask on responses) and uses boolean indexing per-neuron — i.e. for
neuron n:
pred_n = pred[..., n, :, :][valid[..., n, :, :]] # 1-D
gt_n = gt [..., n, :, :][valid[..., n, :, :]] # 1-D
result[n] = formula(pred_n, gt_n)
This is the same recipe data_paradigm.md §6 recommends for the training
loop. Per-neuron flattening keeps every metric robust to:
Uncorded
(stim, neuron)pairs (full-NaN slabs inresponses).Variable
T_sacross stims (right-NaN-padded byneural_collate).Variable
R_{s, n}across neurons within a stim (right-NaN-padded).
mask= override
For cases where the canonical “valid iff not NaN” rule is not what the
user wants — for example, masking out the first 50 ms of every stim to
exclude onset transients — every metric accepts an optional mask kwarg:
mask = ~gt_psth.isnan() & (time_idx >= onset_offset)
cc = corrcoef(pred, gt_psth, mask=mask)
mask must be a bool tensor broadcastable to the ground-truth tensor’s
shape. If mask=None (the default), the metric falls back to
~gt.isnan(). If mask is provided, it replaces (not augments) the
NaN-derived mask: the user is now responsible for keeping the
NaN-positions out — typically by &ing them in as shown above. We
document this explicitly because the alternative (“mask is always
intersected with ~isnan”) is harder to reason about when the user
deliberately wants to mark NaN positions as valid (impossible — they would
contaminate the result; we do not protect against this).
Loud failure on misuse
If the user passes a mask that marks a NaN position as valid, the
metric’s per-element computation is performed at that position
((pred − NaN)**2 = NaN, gt · log(pred + ε) with gt = NaN = NaN,
etc.). NaN propagates through the per-neuron sum into the per-neuron
output. The result is then a visibly NaN per-neuron value — not a
silently-dropped or zero-contaminated number. This matches the
loud-failure ethos of the data paradigm (data_paradigm.md §4): users
spot misuse via NaN in their reported metric, not via subtle drift.
Implementation note: this is achieved by masked_fill(~valid, 0.0) → sum / count rather than nanmean, which would silently drop NaN.
Boolean indexing vs multiplicative masking
Two natural ways to compute a masked reduction:
# (a) Boolean indexing — flat 1-D vector of valid positions
ssr = ((pred[valid] - gt[valid]) ** 2).mean()
# (b) Multiplicative masking — preserves axis structure
per_element = (pred - gt.nan_to_num(0.0)) ** 2 * valid.float()
per_axis = per_element.sum(dim=AXIS) / valid.float().sum(dim=AXIS)
The shipped metrics use boolean indexing internally for the per-neuron formula, because:
The flat 1-D form keeps
cov/var/ Pearson correlation textbook-clean (no per-row.nansum()denominators).It is numerically robust against
0 · NaN = NaNtraps that multiplicative masking can hit when NaN has leaked intopred.
Multiplicative masking is the right choice when you need to preserve the
(B, N, T) axis structure for further aggregation (e.g. per-neuron
losses with custom batch reduction outside the metric API). In that
case, work directly with the dataloader’s valid_mask rather than
calling a deepSTRF metric — the metrics are designed to terminate at
per-neuron scalars.
5. Reduction semantics (PyTorch convention)
Every public metric has a reduction keyword with three values, matching
torch.nn.functional.mse_loss and friends:
|
Returned shape |
Meaning |
|---|---|---|
|
|
one value per neuron |
|
|
mean across neurons |
|
|
sum across neurons |
Reduction is over the neuron axis only. Reduction over time is decided
by the metric’s mathematical formula and is never controllable via
reduction (§3 above).
Edge case: when one or more neurons have zero valid positions under
the chosen mask (e.g. no val data for a particular cell), the per-neuron
result is NaN. Under 'mean' and 'sum' we use torch.nanmean /
torch.nansum so the scalar reduction stays well-defined. Under
'none', the NaN is preserved — the caller can see which cells were
dropped.
6. Per-metric definitions
Each metric is documented with its formula, shape contract, edge cases,
and reference. Where the existing implementation has a known bug or
band-aid (abs(ttrc), capitalized 'None', etc.), the rewrite either
fixes or removes the band-aid; this section is the authoritative spec.
6.1 mse_loss(pred, gt, mask=None, reduction='mean')
Boolean-masked mean squared error.
For each neuron, take the per-element squared residual, drop positions
where mask is False, and reduce.
mse_n = mean_{(b,t) ∈ valid_n} (pred[b, n, 0, t] - gt[b, n, 0, t])²
reduction='none': returns(N,).reduction='mean': scalar,nanmean(mse_n, dim=0).reduction='sum': scalar,nansum(mse_n, dim=0).
Differentiable. No @torch.no_grad.
6.2 poisson_loss(pred, gt, mask=None, reduction='mean', log_input=False, validate_input=False, eps=1e-8)
Negative Poisson log-likelihood, dropping the log(gt!) term. Two
parameterisations of the prediction are supported, matching the
canonical-link logic of generalised linear models:
log_input=False (default): pred is the rate λ
poisson_n = mean_{(b,t) ∈ valid_n} pred[b, n, 0, t] - gt[b, n, 0, t] · log(pred[b, n, 0, t] + ε)
Requires pred ≥ 0 for the log to be meaningful. The implementation
silently clamps pred to ≥ ε inside the log to prevent NaN from
propagating when the model temporarily emits a small negative
prediction. The linear pred term outside the log keeps its sign,
so the gradient still pushes negative predictions toward positive
values. With validate_input=True, the loss instead raises
ValueError on the first negative prediction at a masked-in position
— useful for debugging an output-activation bug, but adds a per-step
CPU sync, so it is opt-in.
log_input=True: pred is the log-rate η = log(λ)
poisson_n = mean_{(b,t) ∈ valid_n} exp(pred[b, n, 0, t]) - gt[b, n, 0, t] · pred[b, n, 0, t]
Derivation: starting from the Poisson NLL λ − gt · log(λ) and
substituting λ = exp(η), the loss becomes exp(η) − gt · η, well-defined
for any real-valued η. This is the canonical link for the Poisson
distribution in GLM language — same role that the logit link plays for
Bernoulli. Practically: pair log_input=True with any readout
(Linear, ParametricSigmoid, etc.) and read the model output as
log-rate. To recover rate for visualisation or downstream metrics
expecting rate (e.g. corrcoef against a count-valued PSTH),
exp(pred) first.
Stirling term
The full Poisson NLL has a gt · log(gt) − gt + 0.5·log(2π·gt) term
(Stirling’s approximation to log(gt!)). It is not added by
default because:
It is constant in
pred, so it does not affect optimisation.It is meaningless when
gtis non-integer (e.g. trial-averaged PSTH binned counts — the typical deepSTRF target).log(gt!)is defined only on non-negative integers.
Users who want the full likelihood for AIC/BIC model comparison on integer count targets can add the term outside the loss.
Pairing with output activations
Output activation |
Recommended |
Why |
|---|---|---|
|
|
Output already |
|
|
Same; default zoo activation (NS1, NAT4, …) |
|
|
Same |
|
|
Same |
|
|
Output unbounded; treat as log-rate |
|
|
Output already |
Any with |
|
Output may go negative |
The default log_input=False matches PyTorch’s
torch.nn.PoissonNLLLoss(log_input=False) and the historical deepSTRF
behaviour.
6.3 corrcoef(pred, gt, mask=None, reduction='mean')
Pearson correlation coefficient between prediction and PSTH, computed per-neuron over all valid positions:
ρ_n = corr(pred_n, gt_n)
where pred_n, gt_n are the 1-D vectors of valid (b, t) positions for
neuron n (§4). Uses torch.corrcoef, denominator regularized with a
small floor to avoid 0/0 on constant predictions.
mean/sumreduce overN.A neuron with zero variance in either
pred_norgt_nreturns NaN for that index — propagation rules from §5 apply.
6.4 normalized_corrcoef(pred, responses, method='schoppe', mask=None, reduction='mean')
Noise-corrected correlation coefficient. Two formulations are shipped; they target the same quantity with different bias profiles.
The shape contract differs from corrcoef: ground truth is the raw
repeat-bearing tensor (B, N, R, T), not the PSTH. The PSTH is computed
internally as responses.nanmean(dim=2, keepdim=True), but the per-trial
information is needed for the noise correction.
method='schoppe'
Schoppe et al. 2016, “Measuring the Performance of Neural Models”, Frontiers in Neuroscience.
CCnorm_n = cov(pred_n, psth_n) / sqrt( var(pred_n) · SP_n )
where pred_n, psth_n are the 1-D vectors of valid (b, t) positions
for neuron n (concatenated across stims via boolean indexing, §3),
and SP_n is the Sahani–Linden signal power for neuron n (§6.5,
length-weighted across stims). The aggregation convention for the
denominator is chosen to match the natural sample-count weighting that
the numerator’s concatenated cov and var perform automatically — see
§11. Replaces the noise-inflated var(psth) in the denominator of the
raw Pearson correlation with the unbiased signal-only variance estimator.
Cells with SP_n ≤ 0 (the noise-floor regime where the unbiased
estimator fails) return NaN.
method='hsu'
Hsu, Borst & Theunissen 2004, with the CCmax formulation popularized
by Schoppe et al. 2016.
CCnorm_n = corr(pred_n, psth_n) / CCmax_n
CCmax_{b,n} = sqrt( 2 · ρ_half_{b,n} / (1 + ρ_half_{b,n}) ) (Spearman–Brown, per stim)
ρ_half_{b,n} = E_{i,j}[ corr( psth_half_{i,b,n}, psth_half_{j,b,n} ) ] (over disjoint half-trial splits)
CCmax_n = sum_b ( T_b · CCmax_{b,n} ) / sum_b T_b (length-weighted, §11)
Computed per-stim by sampling up to ccmax_iters=126 disjoint
half-trial pairs, then length-weighted across stims with T_b as the
weight. Same robustness convention as signal_power (§6.5). Per-stim
cells with ρ_half ≤ 0 (worse-than-chance ceiling — too noisy to
estimate) are dropped from the weighted average rather than
contaminating it.
Single-trial degenerate case
When R = 1 (or every cell of responses[..., n, :, :] has only one
valid repeat after masking), the noise-correction factor is undefined:
both SP_n and CCmax_n would require ≥ 2 trials. In that case
normalized_corrcoef returns the raw corrcoef(pred, psth) for that
neuron. This is a worst-case assumption: with no information about
trial-to-trial variability, we cannot know how much of the prediction
gap is noise vs model error, so we report the un-corrected number. It is
the same convention used by the existing implementation and by
Schoppe (2016) §3.5.
Pennington 2023 — not shipped in v1
The Pennington & David (2023) noise-corrected prediction correlation
divides by sqrt(TTRC) where TTRC is the average pairwise trial-pair
correlation. The existing implementation includes this as
pennington_prediction_correlation, with an abs(ttrc) band-aid that
the audit flags as wrong. Re-derivation: TTRC can be slightly negative
in the low-SNR regime and Pennington’s paper is silent on what to do; the
field’s typical fix is to clip-to-ε with a warning, but this changes the
reported number for poorly-driven cells in a way that has been
benchmarked in deepSTRF. We defer the Pennington method to a
follow-up rather than ship a rushed fix.
6.5 signal_power(responses, mask=None, reduction='mean')
Sahani & Linden 2003, NIPS, “How linear are auditory cortical responses?”.
For one (b, n) cell with R valid repeats, the variance of the
multi-trial response decomposes as
TP_{b,n} = E_r [ var_t( y_{r, b, n}(t) ) ] = SP_{b,n} + NP_{b,n} (single-trial variance)
var_t( E_r[ y_{r, b, n}(t) ] ) = SP_{b,n} + NP_{b,n} / R (variance of the PSTH)
Solving for SP_{b,n}:
SP_{b,n} = ( R · var_t(psth_{b,n}) - TP_{b,n} ) / (R - 1)
The per-neuron quantity is the length-weighted average of the per-stim estimates:
SP_n = sum_b ( T_b · SP_{b,n} ) / sum_b T_b
where T_b is the number of valid time bins for stim b. This is the
BLUE estimator under the assumption that every per-stim SP_{b,n} is
unbiased: weighting by sample count gives the minimum-variance combiner.
It also matches the natural sample weighting that cov and var apply
in the numerator of corrcoef / normalized_corrcoef (which run over
the full concatenated (b, t) series), so a 500 ms clip drives the
denominator the same way it drives the numerator. See §11 for the
derivation.
A given (b, n) cell is included iff it has ≥ 2 valid repeats and ≥ 2
valid time bins. Cells without any qualifying stim return NaN under
reduction='none'.
6.6 noise_power(responses, mask=None, reduction='mean')
NP_{b,n} = TP_{b,n} - SP_{b,n}
NP_n = sum_b ( T_b · NP_{b,n} ) / sum_b T_b
Same length-weighted aggregation as signal_power.
6.7 snr(responses, mask=None, reduction='mean')
SNR_n = SP_n / NP_n
Returns +inf for noiseless cells (NP_n = 0) — caller decides whether
to filter.
6.8 coherence(pred, gt, dt_ms, reduction='mean')
Magnitude-squared coherence between prediction and PSTH per neuron, as a
scalar per neuron computed as the mean over frequency bins. Uses
scipy.signal.coherence internally, so this is eval-only (no
gradient).
Coherence requires a regular time grid and does not tolerate NaN.
The caller is responsible for picking a NaN-free subset (e.g. one fully
recorded validation stim, or pre-flatten across stims of equal length).
We document this explicitly and raise ValueError if NaN appears in the
input. Future follow-up: a chunked-coherence variant that handles
NaN-padded variable-length stims.
This metric is preserved largely as it was — the existing implementation
is correct for its supported shape — but adapted to the new
(B, N, 1, T) × (B, N, 1, T) contract.
7. Recommended training-loop pattern
Mirroring data_paradigm.md §6, the canonical loop now looks like:
from deepSTRF.metrics import mse_loss, corrcoef, normalized_corrcoef
for batch in loader: # batch is a dict
stims, responses = batch['stims'], batch['responses']
pred = model(stims) # (B, N, 1, T)
loss = mse_loss(pred, responses) # auto-PSTH inside
loss.backward()
optimizer.step()
if val_step:
cc = corrcoef(pred, responses, reduction='none') # (N,) auto-PSTH
cc_norm = normalized_corrcoef(pred, responses, # (N,) needs raw repeats
method='schoppe',
reduction='none')
Notes:
No manual masking and no manual PSTH. Pass
responsesdirectly to any prediction-vs-PSTH metric; the auto-PSTH collapse (§2) and the NaN-derived mask (§4) are both internal. The dataloader’svalid_maskis informational at this point.Pass
responses.nanmean(dim=2, keepdim=True)explicitly only when you want a non-default target — a custom objective, a single-trial fit, etc. The(B, N, 1, T)shape is treated as authoritative and passed through unchanged.reduction='none'is the default for val metrics. Per-neuron numbers are usually what you want for diagnostics (best/worst cells, distribution plots). Aggregate yourself withcc.nanmean()if a single number is needed.Use
mask=only when overriding the default NaN rule. The 90 % case (mask =~isnan) is automatic.
9. Known bugs in the existing implementation (audit)
This section is the rewrite checklist. The existing metrics/performance.py
and metrics/losses.py will be rewritten — not patched in place — to
satisfy the contracts above. Issues found during the 2026-04-28 audit:
@torch.no_grad()on losses.NegativePoissonLogLikelihoodis correctly ann.Module(nono_grad), but every “performance” function inperformance.pyhas@torch.no_grad(). Splitting losses vs metrics into two modules makes the contract obvious.No NaN handling anywhere. Silently produces NaN if any input element is NaN. Critical given the data paradigm.
covarianceuses(T - 1)denominator;var(dim=-1)is Bessel-corrected by default. Mixing biased and unbiased estimators in the same formula introduces a small but real bias. Pick one (Bessel) and apply consistently.reduction='None'(capitalized) incorrelation_coefficientandnormalized_correlation_coefficientis a typo. Switch to PyTorch convention'none'|'mean'|'sum'.pennington_prediction_correlationusesabs(ttrc)— a band-aid for sign issues that shouldn’t appear in a correctly-derived formula. Re-derive (or defer; see §6.4).compute_CCmaxuses1/sqrt(cchalf²)=1/|cchalf|instead of1/cchalf. Same band-aid pattern as #5; equivalent to|ρ_half|in the Spearman–Brown formula. Replace with the textbook formsqrt(2·ρ_half / (1+ρ_half))and clip-or-NaN low-SNR cells.fill_missing_repeatsis a dataset-side helper that the modernized NaN-sentinel path obsoletes. Drop from metrics.coherencereturns a(B, N, F)tensor that is then averaged over B, not over F — different from the per-neuron-scalar contract here. Adapt to §6.8.NegativePoissonLogLikelihoodcomputeslog(-prediction + 1e-9)— negation insidelogis a bug. Should belog(prediction + ε). Also uses'None'(capitalized) and computes mean over(-2, -1)then over batch, which is'mean'over all elements — fine for the loss but the reduction semantics are inconsistent with the new convention.
10. Deferred items
These were considered for v1 and explicitly cut. Restoration will re-open this doc.
Module API (torchmetrics-style stateful
update()/compute()). Useful for streaming val metrics across an epoch without materializing every batch. The functional API is enough for v1; module API is easy to add later as a thin wrapper.feature_decorrelation/ redundancy-reduction regularizer. Not part of the original deepSTRF scope and uses a different design pattern (forward hooks onmodel.core).pop_corrcoef(correlation of population-summed PSTH). Uncommon metric; no immediate use case.Pennington-style noise-corrected prediction correlation (§6.4). Re-derivation pending; current
abs(ttrc)is wrong, simple ε-clip is not vetted.Per-stim-then-averaged metrics. Some of the literature reports
mean_s CC(stim_s)rather thanCC(concat_s). Different quantity; not shipped in v1.Chunked-coherence for NaN-padded variable-length stims (§6.8).
11. Why length-weighting? (BLUE derivation)
Several v1 metrics combine per-stim estimates into a per-neuron number:
signal_power, noise_power, snr, and normalized_corrcoef(method='hsu').
The aggregation across stims is length-weighted:
X_n = sum_b ( T_b · X_{b,n} ) / sum_b T_b
with T_b the number of valid time bins for stim b. This subsection
records why, since “average across stims” is also a valid choice and
the literature is split.
Setup
Let X_{b,n} be a per-stim estimate of some neuron-level quantity
X_n (e.g. signal power). Treat X_{b,n} as an unbiased estimator
with variance Var(X_{b,n}) ∝ 1 / T_b — i.e. estimator variance shrinks
linearly with the number of time samples used. This is the standard
behaviour of variance-of-variance estimators on stationary signals:
doubling T halves the noise on the variance estimate.
The general weighted estimator
X̂_n = sum_b ( w_b · X_{b,n} ) / sum_b w_b
is unbiased for any positive w_b summing to one. Its variance is
minimised — i.e. it is the Best Linear Unbiased Estimator (BLUE) —
when w_b ∝ 1 / Var(X_{b,n}) = T_b. So weighting by T_b extracts
the most information possible from the per-stim estimates under that
variance assumption.
Coherence with the numerator
corrcoef(pred_n, psth_n) and var(pred_n) in the Schoppe formula
operate over the concatenated valid (b, t) time series for neuron
n. A 500 ms clip contributes 10× the samples of a 50 ms clip and
correspondingly drives the numerator’s cov / var 10× harder. If we
were to combine per-stim SPs with equal weights instead of length
weights, the numerator and denominator would weight stims differently,
which makes the resulting CCnorm value awkward to interpret —
especially when comparing models on heterogeneous-length validation
sets. Length-weighting on the denominator brings everything into
agreement.
The equal-weight alternative
mean_b X_{b,n} is also defensible:
Pros: each stim contributes one answer, not one fact. If you trust short-stim estimates as much as long-stim estimates (e.g. your short stims are repeated many more times so the per-stim variance gain compensates), equal weighting reflects that.
Cons: ignores the variance structure described above; short noisy stims can pull the average around.
We pick length-weighting as the default because it composes correctly
with the natural sample-count weighting elsewhere in the pipeline. A
future kwarg weighting='length' | 'equal' could expose the choice if a
clear use case emerges.
Practical consequence
A neuron with two clips — a 500 ms validation stim with SP=10 and a
50 ms validation stim with SP=1 — gets SP_n ≈ (500·10 + 50·1)/550 ≈ 9.18 under length-weighting, vs 5.5 under equal weighting. The
length-weighted answer matches what var(psth_n) over the concatenated
time series would have given (modulo the 1/(R−1) Bessel correction);
the equal-weighted answer would produce an internally inconsistent
CCnorm.
12. References
Sahani, M. & Linden, J. F. (2003). “How linear are auditory cortical responses?” Advances in Neural Information Processing Systems (NIPS). — Original signal/noise power decomposition.
Hsu, A., Borst, A. & Theunissen, F. E. (2004). “Quantifying variability in neural responses and its application for the validation of model predictions.” Network: Computation in Neural Systems. — CCmax formulation via Spearman–Brown.
Schoppe, O., Harper, N. S., Willmore, B. D. B., King, A. J. & Schnupp, J. W. H. (2016). “Measuring the performance of neural models.” Frontiers in Computational Neuroscience. — CCnorm (signal-power formulation), reconciliation with Hsu et al.
Pennington, J. R. & David, S. V. (2023). “A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex.” PLOS Computational Biology. — TTRC-based noise-corrected prediction correlation.
Gill, P. et al. (2025). “Sound representation methods for spectro-temporal receptive field estimation.” Journal of Computational Neuroscience. — Coherence-information metric (deferred follow-up).