Squirrel-monkey auditory cortex MUA with TIMIT & monkey vocalizations (Downer 2025)
Dataset Source: Zenodo deposit 16175377 — Multi-channel auditory cortex electrophysiology in squirrel monkey (CC BY 4.0, ~29 GB).
Original Papers:
“Deep neural networks explain spiking activity in auditory cortex” — Ahmed B., Downer J.D., Malone B.J., Makin J.G. (2025), PLoS Computational Biology 21(8): e1013334.
“Temporally precise population coding of dynamic sounds by auditory cortex” — Downer J.D., Bigelow J., Runfeldt M., Malone B.J. (2021), Journal of Neurophysiology.
Dataset Details
Threshold-crossing multi-unit activity (MUA) from 1718 recording channels across 41 sessions in three squirrel monkeys (B, C, F), recorded passively while each animal listened to a battery of TIMIT speech sentences and species-specific vocalizations. Spike-sorting was not feasible at acquisition — Ahmed 2025 explicitly treats each channel as a single “multi-unit” (§Data collection, p4).
Two stimulus classes are loaded as separate Downer2025Dataset instances:
TIMIT — English speech (stimuli='timit')
499 unique sentences from the TIMIT corpus, 1.99–3.59 s each (median 3.11 s), 16 kHz mono. Each sentence is presented with 0.5 s of pre/post silence padding (
befaft_s = (0.5, 0.5)); total ~25 min of stimulus per session.Canonical per-session design (Ahmed 2025 p4): 489 sentences at 1 rep + 10 sentences at 11 reps = 599 presentations per session. Some sessions doubled the 11-rep block (so per-(stim, neuron)
Rcan also be 2 or 22).Test split = the 10 11-rep IDs
[12, 13, 32, 43, 56, 163, 212, 218, 287, 308](programmatically discovered from the per-session stim-code histograms, stable across all sessions).All sessions presented all 499 sentences → the resulting
(S=499, N=1718)nrn_masksmatrix is fully dense (zero NaN sentinels).
mVocs — monkey vocalizations (stimuli='mvocs')
303 unique vocalizations (grunts, screams, coos) packaged in the 28 min concat WAV
MonkVocs_15Blocks.wav(41 kHz stereo, mono-averaged and resampled to 16 kHz by the loader). The play-ordermVocsStimCodesdefines the per-voc canonical rep count.Canonical WAV design: 228 vocs at 1 rep, 75 vocs at variable reps (2–30 reps), and 11 vocs at exactly 15 reps. The 11 at 15 reps are the paper’s test set: IDs
[7, 9, 12, 15, 24, 29, 30, 33, 44, 45, 48].Per-voc canonical duration = minimum inter-onset interval across that voc’s occurrences (excludes variable inter-stim silence). Voc lengths range 0.95–4.69 s, median ~1.8 s.
Different sessions presented different voc subsets → ~7 % sparse (37 572 / 520 554 cells are
(1, 1)NaN sentinels).mvocstim_metacarries an extran_reps_in_wavkey so users can sub-filter beyond the binary test/estimation split.
By default both stim classes go through the same mel-spectrogram pipeline (audio_fs=16 000, n_mels=80, fmax=8 000, window_ms=25, compression='log1p', spec_zscore=True) so two instances are concat-compatible. These defaults match Ahmed 2025’s Kaldi-fbank convention as closely as the deepSTRF mel pipeline allows: 80 mel bands, log compression, per-band per-stim z-score, ~25 ms FFT window. The 8 kHz cap matches Ahmed 2025’s STRF baseline cochleagram (50–8000 Hz, p8).
Setup
Easiest path — auto-download from Zenodo into the platformdirs cache (heads up: the archive is ~29 GB):
from deepSTRF.datasets.audio import Downer2025Dataset
ds_timit = Downer2025Dataset(stimuli='timit', download=True)
ds_mvocs = Downer2025Dataset(stimuli='mvocs', download=True)
Default cache dir is platformdirs.user_cache_dir('deepSTRF')/Downer2025, overridable via $DEEPSTRF_DATA_DIR.
If you already have the data unpacked:
<path>/sessions/<session_id>/<animal>_<session_id>_Ch<N>[suffix]_MUspk.mat
<path>/stimuli/out_sentence_details_timit_all_loudness.mat
<path>/stimuli/SqMoPhys_MVOCStimcodes.mat
<path>/stimuli/MonkVocs_15Blocks.wav
<path>/sessions_metadata.yml
just pass the path:
ds = Downer2025Dataset('/path/to/auditory_cortex_data', stimuli='timit')
End-to-end load times on a typical workstation: ~8 min for TIMIT (1718×499), ~9 min for mVocs (1718×303).
Estimation vs test subsets
ds_test = Downer2025Dataset(stimuli='timit', subset='test') # 10 stims
ds_est = Downer2025Dataset(stimuli='timit', subset='estimation') # 489 stims
# or load everything and filter later
ds = Downer2025Dataset(stimuli='timit') # 499 stims
ds.select_stims_by_attr('split', 'test') # 10
stim_meta[s] surfaces both fields:
Field |
TIMIT example |
mVoc example |
Notes |
|---|---|---|---|
|
|
|
TIMIT sentence name, or |
|
|
|
Stimulus class. |
|
|
(same) |
MATLAB-indexed; matches the upstream |
|
|
|
Full waveform duration (including the TIMIT 0.5 s pre/post). |
|
|
(same) |
Paper-faithful. |
|
|
|
The paper’s per-stim count. |
|
|
|
Silence flanking the stim, only annotated for TIMIT. |
|
(absent) |
|
mVoc only — the per-voc count in the canonical WAV. |
Per-cell metadata
nrn_meta[n] keys:
Field |
Example |
Notes |
|---|---|---|
|
|
Filename minus |
|
|
Recording date. |
|
|
One of |
|
|
Monkey C is the only one recorded in both hemispheres. |
|
|
High-level area assignment from the YAML. |
|
|
Fine area (transcribed from YAML comments). |
|
|
Probe channel number. |
|
|
Annotates re-mounted recordings on the same channel #. |
|
|
Probe layout for that session (1, 16, 32, 48 or 64 in the paper). |
|
|
2D craniotomy coordinates (mm). |
|
|
Constant; all channels are MUA. |
Filter examples:
# Just primary (core) auditory cortex, all animals
ds.select_pop_by_nrn_attr('area_group', 'core') # 909
# Same, using the Ahmed-2025-style 'primary' alias at construction time:
ds = Downer2025Dataset(stimuli='timit', areas=('primary',)) # 909
# Just A1 (most populous core sub-area)
ds.select_pop_by_nrn_attr('area', 'A1') # 813
# Only monkey B's right hemisphere
ds.select_pop_by_nrn_predicate(
lambda n: n['animal_id'] == 'b' and n['hemisphere'] == 'RH')
After 1718 channels are loaded, both filename-derived metadata and YAML-derived metadata are surfaced; area_group is straight from the YAML data, while the fine area label is transcribed once from the YAML comments (the comments are not machine-readable from the YAML data block, so the mapping is hard-coded in downer2025.py).
Reproducing Ahmed 2025’s analysis cohort
The paper’s main figures use a 404-multi-unit “well-tuned to TIMIT” subset and a 489-multi-unit “well-tuned to mVocs” subset, selected by a two-stage criterion: Wilcoxon rank-sum test of trial-to-trial response correlations against a circularly-shifted null, then a δ ≥ 0.5 effect-size filter on the same distributions (§”Trial-to-trial neural variability”, pp. 4–5). Downer2025Dataset provides an opt-in replication:
ds = Downer2025Dataset(stimuli='timit', smooth=False)
ds.compute_paper_tuning(n_resamples=10_000)
ds.select_pop_by_nrn_predicate(lambda n: n.get('ahmed2025_timit_well_tuned', False))
# → ~413 neurons (paper target: 404)
The method writes four keys per neuron — ahmed2025_<stim>_tuned, ahmed2025_<stim>_well_tuned, ahmed2025_<stim>_p_wilcoxon, ahmed2025_<stim>_delta_normalized. With n_resamples=10_000 (~11 min on TIMIT, ~4 min on mVocs) the well-tuned counts come within ~2–3 % of the paper (413 / 475 vs 404 / 489). For an exact match on the looser tuned counts (1195 / 1231) bump to n_resamples=100_000 (~80 min per mode). For most analyses, the library-canonical SNR / CCmax filter via compute_neuron_quality() is a faster paper-agnostic alternative.
Remarks
MUA only. Every
*_MUspk.matfile holds threshold-crossing spike times for one channel — no spike-sorting, no per-spike cluster ID. Ahmed 2025: “we refer to the source of spikes on a single channel as a ‘multi-unit’” (p4).Suffix variants (
p,s2,ps2) annotate re-mounted recordings on the same channel number and are kept as distinct cells (with distinctcell_ids). Theirtrialstruct is identical to the plain-channel files in the same session; only the spike-time history differs.Bandwidth cap. Both stim classes are resampled to 16 kHz and mel-filtered with
fmax=8 000to match the paper’s 50–8000 Hz cochleagram baseline. The mVocs source actually contains content up to ~20 kHz (squirrel monkey audible range is well above 20 kHz), so users wanting the full band can overrideaudio_fs=andfmax=— but cross-mode concatenation will then refuse (differentF/bandwidth between TIMIT and mVocs).8 sessions lack a top-level
*_TRIALINFO.matfile. We always read thetrialstruct embedded in each channel’s MUspk file instead — they’re identical across channels within a session, so this is a no-op for the well-formed sessions and a fallback for the rest.Bad sessions flagged in
sessions_metadata.ymlare absent from the Zenodo upload (verified at load time). TheDowner2025Datasetconstructor honours the flag defensively anyway.