Squirrel-monkey auditory cortex MUA with TIMIT & monkey vocalizations (Downer 2025)

Dataset Source: Zenodo deposit 16175377 — Multi-channel auditory cortex electrophysiology in squirrel monkey (CC BY 4.0, ~29 GB).

Original Papers:

Dataset Details

Threshold-crossing multi-unit activity (MUA) from 1718 recording channels across 41 sessions in three squirrel monkeys (B, C, F), recorded passively while each animal listened to a battery of TIMIT speech sentences and species-specific vocalizations. Spike-sorting was not feasible at acquisition — Ahmed 2025 explicitly treats each channel as a single “multi-unit” (§Data collection, p4).

Two stimulus classes are loaded as separate Downer2025Dataset instances:

TIMIT — English speech (stimuli='timit')

  • 499 unique sentences from the TIMIT corpus, 1.99–3.59 s each (median 3.11 s), 16 kHz mono. Each sentence is presented with 0.5 s of pre/post silence padding (befaft_s = (0.5, 0.5)); total ~25 min of stimulus per session.

  • Canonical per-session design (Ahmed 2025 p4): 489 sentences at 1 rep + 10 sentences at 11 reps = 599 presentations per session. Some sessions doubled the 11-rep block (so per-(stim, neuron) R can also be 2 or 22).

  • Test split = the 10 11-rep IDs [12, 13, 32, 43, 56, 163, 212, 218, 287, 308] (programmatically discovered from the per-session stim-code histograms, stable across all sessions).

  • All sessions presented all 499 sentences → the resulting (S=499, N=1718) nrn_masks matrix is fully dense (zero NaN sentinels).

mVocs — monkey vocalizations (stimuli='mvocs')

  • 303 unique vocalizations (grunts, screams, coos) packaged in the 28 min concat WAV MonkVocs_15Blocks.wav (41 kHz stereo, mono-averaged and resampled to 16 kHz by the loader). The play-order mVocsStimCodes defines the per-voc canonical rep count.

  • Canonical WAV design: 228 vocs at 1 rep, 75 vocs at variable reps (2–30 reps), and 11 vocs at exactly 15 reps. The 11 at 15 reps are the paper’s test set: IDs [7, 9, 12, 15, 24, 29, 30, 33, 44, 45, 48].

  • Per-voc canonical duration = minimum inter-onset interval across that voc’s occurrences (excludes variable inter-stim silence). Voc lengths range 0.95–4.69 s, median ~1.8 s.

  • Different sessions presented different voc subsets → ~7 % sparse (37 572 / 520 554 cells are (1, 1) NaN sentinels).

  • mvoc stim_meta carries an extra n_reps_in_wav key so users can sub-filter beyond the binary test/estimation split.

By default both stim classes go through the same mel-spectrogram pipeline (audio_fs=16 000, n_mels=80, fmax=8 000, window_ms=25, compression='log1p', spec_zscore=True) so two instances are concat-compatible. These defaults match Ahmed 2025’s Kaldi-fbank convention as closely as the deepSTRF mel pipeline allows: 80 mel bands, log compression, per-band per-stim z-score, ~25 ms FFT window. The 8 kHz cap matches Ahmed 2025’s STRF baseline cochleagram (50–8000 Hz, p8).

Setup

Easiest path — auto-download from Zenodo into the platformdirs cache (heads up: the archive is ~29 GB):

from deepSTRF.datasets.audio import Downer2025Dataset

ds_timit = Downer2025Dataset(stimuli='timit', download=True)
ds_mvocs = Downer2025Dataset(stimuli='mvocs', download=True)

Default cache dir is platformdirs.user_cache_dir('deepSTRF')/Downer2025, overridable via $DEEPSTRF_DATA_DIR.

If you already have the data unpacked:

<path>/sessions/<session_id>/<animal>_<session_id>_Ch<N>[suffix]_MUspk.mat
<path>/stimuli/out_sentence_details_timit_all_loudness.mat
<path>/stimuli/SqMoPhys_MVOCStimcodes.mat
<path>/stimuli/MonkVocs_15Blocks.wav
<path>/sessions_metadata.yml

just pass the path:

ds = Downer2025Dataset('/path/to/auditory_cortex_data', stimuli='timit')

End-to-end load times on a typical workstation: ~8 min for TIMIT (1718×499), ~9 min for mVocs (1718×303).

Estimation vs test subsets

ds_test = Downer2025Dataset(stimuli='timit', subset='test')          # 10 stims
ds_est  = Downer2025Dataset(stimuli='timit', subset='estimation')    # 489 stims

# or load everything and filter later
ds = Downer2025Dataset(stimuli='timit')                              # 499 stims
ds.select_stims_by_attr('split', 'test')                              # 10

stim_meta[s] surfaces both fields:

Field

TIMIT example

mVoc example

Notes

name

'fadg0_si1279'

'mvoc_007'

TIMIT sentence name, or mvoc_NNN.

type

'timit'

'mvoc'

Stimulus class.

stim_id

1..499 (TIMIT) / 1..303

(same)

MATLAB-indexed; matches the upstream *Stimcode arrays.

duration_s

2.82

1.95

Full waveform duration (including the TIMIT 0.5 s pre/post).

split

'test' or 'estimation'

(same)

Paper-faithful.

n_reps_canonical

1 or 11 (TIMIT)

1 or 15

The paper’s per-stim count.

befaft_s

(0.5, 0.5) (TIMIT)

(0.0, 0.0)

Silence flanking the stim, only annotated for TIMIT.

n_reps_in_wav

(absent)

1..30

mVoc only — the per-voc count in the canonical WAV.

Per-cell metadata

nrn_meta[n] keys:

Field

Example

Notes

cell_id

'b_180413_Ch49'

Filename minus _MUspk.mat. Unique across the 1718 channels.

session_id

'180413'

Recording date.

animal_id

'b'

One of 'b', 'c', 'f'.

hemisphere

'RH' / 'LH'

Monkey C is the only one recorded in both hemispheres.

area_group

'core' / 'non-primary'

High-level area assignment from the YAML.

area

'A1' / 'R' / 'ML' / 'AL' / 'CL' / 'CPB' / 'RPB'

Fine area (transcribed from YAML comments).

channel

49

Probe channel number.

channel_suffix

None / 'p' / 's2' / 'ps2'

Annotates re-mounted recordings on the same channel #.

n_channels_in_session

16

Probe layout for that session (1, 16, 32, 48 or 64 in the paper).

coord_x, coord_y

0.4, -0.4

2D craniotomy coordinates (mm).

recording_type

'multi-unit'

Constant; all channels are MUA.

Filter examples:

# Just primary (core) auditory cortex, all animals
ds.select_pop_by_nrn_attr('area_group', 'core')                # 909

# Same, using the Ahmed-2025-style 'primary' alias at construction time:
ds = Downer2025Dataset(stimuli='timit', areas=('primary',))    # 909

# Just A1 (most populous core sub-area)
ds.select_pop_by_nrn_attr('area', 'A1')                         # 813

# Only monkey B's right hemisphere
ds.select_pop_by_nrn_predicate(
    lambda n: n['animal_id'] == 'b' and n['hemisphere'] == 'RH')

After 1718 channels are loaded, both filename-derived metadata and YAML-derived metadata are surfaced; area_group is straight from the YAML data, while the fine area label is transcribed once from the YAML comments (the comments are not machine-readable from the YAML data block, so the mapping is hard-coded in downer2025.py).

Reproducing Ahmed 2025’s analysis cohort

The paper’s main figures use a 404-multi-unit “well-tuned to TIMIT” subset and a 489-multi-unit “well-tuned to mVocs” subset, selected by a two-stage criterion: Wilcoxon rank-sum test of trial-to-trial response correlations against a circularly-shifted null, then a δ ≥ 0.5 effect-size filter on the same distributions (§”Trial-to-trial neural variability”, pp. 4–5). Downer2025Dataset provides an opt-in replication:

ds = Downer2025Dataset(stimuli='timit', smooth=False)
ds.compute_paper_tuning(n_resamples=10_000)
ds.select_pop_by_nrn_predicate(lambda n: n.get('ahmed2025_timit_well_tuned', False))
# → ~413 neurons (paper target: 404)

The method writes four keys per neuron — ahmed2025_<stim>_tuned, ahmed2025_<stim>_well_tuned, ahmed2025_<stim>_p_wilcoxon, ahmed2025_<stim>_delta_normalized. With n_resamples=10_000 (~11 min on TIMIT, ~4 min on mVocs) the well-tuned counts come within ~2–3 % of the paper (413 / 475 vs 404 / 489). For an exact match on the looser tuned counts (1195 / 1231) bump to n_resamples=100_000 (~80 min per mode). For most analyses, the library-canonical SNR / CCmax filter via compute_neuron_quality() is a faster paper-agnostic alternative.

Remarks

  • MUA only. Every *_MUspk.mat file holds threshold-crossing spike times for one channel — no spike-sorting, no per-spike cluster ID. Ahmed 2025: “we refer to the source of spikes on a single channel as a ‘multi-unit’” (p4).

  • Suffix variants (p, s2, ps2) annotate re-mounted recordings on the same channel number and are kept as distinct cells (with distinct cell_ids). Their trial struct is identical to the plain-channel files in the same session; only the spike-time history differs.

  • Bandwidth cap. Both stim classes are resampled to 16 kHz and mel-filtered with fmax=8 000 to match the paper’s 50–8000 Hz cochleagram baseline. The mVocs source actually contains content up to ~20 kHz (squirrel monkey audible range is well above 20 kHz), so users wanting the full band can override audio_fs= and fmax= — but cross-mode concatenation will then refuse (different F/bandwidth between TIMIT and mVocs).

  • 8 sessions lack a top-level *_TRIALINFO.mat file. We always read the trial struct embedded in each channel’s MUspk file instead — they’re identical across channels within a session, so this is a no-op for the well-formed sessions and a fallback for the rest.

  • Bad sessions flagged in sessions_metadata.yml are absent from the Zenodo upload (verified at load time). The Downer2025Dataset constructor honours the flag defensively anyway.