deepSTRF.datasets.audio package

Submodules

deepSTRF.datasets.audio.audio_dataset module

class deepSTRF.datasets.audio.audio_dataset.AudioNeuralDataset(path: str, dt_ms: float)[source]

Bases: NeuralDataset

Neural dataset class for auditory stimuli.

Stim shape is polymorphic depending on the loading mode:

  • Spectrogram mode (default): self.stims[s] is a (1, F, T) tensor where F = self.F is the frequency-band count and T is the neural time-bin count.

  • Waveform mode (opt-in, subclass-specific): self.stims[s] is a (1, T_audio) mono float32 tensor at sample rate self.audio_fs. Subclasses that support this mode expose a return_waveform=True constructor flag and set self.audio_fs to a positive int. The (1, ...) leading dim is the mono-channel axis, kept for collate compatibility (neural_collate zero-pads the last axis only).

Subclasses must additionally set self.F (number of frequency bins in the spectrogram — kept positive even in waveform mode so downstream models know the target spectrogram width a wav2spec module should produce) in their __init__, before calling self.validate().

F

Frequency-band count of the target spectrogram. Set by the subclass.

Type:

int

audio_fs

Sample rate of the raw waveform when in waveform mode; None otherwise. Subclasses without a waveform branch leave this None.

Type:

int or None

hearing_range_hz

Optional (low, high) informational bound on the species’ canonical hearing range in Hz (e.g. (200.0, 40000.0) for ferret). Purely advisory — nothing is enforced against it; it exists so notebooks / tooling can display the range and users can choose to clamp a wav2spec’s frequency limits. None when unknown.

Type:

tuple of float or None

get_F()[source]

Return the number of frequency bins in the spectrograms.

Returns:

self.F, the spectrogram frequency-band count.

Return type:

int

property hop: int | None

Audio samples per neural bin in waveform mode (None in spec mode).

hop = round(audio_fs * dt_ms / 1000). The grid-lock contract (see validate()) requires this to be an exact integer, so a wav2spec front-end’s own hop must equal this value for the audio→neural resampling to stay aligned with the response bins.

validate()[source]

Check that the instance is deepSTRF-compatible.

Subclasses should call super().validate() and then add their own checks (e.g. AudioNeuralDataset checks self.F > 0).

deepSTRF.datasets.audio.ns1_drc module

class deepSTRF.datasets.audio.ns1.NS1Dataset(path: str | None = None, dt_ms: float = 5.0, smooth: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the NS1 (Harper et al. 2016, Rahman et al. 2020) data.

119 multi/single units from primary auditory cortex (A1) of deeply anesthetized ferrets, recorded in response to 20 natural sound clips of 4.995 s each, presented 20 times per neuron. Every neuron heard every clip, so the response grid is fully dense (no NaN sentinels).

Of the 119 units, 73 pass the “single-unit at known depth” filter the original authors used (single_t in {'Yes', 'Maybe'} and depth >= 0); select_pop_by_nrn_attr() over single_t / depth_um reproduces this subset.

The spectrogram tensor is precomputed at dt = 5 ms (F = 34 frequency bands, T = 999 bins); the dt_ms constructor argument is currently validated against this resolution. With return_waveform=True, stims are instead raw mono waveforms (1, T_audio) at audio_fs (aligned to T_audio = T_neural * audio_fs * dt_ms / 1000) — feed them through a model’s wav2spec front-end.

Data are freely available (no account required) and auto-fetched by NS1Dataset(download=True):

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). NS1-specific metadata:

  • stim_meta dicts hold name and type.

  • nrn_meta dicts hold cell_id, area, depth_um, noise_ratio, single_n, single_t, n_electrodes and electrode_number. noise_ratio is the Sahani-Linden normalised noise power (lower = cleaner; NOT an SNR despite the legacy .mat field name). single_n is the single-unit flag from spike-snippet clustering (0/1); single_t is the manual triage label (‘Yes’/’Maybe’/’No’).

References

Harper et al. (2016). “Network receptive field modeling reveals extensive integration and multi-feature selectivity in auditory cortical neurons.” PLoS Computational Biology.

Rahman et al. (2020). “Simple transformations capture auditory input to cortex.” PNAS.

Parameters:
  • path (str, optional) – Path to the NS1 data folder containing test_data_5ms.mat, MetadataSHEnCneurons.mat, and spikesandwav/. If None, defaults to the platformdirs cache (user_cache_dir('deepSTRF') / 'NS1' — overridable via $DEEPSTRF_DATA_DIR).

  • dt_ms (float, default 5.0) – Time-bin width in ms. Must equal 5.0 — the bundled spectrogram is precomputed at this resolution. Other values would require re-spectrogramming the wavs (not implemented).

  • smooth (bool, default True) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).

  • download (bool, default False) – If True and the data assets are missing under path, fetch them from their public sources (no account required) — OSF (https://osf.io/ayw2p/: metadata + spike data + wavs) and DNet GitHub (https://github.com/monzilur/DNet: the precomputed 5 ms mel-spectrogram tensor test_data_5ms.mat). Total ~160 MB, ~16 s on a fast connection. See download_ns1().

  • return_waveform (bool, default False) – If True, self.stims holds raw audio waveforms instead of precomputed spectrograms. Each self.stims[s] is a (1, T_audio) float32 tensor at audio_fs Hz, downmixed to mono, resampled from the native 48 828.125 Hz, and right-cropped / zero-padded to exactly T_neural * audio_fs * dt_ms / 1000 samples so it aligns with the 4.995 s response window. Pair with a model that has a wav2spec front-end (see deepSTRF.models.wav2spec).

  • audio_fs (int, default 48000) – Sample rate (Hz) for waveform mode. Default 48 kHz gives a clean 240 samples / 5-ms bin and a Nyquist of 24 kHz — enough to preserve the ~22.6 kHz content used in Rahman et al. 2019’s cochleagram. (Native is 48 828.125 Hz; the small downsample keeps an integer sample-per-bin factor.) Ignored when return_waveform=False.

deepSTRF.datasets.audio.ns1.download_ns1(dest: str | None = None) str[source]

Download all NS1 data assets into dest.

Sources:

  • OSF (https://osf.io/ayw2p/, no account): the dataset README, the per-neuron metadata (.mat), and the spike + wav zip (~155 MB total).

  • DNet GitHub (https://github.com/monzilur/DNet, master branch): the precomputed 5 ms mel-spectrogram tensor test_data_5ms.mat (5.2 MB) accompanying Rahman et al. 2019 PLoS Comp Biol. NOT on OSF.

Idempotent: skips files that already exist; returns the destination path.

Parameters:

dest (str, optional) – Where to put the downloaded files. Defaults to the platformdirs cache (overridable via $DEEPSTRF_DATA_DIR).

Returns:

Absolute path to the dataset directory.

Return type:

str

deepSTRF.datasets.audio.nat4 module

class deepSTRF.datasets.audio.nat4.NAT4Dataset(path: str | None = None, area: str = 'A1', dt_ms: float = 10.0, smooth: bool = False, download: bool = False, subset: str = 'all', return_waveform: bool = False, audio_fs: int = 44100)[source]

Bases: AudioNeuralDataset

PyTorch dataset for NAT4 (Pennington & David, 2022 / 2023).

Two cortical areas: A1 (primary, 849 cells of which 777 auditory) and PEG (secondary, 398 of which 339 auditory). Pass area=...; one instance covers one area. To pool both, instantiate twice and concat_neural_datasets([a1, peg]).

There are 595 stimuli total: 18 high-rep (val, 20 trials) + 577 low-rep (est, 1 trial), each clip 1.5 s. The default time bin is dt_ms = 10 (the population recording is precomputed at fs=100 with val pre-averaged over 20 reps; per-site spike trains are at fs=1000 and downsampled to 10 ms by summing). The spectrogram has F = 18 ozgf bands and T = 150 frames per stim.

The loader reads the published NAT4 archive directly with native CSV / JSON / HDF5 parsers — no NEMS0 dependency. Data are freely available at https://doi.org/10.5281/zenodo.8044773 (no account required) and auto-fetched by NAT4Dataset(download=True).

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). NAT4-specific metadata:

  • stim_meta dicts hold name and subset ('est' or 'val'); the subset='all'|'est'|'val' constructor argument filters this list at load time.

  • nrn_meta dicts hold cell_id (raw NEMS id, e.g. 'ARM029a-01-1'), area, auditory (flag from the dataset’s <area>_pred_correlation.csv), and the parsed components site (e.g. 'ARM029a'), animal (3-char site prefix, e.g. 'ARM'), electrode (int) and unit_in_electrode (int). Components default to None for any cell whose id does not match the standard <site>-<elec>-<unit> scheme.

est responses have shape (R=1, T=150) and val responses (R=20, T=150); the (1, 1) NaN sentinel marks (stim, neuron) pairs where the cell was not recorded for that stim.

With return_waveform=True, stims are instead the raw mono waveforms (1, T_audio = T * hop) at audio_fs (hop=441 at 44.1 kHz / 10 ms) — feed them through a model’s wav2spec slot.

References

Pennington & David (2022, preprint). “Can deep learning provide a generalizable model for dynamic sound encoding in auditory cortex?”

Pennington & David (2023). “A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex.” PLOS Computational Biology.

Parameters:
  • path (str, optional) – Path to the NAT4 data folder. Defaults to the platformdirs cache.

  • area ({'A1', 'PEG'}) – Cortical area.

  • dt_ms (float, default 10.0) – Time-bin width in ms. Currently must equal 10.0; the population recording is precomputed at fs=100 and the per-site downsampling assumes a fixed 10x ratio from fs=1000.

  • smooth (bool, default False) – If True, smooth PSTHs with a 21 ms Hanning window. Off by default here because NAT4 trials are typically used as-is for STRF fitting (unlike CRCNS-AA where smoothing is the published norm).

  • download (bool, default False) – If True and the data is missing under path, fetch it from Zenodo (record 8044773).

  • subset ({'all', 'est', 'val'}, default 'all') – If ‘est’ or ‘val’, only that stimulus subset is loaded — stim_meta / stims / responses shrink accordingly, and the (more expensive) per-site spike-time pass is skipped entirely under subset='est'. The two subsets correspond to Pennington & David’s published estimation set (575 stims, R=1, from the population recording) and validation set (18 stims, R=20, from the per-site recordings) respectively. Note that 33 of the 849 A1 cells have no val data — under subset='val' their responses are full NaN sentinels; pair the constructor arg with ds.select_pop_by_stim_attr('subset', 'val') to drop them automatically (idiomatic alternative: ds.select_stims_by_attr('subset', 'val') — which leaves the full stim bank loaded but applies the bidirectional rule, so cells without val data are hidden from __getitem__).

  • return_waveform (bool, default False) – If True, each stimulus is the raw mono waveform (1, T_audio) at audio_fs Hz instead of the precomputed ozgf cochleagram. The 593 source .wav files (44.1 kHz, 1 s of sound) are read from <path>/wav/ and embedded in the 1.5 s trial window at the recording’s pre-silence offset, then grid-locked to T_audio = T_neural * hop (hop = audio_fs * dt_ms / 1000). Feed it through a model’s wav2spec slot (e.g. CausalGammatone to reproduce the native ozgf front-end). Pass download=True to also fetch wav.zip from Zenodo.

  • audio_fs (int, default 44100) – Audio sample rate for return_waveform=True. The default 44.1 kHz is the native rate of the NAT4 wavs and gives an exact integer hop = 441 at dt_ms = 10 (no resampling). Choose any rate making audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

deepSTRF.datasets.audio.nat4.download_nat4(area: str, dest: str | None = None, wav: bool = False) str[source]

Download the NAT4 release from Zenodo into dest.

Fetches the population .tgz, the per-cell auditory CSV, and the per-site .zip. The single-sites zip is unpacked into <dest>/<area>_single_sites/ so the loader finds the per-site .tgzs where it expects them.

Idempotent: skips files / dirs that already exist.

Parameters:
  • area ({'A1', 'PEG'})

  • dest (str, optional) – Defaults to default_cache_dir('NAT4') (overridable via $DEEPSTRF_DATA_DIR).

  • wav (bool, default False) – If True, also fetch and unpack wav.zip (the 593 source waveforms, 44.1 kHz / 1 s each) into <dest>/wav/ for the raw-waveform branch (NAT4Dataset(return_waveform=True)). The spectrogram-mode loader does not need it.

deepSTRF.datasets.audio.wehr module

deepSTRF.datasets.audio.asari module

deepSTRF.datasets.audio.crcns_aa1 module

class deepSTRF.datasets.audio.crcns_aa1.CRCNSAA1Dataset(path: str | None = None, areas=('Field_L', 'MLd'), stimuli=('conspecific', 'flatrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the CRCNS-AA1 recordings.

Extracellular, spike-sorted single units of anesthetized male zebra finches: 50 cells in Field L and 50 in MLd, recorded in response to 10 clips of conspecific vocalizations and 20 clips of flat ripples (up to 5 s each, ~10 trials on average). Data are available at https://crcns.org/data-sets/aa/aa-1/about (free CRCNS account); see the AA1 README in the deepSTRF docs for the full notes.

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). AA1-specific metadata:

  • stims are mel-spectrograms (1, F, T_s).

  • stim_meta dicts hold name, type, sample_rate, n_samples and duration_s.

  • nrn_meta dicts hold cell_id, animal_id, area, cell_seq and rig. cell_seq is the sequential cell index parsed from the cell folder name (the n-th cell recorded); rig is the single-letter rig label when present, else None (cells “4_A” and “4_B” were recorded simultaneously, possibly in different areas).

Two cells lack conspecific responses: pipu1018_2_A (MLd) and pipu1018_2_B (Field_L).

References

Woolley et al. (2005). “Tuning for Spectro-temporal Modulations: a Mechanism for Auditory Discrimination of Natural Sound.”

Hsu et al. (2004). “Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons.”

Singh & Theunissen (2003). “Modulation spectra of natural sounds and ethological theories of auditory processing.”

Initializes the AA1 Dataset.

Parameters:
  • path (str, optional) – Path to the AA1 data folder (containing Field_L_cells/, MLd_cells/, all_stims/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIR overrides).

  • areas (tuple of str) – Recording sites: ‘Field_L’ or ‘MLd’.

  • stimuli (tuple of str) – Stimulus types: ‘conspecific’ or ‘flatrip’.

  • dt_ms (float) – Time step size in ms.

  • n_mels (int) – Number of mel frequency bands.

  • compression (str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored when return_waveform=True.

  • window_ms (float, default 10.0) – FFT analysis-window length in ms. n_fft is computed as round(window_ms * 1e-3 * sample_rate) and is decoupled from ``hop_length`` so phonemic detail is preserved at any dt_ms. Earlier versions of this dataset hardcoded n_fft = 10 * hop_length — benign at the default dt_ms=1 (10 ms FFT window), but at dt_ms=50 the same formula produced a 500 ms FFT window and over-smoothed every spec frame. The default window_ms=10.0 preserves bit-identical behaviour at dt_ms=1 while removing the scaling bug at coarser bins. Speech-pipeline users may prefer window_ms=25.0 (Kaldi default). Ignored when return_waveform=True.

  • return_waveform (bool, default False) – If True, self.stims[s] holds the raw audio waveform (1, T_audio) at audio_fs Hz (grid-locked to T_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whose wav2spec slot is a waveform front-end (see deepSTRF.models.wav2spec); responses are unchanged.

  • audio_fs (int, default 32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA1 wavs (so no resampling); other values resample and must keep audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

  • download (bool, default False) – If True and the data is missing under path, fetch the ~17 MB CRCNS-AA1 archive from the NERSC mirror (free CRCNS account required; see crcns_download) and unzip in place.

  • username (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.

  • password (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.

deepSTRF.datasets.audio.crcns_aa1.download_aa1(dest: str | None = None, username: str | None = None, password: str | None = None) str[source]

Download the CRCNS-AA1 archive from the NERSC mirror into dest.

Idempotent: skips the archive if already present, and skips unzipping if Field_L_cells/ already exists in dest. Returns the dataset directory.

Parameters:
  • dest (str, optional) – Defaults to the platformdirs cache (overridable via $DEEPSTRF_DATA_DIR).

  • username (str, optional) – Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Free CRCNS account at https://crcns.org/register.

  • password (str, optional) – Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Free CRCNS account at https://crcns.org/register.

deepSTRF.datasets.audio.crcns_aa1.get_animals_ids(data_path)[source]
Takes in the path of the ‘CRCNS_AA1/data/’ folder, goes through ‘Field_L/’ and ‘MLd’, and outputs a list of unique

animal ids, which are the string preceding the first underscore of each subfolder. e.g., ‘gg0304_4_B’ –> ‘gg0304’

deepSTRF.datasets.audio.crcns_aa1.get_area_cells(data_path)[source]

From the cell_regions.csv file, returns a dictionary with area labels as keys and lists of cell names as values.

deepSTRF.datasets.audio.crcns_aa1.get_stim_ids(data_path)[source]

From the cell_regions.csv file, returns a dictionary with area labels as keys and lists of cell names as values.

deepSTRF.datasets.audio.crcns_aa2 module

class deepSTRF.datasets.audio.crcns_aa2.CRCNSAA2Dataset(path: str | None = None, areas=('Field_L', 'mld', 'OV', 'CM', 'None'), stimuli=('conspecific', 'flatrip', 'songrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the CRCNS-AA2 recordings (OV, MLd, Field L, CM).

494 extracellular, spike-sorted single units of male zebra finches, identified in OV, MLd, Field L, L1, L2a, L2b, L3 (and some with unidentified area, None). Three stimulus classes — conspecific songs (72 stims), flat ripples (20) and song ripples (25) — each presented 10-20 times, with low trial-to-trial variability. Almost all cells saw conspecific and songrip stimuli; about half saw flatrip. Population fitting-compatible. Data are available at https://crcns.org/data-sets/aa/aa-2/about (free CRCNS account).

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). AA2-specific metadata:

  • stims are mel-spectrograms (1, F, T_s).

  • stim_meta dicts hold name, type, sample_rate, n_samples and duration_s (the last three from data/stim_data.csv).

  • nrn_meta dicts hold cell_id, animal_id, area, cell_seq and rig (see CRCNSAA1Dataset for the cell-name format; rig is often None in AA2).

References

Gill et al. (2006). “Sound representation methods for spectro-temporal receptive field estimation.”

Amin et al. (2010). “Role of the Zebra Finch Auditory Thalamus in Generating Complex Representations for Natural Sounds.”

Initializes the AA2 Dataset.

Parameters:
  • path (str, optional) – Path to the AA2 data folder. Defaults to the platformdirs cache.

  • areas (tuple of str) – Recording sites of interest: ‘Field_L’, ‘L1’, ‘L2a’, ‘L2b’, ‘L3’, ‘mld’, ‘OV’, ‘CM’, or ‘None’.

  • stimuli (tuple of str) – Stimulus types of interest: ‘conspecific’, ‘flatrip’, ‘songrip’.

  • dt_ms (float) – Time step size in ms.

  • n_mels (int) – Number of mel frequency bands.

  • compression (str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored when return_waveform=True.

  • window_ms (float, default 10.0) – FFT analysis-window length in ms. n_fft is computed as round(window_ms * 1e-3 * sample_rate) and is decoupled from ``hop_length`` so phonemic detail is preserved at any dt_ms. Earlier versions of this dataset hardcoded n_fft = 10 * hop_length, which gave a benign 10 ms FFT window at dt_ms=1 but a 500 ms window at dt_ms=50. Default window_ms=10.0 preserves bit-identical behaviour at dt_ms=1 while fixing the scaling bug at coarser bins. Ignored when return_waveform=True.

  • return_waveform (bool, default False) – If True, self.stims[s] holds the raw audio waveform (1, T_audio) at audio_fs Hz (grid-locked to T_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whose wav2spec slot is a waveform front-end; responses are unchanged.

  • audio_fs (int, default 32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA2 wavs (no resampling); other values resample and must keep audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

  • download (bool, default False) – If True and the data is missing under path, fetch the ~30 MB worth of CRCNS-AA2 archives from the NERSC mirror (free CRCNS account required) and extract in place.

  • username (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

  • password (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

deepSTRF.datasets.audio.crcns_aa2.download_aa2(dest: str | None = None, username: str | None = None, password: str | None = None) str[source]

Download the CRCNS-AA2 archives from the NERSC mirror into dest.

Idempotent: skips an archive if already on disk, and skips extraction of an archive if its anchor sub-tree (all_cells/, all_stims/, or stim_data.csv) already exists.

Parameters:
  • dest (str, optional) – Defaults to default_cache_dir('AA2') (overridable via $DEEPSTRF_DATA_DIR).

  • username (str, optional) – Default to $CRCNS_USERNAME / $CRCNS_PASSWORD.

  • password (str, optional) – Default to $CRCNS_USERNAME / $CRCNS_PASSWORD.

deepSTRF.datasets.audio.crcns_aa2.get_animals_ids(file_path)[source]

Extracts unique animal identifiers from the first column of the ‘cell_stim_classes.csv’ file. The unique identifier is defined as the substring preceding the first underscore in the first column. The output is a list of unique identifiers.

deepSTRF.datasets.audio.crcns_aa2.get_area_cells(file_path)[source]

From the cell_regions.csv file, returns a dictionary with area labels as keys and lists of cell names as values.

deepSTRF.datasets.audio.crcns_aa2.get_stim_ids_from_folders(cells_path, verbose=False)[source]

needs the ‘all_cells/’ path

returns a dictionary with the three main stim_types ‘consepcific’, ‘songrip’ and ‘flatrip’ as keys, and a list of unique wav names for each value

deepSTRF.datasets.audio.crcns_aa2.get_stims_ids_from_csv(file_path)[source]

Extracts .wav file names from the first column of the ‘stim_data.csv’ file, and classify them into stimulus types. The output is a dictionary with categories as keys and lists of .wav file names as values.

deepSTRF.datasets.audio.crcns_aa2.load_stim_data_csv(file_path)[source]

Read CRCNS-AA2 stim_data.csv into a {wav_filename: {...}} dict.

Each value is {"sample_rate": Hz, "bit_depth": int, "n_samples": int, "duration_s": float}. Returned even for stims classified as “unknown” / “bengalese”, since the dataset class will simply not select those by default.

deepSTRF.datasets.audio.crcns_aa4 module

class deepSTRF.datasets.audio.crcns_aa4.CRCNSAA4Dataset(path: str | None = None, animals='all', stimuli=('song', 'call', 'mlnoise'), dt_ms=1.0, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 24000, download: bool = False, username: str | None = None, password: str | None = None)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the CRCNS-AA4 recordings.

1401 extracellular, spike-sorted single and multi units of adult zebra finches (4 males, 2 females) in Field L, caudolateral and caudomedial mesopallium (CLM, CMM) and caudomedial nidopallium (NCM) — though units were not precisely assigned to one of these areas. Three stimulus classes (conspecific songs, calls, ripple noise), each a few seconds long and presented ~10 times. Population- and batch-compatible. Data are available at https://crcns.org/data-sets/aa/aa-4/about-aa-4 (free CRCNS account).

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). AA4-specific metadata:

  • stims are mel-spectrograms (1, F, T_s).

  • stim_meta dicts hold name (the stimulus md5 — the canonical identifier, since the wav filename is per-animal and not unique across the corpus), type, class and duration_s (the stim_duration attr from the h5, in seconds).

  • nrn_meta dicts hold: cell_id (h5 basename, no extension), animal_id, sex ('M' / 'F'), site (e.g. "Site1"), electrode (int 1-32, channel index across both 16-channel arrays at a site), ldepth / rdepth (left / right array depth in µm), sort_type ('single' / 'multi'; 'noise' / 'tdt' are filtered out), sort_id (online-sort int) and subsort_id (offline-sort int parsed from the trailing _ss<N>; None if absent).

The dataset paper does not publish a per-cell brain-area assignment, so the depth + electrode-array geometry is the only anatomical proxy; nor does it document which electrode IDs (1-16 vs 17-32) map to the left vs right hemisphere — confirm with the dataset authors before deriving a hemisphere from electrode.

References

Elie & Theunissen (2015). “Meaning in the avian auditory cortex: Neural representation of communication calls.” European Journal of Neuroscience.

Elie & Theunissen (2019). “Invariant neural responses for sensory categories revealed by the time-varying information for communication calls.” PLoS Computational Biology.

Initializes the AA4 Dataset.

Parameters:
  • path (str, optional) – Path to the CRCNS_AA4/data/ folder containing one subfolder per animal (with .h5 cell files + a wavfiles/ directory of stimulus .wav files). Defaults to the platformdirs cache.

  • animals ('all' or sequence of str) – Animals to load (any subset of AA4_ANIMAL_IDS).

  • stimuli (sequence of str) – Stimulus types to keep; subset of {‘song’, ‘call’, ‘mlnoise’}.

  • dt_ms (float) – Time-bin width in ms.

  • smooth (bool) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).

  • n_mels (int) – Number of mel frequency bands of the stimulus spectrogram.

  • compression ({'cubic', 'log1p', 'none'}) – Compression applied to the spectrogram (saturation effect of hair cells). Ignored when return_waveform=True.

  • window_ms (float, default 10.0) – FFT analysis-window length in ms. n_fft is computed per-stim as round(window_ms * 1e-3 * sample_rate) and is decoupled from ``hop_length`` so phonemic detail is preserved at any dt_ms. Earlier versions of this dataset hardcoded n_fft = hop * 10 — at dt_ms=50 that gave a 500 ms FFT window and over-smoothed every spec frame. Default window_ms=10.0 preserves bit-identical behaviour at dt_ms=1 (n_fft=320 at sr=32 kHz) while fixing the scaling bug at coarser bins. Ignored when return_waveform=True.

  • return_waveform (bool, default False) – If True, self.stims[s] holds the raw audio waveform (1, T_audio) at audio_fs Hz (grid-locked to T_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whose wav2spec slot is a waveform front-end; responses are unchanged.

  • audio_fs (int, default 24000) – Sample rate for waveform mode. The AA4 wavs are 24414 Hz, which gives a non-integer hop at dt=1 ms; the default 24 kHz resamples to a clean hop = 24 (exactly dt=1 ms bins, slightly better than the native spec’s 0.983 ms). Other values must keep audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

  • download (bool, default False) – If True and an animal’s data is missing under path, fetch its tarball (~hundreds of MB per animal) from the NERSC mirror and untar in place. Only the animals listed in animals are downloaded — useful for quick iteration on a subset.

  • username (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

  • password (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

deepSTRF.datasets.audio.crcns_aa4.download_aa4(dest: str | None = None, animals: Sequence[str] = ('BlaBro09xxF', 'GreBlu9508M', 'LblBlu2028M', 'WhiBlu5396M', 'WhiWhi4522M', 'YelBlu6903F'), username: str | None = None, password: str | None = None) str[source]

Download CRCNS-AA4 archives from the NERSC mirror into dest.

AA4 is split into one .tar.gz per animal (each is hundreds of MB); by default this fetches all 6, but animals can be narrowed to a subset. The CRCNSCode tutorial archive is also fetched (small, ~1 MB).

Idempotent: skips an archive if its animal directory already exists, skips the CRCNSCode archive if CRCNSCode/ already exists.

Parameters:
  • dest (str, optional) – Defaults to default_cache_dir('AA4') ($DEEPSTRF_DATA_DIR overrides).

  • animals (sequence of str, default all 6) – Animals to download. Must be a subset of AA4_ANIMAL_IDS.

  • username (str, optional) – Default to $CRCNS_USERNAME / $CRCNS_PASSWORD.

  • password (str, optional) – Default to $CRCNS_USERNAME / $CRCNS_PASSWORD.

deepSTRF.datasets.audio.espejo module

Espejo (Lopez-Espejo et al. 2019) auditory cortex dataset.

Public Zenodo deposit (DOI 10.5281/zenodo.3445557) ships two disjoint releases — natural sounds (NAT) and vocalization-modulated noise (VMN) — recorded from awake passively-listening ferret A1. One dataset class covers both via the stimuli={'nat', 'vmn'} constructor arg; the two share no cells and have different F (18 vs 2), so they cannot be concatenated.

The on-disk format is NEMS-flavored but we parse it directly with h5py + pandas — see deepSTRF.datasets.audio._espejo_native. No nems0 dependency.

class deepSTRF.datasets.audio.espejo.EspejoDataset(path: str | None = None, stimuli: Literal['nat', 'vmn'] = 'nat', dt_ms: float = 10.0, subset: Literal['all', 'estimation', 'test'] = 'all', cells: Sequence[str] | None = None, smooth: bool = False, return_waveform: bool = False, audio_fs: int = 44100, download: bool = False)[source]

Bases: AudioNeuralDataset

PyTorch dataset for Lopez-Espejo et al. (2019) ferret A1 recordings.

Awake, passively-listening adult ferret primary auditory cortex (A1), extracellularly recorded single units. The dataset ships in two disjoint releases (no cell overlap, different stimulus dimensionality — they cannot be concatenated), selected by the stimuli argument:

  • 'nat': 93 3-second natural sounds (animal vocalizations, speech, environmental, music), stored as 18-band gammatone log-spectrograms (NEMS “ozgf”, F=18). ~540 cells across 35 sites in 6 ferrets; each site presents a subset of the stim bank.

  • 'vmn': 30 3-second vocalization-modulated noise stimuli (two narrowband noise streams modulated by independent natural-vocalization envelopes), stored as 2-band envelopes (“envelope” stimfmt, F=2). ~200 cells across 103 sites in 5 ferrets.

Both releases sample at 100 Hz (dt=10 ms native); the on-disk cochleagrams are log-compressed at source. Each occurrence epoch includes the published 0.5 s pre-stim + 0.5 s post-stim silence flanking the 3 s stimulus, so per-stim tensors are (1, F, 500) (NAT) or (1, F, 400) (VMN). The estimation / test split follows the paper’s split_by_occurrence_counts and is surfaced via the per-stim n_repeats and split metadata fields.

Data are freely available at https://doi.org/10.5281/zenodo.3445557 (no account required) and auto-fetched with download=True.

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). Espejo-specific metadata:

  • stim_meta dicts hold name, type ('nat' / 'vmn'), n_repeats, split ('test' / 'estimation'), duration_s and n_samples.

  • nrn_meta dicts hold cell_id, site, animal_id, channel, unit and experiment_set ('nat' / 'vmn'). unit can be None for VMN cells (2-segment cellids).

The (1, 1) NaN sentinel marks (stim, neuron) pairs the cell was not recorded for (different sites present different stim subsets). Only the pre-computed cochleagrams are in the Zenodo deposit (the raw NAT waveforms are mirrored on the LBHB bitbucket); the loader fixes dt_ms = 10.

References

Lopez Espejo, Schwartz & David (2019). “Spectral tuning of adaptation supports coding of sensory context in auditory cortex.” PLoS Computational Biology 15(10): e1007430. https://doi.org/10.1371/journal.pcbi.1007430

Parameters:
  • path (str, optional) – Path to the Espejo data folder (containing A1_natural_sounds/ and / or A1_voc_mod_noise/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIR overrides).

  • stimuli ({'nat', 'vmn'}) – Which release to load. The two are mutually exclusive (disjoint cells, different F); to use both, instantiate twice and keep them separate.

  • dt_ms (float, default 10.0) – Time-bin width in ms. Currently fixed at 10 ms — the on-disk cochleagrams are precomputed at fs=100 Hz, and the response rasterizer aligns to that grid.

  • subset ({'all', 'estimation', 'test'}, default 'all') – If ‘estimation’ or ‘test’, only that stim subset is kept. Split follows the paper’s split_by_occurrence_counts: test = stims at max repetition count per site; estimation = stims at lower repetition counts.

  • cells (sequence of str, optional) – Whitelist of cell IDs to include (intersection with what’s on disk). None keeps all.

  • smooth (bool, default False) – If True, smooth PSTHs with a 21 ms Hanning window (Hsu / Borst / Theunissen 2004). Off by default — Espejo is typically used as-is.

  • return_waveform (bool, default False) – If True (stimuli='nat' only), hand out the raw natural-sound waveform per stim as a (1, T_audio) mono tensor at audio_fs instead of the precomputed ozgf cochleagram, for use with a learnable wav2spec model front-end. The raw wavs are not in the Zenodo deposit — they are fetched from the LBHB baphy bitbucket mirror (see the README) and cached under <path>/nat_waveforms/. The 4 s sound is inset at the published 0.5 s pre-stim silence offset so it grid-locks to the cochleagram frames (T_audio = T_neural * hop, hop = audio_fs·dt/1000). Responses are identical to cochleagram mode. VMN is unsupported (its stimuli are synthesized 2-band envelopes with no raw audio).

  • audio_fs (int, default 44100) – Sample rate for waveform mode (the native rate of the mirror wavs; ignored — and reported as None — in cochleagram mode). 44.1 kHz grid-locks at dt=10 ms (hop=441).

  • download (bool, default False) – If True and the data is missing under path, fetch the requested archive from Zenodo (record 3445557) and untar in place.

deepSTRF.datasets.audio.espejo.download_espejo(stimuli: str, dest: str | None = None) str[source]

Download one Espejo stimuli set from Zenodo into dest.

Parameters:
  • stimuli ({'nat', 'vmn'})

  • dest (str, optional) – Defaults to default_cache_dir('Espejo') (overridable via $DEEPSTRF_DATA_DIR).

Returns:

The dataset root directory.

Return type:

str

Notes

Idempotent: skips the archive if already present, and skips the untar step if the expected <subdir>/ already exists. NAT is ~638 MB, VMN is ~25 MB.

deepSTRF.datasets.audio.espejo.download_espejo_nat_waveforms(names: Sequence[str], dest: str | None = None, *, progress: bool = True) Dict[str, str][source]

Fetch the raw NAT waveforms from the LBHB baphy bitbucket mirror.

Parameters:
  • names (sequence of str) – Stim names as they appear in stim_meta (STIM_<file>.wav); the STIM_ prefix is stripped to get the on-mirror filename.

  • dest (str, optional) – Parent directory; wavs are cached under <dest>/nat_waveforms/. Defaults to default_cache_dir('Espejo').

  • progress (bool, default True) – Show a tqdm bar over the (missing) downloads.

Returns:

name -> local wav path for every name found on the mirror.

Return type:

dict

Notes

Idempotent — already-cached wavs are skipped. Each filename is tried in sounds_set3/ first, then sounds/. Names found in neither are collected and surfaced by the caller (the dataset raises on genuine misses so waveform mode never silently substitutes silence).

deepSTRF.datasets.audio.alice_eeg module

Alice EEG dataset adapted to the deepSTRF paradigm.

See docs/_source/md/README_Alice_EEG.md for context, citation, and the benchmark comparison against Brodbeck et al. 2023 (eLife).

class deepSTRF.datasets.audio.alice_eeg.AliceEEGDataset(path: str | None = None, subjects: Sequence[str] | None = None, dt_ms: float = 10.0, n_frequency_bands: int = 8, treat_subjects_as: str = 'neurons', hp_freq_hz: float | None = 1.0, lp_freq_hz: float | None = None, window_ms: float | None = None, fmin: float = 80.0, fmax: float | None = None, spec_backend: str = 'gaussian', download: bool = False, return_waveform: bool = False, audio_fs: int = 44100)[source]

Bases: AudioNeuralDataset

PyTorch dataset for EEG from the Alice audiobook listening paradigm.

33 human participants listened to the first chapter of Alice in Wonderland (~12.4 min) split into 12 audio segments, recorded with 61 EEG channels per subject (10-20-like montage). Bad channels and bad artifact windows (marked in the source .fif metadata) are converted to NaN at the response level. Each subject heard each segment once (R = 1). deepSTRF consumes Brodbeck et al. 2023’s restructured release (UMd PULFR 10.13016/pulf-lndn): per-subject MNE .fif files plus 12 audio segments and a word-onset table. See docs/_source/md/README_Alice_EEG.md for the full dataset notes.

The treat_subjects_as argument selects one of two layouts:

  • "neurons" (default): every (subject, channel) pair becomes a “neuron”; N = sum_s(n_channels_s) and R = 1 everywhere. Bad channels carry the structural NaN sentinel. Use corrcoef / fve.

  • "repeats": subjects are treated as repeats of a shared canonical per-channel EEG response; N = n_montage_channels (e.g. 61) and R = n_subjects. Bad (channel, subject) combinations become NaN repeat slabs. Useful for inter-subject reliability (ISC-style) via normalized_corrcoef(method='schoppe') — but note this is inter-subject reliability, not trial reliability, so the iid-trial noise model the Schoppe correction assumes does not strictly hold; treat the resulting ceiling as a group-level sanity check.

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). Alice-specific metadata:

  • stims are S = 12 log-power ERB-band spectrograms (1, F, T_s) (a gammatone approximation; see _gammatone_spectrogram).

  • stim_meta dicts hold name, type, sample_rate, n_samples and duration_s.

  • nrn_meta dicts hold channel_id, subject, area and xyz in "neurons" mode; a channel-only entry in "repeats" mode.

The default spec_backend='gaussian' is a frequency-domain Gaussian approximation of Brodbeck 2023’s time-domain gammatone (Heeris) filterbank — spectrally equivalent to first order but with lower dynamic range and less time-localized transients. spec_backend='heeris' selects the paper-faithful bank (requires the optional gammatone package in the [eeg] extra). The window_ms / fmin / fmax constructor knobs control the FFT window and ERB-band edges; their defaults preserve the historical behaviour, so no existing fits change.

References

Bhattasali et al. (2020). “The Alice Datasets: fMRI & EEG Observations of Natural Language Comprehension.” LREC.

Brennan et al. (2019). “Hierarchical structure guides rapid linguistic predictions during naturalistic listening.” PLOS ONE.

Brodbeck et al. (2023). Eelbrain methods paper. eLife (Tools & Resources).

Parameters:
  • path (str, optional) – Path to the Brodbeck-restructured Alice EEG data directory (containing eeg.0/, eeg.1/, eeg.2/, stimuli/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIR overrides).

  • subjects (sequence of str, optional) – Subject ids (e.g. ["S01", "S20"]). If None, all subjects discovered on disk are used.

  • dt_ms (float, default 10.0) – Time-bin width in ms (100 Hz with the default — matches the eelbrain paper’s analysis rate).

  • n_frequency_bands (int, default 8) – ERB-band count for the gammatone-equivalent spectrogram. Matches Brodbeck 2023 Fig 4.

  • treat_subjects_as ({"neurons", "repeats"}, default "neurons") – See the class docstring.

  • hp_freq_hz (float or None, default 1.0) – High-pass cutoff applied via raw.filter before downsampling and segmentation. The Brodbeck restructure ships data with a 0.1 Hz HP, which leaves enough slow drift across the ~12 min recording that per-segment baselines vary by >1 SD — fatal for held-out fve. Brodbeck applies 1 Hz HP in the paper’s analysis pipeline; we mirror that as the default. Pass None to skip.

  • lp_freq_hz (float or None, default None) – Optional low-pass cutoff. Useful if you want to focus on the cortical-tracking band (< 40 Hz) or the envelope-tracking band (< 8 Hz).

  • window_ms (float, optional) – FFT analysis-window length in ms for the stimulus spectrogram. None preserves the legacy n_fft=1024 default — at the audiobook sample rate (16 kHz) this gives a ~64 ms window; at 44.1 kHz, ~23 ms. Pass an explicit window_ms to override (e.g. 25.0 for the Kaldi convention). The spec pipeline is otherwise unchanged from the audit baseline — see the “Audit status” callout below before benchmarking against Brodbeck 2023.

  • fmin (float, optional) – Lower and upper ERB-band edges in Hz. Default 80.0 and sr/2 (Nyquist). For speech-tracking work, pass fmax=8000 to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).

  • fmax (float, optional) – Lower and upper ERB-band edges in Hz. Default 80.0 and sr/2 (Nyquist). For speech-tracking work, pass fmax=8000 to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).

  • spec_backend ({'gaussian', 'heeris'}, default 'gaussian') –

    Spec-pipeline backend.

    • 'gaussian' (back-compat): frequency-domain Gaussian ERB filterbank — the deepSTRF approximation that’s been shipped to date.

    • 'heeris' (paper-faithful, requires gammatone PyPI package): time-domain Heeris filterbank, same as Brodbeck et al. 2023 via eelbrain’s gammatone_bank. See the empirical comparison at untracked/alice_eeg_spec_compare.py — Heeris has visibly sharper time-localization and broader dynamic range. Recommended when reproducing the paper.

  • download (bool, default False) – If True and the data is missing under path, fetch the four zips from the UMd DRUM mirror (~2.5 GiB total; anonymous HTTPS). Idempotent — skips any zip / unpacked subtree already present.

deepSTRF.datasets.audio.alice_eeg.download_alice_eeg(dest: str | None = None) str[source]

Download Brodbeck’s restructured Alice EEG release from UMd DRUM.

Idempotent: skips any zip that’s already on disk and any subdirectory that’s already unpacked. Returns the dataset directory.

Parameters:

dest (str, optional) – Defaults to the platformdirs cache (overridable via $DEEPSTRF_DATA_DIR).

Notes

~2.5 GiB total across four zips. Anonymous HTTPS; no auth.

deepSTRF.datasets.audio.le_2025 module

Le, Bjoring & Meliza (2025), Nature Communications — zebra finch dataset.

“The zebra finch auditory cortex reconstructs occluded syllables in conspecific song.” DOI: 10.1038/s41467-025-63182-y. Data: 10.6084/m9.figshare.29203457. Code: github.com/melizalab/auditory-restoration.

Single-unit extracellular recordings from the auditory pallium of anesthetized adult zebra finches, in response to 8 natural song motifs (and in cohort 3, 8 scrambled-syntax pseudo-motifs) presented in up to 7 variants per critical interval (CI) to probe the neural correlate of auditory restoration.

Sub-experiments (one experiment= per instance; concat for the union):

nat8a

Cohorts 1 & 2 — natural motifs (8 birds × 2 CIs × {C, G, N, GB, CB}). Cohort 1 (alpha) had a familiarity manipulation; cohort 2 (beta) did not. No masking variants.

nat8b

Cohort 3 — same natural motifs renamed nat8mk0..7, full set of 7 variants per CI (adds GM, CM).

synth8b

Cohort 3 — 8 scrambled-syntax pseudo-motifs, full variant set.

Per-CI variants:

C (Continuous)

Unmodified motif; shared across both CIs.

G (Gap)

CI replaced by silence.

N (Noise)

CI-duration noise burst in isolation.

GB (Gap + Burst)

CI replaced by noise within the motif; the illusion-inducing stimulus.

CB (Continuous + Burst)

Motif unchanged, noise added on top of the CI.

GM (Gap-Masked)

Whole motif masked, CI deleted (nat8b / synth8b only).

CM (Continuous-Masked)

Whole motif masked, CI intact (nat8b / synth8b only). CM is CI-independent, so it lives once per motif.

class deepSTRF.datasets.audio.le_2025.Le2025Dataset(path: str | Path | None = None, experiment: str = 'nat8b', dt_ms: float = 5.0, n_bands: int = 50, fmin: float = 1000.0, fmax: float = 8000.0, window_ms: float = 2.5, smooth: bool = True, keep_areas: Sequence[str] | None = None, compute_reliability: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]

Bases: AudioNeuralDataset

deepSTRF wrapper for one sub-experiment of Le, Bjoring & Meliza (2025).

Instantiate one per experiment ("nat8a" | "nat8b" | "synth8b") and concatenate with concat_neural_datasets (or ds_a + ds_b) for the union; the three experiments share no stimuli, so the bidirectional selection rule in the base class hides cross-experiment NaN-only entries automatically.

Layout on disk (as shipped on figshare):

<path>/ ├── metadata/ │ ├── recordings.csv area for nat8a-beta / nat8b / synth8b │ ├── song-birds.csv motif name mapping + CI timings (ms) │ └── ephys-birds.csv cohort, sex, age, familiarity group ├── nat8a-alpha-responses/ cohort 1 (familiarity manipulation) ├── nat8a-beta-responses/ cohort 2 ├── nat8a-stimuli/ shared by alpha + beta ├── nat8b-responses/ cohort 3 (natural-syntax) ├── nat8b-stimuli/ ├── synth8b-responses/ cohort 3 (scrambled-syntax) └── synth8b-stimuli/

Two pprox schemas coexist in the archive: the legacy spec:2/pprox (used only by nat8a-alpha; spike times in ms, condition field encodes variant) and spec:2/stimtrial (everything else; spike times in s, stimulus dict carries the full filename stem). Both are handled below.

Parameters:
  • path – Filesystem path to the unpacked figshare archive (the directory that contains metadata/, *-responses/, *-stimuli/).

  • experiment – One of "nat8a" | "nat8b" | "synth8b". nat8a unifies the two cohorts (alpha + beta) that share the same stim set; use select_pop_by_nrn_attr("cohort", 1) to restrict to the familiarity sub-experiment.

  • dt_ms – Bin width in ms. Default 5; pass dt_ms=1 for paper-faithful spectrogram + response binning (the paper uses 1 ms throughout).

  • n_bands – Number of gammatone bands. Default 50, matching the paper.

  • fmin – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.

  • fmax – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.

  • window_ms – Gammatone analysis-window width, in ms. Default 2.5, matching the paper. Hop is dt_ms.

  • smooth – If True (default), apply a 21 ms Hanning smoother to all PSTHs (Hsu / Borst / Theunissen 2004).

  • keep_areas – Optional iterable of area strings to filter on (per the per-unit area metadata field; values vary by cohort).

  • compute_reliability – If True (default), pre-compute per-neuron Sahani–Linden signal power, noise power, and SNR (length-weighted across stims) and attach them to nrn_meta. Set to False for fast iteration when reliability filtering is not needed.

  • download – If True and path=None, fetches the ~105 MB figshare archive (DOI 10.6084/m9.figshare.29203457) into the deepSTRF cache and unpacks it. Idempotent: skips the download if the unpacked tree is already present.

Notes

Stim-side metadata fields:
  • name : filename stem.

  • motif : e.g. "B189" (nat8a) or "nat8mk0" (nat8b).

  • critical_intervalint (1/2) for per-CI variants, None

    for C and CM (CI-independent), or a string like "2a" for synth8b.

  • variant : one of VARIANTS.

  • syntax : "natural" or "scrambled".

  • experiment : "nat8a" | "nat8b" | "synth8b".

  • sample_rate_hz : native sample rate of the source wav.

  • duration_s : duration of the source wav, in seconds.

  • ci_onset_s/ci_offset_scritical-interval bounds in seconds

    (NaN for C/CM and for synth8b — the CI table only covers nat8a/nat8b).

Per-neuron metadata fields:

cell_id, animal_id, animal_uuid, cohort, experiment, area, hemisphere, familiar_motifs (list of motif IDs the bird was reared with; empty unless cohort 1), sex, age_days, pprox_file.

select_critical_interval(ci) List[int][source]

Restrict iteration to one CI index (or None for C/CM variants).

select_motif(motif: str) List[int][source]

Restrict iteration to all variants of one motif.

select_restoration_quartet(motif: str, ci, variants: Sequence[str] = ('C', 'CB', 'GB', 'GM')) List[int][source]

Select the stim set used in the paper’s core restoration analysis for one (motif, CI). Defaults to C / CB / GB / GM — the four trajectories compared in Fig. 4. Returns the selected stim indices.

The C continuous and CM masked variants are CI-independent and are kept whenever they appear in variants, regardless of ci.

select_variant(variant: str) List[int][source]

Restrict iteration to one variant code, e.g. 'GB'.

Module contents

class deepSTRF.datasets.audio.AliceEEGDataset(path: str | None = None, subjects: Sequence[str] | None = None, dt_ms: float = 10.0, n_frequency_bands: int = 8, treat_subjects_as: str = 'neurons', hp_freq_hz: float | None = 1.0, lp_freq_hz: float | None = None, window_ms: float | None = None, fmin: float = 80.0, fmax: float | None = None, spec_backend: str = 'gaussian', download: bool = False, return_waveform: bool = False, audio_fs: int = 44100)[source]

Bases: AudioNeuralDataset

PyTorch dataset for EEG from the Alice audiobook listening paradigm.

33 human participants listened to the first chapter of Alice in Wonderland (~12.4 min) split into 12 audio segments, recorded with 61 EEG channels per subject (10-20-like montage). Bad channels and bad artifact windows (marked in the source .fif metadata) are converted to NaN at the response level. Each subject heard each segment once (R = 1). deepSTRF consumes Brodbeck et al. 2023’s restructured release (UMd PULFR 10.13016/pulf-lndn): per-subject MNE .fif files plus 12 audio segments and a word-onset table. See docs/_source/md/README_Alice_EEG.md for the full dataset notes.

The treat_subjects_as argument selects one of two layouts:

  • "neurons" (default): every (subject, channel) pair becomes a “neuron”; N = sum_s(n_channels_s) and R = 1 everywhere. Bad channels carry the structural NaN sentinel. Use corrcoef / fve.

  • "repeats": subjects are treated as repeats of a shared canonical per-channel EEG response; N = n_montage_channels (e.g. 61) and R = n_subjects. Bad (channel, subject) combinations become NaN repeat slabs. Useful for inter-subject reliability (ISC-style) via normalized_corrcoef(method='schoppe') — but note this is inter-subject reliability, not trial reliability, so the iid-trial noise model the Schoppe correction assumes does not strictly hold; treat the resulting ceiling as a group-level sanity check.

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). Alice-specific metadata:

  • stims are S = 12 log-power ERB-band spectrograms (1, F, T_s) (a gammatone approximation; see _gammatone_spectrogram).

  • stim_meta dicts hold name, type, sample_rate, n_samples and duration_s.

  • nrn_meta dicts hold channel_id, subject, area and xyz in "neurons" mode; a channel-only entry in "repeats" mode.

The default spec_backend='gaussian' is a frequency-domain Gaussian approximation of Brodbeck 2023’s time-domain gammatone (Heeris) filterbank — spectrally equivalent to first order but with lower dynamic range and less time-localized transients. spec_backend='heeris' selects the paper-faithful bank (requires the optional gammatone package in the [eeg] extra). The window_ms / fmin / fmax constructor knobs control the FFT window and ERB-band edges; their defaults preserve the historical behaviour, so no existing fits change.

References

Bhattasali et al. (2020). “The Alice Datasets: fMRI & EEG Observations of Natural Language Comprehension.” LREC.

Brennan et al. (2019). “Hierarchical structure guides rapid linguistic predictions during naturalistic listening.” PLOS ONE.

Brodbeck et al. (2023). Eelbrain methods paper. eLife (Tools & Resources).

Parameters:
  • path (str, optional) – Path to the Brodbeck-restructured Alice EEG data directory (containing eeg.0/, eeg.1/, eeg.2/, stimuli/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIR overrides).

  • subjects (sequence of str, optional) – Subject ids (e.g. ["S01", "S20"]). If None, all subjects discovered on disk are used.

  • dt_ms (float, default 10.0) – Time-bin width in ms (100 Hz with the default — matches the eelbrain paper’s analysis rate).

  • n_frequency_bands (int, default 8) – ERB-band count for the gammatone-equivalent spectrogram. Matches Brodbeck 2023 Fig 4.

  • treat_subjects_as ({"neurons", "repeats"}, default "neurons") – See the class docstring.

  • hp_freq_hz (float or None, default 1.0) – High-pass cutoff applied via raw.filter before downsampling and segmentation. The Brodbeck restructure ships data with a 0.1 Hz HP, which leaves enough slow drift across the ~12 min recording that per-segment baselines vary by >1 SD — fatal for held-out fve. Brodbeck applies 1 Hz HP in the paper’s analysis pipeline; we mirror that as the default. Pass None to skip.

  • lp_freq_hz (float or None, default None) – Optional low-pass cutoff. Useful if you want to focus on the cortical-tracking band (< 40 Hz) or the envelope-tracking band (< 8 Hz).

  • window_ms (float, optional) – FFT analysis-window length in ms for the stimulus spectrogram. None preserves the legacy n_fft=1024 default — at the audiobook sample rate (16 kHz) this gives a ~64 ms window; at 44.1 kHz, ~23 ms. Pass an explicit window_ms to override (e.g. 25.0 for the Kaldi convention). The spec pipeline is otherwise unchanged from the audit baseline — see the “Audit status” callout below before benchmarking against Brodbeck 2023.

  • fmin (float, optional) – Lower and upper ERB-band edges in Hz. Default 80.0 and sr/2 (Nyquist). For speech-tracking work, pass fmax=8000 to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).

  • fmax (float, optional) – Lower and upper ERB-band edges in Hz. Default 80.0 and sr/2 (Nyquist). For speech-tracking work, pass fmax=8000 to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).

  • spec_backend ({'gaussian', 'heeris'}, default 'gaussian') –

    Spec-pipeline backend.

    • 'gaussian' (back-compat): frequency-domain Gaussian ERB filterbank — the deepSTRF approximation that’s been shipped to date.

    • 'heeris' (paper-faithful, requires gammatone PyPI package): time-domain Heeris filterbank, same as Brodbeck et al. 2023 via eelbrain’s gammatone_bank. See the empirical comparison at untracked/alice_eeg_spec_compare.py — Heeris has visibly sharper time-localization and broader dynamic range. Recommended when reproducing the paper.

  • download (bool, default False) – If True and the data is missing under path, fetch the four zips from the UMd DRUM mirror (~2.5 GiB total; anonymous HTTPS). Idempotent — skips any zip / unpacked subtree already present.

class deepSTRF.datasets.audio.AudioNeuralDataset(path: str, dt_ms: float)[source]

Bases: NeuralDataset

Neural dataset class for auditory stimuli.

Stim shape is polymorphic depending on the loading mode:

  • Spectrogram mode (default): self.stims[s] is a (1, F, T) tensor where F = self.F is the frequency-band count and T is the neural time-bin count.

  • Waveform mode (opt-in, subclass-specific): self.stims[s] is a (1, T_audio) mono float32 tensor at sample rate self.audio_fs. Subclasses that support this mode expose a return_waveform=True constructor flag and set self.audio_fs to a positive int. The (1, ...) leading dim is the mono-channel axis, kept for collate compatibility (neural_collate zero-pads the last axis only).

Subclasses must additionally set self.F (number of frequency bins in the spectrogram — kept positive even in waveform mode so downstream models know the target spectrogram width a wav2spec module should produce) in their __init__, before calling self.validate().

F

Frequency-band count of the target spectrogram. Set by the subclass.

Type:

int

audio_fs

Sample rate of the raw waveform when in waveform mode; None otherwise. Subclasses without a waveform branch leave this None.

Type:

int or None

hearing_range_hz

Optional (low, high) informational bound on the species’ canonical hearing range in Hz (e.g. (200.0, 40000.0) for ferret). Purely advisory — nothing is enforced against it; it exists so notebooks / tooling can display the range and users can choose to clamp a wav2spec’s frequency limits. None when unknown.

Type:

tuple of float or None

get_F()[source]

Return the number of frequency bins in the spectrograms.

Returns:

self.F, the spectrogram frequency-band count.

Return type:

int

property hop: int | None

Audio samples per neural bin in waveform mode (None in spec mode).

hop = round(audio_fs * dt_ms / 1000). The grid-lock contract (see validate()) requires this to be an exact integer, so a wav2spec front-end’s own hop must equal this value for the audio→neural resampling to stay aligned with the response bins.

validate()[source]

Check that the instance is deepSTRF-compatible.

Subclasses should call super().validate() and then add their own checks (e.g. AudioNeuralDataset checks self.F > 0).

class deepSTRF.datasets.audio.CRCNSAA1Dataset(path: str | None = None, areas=('Field_L', 'MLd'), stimuli=('conspecific', 'flatrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the CRCNS-AA1 recordings.

Extracellular, spike-sorted single units of anesthetized male zebra finches: 50 cells in Field L and 50 in MLd, recorded in response to 10 clips of conspecific vocalizations and 20 clips of flat ripples (up to 5 s each, ~10 trials on average). Data are available at https://crcns.org/data-sets/aa/aa-1/about (free CRCNS account); see the AA1 README in the deepSTRF docs for the full notes.

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). AA1-specific metadata:

  • stims are mel-spectrograms (1, F, T_s).

  • stim_meta dicts hold name, type, sample_rate, n_samples and duration_s.

  • nrn_meta dicts hold cell_id, animal_id, area, cell_seq and rig. cell_seq is the sequential cell index parsed from the cell folder name (the n-th cell recorded); rig is the single-letter rig label when present, else None (cells “4_A” and “4_B” were recorded simultaneously, possibly in different areas).

Two cells lack conspecific responses: pipu1018_2_A (MLd) and pipu1018_2_B (Field_L).

References

Woolley et al. (2005). “Tuning for Spectro-temporal Modulations: a Mechanism for Auditory Discrimination of Natural Sound.”

Hsu et al. (2004). “Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons.”

Singh & Theunissen (2003). “Modulation spectra of natural sounds and ethological theories of auditory processing.”

Initializes the AA1 Dataset.

Parameters:
  • path (str, optional) – Path to the AA1 data folder (containing Field_L_cells/, MLd_cells/, all_stims/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIR overrides).

  • areas (tuple of str) – Recording sites: ‘Field_L’ or ‘MLd’.

  • stimuli (tuple of str) – Stimulus types: ‘conspecific’ or ‘flatrip’.

  • dt_ms (float) – Time step size in ms.

  • n_mels (int) – Number of mel frequency bands.

  • compression (str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored when return_waveform=True.

  • window_ms (float, default 10.0) – FFT analysis-window length in ms. n_fft is computed as round(window_ms * 1e-3 * sample_rate) and is decoupled from ``hop_length`` so phonemic detail is preserved at any dt_ms. Earlier versions of this dataset hardcoded n_fft = 10 * hop_length — benign at the default dt_ms=1 (10 ms FFT window), but at dt_ms=50 the same formula produced a 500 ms FFT window and over-smoothed every spec frame. The default window_ms=10.0 preserves bit-identical behaviour at dt_ms=1 while removing the scaling bug at coarser bins. Speech-pipeline users may prefer window_ms=25.0 (Kaldi default). Ignored when return_waveform=True.

  • return_waveform (bool, default False) – If True, self.stims[s] holds the raw audio waveform (1, T_audio) at audio_fs Hz (grid-locked to T_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whose wav2spec slot is a waveform front-end (see deepSTRF.models.wav2spec); responses are unchanged.

  • audio_fs (int, default 32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA1 wavs (so no resampling); other values resample and must keep audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

  • download (bool, default False) – If True and the data is missing under path, fetch the ~17 MB CRCNS-AA1 archive from the NERSC mirror (free CRCNS account required; see crcns_download) and unzip in place.

  • username (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.

  • password (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.

class deepSTRF.datasets.audio.CRCNSAA2Dataset(path: str | None = None, areas=('Field_L', 'mld', 'OV', 'CM', 'None'), stimuli=('conspecific', 'flatrip', 'songrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the CRCNS-AA2 recordings (OV, MLd, Field L, CM).

494 extracellular, spike-sorted single units of male zebra finches, identified in OV, MLd, Field L, L1, L2a, L2b, L3 (and some with unidentified area, None). Three stimulus classes — conspecific songs (72 stims), flat ripples (20) and song ripples (25) — each presented 10-20 times, with low trial-to-trial variability. Almost all cells saw conspecific and songrip stimuli; about half saw flatrip. Population fitting-compatible. Data are available at https://crcns.org/data-sets/aa/aa-2/about (free CRCNS account).

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). AA2-specific metadata:

  • stims are mel-spectrograms (1, F, T_s).

  • stim_meta dicts hold name, type, sample_rate, n_samples and duration_s (the last three from data/stim_data.csv).

  • nrn_meta dicts hold cell_id, animal_id, area, cell_seq and rig (see CRCNSAA1Dataset for the cell-name format; rig is often None in AA2).

References

Gill et al. (2006). “Sound representation methods for spectro-temporal receptive field estimation.”

Amin et al. (2010). “Role of the Zebra Finch Auditory Thalamus in Generating Complex Representations for Natural Sounds.”

Initializes the AA2 Dataset.

Parameters:
  • path (str, optional) – Path to the AA2 data folder. Defaults to the platformdirs cache.

  • areas (tuple of str) – Recording sites of interest: ‘Field_L’, ‘L1’, ‘L2a’, ‘L2b’, ‘L3’, ‘mld’, ‘OV’, ‘CM’, or ‘None’.

  • stimuli (tuple of str) – Stimulus types of interest: ‘conspecific’, ‘flatrip’, ‘songrip’.

  • dt_ms (float) – Time step size in ms.

  • n_mels (int) – Number of mel frequency bands.

  • compression (str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored when return_waveform=True.

  • window_ms (float, default 10.0) – FFT analysis-window length in ms. n_fft is computed as round(window_ms * 1e-3 * sample_rate) and is decoupled from ``hop_length`` so phonemic detail is preserved at any dt_ms. Earlier versions of this dataset hardcoded n_fft = 10 * hop_length, which gave a benign 10 ms FFT window at dt_ms=1 but a 500 ms window at dt_ms=50. Default window_ms=10.0 preserves bit-identical behaviour at dt_ms=1 while fixing the scaling bug at coarser bins. Ignored when return_waveform=True.

  • return_waveform (bool, default False) – If True, self.stims[s] holds the raw audio waveform (1, T_audio) at audio_fs Hz (grid-locked to T_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whose wav2spec slot is a waveform front-end; responses are unchanged.

  • audio_fs (int, default 32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA2 wavs (no resampling); other values resample and must keep audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

  • download (bool, default False) – If True and the data is missing under path, fetch the ~30 MB worth of CRCNS-AA2 archives from the NERSC mirror (free CRCNS account required) and extract in place.

  • username (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

  • password (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

class deepSTRF.datasets.audio.CRCNSAA4Dataset(path: str | None = None, animals='all', stimuli=('song', 'call', 'mlnoise'), dt_ms=1.0, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 24000, download: bool = False, username: str | None = None, password: str | None = None)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the CRCNS-AA4 recordings.

1401 extracellular, spike-sorted single and multi units of adult zebra finches (4 males, 2 females) in Field L, caudolateral and caudomedial mesopallium (CLM, CMM) and caudomedial nidopallium (NCM) — though units were not precisely assigned to one of these areas. Three stimulus classes (conspecific songs, calls, ripple noise), each a few seconds long and presented ~10 times. Population- and batch-compatible. Data are available at https://crcns.org/data-sets/aa/aa-4/about-aa-4 (free CRCNS account).

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). AA4-specific metadata:

  • stims are mel-spectrograms (1, F, T_s).

  • stim_meta dicts hold name (the stimulus md5 — the canonical identifier, since the wav filename is per-animal and not unique across the corpus), type, class and duration_s (the stim_duration attr from the h5, in seconds).

  • nrn_meta dicts hold: cell_id (h5 basename, no extension), animal_id, sex ('M' / 'F'), site (e.g. "Site1"), electrode (int 1-32, channel index across both 16-channel arrays at a site), ldepth / rdepth (left / right array depth in µm), sort_type ('single' / 'multi'; 'noise' / 'tdt' are filtered out), sort_id (online-sort int) and subsort_id (offline-sort int parsed from the trailing _ss<N>; None if absent).

The dataset paper does not publish a per-cell brain-area assignment, so the depth + electrode-array geometry is the only anatomical proxy; nor does it document which electrode IDs (1-16 vs 17-32) map to the left vs right hemisphere — confirm with the dataset authors before deriving a hemisphere from electrode.

References

Elie & Theunissen (2015). “Meaning in the avian auditory cortex: Neural representation of communication calls.” European Journal of Neuroscience.

Elie & Theunissen (2019). “Invariant neural responses for sensory categories revealed by the time-varying information for communication calls.” PLoS Computational Biology.

Initializes the AA4 Dataset.

Parameters:
  • path (str, optional) – Path to the CRCNS_AA4/data/ folder containing one subfolder per animal (with .h5 cell files + a wavfiles/ directory of stimulus .wav files). Defaults to the platformdirs cache.

  • animals ('all' or sequence of str) – Animals to load (any subset of AA4_ANIMAL_IDS).

  • stimuli (sequence of str) – Stimulus types to keep; subset of {‘song’, ‘call’, ‘mlnoise’}.

  • dt_ms (float) – Time-bin width in ms.

  • smooth (bool) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).

  • n_mels (int) – Number of mel frequency bands of the stimulus spectrogram.

  • compression ({'cubic', 'log1p', 'none'}) – Compression applied to the spectrogram (saturation effect of hair cells). Ignored when return_waveform=True.

  • window_ms (float, default 10.0) – FFT analysis-window length in ms. n_fft is computed per-stim as round(window_ms * 1e-3 * sample_rate) and is decoupled from ``hop_length`` so phonemic detail is preserved at any dt_ms. Earlier versions of this dataset hardcoded n_fft = hop * 10 — at dt_ms=50 that gave a 500 ms FFT window and over-smoothed every spec frame. Default window_ms=10.0 preserves bit-identical behaviour at dt_ms=1 (n_fft=320 at sr=32 kHz) while fixing the scaling bug at coarser bins. Ignored when return_waveform=True.

  • return_waveform (bool, default False) – If True, self.stims[s] holds the raw audio waveform (1, T_audio) at audio_fs Hz (grid-locked to T_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whose wav2spec slot is a waveform front-end; responses are unchanged.

  • audio_fs (int, default 24000) – Sample rate for waveform mode. The AA4 wavs are 24414 Hz, which gives a non-integer hop at dt=1 ms; the default 24 kHz resamples to a clean hop = 24 (exactly dt=1 ms bins, slightly better than the native spec’s 0.983 ms). Other values must keep audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

  • download (bool, default False) – If True and an animal’s data is missing under path, fetch its tarball (~hundreds of MB per animal) from the NERSC mirror and untar in place. Only the animals listed in animals are downloaded — useful for quick iteration on a subset.

  • username (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

  • password (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD. Prefer env vars over passing literals.

class deepSTRF.datasets.audio.CRCNSAC1Dataset(path: str | None = None, experimenter: None | str | Iterable[str] = ('wehr', 'asari'), sites: None | str | Iterable[str] = ('A1', 'MGB'), dt_ms: float = 5.0, fmin: float = 100.0, fmax: float = 45000.0, bins_per_octave: int = 6, window_ms: float | None = None, detrend_med_ms: float = 100.0, detrend_gauss_ms: float = 10.0, gating: RepeatGating | None = None, return_waveform: bool = False, audio_fs: int = 96000, download: bool = False, username: str | None = None, password: str | None = None)[source]

Bases: AudioNeuralDataset

Unified loader for the Wehr + Asari subsets of CRCNS-AC1.

Both subsets are intracellular Vm in anaesthetised rat auditory pathway — Wehr in A1 (whole-cell, sf=4 kHz), Asari in A1 + MGB (whole-cell + cell-attached, sf=10 kHz) — and both record natural- sound responses with multi-trial repeats per stimulus. The loader deduplicates stimuli across cells (via the shared NaN-sentinel paradigm) so the same waveform never gets a duplicate spectrogram when it was presented to multiple cells.

Parameters:
  • path (str, optional) – Directory holding (or about to hold) the three CRCNS-AC1 zips and their extracted contents. Defaults to default_cache_dir( 'CRCNS_AC1') (overridable via $DEEPSTRF_DATA_DIR).

  • experimenter (str or iterable of str, optional) – 'wehr', 'asari', or both. Default loads both.

  • sites (str or iterable of str, optional) –

    'A1' and/or 'MGB'. Default loads both. Wehr is all-A1; Asari has both areas.

    The signal type is not a free choice — it is determined by each cell’s recording mode, because the recording mode dictates what signal physically exists:

    • whole-cell (Wehr A1, Asari A1) → 'subthresh': MedGauss-detrended membrane potential in mV (signed). Action potentials were blocked (Wehr) or not analysed (Asari A1, per the paper); the synaptic input is the signal. Pair with MSE.

    • cell-attached (Asari MGB) → 'spikes': a Hann-smoothed spike-rate PSTH (non-negative). There is no intracellular Vm in cell-attached mode. Pair with Poisson.

    Each cell carries its resolved type in nrn_meta['signal_type']; self.signal_type is that type if the loaded cohort is homogeneous, else 'mixed' (loading A1 + MGB together mixes signed-mV and spike-rate neurons — filter by site / signal_type before training one model across them).

  • dt_ms (float, default 5.0) – Output time-bin width in ms. The Goertzel STFT is parametrised to produce its frames at exactly this resolution (no two-step compute-then-downsample); the response is average-pooled to match.

  • fmin (float) – Spectrogram frequency range in Hz. Defaults to the Asari 2025 layout: (100.0, 45000.0). Pass fmax=25600.0 to recover the Wehr 2024 setting (49 bands).

  • fmax (float) – Spectrogram frequency range in Hz. Defaults to the Asari 2025 layout: (100.0, 45000.0). Pass fmax=25600.0 to recover the Wehr 2024 setting (49 bands).

  • bins_per_octave (int, default 6) – Spectrogram spectral density. With the defaults this yields F=53.

  • window_ms (float, optional) – STFT analysis-window length in ms. Defaults to 2 * dt_ms (legacy MATLAB overlap=2).

  • detrend_med_ms (float, default 100.0) – Median-filter window (ms) for the MedGauss baseline subtracted from each Vm trace. Larger windows remove only slow drift and preserve more low-frequency response dynamics; smaller windows detrend more aggressively. The 100 ms default matches the Rançon 2024/2025 pipeline; the choice is robust (residuals barely change between 100 and 1000 ms because the response is dominated by fast PSP transients).

  • detrend_gauss_ms (float, default 10.0) – Gaussian-smoothing σ (ms) applied to the median-filtered baseline before subtraction.

  • gating (RepeatGating, optional) – Per-repeat artifact-rejection thresholds. Default values gate out repeats with derivative-MAD jumps and excessive dynamic range; see _crcns_ac1_native.RepeatGating.

  • return_waveform (bool, default False) – If True, hand out the raw stimulus waveform per stim as a (1, T_audio) mono tensor resampled to audio_fs instead of the in-loader log-spectrogram, for use with a learnable wav2spec model front-end. Because the source waveforms have heterogeneous sample rates (Asari at 97656 Hz, Wehr differs), they are all resampled to the single audio_fs so the grid-lock (T_audio = T_neural * hop, hop = audio_fs * dt_ms / 1000) holds dataset-wide. offset 0 — the waveform starts at stimulus onset, matching the response trace. Responses are identical to spectrogram mode.

  • audio_fs (int, default 96000) – Common sample rate the waveforms are resampled to in waveform mode (ignored in spectrogram mode, where self.audio_fs is None). The default 96 kHz exceeds twice the default fmax (45 kHz) so no in-band content is lost, and grid-locks cleanly for any integer dt_ms (96000/1000 = 96 samples per ms).

  • download (bool, default False) – If True, fetch the three archives via download_ac1() before extraction. Requires CRCNS credentials.

  • username (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD env vars.

  • password (str, optional) – CRCNS credentials. Default to $CRCNS_USERNAME / $CRCNS_PASSWORD env vars.

Notes

deepSTRF data paradigm — see docs/_source/md/data_paradigm.md. Per-stim metadata:

  • stim_meta dicts hold experimenter, category, idx (Wehr) or class_n / segments / segment_files (Asari), description, duration_s.

  • nrn_meta dicts hold experimenter, session, animal_id, penetration, date, site, recording_type, signal_type ('subthresh' / 'spikes', derived from the recording mode), species, plus _wehr_cell_idx for Wehr cells (used with WEHR_VALID_NEURONS / WEHR_NEURONS_SPLIT_NATURAL for Rançon-paper reproducibility).

References

Machens, Wehr & Zador (2004). J. Neurosci. 24(5):1089-1100. Asari & Zador (2009). J. Neurophysiol. 102(5):2638-2656. Rançon, Masquelier & Cottereau (2025). Commun. Biol. 8:1456.

class deepSTRF.datasets.audio.Downer2025Dataset(path: str | None = None, stimuli: Literal['timit', 'mvocs'] = 'timit', dt_ms: float = 5.0, n_mels: int = 80, compression: Literal['cubic', 'log1p', 'none'] = 'log1p', spec_zscore: bool = True, smooth: bool = True, subset: Literal['all', 'estimation', 'test'] = 'all', animals: str | Sequence[str] = 'all', areas: Sequence[str] | None = None, sessions: Sequence[str] | None = None, audio_fs: int = 16000, fmax: int = 8000, window_ms: float = 25.0, return_waveform: bool = False, download: bool = False, _enumerate_only: bool = False)[source]

Bases: AudioNeuralDataset

Squirrel-monkey auditory cortex (Downer 2025 / Ahmed 2025).

Multi-unit threshold-crossing spike trains from 41 sessions across 3 animals (B, C, F), recorded passively while the animal listened to TIMIT speech and monkey vocalizations. 1718 multi-units total (one per recording channel).

Two stim modes are loaded independently:

  • stimuli='timit' — 499 unique English sentences (489 single-rep + 10 with 11 reps; the 10 form the canonical test subset per Ahmed 2025).

  • stimuli='mvocs' — 303 unique monkey vocalizations (292 single- rep + 11 with 15 reps; the 11 form the canonical test subset).

Both modes share the same recording channels but the per-session response counts differ, so a (cell, stim) pair has (1, 1) NaN where the channel was in a session that did not play that stim.

By default, both stim modes go through the same mel pipeline at audio_fs=16000 / fmax=8000 to match the Ahmed 2025 baseline (cochleagram capped at 8 kHz) and to allow two instances of this class (one per stim mode) to be concatenated via deepSTRF.utils.concat_neural_datasets. Pass return_waveform=True to hand out the raw source waveform per stim (grid-locked to the neural bins) instead, for use with a learnable wav2spec model front-end.

Notes

Phase-1 skeleton — stim and response loading are not yet implemented (will land in subsequent commits). Instantiating with _enumerate_only=True populates self.nrn_meta and self.N_neurons for inspection.

Parameters:
  • path (str, optional) – Path to the unpacked dataset root (the directory containing sessions/ and stimuli/). Defaults to default_cache_dir('Downer2025').

  • stimuli ({'timit', 'mvocs'}) – Which stim class to load. The two share recording channels but are loaded independently; concatenate two instances if you want both.

  • dt_ms (float, default 5.0) – Neural time-bin width. The paper’s main analysis uses 50 ms; 5 ms matches NS1 and gives users the freedom to re-bin.

  • n_mels (int, default 80) – Number of mel bands. Matches Ahmed 2025’s Kaldi fbank (num_mel_bins=80) by default.

  • compression ({'cubic', 'log1p', 'none'}, default 'log1p') – Spectrogram amplitude compression. log1p matches Ahmed 2025’s log-mel (Kaldi fbank). The other deepSTRF audio datasets default to 'cubic'; we picked log1p here to match the paper as closely as possible.

  • spec_zscore (bool, default True) – If True, z-score each spectrogram per-band over its own time axis (= Ahmed 2025’s normalize() helper). Boosts contrast in higher-frequency bands that would otherwise be flattened by the log compression.

  • smooth (bool, default True) – Hsu 2004 21 ms PSTH smoothing.

  • subset ({'all', 'estimation', 'test'}) – 'estimation' keeps the single-rep stims, 'test' keeps the canonical high-rep subset (10 TIMIT IDs or 11 mVocs IDs).

  • animals ('all' or iterable of {'b','c','f'})

  • areas (iterable of {'core'|'primary', 'non-primary'} or fine labels) – {‘A1’,’R’,’ML’,’AL’,’CL’,’CPB’,’RPB’}. 'primary' is an alias for 'core'. None = no filter.

  • sessions (iterable of session-id strings, or None.)

  • audio_fs (int, default 16000) – Common sample rate both stim classes are resampled to before mel. mVocs source is 41 kHz stereo; TIMIT is already 16 kHz.

  • fmax (int, default 8000) – Mel-band high cutoff. Matches Ahmed 2025’s cochleagram.

  • window_ms (float, default 25.0) – FFT analysis-window length in ms (Kaldi default). The window is decoupled from the hop so phonemic detail is preserved regardless of dt_ms. Earlier versions of this dataset hardcoded n_fft = 10 * hop which produced a 500 ms FFT window at dt_ms=50 and over-smoothed the spectrogram so badly that a closed-form ridge STRF only reached cc_norm ≈ 0.40 on the well-tuned cohort. With window_ms=25 the same fit reaches cc_norm ≈ 0.53 — matching Ahmed 2025’s paper-reported STRF baseline.

  • return_waveform (bool, default False) – If True, hand out the raw source waveform per stim as a (1, T_audio) mono tensor at audio_fs instead of the Kaldi-fbank spectrogram, for use with a learnable wav2spec model front-end. The waveform is grid-locked to the neural bins (T_audio = T_neural * hop with hop = audio_fs * dt_ms / 1000) and right-padded / cropped to the canonical stim length. The source audio is already speech-onset aligned (TIMIT befaft silence trimmed; mVocs snippet starts at voc onset), so there is no pre-silence offset. Responses are identical to spectrogram mode. Note the default audio_fs=16000 is band-limited to 8 kHz — pass a higher audio_fs if you want a learnable front-end to see energy above the paper’s cochleagram cutoff.

  • download (bool, default False) – Fetch the 29 GB Zenodo archive (record 16175377) if missing and unzip it. Idempotent — both the download and the unzip steps are skipped when their outputs already exist.

  • _enumerate_only (bool, default False) – Phase-1 internal flag. Populates self.nrn_meta and self.N_neurons then returns, skipping stim and response loading. Will be removed once phases 2–3 are in.

attach_ahmed2025_well_tuned() int[source]

Write the precomputed Ahmed 2025 well-tuned booleans into nrn_meta.

Reads AHMED2025_WELL_TUNED_TIMIT / ..._MVOCS (module constants generated with compute_paper_tuning(n_resamples=10_000, dt_ms_analysis=50.0, seed=0, alpha=0.05, delta=0.5) on the full 1718-channel population) and tags each nrn_meta[i] with:

ahmed2025_<stim>_well_tuned : bool

where <stim> matches self.stimuli ('timit' or 'mvocs'). Cells not in the precomputed list are tagged False.

Returns:

Number of cells flagged True in the current selection.

Return type:

int

Notes

  • For an exact reproduction of the paper’s 404 / 489 well-tuned counts, instantiate at dt_ms=50.0, smooth=False and call compute_paper_tuning(n_resamples=100_000) — the precomputed lists land at 417 / 476, within ~3 % of the paper.

  • This method does not run any computation; it’s pure metadata attachment.

compute_paper_tuning(n_resamples: int = 10000, dt_ms_analysis: float = 50.0, seed: int = 0, alpha: float = 0.05, delta: float = 0.5, verbose: bool = True) dict[source]

Replicate Ahmed 2025’s tuned / well-tuned multi-unit criterion.

For each neuron and each of the M test-split stims, the method randomly samples a pair of reps (with replacement across iterations but never the same rep within a pair), concatenates them into long sequences U and V (each of length sum_s T_s_coarse), and computes their Pearson correlation. The null distribution is built the same way but with each V circularly shifted by a random offset before correlating. n_resamples iterations yield two empirical distributions per neuron.

tuned – one-sided Wilcoxon rank-sum test (true > null) at alpha (default 0.05). Paper target: 1195 (TIMIT) / 1231 (mVocs).

well_tuned – additionally requires (mean(true) - mean(null)) / std(null) >= delta (default 0.5). Paper target: 404 (TIMIT) / 489 (mVocs).

Writes four floats / booleans to each nrn_meta[i], prefixed by the current stim mode:

ahmed2025_{timit|mvocs}_tuned          : bool
ahmed2025_{timit|mvocs}_well_tuned     : bool
ahmed2025_{timit|mvocs}_p_wilcoxon     : float
ahmed2025_{timit|mvocs}_delta_normalized : float
Parameters:
  • n_resamples (int, default 10_000) – Pair-resamplings per neuron. The paper used 100_000; 10_000 is ~10x faster and gives a stable Wilcoxon (the criterion only cares about the rank order of the two distributions).

  • dt_ms_analysis (float, default 50.0) – Coarse bin width for the long-sequence correlations. Must be an integer multiple of self.dt. Paper used 50 ms for the main results, 20 ms for Fig 4.

  • seed (int, default 0) – RNG seed for reproducibility.

  • alpha (float) – Significance and effect-size thresholds (see above).

  • delta (float) – Significance and effect-size thresholds (see above).

  • verbose (bool, default True) – Show a tqdm progress bar.

Returns:

summary – Aggregate counts {'tuned': int, 'well_tuned': int, 'n_with_data': int, 'stimuli': str}.

Return type:

dict

Notes

Re-bins self.responses to dt_ms_analysis on the fly via summing; if smooth=True was passed to the constructor the smoothing has already been applied at self.dt. For the strictest paper match instantiate with smooth=False — though in practice the 21 ms Hanning smoothing × subsequent 50 ms re-binning makes the smoothing nearly invisible.

class deepSTRF.datasets.audio.EspejoDataset(path: str | None = None, stimuli: Literal['nat', 'vmn'] = 'nat', dt_ms: float = 10.0, subset: Literal['all', 'estimation', 'test'] = 'all', cells: Sequence[str] | None = None, smooth: bool = False, return_waveform: bool = False, audio_fs: int = 44100, download: bool = False)[source]

Bases: AudioNeuralDataset

PyTorch dataset for Lopez-Espejo et al. (2019) ferret A1 recordings.

Awake, passively-listening adult ferret primary auditory cortex (A1), extracellularly recorded single units. The dataset ships in two disjoint releases (no cell overlap, different stimulus dimensionality — they cannot be concatenated), selected by the stimuli argument:

  • 'nat': 93 3-second natural sounds (animal vocalizations, speech, environmental, music), stored as 18-band gammatone log-spectrograms (NEMS “ozgf”, F=18). ~540 cells across 35 sites in 6 ferrets; each site presents a subset of the stim bank.

  • 'vmn': 30 3-second vocalization-modulated noise stimuli (two narrowband noise streams modulated by independent natural-vocalization envelopes), stored as 2-band envelopes (“envelope” stimfmt, F=2). ~200 cells across 103 sites in 5 ferrets.

Both releases sample at 100 Hz (dt=10 ms native); the on-disk cochleagrams are log-compressed at source. Each occurrence epoch includes the published 0.5 s pre-stim + 0.5 s post-stim silence flanking the 3 s stimulus, so per-stim tensors are (1, F, 500) (NAT) or (1, F, 400) (VMN). The estimation / test split follows the paper’s split_by_occurrence_counts and is surfaced via the per-stim n_repeats and split metadata fields.

Data are freely available at https://doi.org/10.5281/zenodo.3445557 (no account required) and auto-fetched with download=True.

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). Espejo-specific metadata:

  • stim_meta dicts hold name, type ('nat' / 'vmn'), n_repeats, split ('test' / 'estimation'), duration_s and n_samples.

  • nrn_meta dicts hold cell_id, site, animal_id, channel, unit and experiment_set ('nat' / 'vmn'). unit can be None for VMN cells (2-segment cellids).

The (1, 1) NaN sentinel marks (stim, neuron) pairs the cell was not recorded for (different sites present different stim subsets). Only the pre-computed cochleagrams are in the Zenodo deposit (the raw NAT waveforms are mirrored on the LBHB bitbucket); the loader fixes dt_ms = 10.

References

Lopez Espejo, Schwartz & David (2019). “Spectral tuning of adaptation supports coding of sensory context in auditory cortex.” PLoS Computational Biology 15(10): e1007430. https://doi.org/10.1371/journal.pcbi.1007430

Parameters:
  • path (str, optional) – Path to the Espejo data folder (containing A1_natural_sounds/ and / or A1_voc_mod_noise/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIR overrides).

  • stimuli ({'nat', 'vmn'}) – Which release to load. The two are mutually exclusive (disjoint cells, different F); to use both, instantiate twice and keep them separate.

  • dt_ms (float, default 10.0) – Time-bin width in ms. Currently fixed at 10 ms — the on-disk cochleagrams are precomputed at fs=100 Hz, and the response rasterizer aligns to that grid.

  • subset ({'all', 'estimation', 'test'}, default 'all') – If ‘estimation’ or ‘test’, only that stim subset is kept. Split follows the paper’s split_by_occurrence_counts: test = stims at max repetition count per site; estimation = stims at lower repetition counts.

  • cells (sequence of str, optional) – Whitelist of cell IDs to include (intersection with what’s on disk). None keeps all.

  • smooth (bool, default False) – If True, smooth PSTHs with a 21 ms Hanning window (Hsu / Borst / Theunissen 2004). Off by default — Espejo is typically used as-is.

  • return_waveform (bool, default False) – If True (stimuli='nat' only), hand out the raw natural-sound waveform per stim as a (1, T_audio) mono tensor at audio_fs instead of the precomputed ozgf cochleagram, for use with a learnable wav2spec model front-end. The raw wavs are not in the Zenodo deposit — they are fetched from the LBHB baphy bitbucket mirror (see the README) and cached under <path>/nat_waveforms/. The 4 s sound is inset at the published 0.5 s pre-stim silence offset so it grid-locks to the cochleagram frames (T_audio = T_neural * hop, hop = audio_fs·dt/1000). Responses are identical to cochleagram mode. VMN is unsupported (its stimuli are synthesized 2-band envelopes with no raw audio).

  • audio_fs (int, default 44100) – Sample rate for waveform mode (the native rate of the mirror wavs; ignored — and reported as None — in cochleagram mode). 44.1 kHz grid-locks at dt=10 ms (hop=441).

  • download (bool, default False) – If True and the data is missing under path, fetch the requested archive from Zenodo (record 3445557) and untar in place.

class deepSTRF.datasets.audio.Le2025Dataset(path: str | Path | None = None, experiment: str = 'nat8b', dt_ms: float = 5.0, n_bands: int = 50, fmin: float = 1000.0, fmax: float = 8000.0, window_ms: float = 2.5, smooth: bool = True, keep_areas: Sequence[str] | None = None, compute_reliability: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]

Bases: AudioNeuralDataset

deepSTRF wrapper for one sub-experiment of Le, Bjoring & Meliza (2025).

Instantiate one per experiment ("nat8a" | "nat8b" | "synth8b") and concatenate with concat_neural_datasets (or ds_a + ds_b) for the union; the three experiments share no stimuli, so the bidirectional selection rule in the base class hides cross-experiment NaN-only entries automatically.

Layout on disk (as shipped on figshare):

<path>/ ├── metadata/ │ ├── recordings.csv area for nat8a-beta / nat8b / synth8b │ ├── song-birds.csv motif name mapping + CI timings (ms) │ └── ephys-birds.csv cohort, sex, age, familiarity group ├── nat8a-alpha-responses/ cohort 1 (familiarity manipulation) ├── nat8a-beta-responses/ cohort 2 ├── nat8a-stimuli/ shared by alpha + beta ├── nat8b-responses/ cohort 3 (natural-syntax) ├── nat8b-stimuli/ ├── synth8b-responses/ cohort 3 (scrambled-syntax) └── synth8b-stimuli/

Two pprox schemas coexist in the archive: the legacy spec:2/pprox (used only by nat8a-alpha; spike times in ms, condition field encodes variant) and spec:2/stimtrial (everything else; spike times in s, stimulus dict carries the full filename stem). Both are handled below.

Parameters:
  • path – Filesystem path to the unpacked figshare archive (the directory that contains metadata/, *-responses/, *-stimuli/).

  • experiment – One of "nat8a" | "nat8b" | "synth8b". nat8a unifies the two cohorts (alpha + beta) that share the same stim set; use select_pop_by_nrn_attr("cohort", 1) to restrict to the familiarity sub-experiment.

  • dt_ms – Bin width in ms. Default 5; pass dt_ms=1 for paper-faithful spectrogram + response binning (the paper uses 1 ms throughout).

  • n_bands – Number of gammatone bands. Default 50, matching the paper.

  • fmin – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.

  • fmax – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.

  • window_ms – Gammatone analysis-window width, in ms. Default 2.5, matching the paper. Hop is dt_ms.

  • smooth – If True (default), apply a 21 ms Hanning smoother to all PSTHs (Hsu / Borst / Theunissen 2004).

  • keep_areas – Optional iterable of area strings to filter on (per the per-unit area metadata field; values vary by cohort).

  • compute_reliability – If True (default), pre-compute per-neuron Sahani–Linden signal power, noise power, and SNR (length-weighted across stims) and attach them to nrn_meta. Set to False for fast iteration when reliability filtering is not needed.

  • download – If True and path=None, fetches the ~105 MB figshare archive (DOI 10.6084/m9.figshare.29203457) into the deepSTRF cache and unpacks it. Idempotent: skips the download if the unpacked tree is already present.

Notes

Stim-side metadata fields:
  • name : filename stem.

  • motif : e.g. "B189" (nat8a) or "nat8mk0" (nat8b).

  • critical_intervalint (1/2) for per-CI variants, None

    for C and CM (CI-independent), or a string like "2a" for synth8b.

  • variant : one of VARIANTS.

  • syntax : "natural" or "scrambled".

  • experiment : "nat8a" | "nat8b" | "synth8b".

  • sample_rate_hz : native sample rate of the source wav.

  • duration_s : duration of the source wav, in seconds.

  • ci_onset_s/ci_offset_scritical-interval bounds in seconds

    (NaN for C/CM and for synth8b — the CI table only covers nat8a/nat8b).

Per-neuron metadata fields:

cell_id, animal_id, animal_uuid, cohort, experiment, area, hemisphere, familiar_motifs (list of motif IDs the bird was reared with; empty unless cohort 1), sex, age_days, pprox_file.

select_critical_interval(ci) List[int][source]

Restrict iteration to one CI index (or None for C/CM variants).

select_motif(motif: str) List[int][source]

Restrict iteration to all variants of one motif.

select_restoration_quartet(motif: str, ci, variants: Sequence[str] = ('C', 'CB', 'GB', 'GM')) List[int][source]

Select the stim set used in the paper’s core restoration analysis for one (motif, CI). Defaults to C / CB / GB / GM — the four trajectories compared in Fig. 4. Returns the selected stim indices.

The C continuous and CM masked variants are CI-independent and are kept whenever they appear in variants, regardless of ci.

select_variant(variant: str) List[int][source]

Restrict iteration to one variant code, e.g. 'GB'.

class deepSTRF.datasets.audio.NAT4Dataset(path: str | None = None, area: str = 'A1', dt_ms: float = 10.0, smooth: bool = False, download: bool = False, subset: str = 'all', return_waveform: bool = False, audio_fs: int = 44100)[source]

Bases: AudioNeuralDataset

PyTorch dataset for NAT4 (Pennington & David, 2022 / 2023).

Two cortical areas: A1 (primary, 849 cells of which 777 auditory) and PEG (secondary, 398 of which 339 auditory). Pass area=...; one instance covers one area. To pool both, instantiate twice and concat_neural_datasets([a1, peg]).

There are 595 stimuli total: 18 high-rep (val, 20 trials) + 577 low-rep (est, 1 trial), each clip 1.5 s. The default time bin is dt_ms = 10 (the population recording is precomputed at fs=100 with val pre-averaged over 20 reps; per-site spike trains are at fs=1000 and downsampled to 10 ms by summing). The spectrogram has F = 18 ozgf bands and T = 150 frames per stim.

The loader reads the published NAT4 archive directly with native CSV / JSON / HDF5 parsers — no NEMS0 dependency. Data are freely available at https://doi.org/10.5281/zenodo.8044773 (no account required) and auto-fetched by NAT4Dataset(download=True).

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). NAT4-specific metadata:

  • stim_meta dicts hold name and subset ('est' or 'val'); the subset='all'|'est'|'val' constructor argument filters this list at load time.

  • nrn_meta dicts hold cell_id (raw NEMS id, e.g. 'ARM029a-01-1'), area, auditory (flag from the dataset’s <area>_pred_correlation.csv), and the parsed components site (e.g. 'ARM029a'), animal (3-char site prefix, e.g. 'ARM'), electrode (int) and unit_in_electrode (int). Components default to None for any cell whose id does not match the standard <site>-<elec>-<unit> scheme.

est responses have shape (R=1, T=150) and val responses (R=20, T=150); the (1, 1) NaN sentinel marks (stim, neuron) pairs where the cell was not recorded for that stim.

With return_waveform=True, stims are instead the raw mono waveforms (1, T_audio = T * hop) at audio_fs (hop=441 at 44.1 kHz / 10 ms) — feed them through a model’s wav2spec slot.

References

Pennington & David (2022, preprint). “Can deep learning provide a generalizable model for dynamic sound encoding in auditory cortex?”

Pennington & David (2023). “A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex.” PLOS Computational Biology.

Parameters:
  • path (str, optional) – Path to the NAT4 data folder. Defaults to the platformdirs cache.

  • area ({'A1', 'PEG'}) – Cortical area.

  • dt_ms (float, default 10.0) – Time-bin width in ms. Currently must equal 10.0; the population recording is precomputed at fs=100 and the per-site downsampling assumes a fixed 10x ratio from fs=1000.

  • smooth (bool, default False) – If True, smooth PSTHs with a 21 ms Hanning window. Off by default here because NAT4 trials are typically used as-is for STRF fitting (unlike CRCNS-AA where smoothing is the published norm).

  • download (bool, default False) – If True and the data is missing under path, fetch it from Zenodo (record 8044773).

  • subset ({'all', 'est', 'val'}, default 'all') – If ‘est’ or ‘val’, only that stimulus subset is loaded — stim_meta / stims / responses shrink accordingly, and the (more expensive) per-site spike-time pass is skipped entirely under subset='est'. The two subsets correspond to Pennington & David’s published estimation set (575 stims, R=1, from the population recording) and validation set (18 stims, R=20, from the per-site recordings) respectively. Note that 33 of the 849 A1 cells have no val data — under subset='val' their responses are full NaN sentinels; pair the constructor arg with ds.select_pop_by_stim_attr('subset', 'val') to drop them automatically (idiomatic alternative: ds.select_stims_by_attr('subset', 'val') — which leaves the full stim bank loaded but applies the bidirectional rule, so cells without val data are hidden from __getitem__).

  • return_waveform (bool, default False) – If True, each stimulus is the raw mono waveform (1, T_audio) at audio_fs Hz instead of the precomputed ozgf cochleagram. The 593 source .wav files (44.1 kHz, 1 s of sound) are read from <path>/wav/ and embedded in the 1.5 s trial window at the recording’s pre-silence offset, then grid-locked to T_audio = T_neural * hop (hop = audio_fs * dt_ms / 1000). Feed it through a model’s wav2spec slot (e.g. CausalGammatone to reproduce the native ozgf front-end). Pass download=True to also fetch wav.zip from Zenodo.

  • audio_fs (int, default 44100) – Audio sample rate for return_waveform=True. The default 44.1 kHz is the native rate of the NAT4 wavs and gives an exact integer hop = 441 at dt_ms = 10 (no resampling). Choose any rate making audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

class deepSTRF.datasets.audio.NS1Dataset(path: str | None = None, dt_ms: float = 5.0, smooth: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]

Bases: AudioNeuralDataset

PyTorch dataset for the NS1 (Harper et al. 2016, Rahman et al. 2020) data.

119 multi/single units from primary auditory cortex (A1) of deeply anesthetized ferrets, recorded in response to 20 natural sound clips of 4.995 s each, presented 20 times per neuron. Every neuron heard every clip, so the response grid is fully dense (no NaN sentinels).

Of the 119 units, 73 pass the “single-unit at known depth” filter the original authors used (single_t in {'Yes', 'Maybe'} and depth >= 0); select_pop_by_nrn_attr() over single_t / depth_um reproduces this subset.

The spectrogram tensor is precomputed at dt = 5 ms (F = 34 frequency bands, T = 999 bins); the dt_ms constructor argument is currently validated against this resolution. With return_waveform=True, stims are instead raw mono waveforms (1, T_audio) at audio_fs (aligned to T_audio = T_neural * audio_fs * dt_ms / 1000) — feed them through a model’s wav2spec front-end.

Data are freely available (no account required) and auto-fetched by NS1Dataset(download=True):

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). NS1-specific metadata:

  • stim_meta dicts hold name and type.

  • nrn_meta dicts hold cell_id, area, depth_um, noise_ratio, single_n, single_t, n_electrodes and electrode_number. noise_ratio is the Sahani-Linden normalised noise power (lower = cleaner; NOT an SNR despite the legacy .mat field name). single_n is the single-unit flag from spike-snippet clustering (0/1); single_t is the manual triage label (‘Yes’/’Maybe’/’No’).

References

Harper et al. (2016). “Network receptive field modeling reveals extensive integration and multi-feature selectivity in auditory cortical neurons.” PLoS Computational Biology.

Rahman et al. (2020). “Simple transformations capture auditory input to cortex.” PNAS.

Parameters:
  • path (str, optional) – Path to the NS1 data folder containing test_data_5ms.mat, MetadataSHEnCneurons.mat, and spikesandwav/. If None, defaults to the platformdirs cache (user_cache_dir('deepSTRF') / 'NS1' — overridable via $DEEPSTRF_DATA_DIR).

  • dt_ms (float, default 5.0) – Time-bin width in ms. Must equal 5.0 — the bundled spectrogram is precomputed at this resolution. Other values would require re-spectrogramming the wavs (not implemented).

  • smooth (bool, default True) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).

  • download (bool, default False) – If True and the data assets are missing under path, fetch them from their public sources (no account required) — OSF (https://osf.io/ayw2p/: metadata + spike data + wavs) and DNet GitHub (https://github.com/monzilur/DNet: the precomputed 5 ms mel-spectrogram tensor test_data_5ms.mat). Total ~160 MB, ~16 s on a fast connection. See download_ns1().

  • return_waveform (bool, default False) – If True, self.stims holds raw audio waveforms instead of precomputed spectrograms. Each self.stims[s] is a (1, T_audio) float32 tensor at audio_fs Hz, downmixed to mono, resampled from the native 48 828.125 Hz, and right-cropped / zero-padded to exactly T_neural * audio_fs * dt_ms / 1000 samples so it aligns with the 4.995 s response window. Pair with a model that has a wav2spec front-end (see deepSTRF.models.wav2spec).

  • audio_fs (int, default 48000) – Sample rate (Hz) for waveform mode. Default 48 kHz gives a clean 240 samples / 5-ms bin and a Nyquist of 24 kHz — enough to preserve the ~22.6 kHz content used in Rahman et al. 2019’s cochleagram. (Native is 48 828.125 Hz; the small downsample keeps an integer sample-per-bin factor.) Ignored when return_waveform=False.

class deepSTRF.datasets.audio.Wingert2026Dataset(path: str | None = None, area: None | str | Iterable[str] = None, site: None | str | Iterable[str] = None, dt_ms: float = 10.0, subset: str = 'all', smooth: bool = False, log_compress: bool = True, log_offset: float = -1.0, download: bool = False, include_unlabeled: bool = False, return_waveform: bool = False, audio_fs: int = 44100, prestim_ms: float = 1000.0, _enumerate_only: bool = False)[source]

Bases: AudioNeuralDataset

PyTorch dataset for Wingert et al. 2026 (Nat Neurosci).

A high-density ferret auditory-cortex recording library: 2 128 A1 + 746 PEG + 217 AC + 37 HC single units across 67 recording sites (68 cell_list siteid groups, since SLJ032a’s two-probe recording contributes two siteids — A-probe 'SLJ032a' and B-probe 'SLJ032a-B'). Stimuli are 20–22 s sequences of crossfaded natural sound segments (Audioset Core 3 Complete + Pro Sound Effects), each site presents ~100 estimation stims (single-rep) and 1–6 test stims (R ranging from 5 to 30 across sites).

The release ships gammatone-gram spectrograms (“cochleagrams”) precomputed at fs = 100 Hz (10 ms bins), F = 32 log-spaced bands from 200 Hz to 20 kHz. The values in stim.h5 are the raw (linear) gammatone-gram; the loader reproduces the paper’s preprocessing on top of them — log compression log(10·x + 1) then per-band minmax to [0, 1] (see log_compress argument). Responses are per-neuron minmax-normalised. This matches aud_subspace_fit_demo.ipynb (NEMS log_compress + normalize('minmax')) to float32 precision. Two stim-duration cohorts coexist in the released data:

  • 47 sites at T = 2000 bins (20 s, no silence flanks);

  • 21 sites at T = 2200 bins (22 s = 1 s pre + 20 s sound + 1 s post).

The deepSTRF data paradigm supports ragged T natively — the per-stim tensor keeps its own time length and collate zero-pads on the right.

The loader reads the published archive directly with native CSV / JSON / HDF5 parsers — no nems0 dependency. Data are open access at https://doi.org/10.5281/zenodo.18331549 and auto-fetched by Wingert2026Dataset(download=True).

Notes

Follows the standard deepSTRF data paradigm (see docs/_source/md/data_paradigm.md). Wingert-specific metadata:

  • stim_meta dicts hold name (e.g. 'STIM_seq0032.wav'), subset ('est' for STIM_seq*, 'val' for STIM_00*), and site (the cell_list-canonical site id this stim was presented at). The same source wav can appear under multiple (name, site) pairs because each session re-rasterizes its own copy and the two duration cohorts produce different-shape tensors.

  • nrn_meta dicts hold cell_id, site (from cell_list.csv, authoritative), area, layer, depth, narrow, celltype, sw, goodpred, and the parsed animal / electrode / unit_in_electrode components.

The published cell counts hold whenever the cohort uses the standard A1 + PEG filter; AC and HC are exposed but documented as less-curated.

References

Wingert et al. (2026). “Convolutional neural network models describe the encoding subspace of local circuits in auditory cortex.” Nature Neuroscience. https://doi.org/10.1038/s41593-026-02216-0

Parameters:
  • path (str, optional) – Path to the unpacked dataset root (the directory containing recordings/ and cell_list.csv). Defaults to default_cache_dir('Wingert2026').

  • area (str or iterable of str, optional) – Restrict to one or more cortical areas: any of 'A1', 'PEG', 'AC', 'HC'. None (default) loads every area-labelled cell; cells with area=NaN in cell_list.csv (131 cells, presumably sort-failed) are always excluded.

  • site (str or iterable of str, optional) – Restrict to one or more cell_list siteid values (e.g. 'CLT027c', 'SLJ032a-B', 'PRN018a'). None (default) loads every site that survives the area filter.

  • dt_ms (float, default 10.0) – Time-bin width in ms. Currently must equal 10.0 — the published gammatone-gram is precomputed at fs = 100 and a future down-binning helper is out of v1 scope.

  • subset ({'all', 'est', 'val'}, default 'all') – 'est' keeps only the single-rep STIM_seq* estimation stims; 'val' keeps only the high-rep STIM_00* test stims. The bidirectional select rule applies — cells whose site did not present any retained stim are masked out of __getitem__ automatically.

  • smooth (bool, default False) – If True, smooth PSTHs with a 21 ms Hanning window via self.smooth_responses(window_ms=21.0).

  • log_compress (bool, default True) – If True, apply the David-lab log compression log((x + d) / d) with d = 10**log_offset to the raw (linear) gammatone-gram before normalisation, reproducing the nems.preprocessing.normalization.log_compress step in the paper’s pipeline. Set False to feed the raw linear gtgram.

  • log_offset (float, default -1.0) – Offset exponent for log_compress (d = 10**log_offset). The paper uses -1 (i.e. d = 0.1, so the transform is log(10·x + 1)). Ignored when log_compress=False.

  • download (bool, default False) – If True, fetch recordings.zip + cell_list.csv from Zenodo (record 18331549) if missing. The 8 GB wav.zip is NOT fetched (the loader uses the precomputed gtgrams in stim.h5).

  • include_unlabeled (bool, default False) – If True, also include the 131 cells in cell_list.csv that lack an area label (and therefore also lack layer / depth / narrow / celltype). These come from three otherwise-unrepresented PRN sessions (PRN010b, PRN011b, PRN020b) and have area=None, layer=None, depth=None, etc. in nrn_meta. goodpred is still populated. The default False matches the paper’s analysis cohort.

  • return_waveform (bool, default False) – If True, each stimulus is the raw mono waveform (1, T_audio) at audio_fs Hz instead of the precomputed gammatone-gram. The source seq*.wav files (44.1 kHz) are read from <path>/wav/ and inset at the recording’s prestim_ms pre-silence offset inside the trial window, then grid-locked to T_audio = T_neural * hop (hop = audio_fs * dt_ms / 1000). Feed it through a model’s wav2spec slot (e.g. CausalGammatone to reproduce the native front-end). Pass download=True to also fetch wav.zip from Zenodo.

  • audio_fs (int, default 44100) – Audio sample rate for return_waveform=True. The default 44.1 kHz is the native rate of the source wavs and gives an exact integer hop = 441 at dt_ms = 10 (no resampling). Choose any rate making audio_fs * dt_ms / 1000 an integer. Ignored unless return_waveform=True.

  • prestim_ms (float, default 1000.0) – Pre-stimulus silence (ms) before the sound onset in the trial window, used only in return_waveform=True to inset the wav so it aligns with the gammatone-gram frames (= response bins). The default 1000 ms (= 100 bins at dt=10 ms) was recovered empirically and is constant across all sites (the gtgram’s leading silence is not in the epoch table). Ignored unless return_waveform=True.

  • _enumerate_only (bool, default False) – Internal flag for tests: populate nrn_meta and N_neurons only, skip the (~1 minute) per-site .tgz read pass. Subclasses of this loader should not rely on it.

deepSTRF.datasets.audio.download_ac1(dest: str | None = None, *, username: str | None = None, password: str | None = None) str[source]

Fetch the three CRCNS-AC1 archives from the NERSC mirror.

Requires a free CRCNS account (https://crcns.org/register). Credentials can be passed explicitly or sourced from $CRCNS_USERNAME / $CRCNS_PASSWORD. Idempotent: skips archives that already exist on disk; extraction is handled lazily on first dataset instantiation.

Returns the destination directory.

deepSTRF.datasets.audio.download_alice_eeg(dest: str | None = None) str[source]

Download Brodbeck’s restructured Alice EEG release from UMd DRUM.

Idempotent: skips any zip that’s already on disk and any subdirectory that’s already unpacked. Returns the dataset directory.

Parameters:

dest (str, optional) – Defaults to the platformdirs cache (overridable via $DEEPSTRF_DATA_DIR).

Notes

~2.5 GiB total across four zips. Anonymous HTTPS; no auth.

deepSTRF.datasets.audio.download_downer2025(dest: str | None = None) str[source]

Download the Downer 2025 / Ahmed 2025 archive from Zenodo.

The archive (auditory_cortex_data.zip, ~29 GB) ships in Zenodo record 10.5281/zenodo.16175377 and unzips to <dest>/auditory_cortex_data/ — the standard layout the Downer2025Dataset constructor expects.

Parameters:

dest (str, optional) – Parent directory the archive is downloaded into. Defaults to default_cache_dir('Downer2025') (overridable via $DEEPSTRF_DATA_DIR).

Returns:

Path to the extracted auditory_cortex_data/ directory.

Return type:

str

Notes

Idempotent: skips the zip download if already present, and skips the unzip step if the expected auditory_cortex_data/sessions/ subdirectory already exists.

Heads up: the archive is ~29 GB on disk; the unpacked directory is also ~29 GB. Allow ~60 GB total during extraction (zip plus contents); you can delete the zip once unpacking is complete.

deepSTRF.datasets.audio.download_espejo(stimuli: str, dest: str | None = None) str[source]

Download one Espejo stimuli set from Zenodo into dest.

Parameters:
  • stimuli ({'nat', 'vmn'})

  • dest (str, optional) – Defaults to default_cache_dir('Espejo') (overridable via $DEEPSTRF_DATA_DIR).

Returns:

The dataset root directory.

Return type:

str

Notes

Idempotent: skips the archive if already present, and skips the untar step if the expected <subdir>/ already exists. NAT is ~638 MB, VMN is ~25 MB.

deepSTRF.datasets.audio.download_espejo_nat_waveforms(names: Sequence[str], dest: str | None = None, *, progress: bool = True) Dict[str, str][source]

Fetch the raw NAT waveforms from the LBHB baphy bitbucket mirror.

Parameters:
  • names (sequence of str) – Stim names as they appear in stim_meta (STIM_<file>.wav); the STIM_ prefix is stripped to get the on-mirror filename.

  • dest (str, optional) – Parent directory; wavs are cached under <dest>/nat_waveforms/. Defaults to default_cache_dir('Espejo').

  • progress (bool, default True) – Show a tqdm bar over the (missing) downloads.

Returns:

name -> local wav path for every name found on the mirror.

Return type:

dict

Notes

Idempotent — already-cached wavs are skipped. Each filename is tried in sounds_set3/ first, then sounds/. Names found in neither are collected and surfaced by the caller (the dataset raises on genuine misses so waveform mode never silently substitutes silence).

deepSTRF.datasets.audio.download_wingert2026(dest: str | None = None, wav: bool = False) str[source]

Download the Wingert 2026 release from Zenodo into dest.

Fetches recordings.zip (~4.35 GB of per-site .tgz archives, the only large file the spectrogram loader needs) and cell_list.csv (~5.4 MB of per-cell metadata). Does NOT fetch models.zip (published CNN / LN / subspace fits, not used by deepSTRF).

Idempotent — skips files / dirs that already exist.

Parameters:
  • dest (str, optional) – Defaults to default_cache_dir('Wingert2026') (overridable via $DEEPSTRF_DATA_DIR).

  • wav (bool, default False) – If True, also fetch and unpack wav.zip (~3.7 GB of source waveforms, 44.1 kHz) into <dest>/wav/ for the raw-waveform branch (Wingert2026Dataset(return_waveform=True)). The spectrogram-mode loader does not need it.

Returns:

The destination directory.

Return type:

str