deepSTRF.datasets.audio package
Submodules
deepSTRF.datasets.audio.audio_dataset module
- class deepSTRF.datasets.audio.audio_dataset.AudioNeuralDataset(path: str, dt_ms: float)[source]
Bases:
NeuralDatasetNeural dataset class for auditory stimuli.
Stim shape is polymorphic depending on the loading mode:
Spectrogram mode (default):
self.stims[s]is a(1, F, T)tensor whereF = self.Fis the frequency-band count andTis the neural time-bin count.Waveform mode (opt-in, subclass-specific):
self.stims[s]is a(1, T_audio)mono float32 tensor at sample rateself.audio_fs. Subclasses that support this mode expose areturn_waveform=Trueconstructor flag and setself.audio_fsto a positive int. The(1, ...)leading dim is the mono-channel axis, kept for collate compatibility (neural_collatezero-pads the last axis only).
Subclasses must additionally set
self.F(number of frequency bins in the spectrogram — kept positive even in waveform mode so downstream models know the target spectrogram width awav2specmodule should produce) in their__init__, before callingself.validate().- F
Frequency-band count of the target spectrogram. Set by the subclass.
- Type:
int
- audio_fs
Sample rate of the raw waveform when in waveform mode;
Noneotherwise. Subclasses without a waveform branch leave thisNone.- Type:
intorNone
- hearing_range_hz
Optional
(low, high)informational bound on the species’ canonical hearing range in Hz (e.g.(200.0, 40000.0)for ferret). Purely advisory — nothing is enforced against it; it exists so notebooks / tooling can display the range and users can choose to clamp awav2spec’s frequency limits.Nonewhen unknown.- Type:
tupleoffloatorNone
- get_F()[source]
Return the number of frequency bins in the spectrograms.
- Returns:
self.F, the spectrogram frequency-band count.- Return type:
int
- property hop: int | None
Audio samples per neural bin in waveform mode (
Nonein spec mode).hop = round(audio_fs * dt_ms / 1000). The grid-lock contract (seevalidate()) requires this to be an exact integer, so awav2specfront-end’s ownhopmust equal this value for the audio→neural resampling to stay aligned with the response bins.
deepSTRF.datasets.audio.ns1_drc module
- class deepSTRF.datasets.audio.ns1.NS1Dataset(path: str | None = None, dt_ms: float = 5.0, smooth: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the NS1 (Harper et al. 2016, Rahman et al. 2020) data.
119 multi/single units from primary auditory cortex (A1) of deeply anesthetized ferrets, recorded in response to 20 natural sound clips of 4.995 s each, presented 20 times per neuron. Every neuron heard every clip, so the response grid is fully dense (no NaN sentinels).
Of the 119 units, 73 pass the “single-unit at known depth” filter the original authors used (
single_t in {'Yes', 'Maybe'}anddepth >= 0);select_pop_by_nrn_attr()oversingle_t/depth_umreproduces this subset.The spectrogram tensor is precomputed at
dt = 5 ms(F = 34frequency bands,T = 999bins); thedt_msconstructor argument is currently validated against this resolution. Withreturn_waveform=True,stimsare instead raw mono waveforms(1, T_audio)ataudio_fs(aligned toT_audio = T_neural * audio_fs * dt_ms / 1000) — feed them through a model’swav2specfront-end.Data are freely available (no account required) and auto-fetched by
NS1Dataset(download=True):https://osf.io/ayw2p/ — metadata, raw spike and wav data.
https://github.com/monzilur/DNet — precomputed 5 ms mel spectrogram.
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). NS1-specific metadata:stim_metadicts holdnameandtype.nrn_metadicts holdcell_id,area,depth_um,noise_ratio,single_n,single_t,n_electrodesandelectrode_number.noise_ratiois the Sahani-Linden normalised noise power (lower = cleaner; NOT an SNR despite the legacy.matfield name).single_nis the single-unit flag from spike-snippet clustering (0/1);single_tis the manual triage label (‘Yes’/’Maybe’/’No’).
References
Harper et al. (2016). “Network receptive field modeling reveals extensive integration and multi-feature selectivity in auditory cortical neurons.” PLoS Computational Biology.
Rahman et al. (2020). “Simple transformations capture auditory input to cortex.” PNAS.
- Parameters:
path (
str, optional) – Path to the NS1 data folder containingtest_data_5ms.mat,MetadataSHEnCneurons.mat, andspikesandwav/. IfNone, defaults to the platformdirs cache (user_cache_dir('deepSTRF') / 'NS1'— overridable via$DEEPSTRF_DATA_DIR).dt_ms (
float, default5.0) – Time-bin width in ms. Must equal 5.0 — the bundled spectrogram is precomputed at this resolution. Other values would require re-spectrogramming the wavs (not implemented).smooth (
bool, defaultTrue) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).download (
bool, defaultFalse) – If True and the data assets are missing underpath, fetch them from their public sources (no account required) — OSF (https://osf.io/ayw2p/: metadata + spike data + wavs) and DNet GitHub (https://github.com/monzilur/DNet: the precomputed 5 ms mel-spectrogram tensortest_data_5ms.mat). Total ~160 MB, ~16 s on a fast connection. Seedownload_ns1().return_waveform (
bool, defaultFalse) – If True,self.stimsholds raw audio waveforms instead of precomputed spectrograms. Eachself.stims[s]is a(1, T_audio)float32 tensor ataudio_fsHz, downmixed to mono, resampled from the native 48 828.125 Hz, and right-cropped / zero-padded to exactlyT_neural * audio_fs * dt_ms / 1000samples so it aligns with the 4.995 s response window. Pair with a model that has awav2specfront-end (seedeepSTRF.models.wav2spec).audio_fs (
int, default48000) – Sample rate (Hz) for waveform mode. Default 48 kHz gives a clean 240 samples / 5-ms bin and a Nyquist of 24 kHz — enough to preserve the ~22.6 kHz content used in Rahman et al. 2019’s cochleagram. (Native is 48 828.125 Hz; the small downsample keeps an integer sample-per-bin factor.) Ignored whenreturn_waveform=False.
- deepSTRF.datasets.audio.ns1.download_ns1(dest: str | None = None) str[source]
Download all NS1 data assets into
dest.Sources:
OSF (https://osf.io/ayw2p/, no account): the dataset README, the per-neuron metadata (.mat), and the spike + wav zip (~155 MB total).
DNet GitHub (https://github.com/monzilur/DNet, master branch): the precomputed 5 ms mel-spectrogram tensor
test_data_5ms.mat(5.2 MB) accompanying Rahman et al. 2019 PLoS Comp Biol. NOT on OSF.
Idempotent: skips files that already exist; returns the destination path.
- Parameters:
dest (
str, optional) – Where to put the downloaded files. Defaults to the platformdirs cache (overridable via$DEEPSTRF_DATA_DIR).- Returns:
Absolute path to the dataset directory.
- Return type:
str
deepSTRF.datasets.audio.nat4 module
- class deepSTRF.datasets.audio.nat4.NAT4Dataset(path: str | None = None, area: str = 'A1', dt_ms: float = 10.0, smooth: bool = False, download: bool = False, subset: str = 'all', return_waveform: bool = False, audio_fs: int = 44100)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for NAT4 (Pennington & David, 2022 / 2023).
Two cortical areas:
A1(primary, 849 cells of which 777 auditory) andPEG(secondary, 398 of which 339 auditory). Passarea=...; one instance covers one area. To pool both, instantiate twice andconcat_neural_datasets([a1, peg]).There are 595 stimuli total: 18 high-rep (
val, 20 trials) + 577 low-rep (est, 1 trial), each clip 1.5 s. The default time bin isdt_ms = 10(the population recording is precomputed at fs=100 withvalpre-averaged over 20 reps; per-site spike trains are at fs=1000 and downsampled to 10 ms by summing). The spectrogram hasF = 18ozgf bands andT = 150frames per stim.The loader reads the published NAT4 archive directly with native CSV / JSON / HDF5 parsers — no NEMS0 dependency. Data are freely available at https://doi.org/10.5281/zenodo.8044773 (no account required) and auto-fetched by
NAT4Dataset(download=True).Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). NAT4-specific metadata:stim_metadicts holdnameandsubset('est'or'val'); thesubset='all'|'est'|'val'constructor argument filters this list at load time.nrn_metadicts holdcell_id(raw NEMS id, e.g.'ARM029a-01-1'),area,auditory(flag from the dataset’s<area>_pred_correlation.csv), and the parsed componentssite(e.g.'ARM029a'),animal(3-char site prefix, e.g.'ARM'),electrode(int) andunit_in_electrode(int). Components default toNonefor any cell whose id does not match the standard<site>-<elec>-<unit>scheme.
estresponses have shape(R=1, T=150)andvalresponses(R=20, T=150); the(1, 1)NaN sentinel marks(stim, neuron)pairs where the cell was not recorded for that stim.With
return_waveform=True,stimsare instead the raw mono waveforms(1, T_audio = T * hop)ataudio_fs(hop=441 at 44.1 kHz / 10 ms) — feed them through a model’swav2specslot.References
Pennington & David (2022, preprint). “Can deep learning provide a generalizable model for dynamic sound encoding in auditory cortex?”
Pennington & David (2023). “A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex.” PLOS Computational Biology.
- Parameters:
path (
str, optional) – Path to the NAT4 data folder. Defaults to the platformdirs cache.area (
{'A1', 'PEG'}) – Cortical area.dt_ms (
float, default10.0) – Time-bin width in ms. Currently must equal 10.0; the population recording is precomputed at fs=100 and the per-site downsampling assumes a fixed 10x ratio from fs=1000.smooth (
bool, defaultFalse) – If True, smooth PSTHs with a 21 ms Hanning window. Off by default here because NAT4 trials are typically used as-is for STRF fitting (unlike CRCNS-AA where smoothing is the published norm).download (
bool, defaultFalse) – If True and the data is missing underpath, fetch it from Zenodo (record 8044773).subset (
{'all', 'est', 'val'}, default'all') – If ‘est’ or ‘val’, only that stimulus subset is loaded —stim_meta/stims/responsesshrink accordingly, and the (more expensive) per-site spike-time pass is skipped entirely undersubset='est'. The two subsets correspond to Pennington & David’s published estimation set (575 stims, R=1, from the population recording) and validation set (18 stims, R=20, from the per-site recordings) respectively. Note that 33 of the 849 A1 cells have no val data — undersubset='val'their responses are full NaN sentinels; pair the constructor arg withds.select_pop_by_stim_attr('subset', 'val')to drop them automatically (idiomatic alternative:ds.select_stims_by_attr('subset', 'val')— which leaves the full stim bank loaded but applies the bidirectional rule, so cells without val data are hidden from__getitem__).return_waveform (
bool, defaultFalse) – If True, each stimulus is the raw mono waveform(1, T_audio)ataudio_fsHz instead of the precomputed ozgf cochleagram. The 593 source .wav files (44.1 kHz, 1 s of sound) are read from<path>/wav/and embedded in the 1.5 s trial window at the recording’s pre-silence offset, then grid-locked toT_audio = T_neural * hop(hop = audio_fs * dt_ms / 1000). Feed it through a model’swav2specslot (e.g.CausalGammatoneto reproduce the native ozgf front-end). Passdownload=Trueto also fetchwav.zipfrom Zenodo.audio_fs (
int, default44100) – Audio sample rate forreturn_waveform=True. The default 44.1 kHz is the native rate of the NAT4 wavs and gives an exact integerhop = 441atdt_ms = 10(no resampling). Choose any rate makingaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.
- deepSTRF.datasets.audio.nat4.download_nat4(area: str, dest: str | None = None, wav: bool = False) str[source]
Download the NAT4 release from Zenodo into
dest.Fetches the population .tgz, the per-cell auditory CSV, and the per-site .zip. The single-sites zip is unpacked into
<dest>/<area>_single_sites/so the loader finds the per-site .tgzs where it expects them.Idempotent: skips files / dirs that already exist.
- Parameters:
area (
{'A1', 'PEG'})dest (
str, optional) – Defaults todefault_cache_dir('NAT4')(overridable via$DEEPSTRF_DATA_DIR).wav (
bool, defaultFalse) – If True, also fetch and unpackwav.zip(the 593 source waveforms, 44.1 kHz / 1 s each) into<dest>/wav/for the raw-waveform branch (NAT4Dataset(return_waveform=True)). The spectrogram-mode loader does not need it.
deepSTRF.datasets.audio.wehr module
deepSTRF.datasets.audio.asari module
deepSTRF.datasets.audio.crcns_aa1 module
- class deepSTRF.datasets.audio.crcns_aa1.CRCNSAA1Dataset(path: str | None = None, areas=('Field_L', 'MLd'), stimuli=('conspecific', 'flatrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the CRCNS-AA1 recordings.
Extracellular, spike-sorted single units of anesthetized male zebra finches: 50 cells in Field L and 50 in MLd, recorded in response to 10 clips of conspecific vocalizations and 20 clips of flat ripples (up to 5 s each, ~10 trials on average). Data are available at https://crcns.org/data-sets/aa/aa-1/about (free CRCNS account); see the AA1 README in the deepSTRF docs for the full notes.
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). AA1-specific metadata:stimsare mel-spectrograms(1, F, T_s).stim_metadicts holdname,type,sample_rate,n_samplesandduration_s.nrn_metadicts holdcell_id,animal_id,area,cell_seqandrig.cell_seqis the sequential cell index parsed from the cell folder name (the n-th cell recorded);rigis the single-letter rig label when present, elseNone(cells “4_A” and “4_B” were recorded simultaneously, possibly in different areas).
Two cells lack
conspecificresponses:pipu1018_2_A(MLd) andpipu1018_2_B(Field_L).References
Woolley et al. (2005). “Tuning for Spectro-temporal Modulations: a Mechanism for Auditory Discrimination of Natural Sound.”
Hsu et al. (2004). “Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons.”
Singh & Theunissen (2003). “Modulation spectra of natural sounds and ethological theories of auditory processing.”
Initializes the AA1 Dataset.
- Parameters:
path (
str, optional) – Path to the AA1 data folder (containingField_L_cells/,MLd_cells/,all_stims/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIRoverrides).areas (
tupleofstr) – Recording sites: ‘Field_L’ or ‘MLd’.stimuli (
tupleofstr) – Stimulus types: ‘conspecific’ or ‘flatrip’.dt_ms (
float) – Time step size in ms.n_mels (
int) – Number of mel frequency bands.compression (
str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored whenreturn_waveform=True.window_ms (
float, default10.0) – FFT analysis-window length in ms.n_fftis computed asround(window_ms * 1e-3 * sample_rate)and is decoupled from ``hop_length`` so phonemic detail is preserved at anydt_ms. Earlier versions of this dataset hardcodedn_fft = 10 * hop_length— benign at the defaultdt_ms=1(10 ms FFT window), but atdt_ms=50the same formula produced a 500 ms FFT window and over-smoothed every spec frame. The defaultwindow_ms=10.0preserves bit-identical behaviour atdt_ms=1while removing the scaling bug at coarser bins. Speech-pipeline users may preferwindow_ms=25.0(Kaldi default). Ignored whenreturn_waveform=True.return_waveform (
bool, defaultFalse) – If True,self.stims[s]holds the raw audio waveform(1, T_audio)ataudio_fsHz (grid-locked toT_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whosewav2specslot is a waveform front-end (seedeepSTRF.models.wav2spec); responses are unchanged.audio_fs (
int, default32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA1 wavs (so no resampling); other values resample and must keepaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the ~17 MB CRCNS-AA1 archive from the NERSC mirror (free CRCNS account required; seecrcns_download) and unzip in place.username (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.password (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.
- deepSTRF.datasets.audio.crcns_aa1.download_aa1(dest: str | None = None, username: str | None = None, password: str | None = None) str[source]
Download the CRCNS-AA1 archive from the NERSC mirror into
dest.Idempotent: skips the archive if already present, and skips unzipping if
Field_L_cells/already exists indest. Returns the dataset directory.- Parameters:
dest (
str, optional) – Defaults to the platformdirs cache (overridable via$DEEPSTRF_DATA_DIR).username (
str, optional) – Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Free CRCNS account at https://crcns.org/register.password (
str, optional) – Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Free CRCNS account at https://crcns.org/register.
- deepSTRF.datasets.audio.crcns_aa1.get_animals_ids(data_path)[source]
- Takes in the path of the ‘CRCNS_AA1/data/’ folder, goes through ‘Field_L/’ and ‘MLd’, and outputs a list of unique
animal ids, which are the string preceding the first underscore of each subfolder. e.g., ‘gg0304_4_B’ –> ‘gg0304’
deepSTRF.datasets.audio.crcns_aa2 module
- class deepSTRF.datasets.audio.crcns_aa2.CRCNSAA2Dataset(path: str | None = None, areas=('Field_L', 'mld', 'OV', 'CM', 'None'), stimuli=('conspecific', 'flatrip', 'songrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the CRCNS-AA2 recordings (OV, MLd, Field L, CM).
494 extracellular, spike-sorted single units of male zebra finches, identified in OV, MLd, Field L, L1, L2a, L2b, L3 (and some with unidentified area,
None). Three stimulus classes — conspecific songs (72 stims), flat ripples (20) and song ripples (25) — each presented 10-20 times, with low trial-to-trial variability. Almost all cells saw conspecific and songrip stimuli; about half saw flatrip. Population fitting-compatible. Data are available at https://crcns.org/data-sets/aa/aa-2/about (free CRCNS account).Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). AA2-specific metadata:stimsare mel-spectrograms(1, F, T_s).stim_metadicts holdname,type,sample_rate,n_samplesandduration_s(the last three fromdata/stim_data.csv).nrn_metadicts holdcell_id,animal_id,area,cell_seqandrig(seeCRCNSAA1Datasetfor the cell-name format;rigis oftenNonein AA2).
References
Gill et al. (2006). “Sound representation methods for spectro-temporal receptive field estimation.”
Amin et al. (2010). “Role of the Zebra Finch Auditory Thalamus in Generating Complex Representations for Natural Sounds.”
Initializes the AA2 Dataset.
- Parameters:
path (
str, optional) – Path to the AA2 data folder. Defaults to the platformdirs cache.areas (
tupleofstr) – Recording sites of interest: ‘Field_L’, ‘L1’, ‘L2a’, ‘L2b’, ‘L3’, ‘mld’, ‘OV’, ‘CM’, or ‘None’.stimuli (
tupleofstr) – Stimulus types of interest: ‘conspecific’, ‘flatrip’, ‘songrip’.dt_ms (
float) – Time step size in ms.n_mels (
int) – Number of mel frequency bands.compression (
str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored whenreturn_waveform=True.window_ms (
float, default10.0) – FFT analysis-window length in ms.n_fftis computed asround(window_ms * 1e-3 * sample_rate)and is decoupled from ``hop_length`` so phonemic detail is preserved at anydt_ms. Earlier versions of this dataset hardcodedn_fft = 10 * hop_length, which gave a benign 10 ms FFT window atdt_ms=1but a 500 ms window atdt_ms=50. Defaultwindow_ms=10.0preserves bit-identical behaviour atdt_ms=1while fixing the scaling bug at coarser bins. Ignored whenreturn_waveform=True.return_waveform (
bool, defaultFalse) – If True,self.stims[s]holds the raw audio waveform(1, T_audio)ataudio_fsHz (grid-locked toT_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whosewav2specslot is a waveform front-end; responses are unchanged.audio_fs (
int, default32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA2 wavs (no resampling); other values resample and must keepaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the ~30 MB worth of CRCNS-AA2 archives from the NERSC mirror (free CRCNS account required) and extract in place.username (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.password (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.
- deepSTRF.datasets.audio.crcns_aa2.download_aa2(dest: str | None = None, username: str | None = None, password: str | None = None) str[source]
Download the CRCNS-AA2 archives from the NERSC mirror into
dest.Idempotent: skips an archive if already on disk, and skips extraction of an archive if its anchor sub-tree (
all_cells/,all_stims/, orstim_data.csv) already exists.- Parameters:
dest (
str, optional) – Defaults todefault_cache_dir('AA2')(overridable via$DEEPSTRF_DATA_DIR).username (
str, optional) – Default to$CRCNS_USERNAME/$CRCNS_PASSWORD.password (
str, optional) – Default to$CRCNS_USERNAME/$CRCNS_PASSWORD.
- deepSTRF.datasets.audio.crcns_aa2.get_animals_ids(file_path)[source]
Extracts unique animal identifiers from the first column of the ‘cell_stim_classes.csv’ file. The unique identifier is defined as the substring preceding the first underscore in the first column. The output is a list of unique identifiers.
- deepSTRF.datasets.audio.crcns_aa2.get_area_cells(file_path)[source]
From the cell_regions.csv file, returns a dictionary with area labels as keys and lists of cell names as values.
- deepSTRF.datasets.audio.crcns_aa2.get_stim_ids_from_folders(cells_path, verbose=False)[source]
needs the ‘all_cells/’ path
returns a dictionary with the three main stim_types ‘consepcific’, ‘songrip’ and ‘flatrip’ as keys, and a list of unique wav names for each value
- deepSTRF.datasets.audio.crcns_aa2.get_stims_ids_from_csv(file_path)[source]
Extracts .wav file names from the first column of the ‘stim_data.csv’ file, and classify them into stimulus types. The output is a dictionary with categories as keys and lists of .wav file names as values.
- deepSTRF.datasets.audio.crcns_aa2.load_stim_data_csv(file_path)[source]
Read CRCNS-AA2
stim_data.csvinto a{wav_filename: {...}}dict.Each value is
{"sample_rate": Hz, "bit_depth": int, "n_samples": int, "duration_s": float}. Returned even for stims classified as “unknown” / “bengalese”, since the dataset class will simply not select those by default.
deepSTRF.datasets.audio.crcns_aa4 module
- class deepSTRF.datasets.audio.crcns_aa4.CRCNSAA4Dataset(path: str | None = None, animals='all', stimuli=('song', 'call', 'mlnoise'), dt_ms=1.0, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 24000, download: bool = False, username: str | None = None, password: str | None = None)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the CRCNS-AA4 recordings.
1401 extracellular, spike-sorted single and multi units of adult zebra finches (4 males, 2 females) in Field L, caudolateral and caudomedial mesopallium (CLM, CMM) and caudomedial nidopallium (NCM) — though units were not precisely assigned to one of these areas. Three stimulus classes (conspecific songs, calls, ripple noise), each a few seconds long and presented ~10 times. Population- and batch-compatible. Data are available at https://crcns.org/data-sets/aa/aa-4/about-aa-4 (free CRCNS account).
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). AA4-specific metadata:stimsare mel-spectrograms(1, F, T_s).stim_metadicts holdname(the stimulus md5 — the canonical identifier, since the wav filename is per-animal and not unique across the corpus),type,classandduration_s(thestim_durationattr from the h5, in seconds).nrn_metadicts hold:cell_id(h5 basename, no extension),animal_id,sex('M'/'F'),site(e.g."Site1"),electrode(int 1-32, channel index across both 16-channel arrays at a site),ldepth/rdepth(left / right array depth in µm),sort_type('single'/'multi';'noise'/'tdt'are filtered out),sort_id(online-sort int) andsubsort_id(offline-sort int parsed from the trailing_ss<N>;Noneif absent).
The dataset paper does not publish a per-cell brain-area assignment, so the depth + electrode-array geometry is the only anatomical proxy; nor does it document which electrode IDs (1-16 vs 17-32) map to the left vs right hemisphere — confirm with the dataset authors before deriving a hemisphere from
electrode.References
Elie & Theunissen (2015). “Meaning in the avian auditory cortex: Neural representation of communication calls.” European Journal of Neuroscience.
Elie & Theunissen (2019). “Invariant neural responses for sensory categories revealed by the time-varying information for communication calls.” PLoS Computational Biology.
Initializes the AA4 Dataset.
- Parameters:
path (
str, optional) – Path to theCRCNS_AA4/data/folder containing one subfolder per animal (with.h5cell files + awavfiles/directory of stimulus.wavfiles). Defaults to the platformdirs cache.animals (
'all'orsequenceofstr) – Animals to load (any subset ofAA4_ANIMAL_IDS).stimuli (
sequenceofstr) – Stimulus types to keep; subset of {‘song’, ‘call’, ‘mlnoise’}.dt_ms (
float) – Time-bin width in ms.smooth (
bool) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).n_mels (
int) – Number of mel frequency bands of the stimulus spectrogram.compression (
{'cubic', 'log1p', 'none'}) – Compression applied to the spectrogram (saturation effect of hair cells). Ignored whenreturn_waveform=True.window_ms (
float, default10.0) – FFT analysis-window length in ms.n_fftis computed per-stim asround(window_ms * 1e-3 * sample_rate)and is decoupled from ``hop_length`` so phonemic detail is preserved at anydt_ms. Earlier versions of this dataset hardcodedn_fft = hop * 10— atdt_ms=50that gave a 500 ms FFT window and over-smoothed every spec frame. Defaultwindow_ms=10.0preserves bit-identical behaviour atdt_ms=1(n_fft=320 at sr=32 kHz) while fixing the scaling bug at coarser bins. Ignored whenreturn_waveform=True.return_waveform (
bool, defaultFalse) – If True,self.stims[s]holds the raw audio waveform(1, T_audio)ataudio_fsHz (grid-locked toT_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whosewav2specslot is a waveform front-end; responses are unchanged.audio_fs (
int, default24000) – Sample rate for waveform mode. The AA4 wavs are 24414 Hz, which gives a non-integer hop at dt=1 ms; the default 24 kHz resamples to a cleanhop = 24(exactly dt=1 ms bins, slightly better than the native spec’s 0.983 ms). Other values must keepaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.download (
bool, defaultFalse) – If True and an animal’s data is missing underpath, fetch its tarball (~hundreds of MB per animal) from the NERSC mirror and untar in place. Only the animals listed inanimalsare downloaded — useful for quick iteration on a subset.username (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.password (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.
- deepSTRF.datasets.audio.crcns_aa4.download_aa4(dest: str | None = None, animals: Sequence[str] = ('BlaBro09xxF', 'GreBlu9508M', 'LblBlu2028M', 'WhiBlu5396M', 'WhiWhi4522M', 'YelBlu6903F'), username: str | None = None, password: str | None = None) str[source]
Download CRCNS-AA4 archives from the NERSC mirror into
dest.AA4 is split into one
.tar.gzper animal (each is hundreds of MB); by default this fetches all 6, butanimalscan be narrowed to a subset. The CRCNSCode tutorial archive is also fetched (small, ~1 MB).Idempotent: skips an archive if its animal directory already exists, skips the CRCNSCode archive if
CRCNSCode/already exists.- Parameters:
dest (
str, optional) – Defaults todefault_cache_dir('AA4')($DEEPSTRF_DATA_DIRoverrides).animals (
sequenceofstr, defaultall 6) – Animals to download. Must be a subset ofAA4_ANIMAL_IDS.username (
str, optional) – Default to$CRCNS_USERNAME/$CRCNS_PASSWORD.password (
str, optional) – Default to$CRCNS_USERNAME/$CRCNS_PASSWORD.
deepSTRF.datasets.audio.espejo module
Espejo (Lopez-Espejo et al. 2019) auditory cortex dataset.
Public Zenodo deposit (DOI 10.5281/zenodo.3445557) ships two
disjoint releases — natural sounds (NAT) and vocalization-modulated
noise (VMN) — recorded from awake passively-listening ferret A1. One
dataset class covers both via the stimuli={'nat', 'vmn'} constructor
arg; the two share no cells and have different F (18 vs 2), so they
cannot be concatenated.
The on-disk format is NEMS-flavored but we parse it directly with
h5py + pandas — see deepSTRF.datasets.audio._espejo_native.
No nems0 dependency.
- class deepSTRF.datasets.audio.espejo.EspejoDataset(path: str | None = None, stimuli: Literal['nat', 'vmn'] = 'nat', dt_ms: float = 10.0, subset: Literal['all', 'estimation', 'test'] = 'all', cells: Sequence[str] | None = None, smooth: bool = False, return_waveform: bool = False, audio_fs: int = 44100, download: bool = False)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for Lopez-Espejo et al. (2019) ferret A1 recordings.
Awake, passively-listening adult ferret primary auditory cortex (A1), extracellularly recorded single units. The dataset ships in two disjoint releases (no cell overlap, different stimulus dimensionality — they cannot be concatenated), selected by the
stimuliargument:'nat': 93 3-second natural sounds (animal vocalizations, speech, environmental, music), stored as 18-band gammatone log-spectrograms (NEMS “ozgf”,F=18). ~540 cells across 35 sites in 6 ferrets; each site presents a subset of the stim bank.'vmn': 30 3-second vocalization-modulated noise stimuli (two narrowband noise streams modulated by independent natural-vocalization envelopes), stored as 2-band envelopes (“envelope” stimfmt,F=2). ~200 cells across 103 sites in 5 ferrets.
Both releases sample at 100 Hz (
dt=10 msnative); the on-disk cochleagrams are log-compressed at source. Each occurrence epoch includes the published 0.5 s pre-stim + 0.5 s post-stim silence flanking the 3 s stimulus, so per-stim tensors are(1, F, 500)(NAT) or(1, F, 400)(VMN). The estimation / test split follows the paper’ssplit_by_occurrence_countsand is surfaced via the per-stimn_repeatsandsplitmetadata fields.Data are freely available at https://doi.org/10.5281/zenodo.3445557 (no account required) and auto-fetched with
download=True.Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). Espejo-specific metadata:stim_metadicts holdname,type('nat'/'vmn'),n_repeats,split('test'/'estimation'),duration_sandn_samples.nrn_metadicts holdcell_id,site,animal_id,channel,unitandexperiment_set('nat'/'vmn').unitcan beNonefor VMN cells (2-segment cellids).
The
(1, 1)NaN sentinel marks(stim, neuron)pairs the cell was not recorded for (different sites present different stim subsets). Only the pre-computed cochleagrams are in the Zenodo deposit (the raw NAT waveforms are mirrored on the LBHB bitbucket); the loader fixesdt_ms = 10.References
Lopez Espejo, Schwartz & David (2019). “Spectral tuning of adaptation supports coding of sensory context in auditory cortex.” PLoS Computational Biology 15(10): e1007430. https://doi.org/10.1371/journal.pcbi.1007430
- Parameters:
path (
str, optional) – Path to the Espejo data folder (containingA1_natural_sounds/and / orA1_voc_mod_noise/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIRoverrides).stimuli (
{'nat', 'vmn'}) – Which release to load. The two are mutually exclusive (disjoint cells, different F); to use both, instantiate twice and keep them separate.dt_ms (
float, default10.0) – Time-bin width in ms. Currently fixed at 10 ms — the on-disk cochleagrams are precomputed at fs=100 Hz, and the response rasterizer aligns to that grid.subset (
{'all', 'estimation', 'test'}, default'all') – If ‘estimation’ or ‘test’, only that stim subset is kept. Split follows the paper’ssplit_by_occurrence_counts: test = stims at max repetition count per site; estimation = stims at lower repetition counts.cells (
sequenceofstr, optional) – Whitelist of cell IDs to include (intersection with what’s on disk). None keeps all.smooth (
bool, defaultFalse) – If True, smooth PSTHs with a 21 ms Hanning window (Hsu / Borst / Theunissen 2004). Off by default — Espejo is typically used as-is.return_waveform (
bool, defaultFalse) – If True (stimuli='nat'only), hand out the raw natural-sound waveform per stim as a(1, T_audio)mono tensor ataudio_fsinstead of the precomputed ozgf cochleagram, for use with a learnablewav2specmodel front-end. The raw wavs are not in the Zenodo deposit — they are fetched from the LBHB baphy bitbucket mirror (see the README) and cached under<path>/nat_waveforms/. The 4 s sound is inset at the published 0.5 s pre-stim silence offset so it grid-locks to the cochleagram frames (T_audio = T_neural * hop,hop = audio_fs·dt/1000). Responses are identical to cochleagram mode. VMN is unsupported (its stimuli are synthesized 2-band envelopes with no raw audio).audio_fs (
int, default44100) – Sample rate for waveform mode (the native rate of the mirror wavs; ignored — and reported asNone— in cochleagram mode). 44.1 kHz grid-locks at dt=10 ms (hop=441).download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the requested archive from Zenodo (record 3445557) and untar in place.
- deepSTRF.datasets.audio.espejo.download_espejo(stimuli: str, dest: str | None = None) str[source]
Download one Espejo stimuli set from Zenodo into
dest.- Parameters:
stimuli (
{'nat', 'vmn'})dest (
str, optional) – Defaults todefault_cache_dir('Espejo')(overridable via$DEEPSTRF_DATA_DIR).
- Returns:
The dataset root directory.
- Return type:
str
Notes
Idempotent: skips the archive if already present, and skips the untar step if the expected
<subdir>/already exists. NAT is ~638 MB, VMN is ~25 MB.
- deepSTRF.datasets.audio.espejo.download_espejo_nat_waveforms(names: Sequence[str], dest: str | None = None, *, progress: bool = True) Dict[str, str][source]
Fetch the raw NAT waveforms from the LBHB baphy bitbucket mirror.
- Parameters:
names (
sequenceofstr) – Stim names as they appear instim_meta(STIM_<file>.wav); theSTIM_prefix is stripped to get the on-mirror filename.dest (
str, optional) – Parent directory; wavs are cached under<dest>/nat_waveforms/. Defaults todefault_cache_dir('Espejo').progress (
bool, defaultTrue) – Show a tqdm bar over the (missing) downloads.
- Returns:
name -> local wav pathfor every name found on the mirror.- Return type:
dict
Notes
Idempotent — already-cached wavs are skipped. Each filename is tried in
sounds_set3/first, thensounds/. Names found in neither are collected and surfaced by the caller (the dataset raises on genuine misses so waveform mode never silently substitutes silence).
deepSTRF.datasets.audio.alice_eeg module
Alice EEG dataset adapted to the deepSTRF paradigm.
See docs/_source/md/README_Alice_EEG.md for context, citation, and the
benchmark comparison against Brodbeck et al. 2023 (eLife).
- class deepSTRF.datasets.audio.alice_eeg.AliceEEGDataset(path: str | None = None, subjects: Sequence[str] | None = None, dt_ms: float = 10.0, n_frequency_bands: int = 8, treat_subjects_as: str = 'neurons', hp_freq_hz: float | None = 1.0, lp_freq_hz: float | None = None, window_ms: float | None = None, fmin: float = 80.0, fmax: float | None = None, spec_backend: str = 'gaussian', download: bool = False, return_waveform: bool = False, audio_fs: int = 44100)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for EEG from the Alice audiobook listening paradigm.
33 human participants listened to the first chapter of Alice in Wonderland (~12.4 min) split into 12 audio segments, recorded with 61 EEG channels per subject (10-20-like montage). Bad channels and bad artifact windows (marked in the source
.fifmetadata) are converted to NaN at the response level. Each subject heard each segment once (R = 1). deepSTRF consumes Brodbeck et al. 2023’s restructured release (UMd PULFR10.13016/pulf-lndn): per-subject MNE.fiffiles plus 12 audio segments and a word-onset table. Seedocs/_source/md/README_Alice_EEG.mdfor the full dataset notes.The
treat_subjects_asargument selects one of two layouts:"neurons"(default): every(subject, channel)pair becomes a “neuron”;N = sum_s(n_channels_s)andR = 1everywhere. Bad channels carry the structural NaN sentinel. Usecorrcoef/fve."repeats": subjects are treated as repeats of a shared canonical per-channel EEG response;N = n_montage_channels(e.g. 61) andR = n_subjects. Bad(channel, subject)combinations become NaN repeat slabs. Useful for inter-subject reliability (ISC-style) vianormalized_corrcoef(method='schoppe')— but note this is inter-subject reliability, not trial reliability, so the iid-trial noise model the Schoppe correction assumes does not strictly hold; treat the resulting ceiling as a group-level sanity check.
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). Alice-specific metadata:stimsareS = 12log-power ERB-band spectrograms(1, F, T_s)(a gammatone approximation; see_gammatone_spectrogram).stim_metadicts holdname,type,sample_rate,n_samplesandduration_s.nrn_metadicts holdchannel_id,subject,areaandxyzin"neurons"mode; a channel-only entry in"repeats"mode.
The default
spec_backend='gaussian'is a frequency-domain Gaussian approximation of Brodbeck 2023’s time-domain gammatone (Heeris) filterbank — spectrally equivalent to first order but with lower dynamic range and less time-localized transients.spec_backend='heeris'selects the paper-faithful bank (requires the optionalgammatonepackage in the[eeg]extra). Thewindow_ms/fmin/fmaxconstructor knobs control the FFT window and ERB-band edges; their defaults preserve the historical behaviour, so no existing fits change.References
Bhattasali et al. (2020). “The Alice Datasets: fMRI & EEG Observations of Natural Language Comprehension.” LREC.
Brennan et al. (2019). “Hierarchical structure guides rapid linguistic predictions during naturalistic listening.” PLOS ONE.
Brodbeck et al. (2023). Eelbrain methods paper. eLife (Tools & Resources).
- Parameters:
path (
str, optional) – Path to the Brodbeck-restructured Alice EEG data directory (containingeeg.0/,eeg.1/,eeg.2/,stimuli/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIRoverrides).subjects (
sequenceofstr, optional) – Subject ids (e.g.["S01", "S20"]). IfNone, all subjects discovered on disk are used.dt_ms (
float, default10.0) – Time-bin width in ms (100 Hz with the default — matches the eelbrain paper’s analysis rate).n_frequency_bands (
int, default8) – ERB-band count for the gammatone-equivalent spectrogram. Matches Brodbeck 2023 Fig 4.treat_subjects_as (
{"neurons", "repeats"}, default"neurons") – See the class docstring.hp_freq_hz (
floatorNone, default1.0) – High-pass cutoff applied viaraw.filterbefore downsampling and segmentation. The Brodbeck restructure ships data with a 0.1 Hz HP, which leaves enough slow drift across the ~12 min recording that per-segment baselines vary by >1 SD — fatal for held-out fve. Brodbeck applies 1 Hz HP in the paper’s analysis pipeline; we mirror that as the default. PassNoneto skip.lp_freq_hz (
floatorNone, defaultNone) – Optional low-pass cutoff. Useful if you want to focus on the cortical-tracking band (< 40 Hz) or the envelope-tracking band (< 8 Hz).window_ms (
float, optional) – FFT analysis-window length in ms for the stimulus spectrogram.Nonepreserves the legacyn_fft=1024default — at the audiobook sample rate (16 kHz) this gives a ~64 ms window; at 44.1 kHz, ~23 ms. Pass an explicitwindow_msto override (e.g.25.0for the Kaldi convention). The spec pipeline is otherwise unchanged from the audit baseline — see the “Audit status” callout below before benchmarking against Brodbeck 2023.fmin (
float, optional) – Lower and upper ERB-band edges in Hz. Default80.0andsr/2(Nyquist). For speech-tracking work, passfmax=8000to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).fmax (
float, optional) – Lower and upper ERB-band edges in Hz. Default80.0andsr/2(Nyquist). For speech-tracking work, passfmax=8000to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).spec_backend (
{'gaussian', 'heeris'}, default'gaussian') –Spec-pipeline backend.
'gaussian'(back-compat): frequency-domain Gaussian ERB filterbank — the deepSTRF approximation that’s been shipped to date.'heeris'(paper-faithful, requiresgammatonePyPI package): time-domain Heeris filterbank, same as Brodbeck et al. 2023 via eelbrain’sgammatone_bank. See the empirical comparison atuntracked/alice_eeg_spec_compare.py— Heeris has visibly sharper time-localization and broader dynamic range. Recommended when reproducing the paper.
download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the four zips from the UMd DRUM mirror (~2.5 GiB total; anonymous HTTPS). Idempotent — skips any zip / unpacked subtree already present.
- deepSTRF.datasets.audio.alice_eeg.download_alice_eeg(dest: str | None = None) str[source]
Download Brodbeck’s restructured Alice EEG release from UMd DRUM.
Idempotent: skips any zip that’s already on disk and any subdirectory that’s already unpacked. Returns the dataset directory.
- Parameters:
dest (
str, optional) – Defaults to the platformdirs cache (overridable via$DEEPSTRF_DATA_DIR).
Notes
~2.5 GiB total across four zips. Anonymous HTTPS; no auth.
deepSTRF.datasets.audio.le_2025 module
Le, Bjoring & Meliza (2025), Nature Communications — zebra finch dataset.
“The zebra finch auditory cortex reconstructs occluded syllables in conspecific song.” DOI: 10.1038/s41467-025-63182-y. Data: 10.6084/m9.figshare.29203457. Code: github.com/melizalab/auditory-restoration.
Single-unit extracellular recordings from the auditory pallium of anesthetized adult zebra finches, in response to 8 natural song motifs (and in cohort 3, 8 scrambled-syntax pseudo-motifs) presented in up to 7 variants per critical interval (CI) to probe the neural correlate of auditory restoration.
Sub-experiments (one experiment= per instance; concat for the union):
nat8aCohorts 1 & 2 — natural motifs (8 birds × 2 CIs × {C, G, N, GB, CB}). Cohort 1 (alpha) had a familiarity manipulation; cohort 2 (beta) did not. No masking variants.
nat8bCohort 3 — same natural motifs renamed
nat8mk0..7, full set of 7 variants per CI (adds GM, CM).synth8bCohort 3 — 8 scrambled-syntax pseudo-motifs, full variant set.
Per-CI variants:
C(Continuous)Unmodified motif; shared across both CIs.
G(Gap)CI replaced by silence.
N(Noise)CI-duration noise burst in isolation.
GB(Gap + Burst)CI replaced by noise within the motif; the illusion-inducing stimulus.
CB(Continuous + Burst)Motif unchanged, noise added on top of the CI.
GM(Gap-Masked)Whole motif masked, CI deleted (
nat8b/synth8bonly).CM(Continuous-Masked)Whole motif masked, CI intact (
nat8b/synth8bonly). CM is CI-independent, so it lives once per motif.
- class deepSTRF.datasets.audio.le_2025.Le2025Dataset(path: str | Path | None = None, experiment: str = 'nat8b', dt_ms: float = 5.0, n_bands: int = 50, fmin: float = 1000.0, fmax: float = 8000.0, window_ms: float = 2.5, smooth: bool = True, keep_areas: Sequence[str] | None = None, compute_reliability: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]
Bases:
AudioNeuralDatasetdeepSTRF wrapper for one sub-experiment of Le, Bjoring & Meliza (2025).
Instantiate one per experiment (
"nat8a"|"nat8b"|"synth8b") and concatenate withconcat_neural_datasets(ords_a + ds_b) for the union; the three experiments share no stimuli, so the bidirectional selection rule in the base class hides cross-experiment NaN-only entries automatically.Layout on disk (as shipped on figshare):
<path>/ ├── metadata/ │ ├── recordings.csv area for nat8a-beta / nat8b / synth8b │ ├── song-birds.csv motif name mapping + CI timings (ms) │ └── ephys-birds.csv cohort, sex, age, familiarity group ├── nat8a-alpha-responses/ cohort 1 (familiarity manipulation) ├── nat8a-beta-responses/ cohort 2 ├── nat8a-stimuli/ shared by alpha + beta ├── nat8b-responses/ cohort 3 (natural-syntax) ├── nat8b-stimuli/ ├── synth8b-responses/ cohort 3 (scrambled-syntax) └── synth8b-stimuli/
Two pprox schemas coexist in the archive: the legacy
spec:2/pprox(used only by nat8a-alpha; spike times in ms,conditionfield encodes variant) andspec:2/stimtrial(everything else; spike times in s, stimulus dict carries the full filename stem). Both are handled below.- Parameters:
path – Filesystem path to the unpacked figshare archive (the directory that contains
metadata/,*-responses/,*-stimuli/).experiment – One of
"nat8a"|"nat8b"|"synth8b".nat8aunifies the two cohorts (alpha + beta) that share the same stim set; useselect_pop_by_nrn_attr("cohort", 1)to restrict to the familiarity sub-experiment.dt_ms – Bin width in ms. Default 5; pass
dt_ms=1for paper-faithful spectrogram + response binning (the paper uses 1 ms throughout).n_bands – Number of gammatone bands. Default 50, matching the paper.
fmin – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.
fmax – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.
window_ms – Gammatone analysis-window width, in ms. Default 2.5, matching the paper. Hop is
dt_ms.smooth – If True (default), apply a 21 ms Hanning smoother to all PSTHs (Hsu / Borst / Theunissen 2004).
keep_areas – Optional iterable of area strings to filter on (per the per-unit
areametadata field; values vary by cohort).compute_reliability – If True (default), pre-compute per-neuron Sahani–Linden signal power, noise power, and SNR (length-weighted across stims) and attach them to
nrn_meta. Set toFalsefor fast iteration when reliability filtering is not needed.download – If
Trueandpath=None, fetches the ~105 MB figshare archive (DOI10.6084/m9.figshare.29203457) into the deepSTRF cache and unpacks it. Idempotent: skips the download if the unpacked tree is already present.
Notes
- Stim-side metadata fields:
name: filename stem.motif: e.g."B189"(nat8a) or"nat8mk0"(nat8b).critical_intervalint(1/2) for per-CI variants,Nonefor
CandCM(CI-independent), or a string like"2a"for synth8b.
variant: one ofVARIANTS.syntax:"natural"or"scrambled".experiment:"nat8a"|"nat8b"|"synth8b".sample_rate_hz: native sample rate of the source wav.duration_s: duration of the source wav, in seconds.ci_onset_s/ci_offset_scritical-interval bounds in seconds(NaN for C/CM and for synth8b — the CI table only covers nat8a/nat8b).
- Per-neuron metadata fields:
cell_id,animal_id,animal_uuid,cohort,experiment,area,hemisphere,familiar_motifs(list of motif IDs the bird was reared with; empty unless cohort 1),sex,age_days,pprox_file.
- select_critical_interval(ci) List[int][source]
Restrict iteration to one CI index (or
Nonefor C/CM variants).
- select_restoration_quartet(motif: str, ci, variants: Sequence[str] = ('C', 'CB', 'GB', 'GM')) List[int][source]
Select the stim set used in the paper’s core restoration analysis for one (motif, CI). Defaults to C / CB / GB / GM — the four trajectories compared in Fig. 4. Returns the selected stim indices.
The
Ccontinuous andCMmasked variants are CI-independent and are kept whenever they appear invariants, regardless ofci.
Module contents
- class deepSTRF.datasets.audio.AliceEEGDataset(path: str | None = None, subjects: Sequence[str] | None = None, dt_ms: float = 10.0, n_frequency_bands: int = 8, treat_subjects_as: str = 'neurons', hp_freq_hz: float | None = 1.0, lp_freq_hz: float | None = None, window_ms: float | None = None, fmin: float = 80.0, fmax: float | None = None, spec_backend: str = 'gaussian', download: bool = False, return_waveform: bool = False, audio_fs: int = 44100)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for EEG from the Alice audiobook listening paradigm.
33 human participants listened to the first chapter of Alice in Wonderland (~12.4 min) split into 12 audio segments, recorded with 61 EEG channels per subject (10-20-like montage). Bad channels and bad artifact windows (marked in the source
.fifmetadata) are converted to NaN at the response level. Each subject heard each segment once (R = 1). deepSTRF consumes Brodbeck et al. 2023’s restructured release (UMd PULFR10.13016/pulf-lndn): per-subject MNE.fiffiles plus 12 audio segments and a word-onset table. Seedocs/_source/md/README_Alice_EEG.mdfor the full dataset notes.The
treat_subjects_asargument selects one of two layouts:"neurons"(default): every(subject, channel)pair becomes a “neuron”;N = sum_s(n_channels_s)andR = 1everywhere. Bad channels carry the structural NaN sentinel. Usecorrcoef/fve."repeats": subjects are treated as repeats of a shared canonical per-channel EEG response;N = n_montage_channels(e.g. 61) andR = n_subjects. Bad(channel, subject)combinations become NaN repeat slabs. Useful for inter-subject reliability (ISC-style) vianormalized_corrcoef(method='schoppe')— but note this is inter-subject reliability, not trial reliability, so the iid-trial noise model the Schoppe correction assumes does not strictly hold; treat the resulting ceiling as a group-level sanity check.
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). Alice-specific metadata:stimsareS = 12log-power ERB-band spectrograms(1, F, T_s)(a gammatone approximation; see_gammatone_spectrogram).stim_metadicts holdname,type,sample_rate,n_samplesandduration_s.nrn_metadicts holdchannel_id,subject,areaandxyzin"neurons"mode; a channel-only entry in"repeats"mode.
The default
spec_backend='gaussian'is a frequency-domain Gaussian approximation of Brodbeck 2023’s time-domain gammatone (Heeris) filterbank — spectrally equivalent to first order but with lower dynamic range and less time-localized transients.spec_backend='heeris'selects the paper-faithful bank (requires the optionalgammatonepackage in the[eeg]extra). Thewindow_ms/fmin/fmaxconstructor knobs control the FFT window and ERB-band edges; their defaults preserve the historical behaviour, so no existing fits change.References
Bhattasali et al. (2020). “The Alice Datasets: fMRI & EEG Observations of Natural Language Comprehension.” LREC.
Brennan et al. (2019). “Hierarchical structure guides rapid linguistic predictions during naturalistic listening.” PLOS ONE.
Brodbeck et al. (2023). Eelbrain methods paper. eLife (Tools & Resources).
- Parameters:
path (
str, optional) – Path to the Brodbeck-restructured Alice EEG data directory (containingeeg.0/,eeg.1/,eeg.2/,stimuli/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIRoverrides).subjects (
sequenceofstr, optional) – Subject ids (e.g.["S01", "S20"]). IfNone, all subjects discovered on disk are used.dt_ms (
float, default10.0) – Time-bin width in ms (100 Hz with the default — matches the eelbrain paper’s analysis rate).n_frequency_bands (
int, default8) – ERB-band count for the gammatone-equivalent spectrogram. Matches Brodbeck 2023 Fig 4.treat_subjects_as (
{"neurons", "repeats"}, default"neurons") – See the class docstring.hp_freq_hz (
floatorNone, default1.0) – High-pass cutoff applied viaraw.filterbefore downsampling and segmentation. The Brodbeck restructure ships data with a 0.1 Hz HP, which leaves enough slow drift across the ~12 min recording that per-segment baselines vary by >1 SD — fatal for held-out fve. Brodbeck applies 1 Hz HP in the paper’s analysis pipeline; we mirror that as the default. PassNoneto skip.lp_freq_hz (
floatorNone, defaultNone) – Optional low-pass cutoff. Useful if you want to focus on the cortical-tracking band (< 40 Hz) or the envelope-tracking band (< 8 Hz).window_ms (
float, optional) – FFT analysis-window length in ms for the stimulus spectrogram.Nonepreserves the legacyn_fft=1024default — at the audiobook sample rate (16 kHz) this gives a ~64 ms window; at 44.1 kHz, ~23 ms. Pass an explicitwindow_msto override (e.g.25.0for the Kaldi convention). The spec pipeline is otherwise unchanged from the audit baseline — see the “Audit status” callout below before benchmarking against Brodbeck 2023.fmin (
float, optional) – Lower and upper ERB-band edges in Hz. Default80.0andsr/2(Nyquist). For speech-tracking work, passfmax=8000to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).fmax (
float, optional) – Lower and upper ERB-band edges in Hz. Default80.0andsr/2(Nyquist). For speech-tracking work, passfmax=8000to drop bands above the speech-relevant range (matches Brodbeck 2023’s published lower-band figure roughly; not empirically validated against the paper’s actual filterbank — see “Audit status”).spec_backend (
{'gaussian', 'heeris'}, default'gaussian') –Spec-pipeline backend.
'gaussian'(back-compat): frequency-domain Gaussian ERB filterbank — the deepSTRF approximation that’s been shipped to date.'heeris'(paper-faithful, requiresgammatonePyPI package): time-domain Heeris filterbank, same as Brodbeck et al. 2023 via eelbrain’sgammatone_bank. See the empirical comparison atuntracked/alice_eeg_spec_compare.py— Heeris has visibly sharper time-localization and broader dynamic range. Recommended when reproducing the paper.
download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the four zips from the UMd DRUM mirror (~2.5 GiB total; anonymous HTTPS). Idempotent — skips any zip / unpacked subtree already present.
- class deepSTRF.datasets.audio.AudioNeuralDataset(path: str, dt_ms: float)[source]
Bases:
NeuralDatasetNeural dataset class for auditory stimuli.
Stim shape is polymorphic depending on the loading mode:
Spectrogram mode (default):
self.stims[s]is a(1, F, T)tensor whereF = self.Fis the frequency-band count andTis the neural time-bin count.Waveform mode (opt-in, subclass-specific):
self.stims[s]is a(1, T_audio)mono float32 tensor at sample rateself.audio_fs. Subclasses that support this mode expose areturn_waveform=Trueconstructor flag and setself.audio_fsto a positive int. The(1, ...)leading dim is the mono-channel axis, kept for collate compatibility (neural_collatezero-pads the last axis only).
Subclasses must additionally set
self.F(number of frequency bins in the spectrogram — kept positive even in waveform mode so downstream models know the target spectrogram width awav2specmodule should produce) in their__init__, before callingself.validate().- F
Frequency-band count of the target spectrogram. Set by the subclass.
- Type:
int
- audio_fs
Sample rate of the raw waveform when in waveform mode;
Noneotherwise. Subclasses without a waveform branch leave thisNone.- Type:
intorNone
- hearing_range_hz
Optional
(low, high)informational bound on the species’ canonical hearing range in Hz (e.g.(200.0, 40000.0)for ferret). Purely advisory — nothing is enforced against it; it exists so notebooks / tooling can display the range and users can choose to clamp awav2spec’s frequency limits.Nonewhen unknown.- Type:
tupleoffloatorNone
- get_F()[source]
Return the number of frequency bins in the spectrograms.
- Returns:
self.F, the spectrogram frequency-band count.- Return type:
int
- property hop: int | None
Audio samples per neural bin in waveform mode (
Nonein spec mode).hop = round(audio_fs * dt_ms / 1000). The grid-lock contract (seevalidate()) requires this to be an exact integer, so awav2specfront-end’s ownhopmust equal this value for the audio→neural resampling to stay aligned with the response bins.
- class deepSTRF.datasets.audio.CRCNSAA1Dataset(path: str | None = None, areas=('Field_L', 'MLd'), stimuli=('conspecific', 'flatrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the CRCNS-AA1 recordings.
Extracellular, spike-sorted single units of anesthetized male zebra finches: 50 cells in Field L and 50 in MLd, recorded in response to 10 clips of conspecific vocalizations and 20 clips of flat ripples (up to 5 s each, ~10 trials on average). Data are available at https://crcns.org/data-sets/aa/aa-1/about (free CRCNS account); see the AA1 README in the deepSTRF docs for the full notes.
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). AA1-specific metadata:stimsare mel-spectrograms(1, F, T_s).stim_metadicts holdname,type,sample_rate,n_samplesandduration_s.nrn_metadicts holdcell_id,animal_id,area,cell_seqandrig.cell_seqis the sequential cell index parsed from the cell folder name (the n-th cell recorded);rigis the single-letter rig label when present, elseNone(cells “4_A” and “4_B” were recorded simultaneously, possibly in different areas).
Two cells lack
conspecificresponses:pipu1018_2_A(MLd) andpipu1018_2_B(Field_L).References
Woolley et al. (2005). “Tuning for Spectro-temporal Modulations: a Mechanism for Auditory Discrimination of Natural Sound.”
Hsu et al. (2004). “Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons.”
Singh & Theunissen (2003). “Modulation spectra of natural sounds and ethological theories of auditory processing.”
Initializes the AA1 Dataset.
- Parameters:
path (
str, optional) – Path to the AA1 data folder (containingField_L_cells/,MLd_cells/,all_stims/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIRoverrides).areas (
tupleofstr) – Recording sites: ‘Field_L’ or ‘MLd’.stimuli (
tupleofstr) – Stimulus types: ‘conspecific’ or ‘flatrip’.dt_ms (
float) – Time step size in ms.n_mels (
int) – Number of mel frequency bands.compression (
str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored whenreturn_waveform=True.window_ms (
float, default10.0) – FFT analysis-window length in ms.n_fftis computed asround(window_ms * 1e-3 * sample_rate)and is decoupled from ``hop_length`` so phonemic detail is preserved at anydt_ms. Earlier versions of this dataset hardcodedn_fft = 10 * hop_length— benign at the defaultdt_ms=1(10 ms FFT window), but atdt_ms=50the same formula produced a 500 ms FFT window and over-smoothed every spec frame. The defaultwindow_ms=10.0preserves bit-identical behaviour atdt_ms=1while removing the scaling bug at coarser bins. Speech-pipeline users may preferwindow_ms=25.0(Kaldi default). Ignored whenreturn_waveform=True.return_waveform (
bool, defaultFalse) – If True,self.stims[s]holds the raw audio waveform(1, T_audio)ataudio_fsHz (grid-locked toT_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whosewav2specslot is a waveform front-end (seedeepSTRF.models.wav2spec); responses are unchanged.audio_fs (
int, default32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA1 wavs (so no resampling); other values resample and must keepaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the ~17 MB CRCNS-AA1 archive from the NERSC mirror (free CRCNS account required; seecrcns_download) and unzip in place.username (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.password (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer the env vars over passing literals — anything in source / a notebook ends up in history / logs / VCS.
- class deepSTRF.datasets.audio.CRCNSAA2Dataset(path: str | None = None, areas=('Field_L', 'mld', 'OV', 'CM', 'None'), stimuli=('conspecific', 'flatrip', 'songrip'), animals='all', dt_ms=1, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 32000, download: bool = False, username: str | None = None, password: str | None = None)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the CRCNS-AA2 recordings (OV, MLd, Field L, CM).
494 extracellular, spike-sorted single units of male zebra finches, identified in OV, MLd, Field L, L1, L2a, L2b, L3 (and some with unidentified area,
None). Three stimulus classes — conspecific songs (72 stims), flat ripples (20) and song ripples (25) — each presented 10-20 times, with low trial-to-trial variability. Almost all cells saw conspecific and songrip stimuli; about half saw flatrip. Population fitting-compatible. Data are available at https://crcns.org/data-sets/aa/aa-2/about (free CRCNS account).Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). AA2-specific metadata:stimsare mel-spectrograms(1, F, T_s).stim_metadicts holdname,type,sample_rate,n_samplesandduration_s(the last three fromdata/stim_data.csv).nrn_metadicts holdcell_id,animal_id,area,cell_seqandrig(seeCRCNSAA1Datasetfor the cell-name format;rigis oftenNonein AA2).
References
Gill et al. (2006). “Sound representation methods for spectro-temporal receptive field estimation.”
Amin et al. (2010). “Role of the Zebra Finch Auditory Thalamus in Generating Complex Representations for Natural Sounds.”
Initializes the AA2 Dataset.
- Parameters:
path (
str, optional) – Path to the AA2 data folder. Defaults to the platformdirs cache.areas (
tupleofstr) – Recording sites of interest: ‘Field_L’, ‘L1’, ‘L2a’, ‘L2b’, ‘L3’, ‘mld’, ‘OV’, ‘CM’, or ‘None’.stimuli (
tupleofstr) – Stimulus types of interest: ‘conspecific’, ‘flatrip’, ‘songrip’.dt_ms (
float) – Time step size in ms.n_mels (
int) – Number of mel frequency bands.compression (
str) – Spectrogram compression (‘cubic’, ‘log1p’, ‘none’). Ignored whenreturn_waveform=True.window_ms (
float, default10.0) – FFT analysis-window length in ms.n_fftis computed asround(window_ms * 1e-3 * sample_rate)and is decoupled from ``hop_length`` so phonemic detail is preserved at anydt_ms. Earlier versions of this dataset hardcodedn_fft = 10 * hop_length, which gave a benign 10 ms FFT window atdt_ms=1but a 500 ms window atdt_ms=50. Defaultwindow_ms=10.0preserves bit-identical behaviour atdt_ms=1while fixing the scaling bug at coarser bins. Ignored whenreturn_waveform=True.return_waveform (
bool, defaultFalse) – If True,self.stims[s]holds the raw audio waveform(1, T_audio)ataudio_fsHz (grid-locked toT_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whosewav2specslot is a waveform front-end; responses are unchanged.audio_fs (
int, default32000) – Sample rate for waveform mode. Default 32 kHz is the native rate of the AA2 wavs (no resampling); other values resample and must keepaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the ~30 MB worth of CRCNS-AA2 archives from the NERSC mirror (free CRCNS account required) and extract in place.username (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.password (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.
- class deepSTRF.datasets.audio.CRCNSAA4Dataset(path: str | None = None, animals='all', stimuli=('song', 'call', 'mlnoise'), dt_ms=1.0, smooth=True, n_mels=32, compression='cubic', window_ms: float = 10.0, return_waveform: bool = False, audio_fs: int = 24000, download: bool = False, username: str | None = None, password: str | None = None)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the CRCNS-AA4 recordings.
1401 extracellular, spike-sorted single and multi units of adult zebra finches (4 males, 2 females) in Field L, caudolateral and caudomedial mesopallium (CLM, CMM) and caudomedial nidopallium (NCM) — though units were not precisely assigned to one of these areas. Three stimulus classes (conspecific songs, calls, ripple noise), each a few seconds long and presented ~10 times. Population- and batch-compatible. Data are available at https://crcns.org/data-sets/aa/aa-4/about-aa-4 (free CRCNS account).
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). AA4-specific metadata:stimsare mel-spectrograms(1, F, T_s).stim_metadicts holdname(the stimulus md5 — the canonical identifier, since the wav filename is per-animal and not unique across the corpus),type,classandduration_s(thestim_durationattr from the h5, in seconds).nrn_metadicts hold:cell_id(h5 basename, no extension),animal_id,sex('M'/'F'),site(e.g."Site1"),electrode(int 1-32, channel index across both 16-channel arrays at a site),ldepth/rdepth(left / right array depth in µm),sort_type('single'/'multi';'noise'/'tdt'are filtered out),sort_id(online-sort int) andsubsort_id(offline-sort int parsed from the trailing_ss<N>;Noneif absent).
The dataset paper does not publish a per-cell brain-area assignment, so the depth + electrode-array geometry is the only anatomical proxy; nor does it document which electrode IDs (1-16 vs 17-32) map to the left vs right hemisphere — confirm with the dataset authors before deriving a hemisphere from
electrode.References
Elie & Theunissen (2015). “Meaning in the avian auditory cortex: Neural representation of communication calls.” European Journal of Neuroscience.
Elie & Theunissen (2019). “Invariant neural responses for sensory categories revealed by the time-varying information for communication calls.” PLoS Computational Biology.
Initializes the AA4 Dataset.
- Parameters:
path (
str, optional) – Path to theCRCNS_AA4/data/folder containing one subfolder per animal (with.h5cell files + awavfiles/directory of stimulus.wavfiles). Defaults to the platformdirs cache.animals (
'all'orsequenceofstr) – Animals to load (any subset ofAA4_ANIMAL_IDS).stimuli (
sequenceofstr) – Stimulus types to keep; subset of {‘song’, ‘call’, ‘mlnoise’}.dt_ms (
float) – Time-bin width in ms.smooth (
bool) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).n_mels (
int) – Number of mel frequency bands of the stimulus spectrogram.compression (
{'cubic', 'log1p', 'none'}) – Compression applied to the spectrogram (saturation effect of hair cells). Ignored whenreturn_waveform=True.window_ms (
float, default10.0) – FFT analysis-window length in ms.n_fftis computed per-stim asround(window_ms * 1e-3 * sample_rate)and is decoupled from ``hop_length`` so phonemic detail is preserved at anydt_ms. Earlier versions of this dataset hardcodedn_fft = hop * 10— atdt_ms=50that gave a 500 ms FFT window and over-smoothed every spec frame. Defaultwindow_ms=10.0preserves bit-identical behaviour atdt_ms=1(n_fft=320 at sr=32 kHz) while fixing the scaling bug at coarser bins. Ignored whenreturn_waveform=True.return_waveform (
bool, defaultFalse) – If True,self.stims[s]holds the raw audio waveform(1, T_audio)ataudio_fsHz (grid-locked toT_audio = T_neural * hop) instead of the in-loader mel spectrogram. Pair with a model whosewav2specslot is a waveform front-end; responses are unchanged.audio_fs (
int, default24000) – Sample rate for waveform mode. The AA4 wavs are 24414 Hz, which gives a non-integer hop at dt=1 ms; the default 24 kHz resamples to a cleanhop = 24(exactly dt=1 ms bins, slightly better than the native spec’s 0.983 ms). Other values must keepaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.download (
bool, defaultFalse) – If True and an animal’s data is missing underpath, fetch its tarball (~hundreds of MB per animal) from the NERSC mirror and untar in place. Only the animals listed inanimalsare downloaded — useful for quick iteration on a subset.username (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.password (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORD. Prefer env vars over passing literals.
- class deepSTRF.datasets.audio.CRCNSAC1Dataset(path: str | None = None, experimenter: None | str | Iterable[str] = ('wehr', 'asari'), sites: None | str | Iterable[str] = ('A1', 'MGB'), dt_ms: float = 5.0, fmin: float = 100.0, fmax: float = 45000.0, bins_per_octave: int = 6, window_ms: float | None = None, detrend_med_ms: float = 100.0, detrend_gauss_ms: float = 10.0, gating: RepeatGating | None = None, return_waveform: bool = False, audio_fs: int = 96000, download: bool = False, username: str | None = None, password: str | None = None)[source]
Bases:
AudioNeuralDatasetUnified loader for the Wehr + Asari subsets of CRCNS-AC1.
Both subsets are intracellular Vm in anaesthetised rat auditory pathway — Wehr in A1 (whole-cell, sf=4 kHz), Asari in A1 + MGB (whole-cell + cell-attached, sf=10 kHz) — and both record natural- sound responses with multi-trial repeats per stimulus. The loader deduplicates stimuli across cells (via the shared NaN-sentinel paradigm) so the same waveform never gets a duplicate spectrogram when it was presented to multiple cells.
- Parameters:
path (
str, optional) – Directory holding (or about to hold) the three CRCNS-AC1 zips and their extracted contents. Defaults todefault_cache_dir( 'CRCNS_AC1')(overridable via$DEEPSTRF_DATA_DIR).experimenter (
stroriterableofstr, optional) –'wehr','asari', or both. Default loads both.sites (
stroriterableofstr, optional) –'A1'and/or'MGB'. Default loads both. Wehr is all-A1; Asari has both areas.The signal type is not a free choice — it is determined by each cell’s recording mode, because the recording mode dictates what signal physically exists:
whole-cell (Wehr A1, Asari A1) →
'subthresh': MedGauss-detrended membrane potential in mV (signed). Action potentials were blocked (Wehr) or not analysed (Asari A1, per the paper); the synaptic input is the signal. Pair with MSE.cell-attached (Asari MGB) →
'spikes': a Hann-smoothed spike-rate PSTH (non-negative). There is no intracellular Vm in cell-attached mode. Pair with Poisson.
Each cell carries its resolved type in
nrn_meta['signal_type'];self.signal_typeis that type if the loaded cohort is homogeneous, else'mixed'(loading A1 + MGB together mixes signed-mV and spike-rate neurons — filter by site / signal_type before training one model across them).dt_ms (
float, default5.0) – Output time-bin width in ms. The Goertzel STFT is parametrised to produce its frames at exactly this resolution (no two-step compute-then-downsample); the response is average-pooled to match.fmin (
float) – Spectrogram frequency range in Hz. Defaults to the Asari 2025 layout:(100.0, 45000.0). Passfmax=25600.0to recover the Wehr 2024 setting (49 bands).fmax (
float) – Spectrogram frequency range in Hz. Defaults to the Asari 2025 layout:(100.0, 45000.0). Passfmax=25600.0to recover the Wehr 2024 setting (49 bands).bins_per_octave (
int, default6) – Spectrogram spectral density. With the defaults this yieldsF=53.window_ms (
float, optional) – STFT analysis-window length in ms. Defaults to2 * dt_ms(legacy MATLABoverlap=2).detrend_med_ms (
float, default100.0) – Median-filter window (ms) for the MedGauss baseline subtracted from each Vm trace. Larger windows remove only slow drift and preserve more low-frequency response dynamics; smaller windows detrend more aggressively. The 100 ms default matches the Rançon 2024/2025 pipeline; the choice is robust (residuals barely change between 100 and 1000 ms because the response is dominated by fast PSP transients).detrend_gauss_ms (
float, default10.0) – Gaussian-smoothing σ (ms) applied to the median-filtered baseline before subtraction.gating (
RepeatGating, optional) – Per-repeat artifact-rejection thresholds. Default values gate out repeats with derivative-MAD jumps and excessive dynamic range; see_crcns_ac1_native.RepeatGating.return_waveform (
bool, defaultFalse) – If True, hand out the raw stimulus waveform per stim as a(1, T_audio)mono tensor resampled toaudio_fsinstead of the in-loader log-spectrogram, for use with a learnablewav2specmodel front-end. Because the source waveforms have heterogeneous sample rates (Asari at 97656 Hz, Wehr differs), they are all resampled to the singleaudio_fsso the grid-lock (T_audio = T_neural * hop,hop = audio_fs * dt_ms / 1000) holds dataset-wide. offset 0 — the waveform starts at stimulus onset, matching the response trace. Responses are identical to spectrogram mode.audio_fs (
int, default96000) – Common sample rate the waveforms are resampled to in waveform mode (ignored in spectrogram mode, whereself.audio_fsisNone). The default 96 kHz exceeds twice the defaultfmax(45 kHz) so no in-band content is lost, and grid-locks cleanly for any integerdt_ms(96000/1000 = 96 samples per ms).download (
bool, defaultFalse) – If True, fetch the three archives viadownload_ac1()before extraction. Requires CRCNS credentials.username (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORDenv vars.password (
str, optional) – CRCNS credentials. Default to$CRCNS_USERNAME/$CRCNS_PASSWORDenv vars.
Notes
deepSTRF data paradigm — see
docs/_source/md/data_paradigm.md. Per-stim metadata:stim_metadicts holdexperimenter,category,idx(Wehr) orclass_n/segments/segment_files(Asari),description,duration_s.nrn_metadicts holdexperimenter,session,animal_id,penetration,date,site,recording_type,signal_type('subthresh'/'spikes', derived from the recording mode),species, plus_wehr_cell_idxfor Wehr cells (used withWEHR_VALID_NEURONS/WEHR_NEURONS_SPLIT_NATURALfor Rançon-paper reproducibility).
References
Machens, Wehr & Zador (2004). J. Neurosci. 24(5):1089-1100. Asari & Zador (2009). J. Neurophysiol. 102(5):2638-2656. Rançon, Masquelier & Cottereau (2025). Commun. Biol. 8:1456.
- class deepSTRF.datasets.audio.Downer2025Dataset(path: str | None = None, stimuli: Literal['timit', 'mvocs'] = 'timit', dt_ms: float = 5.0, n_mels: int = 80, compression: Literal['cubic', 'log1p', 'none'] = 'log1p', spec_zscore: bool = True, smooth: bool = True, subset: Literal['all', 'estimation', 'test'] = 'all', animals: str | Sequence[str] = 'all', areas: Sequence[str] | None = None, sessions: Sequence[str] | None = None, audio_fs: int = 16000, fmax: int = 8000, window_ms: float = 25.0, return_waveform: bool = False, download: bool = False, _enumerate_only: bool = False)[source]
Bases:
AudioNeuralDatasetSquirrel-monkey auditory cortex (Downer 2025 / Ahmed 2025).
Multi-unit threshold-crossing spike trains from 41 sessions across 3 animals (B, C, F), recorded passively while the animal listened to TIMIT speech and monkey vocalizations. 1718 multi-units total (one per recording channel).
Two stim modes are loaded independently:
stimuli='timit'— 499 unique English sentences (489 single-rep + 10 with 11 reps; the 10 form the canonical test subset per Ahmed 2025).stimuli='mvocs'— 303 unique monkey vocalizations (292 single- rep + 11 with 15 reps; the 11 form the canonical test subset).
Both modes share the same recording channels but the per-session response counts differ, so a (cell, stim) pair has
(1, 1)NaN where the channel was in a session that did not play that stim.By default, both stim modes go through the same mel pipeline at
audio_fs=16000/fmax=8000to match the Ahmed 2025 baseline (cochleagram capped at 8 kHz) and to allow two instances of this class (one per stim mode) to be concatenated viadeepSTRF.utils.concat_neural_datasets. Passreturn_waveform=Trueto hand out the raw source waveform per stim (grid-locked to the neural bins) instead, for use with a learnablewav2specmodel front-end.Notes
Phase-1 skeleton — stim and response loading are not yet implemented (will land in subsequent commits). Instantiating with
_enumerate_only=Truepopulatesself.nrn_metaandself.N_neuronsfor inspection.- Parameters:
path (
str, optional) – Path to the unpacked dataset root (the directory containingsessions/andstimuli/). Defaults todefault_cache_dir('Downer2025').stimuli (
{'timit', 'mvocs'}) – Which stim class to load. The two share recording channels but are loaded independently; concatenate two instances if you want both.dt_ms (
float, default5.0) – Neural time-bin width. The paper’s main analysis uses 50 ms; 5 ms matches NS1 and gives users the freedom to re-bin.n_mels (
int, default80) – Number of mel bands. Matches Ahmed 2025’s Kaldi fbank (num_mel_bins=80) by default.compression (
{'cubic', 'log1p', 'none'}, default'log1p') – Spectrogram amplitude compression.log1pmatches Ahmed 2025’s log-mel (Kaldi fbank). The other deepSTRF audio datasets default to'cubic'; we pickedlog1phere to match the paper as closely as possible.spec_zscore (
bool, defaultTrue) – If True, z-score each spectrogram per-band over its own time axis (= Ahmed 2025’snormalize()helper). Boosts contrast in higher-frequency bands that would otherwise be flattened by the log compression.smooth (
bool, defaultTrue) – Hsu 2004 21 ms PSTH smoothing.subset (
{'all', 'estimation', 'test'}) –'estimation'keeps the single-rep stims,'test'keeps the canonical high-rep subset (10 TIMIT IDs or 11 mVocs IDs).animals (
'all'oriterableof{'b','c','f'})areas (
iterableof{'core'|'primary', 'non-primary'}orfine labels) – {‘A1’,’R’,’ML’,’AL’,’CL’,’CPB’,’RPB’}.'primary'is an alias for'core'. None = no filter.sessions (
iterableofsession-id strings, orNone.)audio_fs (
int, default16000) – Common sample rate both stim classes are resampled to before mel. mVocs source is 41 kHz stereo; TIMIT is already 16 kHz.fmax (
int, default8000) – Mel-band high cutoff. Matches Ahmed 2025’s cochleagram.window_ms (
float, default25.0) – FFT analysis-window length in ms (Kaldi default). The window is decoupled from the hop so phonemic detail is preserved regardless ofdt_ms. Earlier versions of this dataset hardcodedn_fft = 10 * hopwhich produced a 500 ms FFT window atdt_ms=50and over-smoothed the spectrogram so badly that a closed-form ridge STRF only reached cc_norm ≈ 0.40 on the well-tuned cohort. Withwindow_ms=25the same fit reaches cc_norm ≈ 0.53 — matching Ahmed 2025’s paper-reported STRF baseline.return_waveform (
bool, defaultFalse) – If True, hand out the raw source waveform per stim as a(1, T_audio)mono tensor ataudio_fsinstead of the Kaldi-fbank spectrogram, for use with a learnablewav2specmodel front-end. The waveform is grid-locked to the neural bins (T_audio = T_neural * hopwithhop = audio_fs * dt_ms / 1000) and right-padded / cropped to the canonical stim length. The source audio is already speech-onset aligned (TIMITbefaftsilence trimmed; mVocs snippet starts at voc onset), so there is no pre-silence offset. Responses are identical to spectrogram mode. Note the defaultaudio_fs=16000is band-limited to 8 kHz — pass a higheraudio_fsif you want a learnable front-end to see energy above the paper’s cochleagram cutoff.download (
bool, defaultFalse) – Fetch the 29 GB Zenodo archive (record 16175377) if missing and unzip it. Idempotent — both the download and the unzip steps are skipped when their outputs already exist._enumerate_only (
bool, defaultFalse) – Phase-1 internal flag. Populatesself.nrn_metaandself.N_neuronsthen returns, skipping stim and response loading. Will be removed once phases 2–3 are in.
- attach_ahmed2025_well_tuned() int[source]
Write the precomputed Ahmed 2025 well-tuned booleans into
nrn_meta.Reads
AHMED2025_WELL_TUNED_TIMIT/..._MVOCS(module constants generated withcompute_paper_tuning(n_resamples=10_000, dt_ms_analysis=50.0, seed=0, alpha=0.05, delta=0.5)on the full 1718-channel population) and tags eachnrn_meta[i]with:ahmed2025_<stim>_well_tuned : bool
where
<stim>matchesself.stimuli('timit'or'mvocs'). Cells not in the precomputed list are taggedFalse.- Returns:
Number of cells flagged
Truein the current selection.- Return type:
int
Notes
For an exact reproduction of the paper’s 404 / 489 well-tuned counts, instantiate at
dt_ms=50.0, smooth=Falseand callcompute_paper_tuning(n_resamples=100_000)— the precomputed lists land at 417 / 476, within ~3 % of the paper.This method does not run any computation; it’s pure metadata attachment.
- compute_paper_tuning(n_resamples: int = 10000, dt_ms_analysis: float = 50.0, seed: int = 0, alpha: float = 0.05, delta: float = 0.5, verbose: bool = True) dict[source]
Replicate Ahmed 2025’s tuned / well-tuned multi-unit criterion.
For each neuron and each of the M test-split stims, the method randomly samples a pair of reps (with replacement across iterations but never the same rep within a pair), concatenates them into long sequences U and V (each of length sum_s T_s_coarse), and computes their Pearson correlation. The null distribution is built the same way but with each V circularly shifted by a random offset before correlating.
n_resamplesiterations yield two empirical distributions per neuron.tuned– one-sided Wilcoxon rank-sum test (true > null) atalpha(default 0.05). Paper target: 1195 (TIMIT) / 1231 (mVocs).well_tuned– additionally requires(mean(true) - mean(null)) / std(null) >= delta(default 0.5). Paper target: 404 (TIMIT) / 489 (mVocs).Writes four floats / booleans to each
nrn_meta[i], prefixed by the current stim mode:ahmed2025_{timit|mvocs}_tuned : bool ahmed2025_{timit|mvocs}_well_tuned : bool ahmed2025_{timit|mvocs}_p_wilcoxon : float ahmed2025_{timit|mvocs}_delta_normalized : float
- Parameters:
n_resamples (
int, default10_000) – Pair-resamplings per neuron. The paper used 100_000; 10_000 is ~10x faster and gives a stable Wilcoxon (the criterion only cares about the rank order of the two distributions).dt_ms_analysis (
float, default50.0) – Coarse bin width for the long-sequence correlations. Must be an integer multiple ofself.dt. Paper used 50 ms for the main results, 20 ms for Fig 4.seed (
int, default0) – RNG seed for reproducibility.alpha (
float) – Significance and effect-size thresholds (see above).delta (
float) – Significance and effect-size thresholds (see above).verbose (
bool, defaultTrue) – Show a tqdm progress bar.
- Returns:
summary – Aggregate counts
{'tuned': int, 'well_tuned': int, 'n_with_data': int, 'stimuli': str}.- Return type:
dict
Notes
Re-bins
self.responsestodt_ms_analysison the fly via summing; ifsmooth=Truewas passed to the constructor the smoothing has already been applied atself.dt. For the strictest paper match instantiate withsmooth=False— though in practice the 21 ms Hanning smoothing × subsequent 50 ms re-binning makes the smoothing nearly invisible.
- class deepSTRF.datasets.audio.EspejoDataset(path: str | None = None, stimuli: Literal['nat', 'vmn'] = 'nat', dt_ms: float = 10.0, subset: Literal['all', 'estimation', 'test'] = 'all', cells: Sequence[str] | None = None, smooth: bool = False, return_waveform: bool = False, audio_fs: int = 44100, download: bool = False)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for Lopez-Espejo et al. (2019) ferret A1 recordings.
Awake, passively-listening adult ferret primary auditory cortex (A1), extracellularly recorded single units. The dataset ships in two disjoint releases (no cell overlap, different stimulus dimensionality — they cannot be concatenated), selected by the
stimuliargument:'nat': 93 3-second natural sounds (animal vocalizations, speech, environmental, music), stored as 18-band gammatone log-spectrograms (NEMS “ozgf”,F=18). ~540 cells across 35 sites in 6 ferrets; each site presents a subset of the stim bank.'vmn': 30 3-second vocalization-modulated noise stimuli (two narrowband noise streams modulated by independent natural-vocalization envelopes), stored as 2-band envelopes (“envelope” stimfmt,F=2). ~200 cells across 103 sites in 5 ferrets.
Both releases sample at 100 Hz (
dt=10 msnative); the on-disk cochleagrams are log-compressed at source. Each occurrence epoch includes the published 0.5 s pre-stim + 0.5 s post-stim silence flanking the 3 s stimulus, so per-stim tensors are(1, F, 500)(NAT) or(1, F, 400)(VMN). The estimation / test split follows the paper’ssplit_by_occurrence_countsand is surfaced via the per-stimn_repeatsandsplitmetadata fields.Data are freely available at https://doi.org/10.5281/zenodo.3445557 (no account required) and auto-fetched with
download=True.Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). Espejo-specific metadata:stim_metadicts holdname,type('nat'/'vmn'),n_repeats,split('test'/'estimation'),duration_sandn_samples.nrn_metadicts holdcell_id,site,animal_id,channel,unitandexperiment_set('nat'/'vmn').unitcan beNonefor VMN cells (2-segment cellids).
The
(1, 1)NaN sentinel marks(stim, neuron)pairs the cell was not recorded for (different sites present different stim subsets). Only the pre-computed cochleagrams are in the Zenodo deposit (the raw NAT waveforms are mirrored on the LBHB bitbucket); the loader fixesdt_ms = 10.References
Lopez Espejo, Schwartz & David (2019). “Spectral tuning of adaptation supports coding of sensory context in auditory cortex.” PLoS Computational Biology 15(10): e1007430. https://doi.org/10.1371/journal.pcbi.1007430
- Parameters:
path (
str, optional) – Path to the Espejo data folder (containingA1_natural_sounds/and / orA1_voc_mod_noise/). Defaults to the platformdirs cache ($DEEPSTRF_DATA_DIRoverrides).stimuli (
{'nat', 'vmn'}) – Which release to load. The two are mutually exclusive (disjoint cells, different F); to use both, instantiate twice and keep them separate.dt_ms (
float, default10.0) – Time-bin width in ms. Currently fixed at 10 ms — the on-disk cochleagrams are precomputed at fs=100 Hz, and the response rasterizer aligns to that grid.subset (
{'all', 'estimation', 'test'}, default'all') – If ‘estimation’ or ‘test’, only that stim subset is kept. Split follows the paper’ssplit_by_occurrence_counts: test = stims at max repetition count per site; estimation = stims at lower repetition counts.cells (
sequenceofstr, optional) – Whitelist of cell IDs to include (intersection with what’s on disk). None keeps all.smooth (
bool, defaultFalse) – If True, smooth PSTHs with a 21 ms Hanning window (Hsu / Borst / Theunissen 2004). Off by default — Espejo is typically used as-is.return_waveform (
bool, defaultFalse) – If True (stimuli='nat'only), hand out the raw natural-sound waveform per stim as a(1, T_audio)mono tensor ataudio_fsinstead of the precomputed ozgf cochleagram, for use with a learnablewav2specmodel front-end. The raw wavs are not in the Zenodo deposit — they are fetched from the LBHB baphy bitbucket mirror (see the README) and cached under<path>/nat_waveforms/. The 4 s sound is inset at the published 0.5 s pre-stim silence offset so it grid-locks to the cochleagram frames (T_audio = T_neural * hop,hop = audio_fs·dt/1000). Responses are identical to cochleagram mode. VMN is unsupported (its stimuli are synthesized 2-band envelopes with no raw audio).audio_fs (
int, default44100) – Sample rate for waveform mode (the native rate of the mirror wavs; ignored — and reported asNone— in cochleagram mode). 44.1 kHz grid-locks at dt=10 ms (hop=441).download (
bool, defaultFalse) – If True and the data is missing underpath, fetch the requested archive from Zenodo (record 3445557) and untar in place.
- class deepSTRF.datasets.audio.Le2025Dataset(path: str | Path | None = None, experiment: str = 'nat8b', dt_ms: float = 5.0, n_bands: int = 50, fmin: float = 1000.0, fmax: float = 8000.0, window_ms: float = 2.5, smooth: bool = True, keep_areas: Sequence[str] | None = None, compute_reliability: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]
Bases:
AudioNeuralDatasetdeepSTRF wrapper for one sub-experiment of Le, Bjoring & Meliza (2025).
Instantiate one per experiment (
"nat8a"|"nat8b"|"synth8b") and concatenate withconcat_neural_datasets(ords_a + ds_b) for the union; the three experiments share no stimuli, so the bidirectional selection rule in the base class hides cross-experiment NaN-only entries automatically.Layout on disk (as shipped on figshare):
<path>/ ├── metadata/ │ ├── recordings.csv area for nat8a-beta / nat8b / synth8b │ ├── song-birds.csv motif name mapping + CI timings (ms) │ └── ephys-birds.csv cohort, sex, age, familiarity group ├── nat8a-alpha-responses/ cohort 1 (familiarity manipulation) ├── nat8a-beta-responses/ cohort 2 ├── nat8a-stimuli/ shared by alpha + beta ├── nat8b-responses/ cohort 3 (natural-syntax) ├── nat8b-stimuli/ ├── synth8b-responses/ cohort 3 (scrambled-syntax) └── synth8b-stimuli/
Two pprox schemas coexist in the archive: the legacy
spec:2/pprox(used only by nat8a-alpha; spike times in ms,conditionfield encodes variant) andspec:2/stimtrial(everything else; spike times in s, stimulus dict carries the full filename stem). Both are handled below.- Parameters:
path – Filesystem path to the unpacked figshare archive (the directory that contains
metadata/,*-responses/,*-stimuli/).experiment – One of
"nat8a"|"nat8b"|"synth8b".nat8aunifies the two cohorts (alpha + beta) that share the same stim set; useselect_pop_by_nrn_attr("cohort", 1)to restrict to the familiarity sub-experiment.dt_ms – Bin width in ms. Default 5; pass
dt_ms=1for paper-faithful spectrogram + response binning (the paper uses 1 ms throughout).n_bands – Number of gammatone bands. Default 50, matching the paper.
fmin – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.
fmax – Low / high edges of the gammatone filter bank, in Hz. Defaults 1000 / 8000 — the paper’s range.
window_ms – Gammatone analysis-window width, in ms. Default 2.5, matching the paper. Hop is
dt_ms.smooth – If True (default), apply a 21 ms Hanning smoother to all PSTHs (Hsu / Borst / Theunissen 2004).
keep_areas – Optional iterable of area strings to filter on (per the per-unit
areametadata field; values vary by cohort).compute_reliability – If True (default), pre-compute per-neuron Sahani–Linden signal power, noise power, and SNR (length-weighted across stims) and attach them to
nrn_meta. Set toFalsefor fast iteration when reliability filtering is not needed.download – If
Trueandpath=None, fetches the ~105 MB figshare archive (DOI10.6084/m9.figshare.29203457) into the deepSTRF cache and unpacks it. Idempotent: skips the download if the unpacked tree is already present.
Notes
- Stim-side metadata fields:
name: filename stem.motif: e.g."B189"(nat8a) or"nat8mk0"(nat8b).critical_intervalint(1/2) for per-CI variants,Nonefor
CandCM(CI-independent), or a string like"2a"for synth8b.
variant: one ofVARIANTS.syntax:"natural"or"scrambled".experiment:"nat8a"|"nat8b"|"synth8b".sample_rate_hz: native sample rate of the source wav.duration_s: duration of the source wav, in seconds.ci_onset_s/ci_offset_scritical-interval bounds in seconds(NaN for C/CM and for synth8b — the CI table only covers nat8a/nat8b).
- Per-neuron metadata fields:
cell_id,animal_id,animal_uuid,cohort,experiment,area,hemisphere,familiar_motifs(list of motif IDs the bird was reared with; empty unless cohort 1),sex,age_days,pprox_file.
- select_critical_interval(ci) List[int][source]
Restrict iteration to one CI index (or
Nonefor C/CM variants).
- select_restoration_quartet(motif: str, ci, variants: Sequence[str] = ('C', 'CB', 'GB', 'GM')) List[int][source]
Select the stim set used in the paper’s core restoration analysis for one (motif, CI). Defaults to C / CB / GB / GM — the four trajectories compared in Fig. 4. Returns the selected stim indices.
The
Ccontinuous andCMmasked variants are CI-independent and are kept whenever they appear invariants, regardless ofci.
- class deepSTRF.datasets.audio.NAT4Dataset(path: str | None = None, area: str = 'A1', dt_ms: float = 10.0, smooth: bool = False, download: bool = False, subset: str = 'all', return_waveform: bool = False, audio_fs: int = 44100)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for NAT4 (Pennington & David, 2022 / 2023).
Two cortical areas:
A1(primary, 849 cells of which 777 auditory) andPEG(secondary, 398 of which 339 auditory). Passarea=...; one instance covers one area. To pool both, instantiate twice andconcat_neural_datasets([a1, peg]).There are 595 stimuli total: 18 high-rep (
val, 20 trials) + 577 low-rep (est, 1 trial), each clip 1.5 s. The default time bin isdt_ms = 10(the population recording is precomputed at fs=100 withvalpre-averaged over 20 reps; per-site spike trains are at fs=1000 and downsampled to 10 ms by summing). The spectrogram hasF = 18ozgf bands andT = 150frames per stim.The loader reads the published NAT4 archive directly with native CSV / JSON / HDF5 parsers — no NEMS0 dependency. Data are freely available at https://doi.org/10.5281/zenodo.8044773 (no account required) and auto-fetched by
NAT4Dataset(download=True).Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). NAT4-specific metadata:stim_metadicts holdnameandsubset('est'or'val'); thesubset='all'|'est'|'val'constructor argument filters this list at load time.nrn_metadicts holdcell_id(raw NEMS id, e.g.'ARM029a-01-1'),area,auditory(flag from the dataset’s<area>_pred_correlation.csv), and the parsed componentssite(e.g.'ARM029a'),animal(3-char site prefix, e.g.'ARM'),electrode(int) andunit_in_electrode(int). Components default toNonefor any cell whose id does not match the standard<site>-<elec>-<unit>scheme.
estresponses have shape(R=1, T=150)andvalresponses(R=20, T=150); the(1, 1)NaN sentinel marks(stim, neuron)pairs where the cell was not recorded for that stim.With
return_waveform=True,stimsare instead the raw mono waveforms(1, T_audio = T * hop)ataudio_fs(hop=441 at 44.1 kHz / 10 ms) — feed them through a model’swav2specslot.References
Pennington & David (2022, preprint). “Can deep learning provide a generalizable model for dynamic sound encoding in auditory cortex?”
Pennington & David (2023). “A convolutional neural network provides a generalizable model of natural sound coding by neural populations in auditory cortex.” PLOS Computational Biology.
- Parameters:
path (
str, optional) – Path to the NAT4 data folder. Defaults to the platformdirs cache.area (
{'A1', 'PEG'}) – Cortical area.dt_ms (
float, default10.0) – Time-bin width in ms. Currently must equal 10.0; the population recording is precomputed at fs=100 and the per-site downsampling assumes a fixed 10x ratio from fs=1000.smooth (
bool, defaultFalse) – If True, smooth PSTHs with a 21 ms Hanning window. Off by default here because NAT4 trials are typically used as-is for STRF fitting (unlike CRCNS-AA where smoothing is the published norm).download (
bool, defaultFalse) – If True and the data is missing underpath, fetch it from Zenodo (record 8044773).subset (
{'all', 'est', 'val'}, default'all') – If ‘est’ or ‘val’, only that stimulus subset is loaded —stim_meta/stims/responsesshrink accordingly, and the (more expensive) per-site spike-time pass is skipped entirely undersubset='est'. The two subsets correspond to Pennington & David’s published estimation set (575 stims, R=1, from the population recording) and validation set (18 stims, R=20, from the per-site recordings) respectively. Note that 33 of the 849 A1 cells have no val data — undersubset='val'their responses are full NaN sentinels; pair the constructor arg withds.select_pop_by_stim_attr('subset', 'val')to drop them automatically (idiomatic alternative:ds.select_stims_by_attr('subset', 'val')— which leaves the full stim bank loaded but applies the bidirectional rule, so cells without val data are hidden from__getitem__).return_waveform (
bool, defaultFalse) – If True, each stimulus is the raw mono waveform(1, T_audio)ataudio_fsHz instead of the precomputed ozgf cochleagram. The 593 source .wav files (44.1 kHz, 1 s of sound) are read from<path>/wav/and embedded in the 1.5 s trial window at the recording’s pre-silence offset, then grid-locked toT_audio = T_neural * hop(hop = audio_fs * dt_ms / 1000). Feed it through a model’swav2specslot (e.g.CausalGammatoneto reproduce the native ozgf front-end). Passdownload=Trueto also fetchwav.zipfrom Zenodo.audio_fs (
int, default44100) – Audio sample rate forreturn_waveform=True. The default 44.1 kHz is the native rate of the NAT4 wavs and gives an exact integerhop = 441atdt_ms = 10(no resampling). Choose any rate makingaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.
- class deepSTRF.datasets.audio.NS1Dataset(path: str | None = None, dt_ms: float = 5.0, smooth: bool = True, download: bool = False, return_waveform: bool = False, audio_fs: int = 48000)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for the NS1 (Harper et al. 2016, Rahman et al. 2020) data.
119 multi/single units from primary auditory cortex (A1) of deeply anesthetized ferrets, recorded in response to 20 natural sound clips of 4.995 s each, presented 20 times per neuron. Every neuron heard every clip, so the response grid is fully dense (no NaN sentinels).
Of the 119 units, 73 pass the “single-unit at known depth” filter the original authors used (
single_t in {'Yes', 'Maybe'}anddepth >= 0);select_pop_by_nrn_attr()oversingle_t/depth_umreproduces this subset.The spectrogram tensor is precomputed at
dt = 5 ms(F = 34frequency bands,T = 999bins); thedt_msconstructor argument is currently validated against this resolution. Withreturn_waveform=True,stimsare instead raw mono waveforms(1, T_audio)ataudio_fs(aligned toT_audio = T_neural * audio_fs * dt_ms / 1000) — feed them through a model’swav2specfront-end.Data are freely available (no account required) and auto-fetched by
NS1Dataset(download=True):https://osf.io/ayw2p/ — metadata, raw spike and wav data.
https://github.com/monzilur/DNet — precomputed 5 ms mel spectrogram.
Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). NS1-specific metadata:stim_metadicts holdnameandtype.nrn_metadicts holdcell_id,area,depth_um,noise_ratio,single_n,single_t,n_electrodesandelectrode_number.noise_ratiois the Sahani-Linden normalised noise power (lower = cleaner; NOT an SNR despite the legacy.matfield name).single_nis the single-unit flag from spike-snippet clustering (0/1);single_tis the manual triage label (‘Yes’/’Maybe’/’No’).
References
Harper et al. (2016). “Network receptive field modeling reveals extensive integration and multi-feature selectivity in auditory cortical neurons.” PLoS Computational Biology.
Rahman et al. (2020). “Simple transformations capture auditory input to cortex.” PNAS.
- Parameters:
path (
str, optional) – Path to the NS1 data folder containingtest_data_5ms.mat,MetadataSHEnCneurons.mat, andspikesandwav/. IfNone, defaults to the platformdirs cache (user_cache_dir('deepSTRF') / 'NS1'— overridable via$DEEPSTRF_DATA_DIR).dt_ms (
float, default5.0) – Time-bin width in ms. Must equal 5.0 — the bundled spectrogram is precomputed at this resolution. Other values would require re-spectrogramming the wavs (not implemented).smooth (
bool, defaultTrue) – If True, smooth PSTHs in place with a 21 ms Hanning window (Hsu, Borst & Theunissen 2004).download (
bool, defaultFalse) – If True and the data assets are missing underpath, fetch them from their public sources (no account required) — OSF (https://osf.io/ayw2p/: metadata + spike data + wavs) and DNet GitHub (https://github.com/monzilur/DNet: the precomputed 5 ms mel-spectrogram tensortest_data_5ms.mat). Total ~160 MB, ~16 s on a fast connection. Seedownload_ns1().return_waveform (
bool, defaultFalse) – If True,self.stimsholds raw audio waveforms instead of precomputed spectrograms. Eachself.stims[s]is a(1, T_audio)float32 tensor ataudio_fsHz, downmixed to mono, resampled from the native 48 828.125 Hz, and right-cropped / zero-padded to exactlyT_neural * audio_fs * dt_ms / 1000samples so it aligns with the 4.995 s response window. Pair with a model that has awav2specfront-end (seedeepSTRF.models.wav2spec).audio_fs (
int, default48000) – Sample rate (Hz) for waveform mode. Default 48 kHz gives a clean 240 samples / 5-ms bin and a Nyquist of 24 kHz — enough to preserve the ~22.6 kHz content used in Rahman et al. 2019’s cochleagram. (Native is 48 828.125 Hz; the small downsample keeps an integer sample-per-bin factor.) Ignored whenreturn_waveform=False.
- class deepSTRF.datasets.audio.Wingert2026Dataset(path: str | None = None, area: None | str | Iterable[str] = None, site: None | str | Iterable[str] = None, dt_ms: float = 10.0, subset: str = 'all', smooth: bool = False, log_compress: bool = True, log_offset: float = -1.0, download: bool = False, include_unlabeled: bool = False, return_waveform: bool = False, audio_fs: int = 44100, prestim_ms: float = 1000.0, _enumerate_only: bool = False)[source]
Bases:
AudioNeuralDatasetPyTorch dataset for Wingert et al. 2026 (Nat Neurosci).
A high-density ferret auditory-cortex recording library: 2 128 A1 + 746 PEG + 217 AC + 37 HC single units across 67 recording sites (68 cell_list
siteidgroups, since SLJ032a’s two-probe recording contributes two siteids — A-probe'SLJ032a'and B-probe'SLJ032a-B'). Stimuli are 20–22 s sequences of crossfaded natural sound segments (Audioset Core 3 Complete + Pro Sound Effects), each site presents ~100 estimation stims (single-rep) and 1–6 test stims (R ranging from 5 to 30 across sites).The release ships gammatone-gram spectrograms (“cochleagrams”) precomputed at fs = 100 Hz (10 ms bins), F = 32 log-spaced bands from 200 Hz to 20 kHz. The values in
stim.h5are the raw (linear) gammatone-gram; the loader reproduces the paper’s preprocessing on top of them — log compressionlog(10·x + 1)then per-band minmax to[0, 1](seelog_compressargument). Responses are per-neuron minmax-normalised. This matchesaud_subspace_fit_demo.ipynb(NEMSlog_compress+normalize('minmax')) to float32 precision. Two stim-duration cohorts coexist in the released data:47 sites at
T = 2000bins (20 s, no silence flanks);21 sites at
T = 2200bins (22 s = 1 s pre + 20 s sound + 1 s post).
The deepSTRF data paradigm supports ragged T natively — the per-stim tensor keeps its own time length and collate zero-pads on the right.
The loader reads the published archive directly with native CSV / JSON / HDF5 parsers — no
nems0dependency. Data are open access at https://doi.org/10.5281/zenodo.18331549 and auto-fetched byWingert2026Dataset(download=True).Notes
Follows the standard deepSTRF data paradigm (see
docs/_source/md/data_paradigm.md). Wingert-specific metadata:stim_metadicts holdname(e.g.'STIM_seq0032.wav'),subset('est'forSTIM_seq*,'val'forSTIM_00*), andsite(the cell_list-canonical site id this stim was presented at). The same source wav can appear under multiple(name, site)pairs because each session re-rasterizes its own copy and the two duration cohorts produce different-shape tensors.nrn_metadicts holdcell_id,site(fromcell_list.csv, authoritative),area,layer,depth,narrow,celltype,sw,goodpred, and the parsedanimal/electrode/unit_in_electrodecomponents.
The published cell counts hold whenever the cohort uses the standard A1 + PEG filter; AC and HC are exposed but documented as less-curated.
References
Wingert et al. (2026). “Convolutional neural network models describe the encoding subspace of local circuits in auditory cortex.” Nature Neuroscience. https://doi.org/10.1038/s41593-026-02216-0
- Parameters:
path (
str, optional) – Path to the unpacked dataset root (the directory containingrecordings/andcell_list.csv). Defaults todefault_cache_dir('Wingert2026').area (
stroriterableofstr, optional) – Restrict to one or more cortical areas: any of'A1','PEG','AC','HC'.None(default) loads every area-labelled cell; cells witharea=NaNincell_list.csv(131 cells, presumably sort-failed) are always excluded.site (
stroriterableofstr, optional) – Restrict to one or more cell_listsiteidvalues (e.g.'CLT027c','SLJ032a-B','PRN018a').None(default) loads every site that survives theareafilter.dt_ms (
float, default10.0) – Time-bin width in ms. Currently must equal 10.0 — the published gammatone-gram is precomputed at fs = 100 and a future down-binning helper is out of v1 scope.subset (
{'all', 'est', 'val'}, default'all') –'est'keeps only the single-repSTIM_seq*estimation stims;'val'keeps only the high-repSTIM_00*test stims. The bidirectional select rule applies — cells whose site did not present any retained stim are masked out of__getitem__automatically.smooth (
bool, defaultFalse) – If True, smooth PSTHs with a 21 ms Hanning window viaself.smooth_responses(window_ms=21.0).log_compress (
bool, defaultTrue) – If True, apply the David-lab log compressionlog((x + d) / d)withd = 10**log_offsetto the raw (linear) gammatone-gram before normalisation, reproducing thenems.preprocessing.normalization.log_compressstep in the paper’s pipeline. Set False to feed the raw linear gtgram.log_offset (
float, default-1.0) – Offset exponent forlog_compress(d = 10**log_offset). The paper uses-1(i.e.d = 0.1, so the transform islog(10·x + 1)). Ignored whenlog_compress=False.download (
bool, defaultFalse) – If True, fetchrecordings.zip+cell_list.csvfrom Zenodo (record18331549) if missing. The 8 GBwav.zipis NOT fetched (the loader uses the precomputed gtgrams instim.h5).include_unlabeled (
bool, defaultFalse) – If True, also include the 131 cells incell_list.csvthat lack an area label (and therefore also lacklayer/depth/narrow/celltype). These come from three otherwise-unrepresented PRN sessions (PRN010b, PRN011b, PRN020b) and havearea=None,layer=None,depth=None, etc. innrn_meta.goodpredis still populated. The defaultFalsematches the paper’s analysis cohort.return_waveform (
bool, defaultFalse) – If True, each stimulus is the raw mono waveform(1, T_audio)ataudio_fsHz instead of the precomputed gammatone-gram. The sourceseq*.wavfiles (44.1 kHz) are read from<path>/wav/and inset at the recording’sprestim_mspre-silence offset inside the trial window, then grid-locked toT_audio = T_neural * hop(hop = audio_fs * dt_ms / 1000). Feed it through a model’swav2specslot (e.g.CausalGammatoneto reproduce the native front-end). Passdownload=Trueto also fetchwav.zipfrom Zenodo.audio_fs (
int, default44100) – Audio sample rate forreturn_waveform=True. The default 44.1 kHz is the native rate of the source wavs and gives an exact integerhop = 441atdt_ms = 10(no resampling). Choose any rate makingaudio_fs * dt_ms / 1000an integer. Ignored unlessreturn_waveform=True.prestim_ms (
float, default1000.0) – Pre-stimulus silence (ms) before the sound onset in the trial window, used only inreturn_waveform=Trueto inset the wav so it aligns with the gammatone-gram frames (= response bins). The default 1000 ms (= 100 bins at dt=10 ms) was recovered empirically and is constant across all sites (the gtgram’s leading silence is not in the epoch table). Ignored unlessreturn_waveform=True._enumerate_only (
bool, defaultFalse) – Internal flag for tests: populatenrn_metaandN_neuronsonly, skip the (~1 minute) per-site .tgz read pass. Subclasses of this loader should not rely on it.
- deepSTRF.datasets.audio.download_ac1(dest: str | None = None, *, username: str | None = None, password: str | None = None) str[source]
Fetch the three CRCNS-AC1 archives from the NERSC mirror.
Requires a free CRCNS account (https://crcns.org/register). Credentials can be passed explicitly or sourced from
$CRCNS_USERNAME/$CRCNS_PASSWORD. Idempotent: skips archives that already exist on disk; extraction is handled lazily on first dataset instantiation.Returns the destination directory.
- deepSTRF.datasets.audio.download_alice_eeg(dest: str | None = None) str[source]
Download Brodbeck’s restructured Alice EEG release from UMd DRUM.
Idempotent: skips any zip that’s already on disk and any subdirectory that’s already unpacked. Returns the dataset directory.
- Parameters:
dest (
str, optional) – Defaults to the platformdirs cache (overridable via$DEEPSTRF_DATA_DIR).
Notes
~2.5 GiB total across four zips. Anonymous HTTPS; no auth.
- deepSTRF.datasets.audio.download_downer2025(dest: str | None = None) str[source]
Download the Downer 2025 / Ahmed 2025 archive from Zenodo.
The archive (
auditory_cortex_data.zip, ~29 GB) ships in Zenodo record10.5281/zenodo.16175377and unzips to<dest>/auditory_cortex_data/— the standard layout theDowner2025Datasetconstructor expects.- Parameters:
dest (
str, optional) – Parent directory the archive is downloaded into. Defaults todefault_cache_dir('Downer2025')(overridable via$DEEPSTRF_DATA_DIR).- Returns:
Path to the extracted
auditory_cortex_data/directory.- Return type:
str
Notes
Idempotent: skips the zip download if already present, and skips the unzip step if the expected
auditory_cortex_data/sessions/subdirectory already exists.Heads up: the archive is ~29 GB on disk; the unpacked directory is also ~29 GB. Allow ~60 GB total during extraction (zip plus contents); you can delete the zip once unpacking is complete.
- deepSTRF.datasets.audio.download_espejo(stimuli: str, dest: str | None = None) str[source]
Download one Espejo stimuli set from Zenodo into
dest.- Parameters:
stimuli (
{'nat', 'vmn'})dest (
str, optional) – Defaults todefault_cache_dir('Espejo')(overridable via$DEEPSTRF_DATA_DIR).
- Returns:
The dataset root directory.
- Return type:
str
Notes
Idempotent: skips the archive if already present, and skips the untar step if the expected
<subdir>/already exists. NAT is ~638 MB, VMN is ~25 MB.
- deepSTRF.datasets.audio.download_espejo_nat_waveforms(names: Sequence[str], dest: str | None = None, *, progress: bool = True) Dict[str, str][source]
Fetch the raw NAT waveforms from the LBHB baphy bitbucket mirror.
- Parameters:
names (
sequenceofstr) – Stim names as they appear instim_meta(STIM_<file>.wav); theSTIM_prefix is stripped to get the on-mirror filename.dest (
str, optional) – Parent directory; wavs are cached under<dest>/nat_waveforms/. Defaults todefault_cache_dir('Espejo').progress (
bool, defaultTrue) – Show a tqdm bar over the (missing) downloads.
- Returns:
name -> local wav pathfor every name found on the mirror.- Return type:
dict
Notes
Idempotent — already-cached wavs are skipped. Each filename is tried in
sounds_set3/first, thensounds/. Names found in neither are collected and surfaced by the caller (the dataset raises on genuine misses so waveform mode never silently substitutes silence).
- deepSTRF.datasets.audio.download_wingert2026(dest: str | None = None, wav: bool = False) str[source]
Download the Wingert 2026 release from Zenodo into
dest.Fetches
recordings.zip(~4.35 GB of per-site .tgz archives, the only large file the spectrogram loader needs) andcell_list.csv(~5.4 MB of per-cell metadata). Does NOT fetchmodels.zip(published CNN / LN / subspace fits, not used by deepSTRF).Idempotent — skips files / dirs that already exist.
- Parameters:
dest (
str, optional) – Defaults todefault_cache_dir('Wingert2026')(overridable via$DEEPSTRF_DATA_DIR).wav (
bool, defaultFalse) – If True, also fetch and unpackwav.zip(~3.7 GB of source waveforms, 44.1 kHz) into<dest>/wav/for the raw-waveform branch (Wingert2026Dataset(return_waveform=True)). The spectrogram-mode loader does not need it.
- Returns:
The destination directory.
- Return type:
str