CRCNS AA1 Dataset
Dataset Source: AA1 Dataset
Citation:
Theunissen, FE; Hauber, ME; Woolley, SMN; Gill, P; Shaevitz, SS; Amin, Noopur; Hsu, A; Singh, NC; Grace, GA; Fremouw, T; Zhang, Junli; Cassey, P; Doupe, AJ; David, SV (2009): Single-unit recordings from two auditory areas in male zebra finches. CRCNS.org.
http://dx.doi.org/10.6080/K0F769GP
Papers Using the Dataset:
“Tuning for Spectro-temporal Modulations: a Mechanism for Auditory Discrimination of Natural Sound” by Woolley S., Fremouw T., Hsu A. and Theunissen F.E.
“Modulation and phase spectrum of natural sounds enhance neural discrimination performed by single auditory neurons” by Hsu A., Woolley S., Fremouw T. and Theunissen F.E.
“Modulation spectra of natural sounds and ethological theories of auditory processing” by Singh N. and Theunissen F.E.
Dataset Details
Population fitting: ✅
Description of Stimuli:
10 clips of conspecific vocalizations and 20 clips of flat ripples, up to 5 s duration.
Sample rate @ 32 kHz and 16 bit precision
Up to 10 response trials for a given sound
Description of Neurons:
Total Number of Neurons: 100 (50 MLd + 50 Field L)
Extracellular single-unit recordings from anesthetized male zebra finches
Available data:
Full Python preprocessing.
One very simple .txt file for each cell (unit) response:
Spike timestamp relative to stimulus onset
Each line corresponds to one response trial
Processing needed (Dataset class init() method):
Transforming the sound waveform (.wav file) into a 32-band spectrogram.
Choosing neurons based on their recording site and stimulus type.
Transforming the spike times of each repeat of each stimulus into PSTHs
Remove pre-onset spikes
Align trials temporally
Pad/cut to the right (present/future time steps) so that trials hvae the same duration
Benchmark results
Area |
Model backbone |
Rank |
Remarks |
Params / nrn |
Perfs |
Paper (backbone) |
|---|---|---|---|---|---|---|
Field L |
StateNet |
🥇 |
GRU, pop |
24,900 |
/ 71.0 |
|
Transformer |
🥈 |
pop |
29,109 |
/ 65.5 |
||
2D-CNN |
🥉 |
pop |
26,915 |
/ 65.0 |
||
MLd |
StateNet |
🥇 |
Mamba, pop |
32,334 |
/ 73.4 |
|
2D-CNN |
🥈 |
pop |
29,109 |
/ 68.9 |
||
Transformer |
🥉 |
pop |
34,475 |
/ 68.3 |
Setup
Requirements: a CRCNS account (the data host requires login).
Easiest path — auto-download via the CRCNS NERSC mirror:
from deepSTRF.datasets.audio import CRCNSAA1Dataset
ds = CRCNSAA1Dataset(
download=True, dt_ms=5,
crcns_username="your_username",
crcns_password="your_password",
)
Alternatively, set $CRCNS_USERNAME / $CRCNS_PASSWORD in the env and
omit the credential kwargs. Default cache dir is
platformdirs.user_cache_dir('deepSTRF')/CRCNS_AA1, overridable via
$DEEPSTRF_DATA_DIR. download=True is idempotent.
If you already have the data laid out manually:
Download
crcns-aa1.zipat the original dataset repository.Extract
all_stims/,Field_L_cells/,MLd_cells/into adata/folder.ds = CRCNSAA1Dataset('/path/to/data', dt_ms=5).
Filtering
Each stim_meta dict carries name, type ("conspecific" or
"flatrip"), sample_rate, n_samples, duration_s. Each
nrn_meta dict carries cell_id, area ("Field_L" or
"MLd"), animal_id, cell_seq, rig. Combined with the
base-class selection API:
ds.select_pop_by_nrn_attr("area", "MLd") # only MLd cells
ds.select_stims_by_attr("type", "conspecific") # only conspecific stims
# (auto-hides 2 cells with
# no conspecific data)