CRCNS AA2 Dataset
Dataset Source: AA2 Dataset
Citation:
Theunissen, Frederic E.; Gill, Patrick; Noopur, Amin; Zhang, Junli; Woolley, Sarah M. N.; Fremouw, Thane (2011): Single-unit recordings from multiple auditory areas in male zebra finches. CRCNS.org.
http://dx.doi.org/10.6080/10.6080/K0JW8BSC
Papers Using the Dataset:
“Sound representation methods for spectro-temporal receptive field estimation” (2006) by Patrick Gill, Junli Zhang, Sarah M. N. Woolley, Thane Fremouw and Frédéric E. Theunissen
“Role of the Zebra Finch Auditory Thalamus in Generating Complex Representations for Natural Sounds” (2010) by Noopur Amin, Patrick Gill, Frederic E Theunissen
Dataset Details
Population fitting: ✅
Description of Stimuli:
72 clips of conspecific vocalizations, 20 clips of flat ripples, and 25 clips of song ripples up to 5 s duration.
Sample rate @ 32 kHz and 16 bit precision
Up to 10-20 response trials for a given stimulus
Description of Neurons:
Extracellular single-unit recordings from 57 male zebra finches
Total Number of Neurons: 494
143 mld
59 OV
189 L
17 L1
53 L2a
42 L2b
43 L3
31 others (“L”)
37 CM
66 others (“None”)
Available data:
Full Python preprocessing.
One very simple .txt file for each cell (unit) response:
Spike timestamp relative to stimulus onset
Each line corresponds to one response trial
Processing needed (Dataset class init() method):
Transforming the sound waveform (.wav file) into a 32-band spectrogram.
Choosing neurons based on their recording site, stimulus type, and animal.
Transforming the spike times of each repeat of each stimulus into PSTHs
Remove pre-onset spikes
Align trials temporally
Pad/cut to the right (present/future time steps) so that trials have the same duration
Setup
Requirements: a CRCNS account.
Easiest path — auto-download via the CRCNS NERSC mirror:
from deepSTRF.datasets.audio import CRCNSAA2Dataset
ds = CRCNSAA2Dataset(
download=True, dt_ms=5,
crcns_username="your_username",
crcns_password="your_password",
)
Alternatively set $CRCNS_USERNAME / $CRCNS_PASSWORD. Default cache dir
is platformdirs.user_cache_dir('deepSTRF')/CRCNS_AA2,
overridable via $DEEPSTRF_DATA_DIR. download=True is idempotent.
If you already have the data laid out manually:
Download from the dataset page.
Extract
all_stims/andall_cells/into adata/folder.ds = CRCNSAA2Dataset('/path/to/data', dt_ms=5).
Filtering
Each stim_meta dict carries name (stimulus identifier), type
("conspecific" or "songrip" — the latter is reversed-song / pitch-shifted
controls), sample_rate, n_samples, duration_s (last three from
data/stim_data.csv). Each nrn_meta dict carries cell_id (the
raw cell name from the dataset), animal_id, area ("MLd", "OV",
"L", "CM", or one of the smaller secondary areas — see AA1 for the
parsing details), cell_seq (within-animal cell index), and rig (often
None in AA2).
The full selection API from the data paradigm doc
is available on AA2: filter neurons by metadata (select_pop_by_nrn_attr)
or by stim coverage (select_pop_by_stim_attr), filter stims by metadata
(select_stims_by_attr), and rely on the bidirectional rule so that
narrowing the stim space automatically hides cells with no responses
left in it.