CRCNS AA4 Dataset
Dataset Source: AA4 Dataset
Citation:
Elie J E and Theunissen F E (2019), Simultaneous extracellular recordings of avian auditory neurons in zebra finches presented with all the repertoire of vocalizations used by this species for vocal communication. CRCNS.org.
http://dx.doi.org/10.6080/K00C4T06
Papers Using the Dataset:
“Meaning in the avian auditory cortex: Neural representation of communication calls” (2015) by Julie Elie and Frédéric Theunissen
“Invariant neural responses for sensory categories revealed by the time-varying information for communication calls” (2019) by Julie Elie and Frédéric Theunissen
Dataset Details
Population fitting: ✅
Batching: ✅
Description of Stimuli: A total of 170 different clips of conspecific vocalizations (songs and calls) and clips of artificial ripple noise, up to 3 s duration.
Sample rate @ 24.4 kHz
Transformed into 32-band mel spectrograms followed by a compression function
Around 10 response trials on average for a given stimulus
Description of Neurons:
Extracellular single-unit recordings from 4 male and 2 female zebra finches
anesthetized subjects
Total Number of units: 1401 (including 914 single units)
Targeted avian auditory cortical regions included:
Field L (including the thalamo recipient L2, the primary auditory regions L1 and L3),
caudolateral and caudomedial mesopallium (CLM and CMM),
caudomedial nidopallium (NCM)
However, neurons were not individually assigned one of these specific regions.
Animal |
Sex |
#units |
#stims |
|---|---|---|---|
BlaBro09xxF |
F |
151 |
130 |
GreBlu9508M |
M |
355 |
130 |
LblBlu2028M |
M |
53 |
137 |
WhiBlu5396M |
M |
198 |
73 |
WhiWhi4522M |
M |
304 |
131 |
YelBlu6903F |
F |
282 |
129 |
Available data:
Full Python preprocessing.
One folder for each animal subject, containing several .h5 files of neural recordings (one for each unit)
Processing needed (Dataset constructor):
Transforming the sound waveform (.wav file) into a 32-band spectrogram.
Choosing neurons based on stimulus type and animal.
Transforming the spike times of each repeat of each stimulus into PSTHs
Remove pre-onset spikes
Align trials temporally
Pad/cut to the right (present/future time steps) so that trials have the same duration
Setup
Requirements: a CRCNS account.
Easiest path — auto-download via the CRCNS NERSC mirror:
from deepSTRF.datasets.audio import CRCNSAA4Dataset, AA4_ANIMAL_IDS
ds = CRCNSAA4Dataset(
download=True, dt_ms=5,
crcns_username="your_username",
crcns_password="your_password",
)
Alternatively set $CRCNS_USERNAME / $CRCNS_PASSWORD. Default cache dir
is platformdirs.user_cache_dir('deepSTRF')/CRCNS_AA4, overridable via
$DEEPSTRF_DATA_DIR. download=True is idempotent.
If you already have the data laid out manually, the data/ folder should
look like this:
data/
|____ BlaBro09xxF/
|____ GreBlu9508M/
|____ LblBlu2028M/
|____ WhiBlu5396M/
|____ YelBlu6903F/
|______ ...
ds = CRCNSAA4Dataset('/path/to/data', stimuli=('song', 'call'),
animals=(AA4_ANIMAL_IDS[0],))
Filtering
Each stim_meta dict carries name (the stimulus md5 — the canonical
identifier; the wav filename is per-animal and not unique across the
corpus), type (e.g. "song", "call"), class (broader category), and
duration_s. Each nrn_meta dict carries cell_id (the basename
of the source h5 file), animal_id (one of AA4_ANIMAL_IDS), sex
("M" or "F" — last char of animal_id), site (recording site
label, e.g. "Site1"), electrode (int 1-32 across both hemisphere
arrays, 16 channels each in 5/6 birds; 1 array in the 6th),
ldepth / rdepth (left- and right-array depth in µm at this site),
sort_type ("single" or "multi"; "noise" / "tdt" are filtered out
at load), sort_id (online-sort id, int), and subsort_id (offline
spike-sorting id parsed from the trailing _ss<N> of the filename;
None if absent).
The dataset paper does not publish a per-cell brain-area assignment,
so neurons cannot be filtered by area — the natural axis to slice by is
animal_id. Otherwise the full selection API from
the data paradigm doc
is available: select neurons by metadata (select_pop_by_nrn_attr),
select stims by metadata (select_stims_by_attr), and the bidirectional
rule auto-hides cells that have no responses to the current stim
selection.