CRCNS AA4 Dataset

Dataset Source: AA4 Dataset

Citation:

 Elie J E and Theunissen F E (2019), Simultaneous extracellular recordings of avian auditory neurons in zebra finches presented with all the repertoire of vocalizations used by this species for vocal communication. CRCNS.org.
http://dx.doi.org/10.6080/K00C4T06

Papers Using the Dataset:

Dataset Details

Population fitting:

Batching:

Description of Stimuli: A total of 170 different clips of conspecific vocalizations (songs and calls) and clips of artificial ripple noise, up to 3 s duration.

  • Sample rate @ 24.4 kHz

  • Transformed into 32-band mel spectrograms followed by a compression function

  • Around 10 response trials on average for a given stimulus

Description of Neurons:

  • Extracellular single-unit recordings from 4 male and 2 female zebra finches

  • anesthetized subjects

  • Total Number of units: 1401 (including 914 single units)

  • Targeted avian auditory cortical regions included:

    • Field L (including the thalamo recipient L2, the primary auditory regions L1 and L3),

    • caudolateral and caudomedial mesopallium (CLM and CMM),

    • caudomedial nidopallium (NCM)

  • However, neurons were not individually assigned one of these specific regions.

Animal

Sex

#units

#stims

BlaBro09xxF

F

151

130

GreBlu9508M

M

355

130

LblBlu2028M

M

53

137

WhiBlu5396M

M

198

73

WhiWhi4522M

M

304

131

YelBlu6903F

F

282

129

Available data:

  • Full Python preprocessing.

  • One folder for each animal subject, containing several .h5 files of neural recordings (one for each unit)

Processing needed (Dataset constructor):

  • Transforming the sound waveform (.wav file) into a 32-band spectrogram.

  • Choosing neurons based on stimulus type and animal.

  • Transforming the spike times of each repeat of each stimulus into PSTHs

    • Remove pre-onset spikes

    • Align trials temporally

    • Pad/cut to the right (present/future time steps) so that trials have the same duration

Setup

Requirements: a CRCNS account.

Easiest path — auto-download via the CRCNS NERSC mirror:

from deepSTRF.datasets.audio import CRCNSAA4Dataset, AA4_ANIMAL_IDS

ds = CRCNSAA4Dataset(
    download=True, dt_ms=5,
    crcns_username="your_username",
    crcns_password="your_password",
)

Alternatively set $CRCNS_USERNAME / $CRCNS_PASSWORD. Default cache dir is platformdirs.user_cache_dir('deepSTRF')/CRCNS_AA4, overridable via $DEEPSTRF_DATA_DIR. download=True is idempotent.

If you already have the data laid out manually, the data/ folder should look like this:

data/
 |____ BlaBro09xxF/
 |____ GreBlu9508M/
 |____ LblBlu2028M/
 |____ WhiBlu5396M/
 |____ YelBlu6903F/
        |______ ...
ds = CRCNSAA4Dataset('/path/to/data', stimuli=('song', 'call'),
                       animals=(AA4_ANIMAL_IDS[0],))

Filtering

Each stim_meta dict carries name (the stimulus md5 — the canonical identifier; the wav filename is per-animal and not unique across the corpus), type (e.g. "song", "call"), class (broader category), and duration_s. Each nrn_meta dict carries cell_id (the basename of the source h5 file), animal_id (one of AA4_ANIMAL_IDS), sex ("M" or "F" — last char of animal_id), site (recording site label, e.g. "Site1"), electrode (int 1-32 across both hemisphere arrays, 16 channels each in 5/6 birds; 1 array in the 6th), ldepth / rdepth (left- and right-array depth in µm at this site), sort_type ("single" or "multi"; "noise" / "tdt" are filtered out at load), sort_id (online-sort id, int), and subsort_id (offline spike-sorting id parsed from the trailing _ss<N> of the filename; None if absent).

The dataset paper does not publish a per-cell brain-area assignment, so neurons cannot be filtered by area — the natural axis to slice by is animal_id. Otherwise the full selection API from the data paradigm doc is available: select neurons by metadata (select_pop_by_nrn_attr), select stims by metadata (select_stims_by_attr), and the bidirectional rule auto-hides cells that have no responses to the current stim selection.