CRCNS AA2 Dataset

Dataset Source: AA2 Dataset

Citation:

Theunissen, Frederic E.; Gill, Patrick; Noopur, Amin; Zhang, Junli; Woolley, Sarah M. N.; Fremouw, Thane (2011): Single-unit recordings from multiple auditory areas in male zebra finches. CRCNS.org.
http://dx.doi.org/10.6080/10.6080/K0JW8BSC

Papers Using the Dataset:

“Sound representation methods for spectro-temporal receptive field estimation” (2006) by Patrick Gill, Junli Zhang, Sarah M. N. Woolley, Thane Fremouw and Frédéric E. Theunissen
“Role of the Zebra Finch Auditory Thalamus in Generating Complex Representations for Natural Sounds” (2010) by Noopur Amin, Patrick Gill, Frederic E Theunissen

Dataset Details

Population fitting: ✅

Description of Stimuli:

72 clips of conspecific vocalizations, 20 clips of flat ripples, and 25 clips of song ripples up to 5 s duration.
Sample rate @ 32 kHz and 16 bit precision
Up to 10-20 response trials for a given stimulus

Description of Neurons:

Extracellular single-unit recordings from 57 male zebra finches
Total Number of Neurons: 494
- 143 mld
- 59 OV
- 189 L
  - 17 L1
  - 53 L2a
  - 42 L2b
  - 43 L3
  - 31 others (“L”)
- 37 CM
- 66 others (“None”)

Available data:

Full Python preprocessing.
One very simple .txt file for each cell (unit) response:
- Spike timestamp relative to stimulus onset
- Each line corresponds to one response trial

Processing needed (Dataset class init() method):

Transforming the sound waveform (.wav file) into a 32-band spectrogram.
Choosing neurons based on their recording site, stimulus type, and animal.
Transforming the spike times of each repeat of each stimulus into PSTHs
- Remove pre-onset spikes
- Align trials temporally
- Pad/cut to the right (present/future time steps) so that trials have the same duration

Setup

Requirements: a CRCNS account.

Easiest path — auto-download via the CRCNS NERSC mirror:

from deepSTRF.datasets.audio import CRCNSAA2Dataset

ds = CRCNSAA2Dataset(
    download=True, dt_ms=5,
    crcns_username="your_username",
    crcns_password="your_password",
)

Alternatively set $CRCNS_USERNAME / $CRCNS_PASSWORD. Default cache dir is platformdirs.user_cache_dir('deepSTRF')/CRCNS_AA2, overridable via $DEEPSTRF_DATA_DIR. download=True is idempotent.

If you already have the data laid out manually:

Download from the dataset page.
Extract all_stims/ and all_cells/ into a data/ folder.
ds = CRCNSAA2Dataset('/path/to/data', dt_ms=5).

Filtering

Each stim_meta dict carries name (stimulus identifier), type ("conspecific" or "songrip" — the latter is reversed-song / pitch-shifted controls), sample_rate, n_samples, duration_s (last three from data/stim_data.csv). Each nrn_meta dict carries cell_id (the raw cell name from the dataset), animal_id, area ("MLd", "OV", "L", "CM", or one of the smaller secondary areas — see AA1 for the parsing details), cell_seq (within-animal cell index), and rig (often None in AA2).

The full selection API from the data paradigm doc is available on AA2: filter neurons by metadata (select_pop_by_nrn_attr) or by stim coverage (select_pop_by_stim_attr), filter stims by metadata (select_stims_by_attr), and rely on the bidirectional rule so that narrowing the stim space automatically hides cells with no responses left in it.