CRCNS AA2 Dataset

Dataset Source: AA2 Dataset

Citation:

Theunissen, Frederic E.; Gill, Patrick; Noopur, Amin; Zhang, Junli; Woolley, Sarah M. N.; Fremouw, Thane (2011): Single-unit recordings from multiple auditory areas in male zebra finches. CRCNS.org.
http://dx.doi.org/10.6080/10.6080/K0JW8BSC

Papers Using the Dataset:

Dataset Details

Population fitting:

Description of Stimuli:

  • 72 clips of conspecific vocalizations, 20 clips of flat ripples, and 25 clips of song ripples up to 5 s duration.

  • Sample rate @ 32 kHz and 16 bit precision

  • Up to 10-20 response trials for a given stimulus

Description of Neurons:

  • Extracellular single-unit recordings from 57 male zebra finches

  • Total Number of Neurons: 494

    • 143 mld

    • 59 OV

    • 189 L

      • 17 L1

      • 53 L2a

      • 42 L2b

      • 43 L3

      • 31 others (“L”)

    • 37 CM

    • 66 others (“None”)

Available data:

  • Full Python preprocessing.

  • One very simple .txt file for each cell (unit) response:

    • Spike timestamp relative to stimulus onset

    • Each line corresponds to one response trial

Processing needed (Dataset class init() method):

  • Transforming the sound waveform (.wav file) into a 32-band spectrogram.

  • Choosing neurons based on their recording site, stimulus type, and animal.

  • Transforming the spike times of each repeat of each stimulus into PSTHs

    • Remove pre-onset spikes

    • Align trials temporally

    • Pad/cut to the right (present/future time steps) so that trials have the same duration

Setup

Requirements: a CRCNS account.

Easiest path — auto-download via the CRCNS NERSC mirror:

from deepSTRF.datasets.audio import CRCNSAA2Dataset

ds = CRCNSAA2Dataset(
    download=True, dt_ms=5,
    crcns_username="your_username",
    crcns_password="your_password",
)

Alternatively set $CRCNS_USERNAME / $CRCNS_PASSWORD. Default cache dir is platformdirs.user_cache_dir('deepSTRF')/CRCNS_AA2, overridable via $DEEPSTRF_DATA_DIR. download=True is idempotent.

If you already have the data laid out manually:

  1. Download from the dataset page.

  2. Extract all_stims/ and all_cells/ into a data/ folder.

  3. ds = CRCNSAA2Dataset('/path/to/data', dt_ms=5).

Filtering

Each stim_meta dict carries name (stimulus identifier), type ("conspecific" or "songrip" — the latter is reversed-song / pitch-shifted controls), sample_rate, n_samples, duration_s (last three from data/stim_data.csv). Each nrn_meta dict carries cell_id (the raw cell name from the dataset), animal_id, area ("MLd", "OV", "L", "CM", or one of the smaller secondary areas — see AA1 for the parsing details), cell_seq (within-animal cell index), and rig (often None in AA2).

The full selection API from the data paradigm doc is available on AA2: filter neurons by metadata (select_pop_by_nrn_attr) or by stim coverage (select_pop_by_stim_attr), filter stims by metadata (select_stims_by_attr), and rely on the bidirectional rule so that narrowing the stim space automatically hides cells with no responses left in it.