CRCNS AA1 Dataset

Dataset Source: AA1 Dataset

Citation:

Theunissen, FE; Hauber, ME; Woolley, SMN; Gill, P; Shaevitz, SS; Amin, Noopur; Hsu, A; Singh, NC; Grace, GA; Fremouw, T; Zhang, Junli; Cassey, P; Doupe, AJ; David, SV (2009): Single-unit recordings from two auditory areas in male zebra finches. CRCNS.org.
http://dx.doi.org/10.6080/K0F769GP

Papers Using the Dataset:

Dataset Details

Population fitting:

Description of Stimuli:

  • 10 clips of conspecific vocalizations and 20 clips of flat ripples, up to 5 s duration.

  • Sample rate @ 32 kHz and 16 bit precision

  • Up to 10 response trials for a given sound

Description of Neurons:

  • Total Number of Neurons: 100 (50 MLd + 50 Field L)

  • Extracellular single-unit recordings from anesthetized male zebra finches

Available data:

  • Full Python preprocessing.

  • One very simple .txt file for each cell (unit) response:

    • Spike timestamp relative to stimulus onset

    • Each line corresponds to one response trial

Processing needed (Dataset class init() method):

  • Transforming the sound waveform (.wav file) into a 32-band spectrogram.

  • Choosing neurons based on their recording site and stimulus type.

  • Transforming the spike times of each repeat of each stimulus into PSTHs

    • Remove pre-onset spikes

    • Align trials temporally

    • Pad/cut to the right (present/future time steps) so that trials hvae the same duration

Benchmark results

Area

Model backbone

Rank

Remarks

Params / nrn

Perfs
(CCraw / CCnorm) [%]

Paper (backbone)

Field L

StateNet

🥇

GRU, pop

24,900

/ 71.0

Rançon et al.

Transformer

🥈

pop

29,109

/ 65.5

Rançon et al.

2D-CNN

🥉

pop

26,915

/ 65.0

Pennington et al.

MLd

StateNet

🥇

Mamba, pop

32,334

/ 73.4

Rançon et al.

2D-CNN

🥈

pop

29,109

/ 68.9

Rançon et al.

Transformer

🥉

pop

34,475

/ 68.3

Pennington et al.

Setup

Requirements: a CRCNS account (the data host requires login).

Easiest path — auto-download via the CRCNS NERSC mirror:

from deepSTRF.datasets.audio import CRCNSAA1Dataset

ds = CRCNSAA1Dataset(
    download=True, dt_ms=5,
    crcns_username="your_username",
    crcns_password="your_password",
)

Alternatively, set $CRCNS_USERNAME / $CRCNS_PASSWORD in the env and omit the credential kwargs. Default cache dir is platformdirs.user_cache_dir('deepSTRF')/CRCNS_AA1, overridable via $DEEPSTRF_DATA_DIR. download=True is idempotent.

If you already have the data laid out manually:

  1. Download crcns-aa1.zip at the original dataset repository.

  2. Extract all_stims/, Field_L_cells/, MLd_cells/ into a data/ folder.

  3. ds = CRCNSAA1Dataset('/path/to/data', dt_ms=5).

Filtering

Each stim_meta dict carries name, type ("conspecific" or "flatrip"), sample_rate, n_samples, duration_s. Each nrn_meta dict carries cell_id, area ("Field_L" or "MLd"), animal_id, cell_seq, rig. Combined with the base-class selection API:

ds.select_pop_by_nrn_attr("area", "MLd")           # only MLd cells
ds.select_stims_by_attr("type", "conspecific")     # only conspecific stims
                                                   # (auto-hides 2 cells with
                                                   # no conspecific data)