deepSTRF.datasets package
Subpackages
- deepSTRF.datasets.audio package
- Submodules
- deepSTRF.datasets.audio.audio_dataset module
- deepSTRF.datasets.audio.ns1_drc module
- deepSTRF.datasets.audio.nat4 module
- deepSTRF.datasets.audio.wehr module
- deepSTRF.datasets.audio.asari module
- deepSTRF.datasets.audio.crcns_aa1 module
- deepSTRF.datasets.audio.crcns_aa2 module
- deepSTRF.datasets.audio.crcns_aa4 module
- deepSTRF.datasets.audio.espejo module
- deepSTRF.datasets.audio.alice_eeg module
- deepSTRF.datasets.audio.le_2025 module
- Module contents
AliceEEGDatasetAudioNeuralDatasetCRCNSAA1DatasetCRCNSAA2DatasetCRCNSAA4DatasetCRCNSAC1DatasetDowner2025DatasetEspejoDatasetLe2025DatasetNAT4DatasetNS1DatasetWingert2026Datasetdownload_ac1()download_alice_eeg()download_downer2025()download_espejo()download_espejo_nat_waveforms()download_wingert2026()
- deepSTRF.datasets.video package
- Submodules
- deepSTRF.datasets.video.video_dataset module
VideoNeuralDatasetVideoNeuralDataset.add_noise_to_videos()VideoNeuralDataset.change_spatial_resolution()VideoNeuralDataset.change_temporal_resolution()VideoNeuralDataset.change_video_temporal_resolution()VideoNeuralDataset.from_rgb_to_grayscale()VideoNeuralDataset.get_dt()VideoNeuralDataset.get_responses_signal_power()VideoNeuralDataset.normalize_responses()VideoNeuralDataset.normalize_videos()VideoNeuralDataset.smooth_responses()VideoNeuralDataset.split_into_clips()
- Module contents
VideoNeuralDatasetVideoNeuralDataset.add_noise_to_videos()VideoNeuralDataset.change_spatial_resolution()VideoNeuralDataset.change_temporal_resolution()VideoNeuralDataset.change_video_temporal_resolution()VideoNeuralDataset.from_rgb_to_grayscale()VideoNeuralDataset.get_dt()VideoNeuralDataset.get_responses_signal_power()VideoNeuralDataset.normalize_responses()VideoNeuralDataset.normalize_videos()VideoNeuralDataset.smooth_responses()VideoNeuralDataset.split_into_clips()
Submodules
deepSTRF.datasets.neural_dataset module
- class deepSTRF.datasets.neural_dataset.NeuralDataset(path: str, dt_ms: float)[source]
Bases:
Dataset,ABCGeneral base class for datasets of sensory neural responses.
deepSTRF datasets are triply ragged (variable stim duration, variable repeat count per (stim, neuron), sparse stim/neuron coverage). They are stored as Python lists of tensors, with NaN used as the single channel for encoding missingness. See
docs/_source/md/data_paradigm.mdfor the full rationale, collate behaviour, and recommended loss pattern.Subclass contract
A concrete subclass must populate the following attributes in its
__init__and then callself.validate()as its last line:self.stims— list of lengthS, each element a stimulustensor of modality-specific shape (audio spectrogram:
(1, F, T_stim), audio waveform:(1, T_stim), video:(1, H, W, T_stim)).T_stimmay vary across stimuli AND may differ from the response time axis when the stim sampling rate is finer than the neural rate (e.g. raw waveforms vs spike counts).
self.responses— list of lengthS, each element itself alist of length
N.responses[s][n]is a(R_{s,n}, T_resp_s)float tensor of spike counts per repeat × time bin at the dataset’s neuraldt_ms, or a(1, 1)NaN tensor if neuronndid not hear stims.T_resp_sis the per-stim response length; in spectrogram mode it equals the last axis ofself.stims[s].
self.stim_meta— list of lengthS, per-stim metadata dicts.self.nrn_meta— list of lengthN, per-neuron metadata dicts.self.N_neurons— int, must equallen(self.nrn_meta).
Derived attributes (no explicit population needed):
self.nrn_masks—(S, N)bool tensor, derived on the fly from the NaN sentinels inself.responses.nrn_masks[s, n]isTrueiff neuronnhas real data for stims. Implemented as a@propertyso it is always consistent with the currentself.responses— no risk of the mask going out of sync.
Opt-in per-neuron quality metrics
Call
self.compute_neuron_quality()after construction to populate eachnrn_meta[i]with two scalars derived from the responses themselves: Sahani-Linden'snr'and Hsu/Spearman-Brown'ccmax'. Useful as predicate-filter inputs (e.g.ds.select_pop_by_nrn_predicate(lambda n: n['snr'] > 0.5)). Opt-in rather than auto because CCmax is \(O(S \cdot N \cdot R^2)\) and a few seconds on the big datasets.Key invariants
Stim tensors never contain NaN. Batch-level collate zero-pads them on the right along
T.Response tensors may contain NaN. Use
self.nrn_masks(dataset level) or the derivedvalid_maskfrom collate (batch level).Response-side preprocessing (
smooth_responses,normalize_responses, any user-written transform) must be NaN-aware — either usenanmean/nanstd/ etc., or apply the mask before reducing.
- compute_neuron_quality(max_ccmax_iters: int = 126) dict[source]
Write per-neuron SNR and CCmax scalars into
self.nrn_meta.For each neuron, adds two keys to
self.nrn_meta[i]:'snr'(float) — Sahani-Linden signal-to-noise ratio \(\text{SP}_n / \text{NP}_n\), with the two terms length-weighted across stims (weight = number of valid time bins per stim). NaN when the neuron has no stim with \(R_{b,n} \ge 2\) and \(T_b \ge 2\).'ccmax'(float) — Hsu/Spearman-Brown noise ceiling (capped atmax_ccmax_itersrandom half-splits per(stim, neuron)), length-weighted across stims with \(R_{b,n} \ge 2\). Falls back to1.0when the neuron has zero such stims (R=1 everywhere — no normalization possible, socc_norm = cc_raw). NaN if every contributing stim has \(\rho_{\text{half}} \le 0\) (signal too weak to estimate the ceiling).
Both scalars use the length-weighted aggregation convention from
metrics_paradigm.md§11, matching whatcorrcoef/normalized_corrcoefwould compute on the concatenated-over-stims time axis.- Parameters:
max_ccmax_iters (
int, default126) – Maximum number of random half-splits per(stim, neuron)for the CCmax estimate.C(R, R/2)blows up forR > 10; 126 matches the default ofcompute_CCmax().- Returns:
{'snr': Tensor[N], 'ccmax': Tensor[N]}for inspection. The same values are written intoself.nrn_metaas plain Python floats.- Return type:
dict
Notes
Opt-in (not auto-called in
__init__). CCmax is \(O(S \cdot N \cdot R^2 \cdot \text{max\_iters})\) in the worst case — on big datasets (AA4, NAT4, Espejo NAT) this can take tens of seconds. Call once after dataset construction; results live onnrn_metafor subsequent filter-API calls.Memory profile: streams one stim at a time, peaking at the largest single-stim
(N, R_s, T_s)slab. The earlier implementation pre-built a global(S, N, R_max, T_max)padded tensor and OOMed on Downer 2025 TIMIT (~54 GB); the per-stim streaming variant lands the same numbers bit-identically but with a much smaller working set.Examples
>>> ds = CRCNSAA1Dataset(...) >>> ds.compute_neuron_quality() >>> ds.select_pop_by_nrn_predicate(lambda n: n['snr'] > 0.5)
- get_N()[source]
Return the total number of selectable neurons.
- Returns:
self.N_neurons(the full population size, not the current selection).- Return type:
int
- get_S()[source]
Return the total number of stimuli in the dataset.
- Returns:
Number of stimuli presented to the whole neural population.
- Return type:
int
- get_nrn_meta()[source]
Return metadata for each currently selected neuron.
- Returns:
The
nrn_metadicts for the neurons in the current selectionself.I.- Return type:
listofdict
- normalize_responses(method: str = 'max', stim_indices: Sequence[int] | None = None, eps: float = 1e-08) dict[source]
Normalize
self.responsesin place, per neuron.Statistics are computed on a chosen stim subset (typically train +val) and applied to all stims, mirroring
standardize_stims.(1, 1)NaN sentinels for structurally missing(stim, neuron)pairs are preserved unchanged.- Parameters:
method (
{'max', 'zscore'}, default'max') –‘max’ — divide each neuron’s responses by their max across all
(s, r, t)instim_indices. Preserves non-negativity; range becomes[0, max] -> [0, 1]. Natural for rate-coded / spike-count targets where 0 is meaningful.’zscore’ — subtract per-neuron mean, divide by per-neuron std, both computed NaN-aware over the same flat
(s, r, t)samples. Maps signed continuous targets (EEG, LFP) toN(0, 1).stim_indices (
sequenceofint, optional) – Stims used to compute statistics. If None, all stims are used.eps (
float, default1e-8) – Floor on the divisor.
- Returns:
{'method': str, 'scale': Tensor[N], 'offset': Tensor[N], 'stim_indices': list | None}— also stored onself.response_normalization.scaleis the divisor (max or std);offsetis the subtracted location (0 for ‘max’, mean for ‘zscore’).- Return type:
dict
Notes
Not idempotent — calling twice double-normalizes.
- property nrn_masks: Tensor
True iff neuron n has real data for stim s.
Derived from the NaN sentinels in
self.responses— single source of truth, cannot go out of sync. Lazy-cached on first access: subsequent reads are O(1). Callers that structurally mutate the response list (replace a real tensor with a(1, 1)NaN sentinel, or vice versa) should callself._invalidate_nrn_masks()afterwards. Shape-preserving mutations (smooth_responses, normalization, etc.) leave the mask unchanged and do not require invalidation.For a bare dataset (no populated responses), returns an empty
(0, N_neurons)tensor.- Type:
Derived
(S, N)bool tensor
- reset_pop_selection()[source]
Clear the population selection so all neurons are eligible again.
Mirror of
reset_stim_selection. Restoresself.Ito its empty default (interpreted as “no neuron-side restriction” by_selected()).
- select_neuron(neuron_index: int)[source]
Restrict the selection to a single neuron.
- Parameters:
neuron_index (
int) – Index into the full population, in[0, N_neurons).
- select_pop_by_nrn_attr(attribute_name: str, value)[source]
Select neurons whose
nrn_meta[attribute_name] == value.Neurons whose metadata dict does not contain
attribute_nameare silently skipped — this lets a single filter call work against a concatenated dataset that pools sources with different metadata schemas (e.g. AA1’sareais not present on AA4 neurons; callingselect_pop_by_nrn_attr("area", "Field_L")on the concatenation keeps only AA1 neurons in Field L, with noKeyError).- Parameters:
attribute_name (
str) – Key looked up in eachnrn_metadict.value – Required value for an exact (
==) match.
- Returns:
Indices of the selected neurons. Also stored in
self.I.- Return type:
listofint
See also
select_pop_by_nrn_predicatethreshold / range / compound queries.
- select_pop_by_nrn_predicate(predicate)[source]
Select neurons whose
nrn_metadict satisfiespredicate.More flexible than
select_pop_by_nrn_attr(): takes any callable that maps a singlenrn_metadict to a truthy / falsy value, so threshold queries on continuous attributes (”snr > 0.5”), range queries (”200 <= depth_um <= 800”) and compound conditions (”area in {'Field_L', 'MLd'} and auditory”) are expressible. The*_attrsiblings remain available for the equality-only case.Neurons whose predicate raises
KeyErrororTypeErrorare silently skipped — same convention asselect_pop_by_nrn_attr()so a single predicate works on a concatenated dataset whose sources carry heterogeneous metadata schemas. Note: this means a typo in the predicate (referencing a wrong key) will silently select no neurons rather than raising; usenrn.get(key, default)in the predicate for explicit-default semantics.- Parameters:
predicate (
callable(dict) -> bool) – Tested on eachnrn_meta[i]dict.- Returns:
Indices of selected neurons. Also stored in
self.I.- Return type:
list[int]
Examples
>>> ds.select_pop_by_nrn_predicate(lambda n: n.get("snr", 0) > 0.5) >>> ds.select_pop_by_nrn_predicate(lambda n: n["area"] in {"Field_L", "MLd"}) >>> ds.select_pop_by_nrn_predicate( ... lambda n: 200 <= n.get("depth_um", -1) <= 800 ... )
- select_pop_by_stim_attr(attribute_name: str, value)[source]
Select neurons with >=1 non-null response to stimuli matching a given attribute.
Looks up stimuli whose
stim_meta[attribute_name] == valueand keeps only neurons whosenrn_masksis True for at least one of them. Stims missing the key are silently skipped (same convention asselect_pop_by_nrn_attr()).- Parameters:
attribute_name (
str) – Key looked up in eachstim_metadict.value – Required value for an exact (
==) match.
- Returns:
Indices of the selected neurons. Also stored in
self.I.- Return type:
listofint
- select_pop_by_stim_predicate(predicate)[source]
Select neurons with >=1 non-null response to stimuli matching
predicate.Predicate variant of
select_pop_by_stim_attr(). Looks up stimuli whosestim_metadict satisfiespredicateand keeps only neurons whosenrn_masksis True for at least one of them. Stims whose predicate raisesKeyErrororTypeErrorare silently skipped — same forgiving convention asselect_pop_by_nrn_predicate().- Parameters:
predicate (
callable(dict) -> bool) – Tested on eachstim_meta[s]dict.- Returns:
Indices of selected neurons. Also stored in
self.I.- Return type:
list[int]
- select_population(neuron_indices)[source]
Restrict the selection to the listed neurons.
- Parameters:
neuron_indices (
sequenceofint) – Indices into the full population, each in[0, N_neurons).
- select_stim(stim_index: int)[source]
Restrict iteration to a single stimulus index.
Pairs with the bidirectional rule in
_selected(): cells whose only valid responses lie outside the selected stim are auto-hidden from__getitem__.- Parameters:
stim_index (
int) – Index into the stim space, in[0, S).
- select_stims(stim_indices)[source]
Restrict iteration to the listed stimulus indices.
- Parameters:
stim_indices (
sequenceofint) – Indices into the stim space, each in[0, S).
- select_stims_by_attr(attribute_name: str, value)[source]
Restrict iteration to stimuli matching
stim_meta[attr] == value.Stims whose metadata dict does not contain
attribute_nameare silently skipped — same convention asselect_pop_by_nrn_attr(), so a single call works on a concatenated dataset whose sources have heterogeneous stim metadata schemas.- Parameters:
attribute_name (
str) – Key looked up in eachstim_metadict.value – Required value for an exact (
==) match.
- Returns:
Indices of the selected stims. Also stored in
self.S_sel.- Return type:
listofint
- select_stims_by_predicate(predicate)[source]
Restrict iteration to stimuli whose
stim_metasatisfiespredicate.Predicate variant of
select_stims_by_attr(). Takes any callable mapping a singlestim_metadict to truthy / falsy, so threshold and compound queries on continuous attributes (”duration_s > 2.0”, “sample_rate >= 24000”) become expressible.Stims whose predicate raises
KeyErrororTypeErrorare silently skipped — same forgiving convention asselect_pop_by_nrn_predicate(). Usesm.get(key, default)in the predicate for explicit-default semantics.- Parameters:
predicate (
callable(dict) -> bool) – Tested on eachstim_meta[s]dict.- Returns:
Indices of selected stims. Also stored in
self.S_sel.- Return type:
list[int]
Examples
>>> ds.select_stims_by_predicate(lambda s: s.get("duration_s", 0) >= 2.0) >>> ds.select_stims_by_predicate(lambda s: s["type"] in {"song", "call"})
- smooth_responses(window_ms: float = 21.0) None[source]
Temporally smooth each non-NaN response in place with a Hanning window.
- Parameters:
window_ms (
float, default21.0) – Full width of the Hanning window in ms. Rounded to the nearest odd number ofself.dtbins.
Notes
Follows Hsu, Borst & Theunissen (2004) for reducing PSTH estimator variance — a common preprocessing step across spike-count datasets.
(1, 1)NaN-sentinel responses (neurons that did not hear a given stim) are preserved unchanged.
- standardize_stims(stim_indices: Sequence[int] | None = None, per_band: bool = True, eps: float = 1e-08) dict[source]
Standardize
self.stimsin place:(x − mean) / std.Statistics are computed over the stims selected by
stim_indices(typically train + validation indices) and applied to all stims in the dataset — so the held-out test stims are automatically transformed with the same train+val statistics, preventing leakage of test-set first-order moments into the standardisation while still ensuring train / val / test all live in the same standardised space.- Parameters:
stim_indices (
sequenceofint, optional) – Indices of stims to compute statistics from. If None (default), statistics are computed over all stims — equivalent to “no held-out test set”; useful for single-split exploratory analysis but introduces a tiny (first-order) leakage if a test set is held out downstream.per_band (
bool, defaultTrue) – If True, statistics are per-frequency-band (axis-2): mean / std are tensors of shape broadcastable to(C, F, 1). If False, a single scalar mean and std are computed over the whole concatenated stim tensor.eps (
float, default1e-8) – Floor onstdto avoid division by zero on constant bands.
- Returns:
{'mean': Tensor, 'std': Tensor, 'per_band': bool, 'stim_indices': list | None}— also stored onself.stim_normalizationfor inspection (e.g. to fold into a model kernel for STRF visualisation).- Return type:
dict
Notes
Not idempotent: calling twice double-standardizes. To re-do with different statistics, rebuild the dataset.
Module contents
- class deepSTRF.datasets.NeuralDataset(path: str, dt_ms: float)[source]
Bases:
Dataset,ABCGeneral base class for datasets of sensory neural responses.
deepSTRF datasets are triply ragged (variable stim duration, variable repeat count per (stim, neuron), sparse stim/neuron coverage). They are stored as Python lists of tensors, with NaN used as the single channel for encoding missingness. See
docs/_source/md/data_paradigm.mdfor the full rationale, collate behaviour, and recommended loss pattern.Subclass contract
A concrete subclass must populate the following attributes in its
__init__and then callself.validate()as its last line:self.stims— list of lengthS, each element a stimulustensor of modality-specific shape (audio spectrogram:
(1, F, T_stim), audio waveform:(1, T_stim), video:(1, H, W, T_stim)).T_stimmay vary across stimuli AND may differ from the response time axis when the stim sampling rate is finer than the neural rate (e.g. raw waveforms vs spike counts).
self.responses— list of lengthS, each element itself alist of length
N.responses[s][n]is a(R_{s,n}, T_resp_s)float tensor of spike counts per repeat × time bin at the dataset’s neuraldt_ms, or a(1, 1)NaN tensor if neuronndid not hear stims.T_resp_sis the per-stim response length; in spectrogram mode it equals the last axis ofself.stims[s].
self.stim_meta— list of lengthS, per-stim metadata dicts.self.nrn_meta— list of lengthN, per-neuron metadata dicts.self.N_neurons— int, must equallen(self.nrn_meta).
Derived attributes (no explicit population needed):
self.nrn_masks—(S, N)bool tensor, derived on the fly from the NaN sentinels inself.responses.nrn_masks[s, n]isTrueiff neuronnhas real data for stims. Implemented as a@propertyso it is always consistent with the currentself.responses— no risk of the mask going out of sync.
Opt-in per-neuron quality metrics
Call
self.compute_neuron_quality()after construction to populate eachnrn_meta[i]with two scalars derived from the responses themselves: Sahani-Linden'snr'and Hsu/Spearman-Brown'ccmax'. Useful as predicate-filter inputs (e.g.ds.select_pop_by_nrn_predicate(lambda n: n['snr'] > 0.5)). Opt-in rather than auto because CCmax is \(O(S \cdot N \cdot R^2)\) and a few seconds on the big datasets.Key invariants
Stim tensors never contain NaN. Batch-level collate zero-pads them on the right along
T.Response tensors may contain NaN. Use
self.nrn_masks(dataset level) or the derivedvalid_maskfrom collate (batch level).Response-side preprocessing (
smooth_responses,normalize_responses, any user-written transform) must be NaN-aware — either usenanmean/nanstd/ etc., or apply the mask before reducing.
- compute_neuron_quality(max_ccmax_iters: int = 126) dict[source]
Write per-neuron SNR and CCmax scalars into
self.nrn_meta.For each neuron, adds two keys to
self.nrn_meta[i]:'snr'(float) — Sahani-Linden signal-to-noise ratio \(\text{SP}_n / \text{NP}_n\), with the two terms length-weighted across stims (weight = number of valid time bins per stim). NaN when the neuron has no stim with \(R_{b,n} \ge 2\) and \(T_b \ge 2\).'ccmax'(float) — Hsu/Spearman-Brown noise ceiling (capped atmax_ccmax_itersrandom half-splits per(stim, neuron)), length-weighted across stims with \(R_{b,n} \ge 2\). Falls back to1.0when the neuron has zero such stims (R=1 everywhere — no normalization possible, socc_norm = cc_raw). NaN if every contributing stim has \(\rho_{\text{half}} \le 0\) (signal too weak to estimate the ceiling).
Both scalars use the length-weighted aggregation convention from
metrics_paradigm.md§11, matching whatcorrcoef/normalized_corrcoefwould compute on the concatenated-over-stims time axis.- Parameters:
max_ccmax_iters (
int, default126) – Maximum number of random half-splits per(stim, neuron)for the CCmax estimate.C(R, R/2)blows up forR > 10; 126 matches the default ofcompute_CCmax().- Returns:
{'snr': Tensor[N], 'ccmax': Tensor[N]}for inspection. The same values are written intoself.nrn_metaas plain Python floats.- Return type:
dict
Notes
Opt-in (not auto-called in
__init__). CCmax is \(O(S \cdot N \cdot R^2 \cdot \text{max\_iters})\) in the worst case — on big datasets (AA4, NAT4, Espejo NAT) this can take tens of seconds. Call once after dataset construction; results live onnrn_metafor subsequent filter-API calls.Memory profile: streams one stim at a time, peaking at the largest single-stim
(N, R_s, T_s)slab. The earlier implementation pre-built a global(S, N, R_max, T_max)padded tensor and OOMed on Downer 2025 TIMIT (~54 GB); the per-stim streaming variant lands the same numbers bit-identically but with a much smaller working set.Examples
>>> ds = CRCNSAA1Dataset(...) >>> ds.compute_neuron_quality() >>> ds.select_pop_by_nrn_predicate(lambda n: n['snr'] > 0.5)
- get_N()[source]
Return the total number of selectable neurons.
- Returns:
self.N_neurons(the full population size, not the current selection).- Return type:
int
- get_S()[source]
Return the total number of stimuli in the dataset.
- Returns:
Number of stimuli presented to the whole neural population.
- Return type:
int
- get_nrn_meta()[source]
Return metadata for each currently selected neuron.
- Returns:
The
nrn_metadicts for the neurons in the current selectionself.I.- Return type:
listofdict
- normalize_responses(method: str = 'max', stim_indices: Sequence[int] | None = None, eps: float = 1e-08) dict[source]
Normalize
self.responsesin place, per neuron.Statistics are computed on a chosen stim subset (typically train +val) and applied to all stims, mirroring
standardize_stims.(1, 1)NaN sentinels for structurally missing(stim, neuron)pairs are preserved unchanged.- Parameters:
method (
{'max', 'zscore'}, default'max') –‘max’ — divide each neuron’s responses by their max across all
(s, r, t)instim_indices. Preserves non-negativity; range becomes[0, max] -> [0, 1]. Natural for rate-coded / spike-count targets where 0 is meaningful.’zscore’ — subtract per-neuron mean, divide by per-neuron std, both computed NaN-aware over the same flat
(s, r, t)samples. Maps signed continuous targets (EEG, LFP) toN(0, 1).stim_indices (
sequenceofint, optional) – Stims used to compute statistics. If None, all stims are used.eps (
float, default1e-8) – Floor on the divisor.
- Returns:
{'method': str, 'scale': Tensor[N], 'offset': Tensor[N], 'stim_indices': list | None}— also stored onself.response_normalization.scaleis the divisor (max or std);offsetis the subtracted location (0 for ‘max’, mean for ‘zscore’).- Return type:
dict
Notes
Not idempotent — calling twice double-normalizes.
- property nrn_masks: Tensor
True iff neuron n has real data for stim s.
Derived from the NaN sentinels in
self.responses— single source of truth, cannot go out of sync. Lazy-cached on first access: subsequent reads are O(1). Callers that structurally mutate the response list (replace a real tensor with a(1, 1)NaN sentinel, or vice versa) should callself._invalidate_nrn_masks()afterwards. Shape-preserving mutations (smooth_responses, normalization, etc.) leave the mask unchanged and do not require invalidation.For a bare dataset (no populated responses), returns an empty
(0, N_neurons)tensor.- Type:
Derived
(S, N)bool tensor
- reset_pop_selection()[source]
Clear the population selection so all neurons are eligible again.
Mirror of
reset_stim_selection. Restoresself.Ito its empty default (interpreted as “no neuron-side restriction” by_selected()).
- select_neuron(neuron_index: int)[source]
Restrict the selection to a single neuron.
- Parameters:
neuron_index (
int) – Index into the full population, in[0, N_neurons).
- select_pop_by_nrn_attr(attribute_name: str, value)[source]
Select neurons whose
nrn_meta[attribute_name] == value.Neurons whose metadata dict does not contain
attribute_nameare silently skipped — this lets a single filter call work against a concatenated dataset that pools sources with different metadata schemas (e.g. AA1’sareais not present on AA4 neurons; callingselect_pop_by_nrn_attr("area", "Field_L")on the concatenation keeps only AA1 neurons in Field L, with noKeyError).- Parameters:
attribute_name (
str) – Key looked up in eachnrn_metadict.value – Required value for an exact (
==) match.
- Returns:
Indices of the selected neurons. Also stored in
self.I.- Return type:
listofint
See also
select_pop_by_nrn_predicatethreshold / range / compound queries.
- select_pop_by_nrn_predicate(predicate)[source]
Select neurons whose
nrn_metadict satisfiespredicate.More flexible than
select_pop_by_nrn_attr(): takes any callable that maps a singlenrn_metadict to a truthy / falsy value, so threshold queries on continuous attributes (”snr > 0.5”), range queries (”200 <= depth_um <= 800”) and compound conditions (”area in {'Field_L', 'MLd'} and auditory”) are expressible. The*_attrsiblings remain available for the equality-only case.Neurons whose predicate raises
KeyErrororTypeErrorare silently skipped — same convention asselect_pop_by_nrn_attr()so a single predicate works on a concatenated dataset whose sources carry heterogeneous metadata schemas. Note: this means a typo in the predicate (referencing a wrong key) will silently select no neurons rather than raising; usenrn.get(key, default)in the predicate for explicit-default semantics.- Parameters:
predicate (
callable(dict) -> bool) – Tested on eachnrn_meta[i]dict.- Returns:
Indices of selected neurons. Also stored in
self.I.- Return type:
list[int]
Examples
>>> ds.select_pop_by_nrn_predicate(lambda n: n.get("snr", 0) > 0.5) >>> ds.select_pop_by_nrn_predicate(lambda n: n["area"] in {"Field_L", "MLd"}) >>> ds.select_pop_by_nrn_predicate( ... lambda n: 200 <= n.get("depth_um", -1) <= 800 ... )
- select_pop_by_stim_attr(attribute_name: str, value)[source]
Select neurons with >=1 non-null response to stimuli matching a given attribute.
Looks up stimuli whose
stim_meta[attribute_name] == valueand keeps only neurons whosenrn_masksis True for at least one of them. Stims missing the key are silently skipped (same convention asselect_pop_by_nrn_attr()).- Parameters:
attribute_name (
str) – Key looked up in eachstim_metadict.value – Required value for an exact (
==) match.
- Returns:
Indices of the selected neurons. Also stored in
self.I.- Return type:
listofint
- select_pop_by_stim_predicate(predicate)[source]
Select neurons with >=1 non-null response to stimuli matching
predicate.Predicate variant of
select_pop_by_stim_attr(). Looks up stimuli whosestim_metadict satisfiespredicateand keeps only neurons whosenrn_masksis True for at least one of them. Stims whose predicate raisesKeyErrororTypeErrorare silently skipped — same forgiving convention asselect_pop_by_nrn_predicate().- Parameters:
predicate (
callable(dict) -> bool) – Tested on eachstim_meta[s]dict.- Returns:
Indices of selected neurons. Also stored in
self.I.- Return type:
list[int]
- select_population(neuron_indices)[source]
Restrict the selection to the listed neurons.
- Parameters:
neuron_indices (
sequenceofint) – Indices into the full population, each in[0, N_neurons).
- select_stim(stim_index: int)[source]
Restrict iteration to a single stimulus index.
Pairs with the bidirectional rule in
_selected(): cells whose only valid responses lie outside the selected stim are auto-hidden from__getitem__.- Parameters:
stim_index (
int) – Index into the stim space, in[0, S).
- select_stims(stim_indices)[source]
Restrict iteration to the listed stimulus indices.
- Parameters:
stim_indices (
sequenceofint) – Indices into the stim space, each in[0, S).
- select_stims_by_attr(attribute_name: str, value)[source]
Restrict iteration to stimuli matching
stim_meta[attr] == value.Stims whose metadata dict does not contain
attribute_nameare silently skipped — same convention asselect_pop_by_nrn_attr(), so a single call works on a concatenated dataset whose sources have heterogeneous stim metadata schemas.- Parameters:
attribute_name (
str) – Key looked up in eachstim_metadict.value – Required value for an exact (
==) match.
- Returns:
Indices of the selected stims. Also stored in
self.S_sel.- Return type:
listofint
- select_stims_by_predicate(predicate)[source]
Restrict iteration to stimuli whose
stim_metasatisfiespredicate.Predicate variant of
select_stims_by_attr(). Takes any callable mapping a singlestim_metadict to truthy / falsy, so threshold and compound queries on continuous attributes (”duration_s > 2.0”, “sample_rate >= 24000”) become expressible.Stims whose predicate raises
KeyErrororTypeErrorare silently skipped — same forgiving convention asselect_pop_by_nrn_predicate(). Usesm.get(key, default)in the predicate for explicit-default semantics.- Parameters:
predicate (
callable(dict) -> bool) – Tested on eachstim_meta[s]dict.- Returns:
Indices of selected stims. Also stored in
self.S_sel.- Return type:
list[int]
Examples
>>> ds.select_stims_by_predicate(lambda s: s.get("duration_s", 0) >= 2.0) >>> ds.select_stims_by_predicate(lambda s: s["type"] in {"song", "call"})
- smooth_responses(window_ms: float = 21.0) None[source]
Temporally smooth each non-NaN response in place with a Hanning window.
- Parameters:
window_ms (
float, default21.0) – Full width of the Hanning window in ms. Rounded to the nearest odd number ofself.dtbins.
Notes
Follows Hsu, Borst & Theunissen (2004) for reducing PSTH estimator variance — a common preprocessing step across spike-count datasets.
(1, 1)NaN-sentinel responses (neurons that did not hear a given stim) are preserved unchanged.
- standardize_stims(stim_indices: Sequence[int] | None = None, per_band: bool = True, eps: float = 1e-08) dict[source]
Standardize
self.stimsin place:(x − mean) / std.Statistics are computed over the stims selected by
stim_indices(typically train + validation indices) and applied to all stims in the dataset — so the held-out test stims are automatically transformed with the same train+val statistics, preventing leakage of test-set first-order moments into the standardisation while still ensuring train / val / test all live in the same standardised space.- Parameters:
stim_indices (
sequenceofint, optional) – Indices of stims to compute statistics from. If None (default), statistics are computed over all stims — equivalent to “no held-out test set”; useful for single-split exploratory analysis but introduces a tiny (first-order) leakage if a test set is held out downstream.per_band (
bool, defaultTrue) – If True, statistics are per-frequency-band (axis-2): mean / std are tensors of shape broadcastable to(C, F, 1). If False, a single scalar mean and std are computed over the whole concatenated stim tensor.eps (
float, default1e-8) – Floor onstdto avoid division by zero on constant bands.
- Returns:
{'mean': Tensor, 'std': Tensor, 'per_band': bool, 'stim_indices': list | None}— also stored onself.stim_normalizationfor inspection (e.g. to fold into a model kernel for STRF visualisation).- Return type:
dict
Notes
Not idempotent: calling twice double-standardizes. To re-do with different statistics, rebuild the dataset.