---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.16.4
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

Null distributions to test spatial firing
============

Null distributions are ubiquitous in neuroscience. 
The brain can be incredibly noisy, and thus we need to use adequate statistical quantification to separate meaningful observations from random fluctuations.
While statistics can be daunting, most often designing the right statistical test comes down to devising a good null distribution.
A null distribution should reflect the expected behavior of the system under the assumption of no real effect, without introducing unnecessary assumptions or simplifications.
We can then compare our observations to such a null distribution to decide whether there is an effect.

If you want to learn the in's and out's of statistical analysis of neural data, we'd advise picking up a [book](https://link.springer.com/content/pdf/10.1007/978-1-4614-9602-1.pdf) about it!

In this tutorial, we will use Pynapple's randomization module to generate various null distributions to test whether the activity of neurons is modulated by position.

```{code-cell} ipython3
:tags: [remove-output]
# we'll import the packages we're going to use
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
import pynapple as nap
import seaborn as sns
from scipy.signal import correlate2d
from scipy.ndimage import (
    maximum_filter,
    label,
    center_of_mass,
    rotate,
    gaussian_filter,
)
import xarray as xr
from pathlib import Path
from pynwb import NWBHDF5IO

from dandi.dandiapi import DandiAPIClient
import fsspec
from fsspec.implementations.cached import CachingFileSystem
import h5py

# some configuration, you can ignore this
custom_params = {"axes.spines.right": False, "axes.spines.top": False}
sns.set_theme(style="ticks", palette="colorblind", font_scale=1.5, rc=custom_params)
xr.set_options(display_expand_attrs=False)
```

***
Downloading data from DANDI
---------------------------
We will start by downloading some data off of DANDI.
To do so, we need a dandiset ID, and a file path.
In this tutorial we will use data from [dandiset 000582](https://dandiarchive.org/dandiset/000582/draft), which was published in:  
*Sargolini, Francesca, et al. “Conjunctive representation of position, direction, and velocity in entorhinal cortex.” Science 312.5774 (2006): 758-762*

This dataset contains electrophysiology recordings from neurons in the entorhinal cortex, which are known to modulate their firing with position.

```{code-cell} ipython3
:tags: [remove-output]
dandiset = "000582"
session = "sub-10884/sub-10884_ses-03080402_behavior+ecephys.nwb"

with DandiAPIClient() as client:
    asset = client.get_dandiset(dandiset, "draft").get_asset_by_path(session)
    s3_url = asset.get_content_url(follow_redirects=1, strip_query=True)
fs = CachingFileSystem(
    fs=fsspec.filesystem("http"),
    cache_storage=str(Path("~/.caches/nwb-cache").expanduser()),
)
io = NWBHDF5IO(file=h5py.File(fs.open(s3_url, "rb")), load_namespaces=True)
nwb = nap.NWBFile(io.read(), lazy_loading=False)

units = nwb["units"]
position = nwb["SpatialSeriesLED1"]
```


***
Spatial tuning curves
---------------------
We can then use [`compute_tuning_curves`](pynapple.process.tuning_curves.compute_tuning_curves) to compute the tuning curve for each neuron's spikes with respect to the position of the animal:
```{code-cell} ipython3
tuning_curves = nap.compute_tuning_curves(
    data=units,
    features=position,
    feature_names=["X", "Y"],
    range=[(-50, 50), (-50, 50)],
    bins=40,
)

# optional smoothing of tuning curves
def gaussian_filter_nan(a, sigma):
    v = np.where(np.isnan(a), 0, a)
    w = gaussian_filter((~np.isnan(a)).astype(float), sigma)
    return gaussian_filter(v, sigma) / w
tuning_curves_smooth = tuning_curves.copy()
tuning_curves_smooth.values = gaussian_filter_nan(tuning_curves.values, sigma=(0, 2, 2))

# Setting some xarray.DataArray attributes, for beauty
tuning_curves_smooth.name = "firing rate"
tuning_curves_smooth.attrs["unit"] = "Hz"
tuning_curves_smooth.coords["X"].attrs["unit"] = "cm"
tuning_curves_smooth.coords["Y"].attrs["unit"] = "cm"
tuning_curves
```

We can further also compute our metric of interest.
In this case, we are interested in the modulation of the neurons' firing by position. 
A metric that is typically used to quantify this is the mutual information, often called [spatial information](https://proceedings.neurips.cc/paper/1992/hash/5dd9db5e033da9c6fb5ba83c7a7ebea9-Abstract.html) for this case specifically:
```{code-cell} ipython3
spatial_information = nap.compute_mutual_information(tuning_curves)
spatial_information
```

We can visualize all that together:
```{code-cell} ipython3
g = tuning_curves_smooth.plot(col="unit")
for ax, (unit_idx, unit_id) in zip(g.axs.flat, enumerate(units)):
    ax.set_title(f"unit={unit_id}\nSI={spatial_information['bits/spike'][unit_idx]:.2f}")
plt.show()
```
Some of these units seem to have clear spatial firing (e.g. unit 3), with localized firing fields and relatively high spatial information values.
However, for units like unit 0 we can not say for certain that the pattern we see is not just caused by random activity.

***
Randomization
-------------
Next, we can start thinking about null distributions.
Our ultimate goal is to be able to say that a neuron's activity is significantly modulated by position.
To do that, we want to contrast it to the activity of a neuron that is not modulated by position at all.
But how do we pick that neuron? 
We do not want to stray away from the firing patterns of our neuron of interest too much, so we can't just pick something arbitrary.

In practice, we will often take the activity of our neuron of interest and shuffle it somehow, thereby breaking the relation with our variable of interest (position), but keeping its general firing statistics as much as possible.

Pynapple provides four methods for such shuffling:
- [`jitter_timestamps`](pynapple.process.randomize.jitter_timestamps): jitters timestamps independently by random amounts uniformly drawn from a given range.
- [`resample_timestamps`](pynapple.process.randomize.resample_timestamps): uniformly resamples all timestamps within the time support.
- [`shift_timestamps`](pynapple.process.randomize.shift_timestamps): shifts all timestamps by a random amount drawn from a given range, wrapping the end of the time support to the beginning.
- [`shuffle_ts_intervals`](pynapple.process.randomize.shuffle_ts_intervals): randomizes timestamps by shuffling the intervals between them.

Let's apply them to an example neuron:
```{code-cell} ipython3
jitter = nap.jitter_timestamps(units[3], max_jitter=2.0)
resample = nap.resample_timestamps(units[3])
shift = nap.shift_timestamps(units[3], min_shift=10.0)
interval_shuffle = nap.shuffle_ts_intervals(units[3])
```
and plot them to look at their effect on spatial firing:

```{code-cell} ipython3
fig, axs = plt.subplots(3, 5, figsize=(10, 6))
for (ax_spikes, ax_tc, ax_corr), (randomization_type, data) in zip(
    axs.T,
    [
        ("true", units[3]),
        ("jitter", jitter),
        ("resample", resample),
        ("shift", shift),
        ("interval_shuffle", interval_shuffle),
    ],
):
    # Plot spike positions
    ax_spikes.plot(position[:, 0], position[:, 1], color="black", linewidth=0.5, zorder=-1)
    positional_spikes = data.value_from(position)
    ax_spikes.scatter(
        positional_spikes[:, 1],
        positional_spikes[:, 0],
        color="red",
        s=7,
        edgecolor="none",
        zorder=1
    )
    ax_spikes.yaxis.set_inverted(True)
    ax_spikes.axis("off")
    ax_spikes.set_title(randomization_type.replace("_", "\n"))

    # Plot tuning curve
    tuning_curve = nap.compute_tuning_curves(
        data,
        features=position,
        feature_names=["X", "Y"],
        range=[(-50, 50), (-50, 50)],
        bins=40
    )
    tuning_curve.values = gaussian_filter_nan(tuning_curve.values, sigma=(0, 2, 2))
    ax_tc.imshow(tuning_curve.values[0], cmap="viridis", extent=(-50, 50, -50, 50))
    ax_tc.axis("off")

    # Plot 2D autocorrelation
    ax_corr.imshow(correlate2d(tuning_curve[0], tuning_curve[0]), cmap="viridis")
    ax_corr.axis("off")
fig.text(0.09, 0.44, "tuning\ncurve", rotation=90, ha="center")
fig.text(0.09, 0.12, "auto\ncorrelation", rotation=90, ha="center")
plt.show()
```
We are also visualizing the [two-dimensional autocorrelation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.correlate2d.html)
of the tuning curves here to highlight the breaking/maintaining of patterns.  

Each of these methods seem to have different effects:
- **Jittering**: does exactly that, the jitter in time also results in a jitter in position. 
  We set the maximal jitter to 2 seconds, which likely matches the timescale of the movements of the animal.
  The tuning curve and autocorrelation have not changed a lot, meaning we have not broken the relation between spiking and behavior. 
  Thus, jittering might not be a good randomization method in this case, unless we increase the maximal jitter.
- **Resampling**: gives us a uniform spread in position, the relation between spiking and behaviour is gone entirely.
- **Shifting**: has the advantage of keeping the temporal relations between spikes, while only breaking the relation with position.
- **Shuffling interspike intervals**: looks similar to shifting, it keeps the overall interspike interval distribution.

***
Null distributions
------------------
We will opt for random shifts.
For each neuron, we will generate `N` pseudo-neurons and compute the mutual/spatial information.
The resulting distributions of values are what we call the null distributions.
```{code-cell} ipython3
N = 500
shuffles = [nap.shift_timestamps(units, min_shift=20.0) for _ in range(N)]

null_distributions = np.stack(
    [
        nap.compute_mutual_information(
            nap.compute_tuning_curves(
                data=shuffle,
                features=position,
                range=[(-50, 50), (-50, 50)],
                bins=40,
            )
        )["bits/spike"].values
        for shuffle in shuffles
    ],
    axis=1,
)
null_distributions.shape
```

***
Testing
------------------
Ultimately, we will apply the test.
From a statistical view, we see the null distribution as an empirical distribution of spatial information values that would be expected if the neuron’s firing were unrelated to spatial position.
We want to determine how unlikely it is that the observed spatial information arose from our null distribution.

Given a significance level, α, we assess whether the observed value lies in the extreme tail of the null distribution.
If the probability (the p-value) of obtaining a spatial information value **as large as or larger than** the observed one is less than α, we reject the null hypothesis:

```{code-cell} ipython3
alpha = 0.01 # significance level
p_values = np.sum(
    null_distributions.T >= spatial_information['bits/spike'].values, 
    axis=0
    ) / null_distributions.shape[1]
spatial_units = tuning_curves.coords["unit"][p_values > alpha]
```

We can further also compute the exact spatial information values corresponding to our alpha:
```{code-cell} ipython3
thresholds = np.nanpercentile(null_distributions, (1-alpha)*100, axis=1)
```

Finally, we can visualize everything together:
```{code-cell} ipython3
g = tuning_curves_smooth.plot(col="unit")
for ax, (unit_idx, unit_id) in zip(g.axs.flat, enumerate(units)):
    null = null_distributions[unit_idx]
    score = spatial_information['bits/spike'][unit_idx]
    pval = p_values[unit_idx]
    
    ax.set_title(f"unit={unit_id}\nSI={score:.2f}\np={pval:.6f}{'*' if score>thresholds[unit_idx] else ''}")

    ax_hist = inset_axes(ax, width="100%", height="40%", loc="lower center",
                         bbox_to_anchor=(0, -1, 1, 1), bbox_transform=ax.transAxes, borderpad=0)
    
    ax_hist.hist(null, histtype="stepfilled", edgecolor="none", bins=30)
    ax_hist.axvline(thresholds[unit_idx], color="black")
    ax_hist.axvline(score, color="red", linestyle="--")
    ax_hist.yaxis.set_visible(False)
    ax_hist.spines['left'].set_visible(False)
    if unit_idx == 0:
        ax_hist.set_xlabel("spatial\ninformation\n(bits/spike)")
plt.show()
```
We can see that all our units are significantly modulated by position, some more so than others. 
By visualizing the null distributions (blue histograms), the percentile thresholds (black lines), and the actual values (dashed red lines), we can get more nuance.
We now also have approximate p-values to report.

<!-- #region -->
:::{card}
Authors
^^^
Wolf De Wulf
:::
<!-- #endregion -->