Introduction to pynapple#
The goal of this tutorial is to quickly learn enough about pynapple to get started with data analysis. This tutorial assumes familiarity with the basics functionalities of numpy.
You can check how to install pynapple here.
Important
By default, pynapple will assume a time units in seconds when passing timestamps array or time parameters such as bin size (unless specified with the time_units
argument)
Importing pynapple#
The convention is to import pynapple with a namespace:
import pynapple as nap
Show code cell content
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
custom_params = {"axes.spines.right": False, "axes.spines.top": False}
sns.set_theme(style="ticks", palette="colorblind", font_scale=1.5, rc=custom_params)
Instantiating pynapple objects#
nap.Tsd
: 1-dimensional time serie#
If you have a 1-dimensional time series, you use the nap.Tsd
object. The arguments t
and d
are the arguments for timestamps and data.
tsd = nap.Tsd(t=np.arange(100), d=np.random.rand(100))
print(tsd)
Time (s)
---------- ----------
0.0 0.710534
1.0 0.451926
2.0 0.42989
3.0 0.136094
4.0 0.455707
5.0 0.00984743
6.0 0.349025
...
93.0 0.242343
94.0 0.747614
95.0 0.305528
96.0 0.876093
97.0 0.662669
98.0 0.934741
99.0 0.676959
dtype: float64, shape: (100,)
nap.TsdFrame
: 2-dimensional time series#
If you have a 2-dimensional time series, you use the nap.TsdFrame
object. The arguments t
and d
are the arguments for timestamps and data. You can add the argument columns
to label each columns.
tsdframe = nap.TsdFrame(
t=np.arange(100), d=np.random.rand(100, 3), columns=["a", "b", "c"]
)
print(tsdframe)
Time (s) a b c
---------- ------- ------- -------
0.0 0.38741 0.40025 0.39976
1.0 0.2831 0.12855 0.77292
2.0 0.41067 0.02348 0.3287
3.0 0.44481 0.63053 0.0731
4.0 0.98673 0.69324 0.19789
5.0 0.14002 0.00886 0.21845
6.0 0.99148 0.0529 0.29856
...
93.0 0.38078 0.28268 0.5801
94.0 0.27425 0.48065 0.70444
95.0 0.71903 0.17434 0.26413
96.0 0.92325 0.03991 0.87744
97.0 0.32048 0.22797 0.89867
98.0 0.73399 0.051 0.8617
99.0 0.56563 0.28941 0.5132
dtype: float64, shape: (100, 3)
nap.TsdTensor
: n-dimensionals time series#
If you have larger than 2 dimensions time series (typically movies), you use the nap.TsdTensor
object . The arguments t
and d
are the arguments for timestamps and data.
tsdtensor = nap.TsdTensor(
t=np.arange(100), d=np.random.rand(100, 3, 4)
)
print(tsdtensor)
Time (s)
---------- -----------------------------
0.0 [[0.234497 ... 0.575374] ...]
1.0 [[0.724289 ... 0.384916] ...]
2.0 [[0.036109 ... 0.124581] ...]
3.0 [[0.705208 ... 0.385647] ...]
4.0 [[0.893765 ... 0.720703] ...]
5.0 [[0.118928 ... 0.230826] ...]
6.0 [[0.339164 ... 0.141345] ...]
...
93.0 [[0.225432 ... 0.009658] ...]
94.0 [[0.995422 ... 0.474253] ...]
95.0 [[0.354843 ... 0.372272] ...]
96.0 [[0.673888 ... 0.396115] ...]
97.0 [[0.159119 ... 0.565946] ...]
98.0 [[0.129401 ... 0.780345] ...]
99.0 [[0.796631 ... 0.246424] ...]
dtype: float64, shape: (100, 3, 4)
nap.IntervalSet
: intervals#
The IntervalSet
object stores multiple epochs with a common time unit in a table format. The epochs are strictly non-overlapping. Both start
and end
arguments are necessary.
epochs = nap.IntervalSet(start=[0, 10], end=[5, 15])
print(epochs)
start end
0 0 5
1 10 15
shape: (2, 2), time unit: sec.
nap.Ts
: timestamps#
Ts
object stores timestamps data (typically spike times or reward times). The argument t
for timestamps is necessary.
ts = nap.Ts(t=np.sort(np.random.uniform(0, 100, 10)))
print(ts)
Time (s)
1.382846402
12.868862127
15.532096866
21.06465528
23.430290334
42.604321878
64.699475227
80.079956014
98.109809161
99.728955778
shape: 10
nap.TsGroup
: group of timestamps#
TsGroup
is a dictionnary that stores multiple time series with different time stamps (.i.e. a group of neurons with different spike times from one session). The first argument data
can be a dictionnary of Ts
, Tsd
or numpy 1d array.
data = {
0: nap.Ts(t=np.sort(np.random.uniform(0, 100, 1000))),
1: nap.Ts(t=np.sort(np.random.uniform(0, 100, 2000))),
2: nap.Ts(t=np.sort(np.random.uniform(0, 100, 3000))),
}
tsgroup = nap.TsGroup(data)
print(tsgroup, "\n")
Index rate
------- -------
0 10.0091
1 20.0182
2 30.0273
Interaction between pynapple objects#
Time support : attribute of time series#
A key feature of how pynapple manipulates time series is an inherent time support object defined for Ts
, Tsd
, TsdFrame
and TsGroup
objects. The time support object is defined as an IntervalSet
that provides the time serie with a context. For example, the restrict operation will automatically update the time support object for the new time series. Ideally, the time support object should be defined for all time series when instantiating them. If no time series is given, the time support is inferred from the start and end of the time series.
In this example, a Tsd
is instantiated with and without a time support of intervals 0 to 5 seconds. Notice how the shape of the Tsd
varies.
time_support = nap.IntervalSet(start=0, end=2)
print(time_support)
start end
0 0 2
shape: (1, 2), time unit: sec.
Without time support :
print(nap.Tsd(t=[0, 1, 2, 3, 4], d=[0, 1, 2, 3, 4]))
Time (s)
---------- --
0 0
1 1
2 2
3 3
4 4
dtype: int64, shape: (5,)
With time support :
print(
nap.Tsd(
t=[0, 1, 2, 3, 4], d=[0, 1, 2, 3, 4],
time_support = time_support
)
)
Time (s)
---------- --
0 0
1 1
2 2
dtype: int64, shape: (3,)
The time support object has become an attribute of the time series. Depending on the operation applied to the time series, it will be updated.
tsd = nap.Tsd(
t=np.arange(10), d=np.random.randn(10),
time_support = time_support
)
print(tsd.time_support)
start end
0 0 2
shape: (1, 2), time unit: sec.
Restricting time series to epochs#
The central function of pynapple is the restrict
method of Ts
, Tsd
, TsdFrame
and TsGroup
. The argument is an IntervalSet
object. Only time points within the intervals of the IntervalSet
object are returned. The time support of the time series is updated accordingly.
tsd = nap.Tsd(t=np.arange(10), d=np.random.randn(10))
ep = nap.IntervalSet(start=[0, 7], end=[3.5, 12.4])
print(ep)
start end
0 0 3.5
1 7 12.4
shape: (2, 2), time unit: sec.
From :
print(tsd)
Time (s)
---------- ----------
0 1.46394
1 -2.80092
2 0.541031
3 -1.72587
4 0.164169
5 0.330627
6 0.0936461
7 -2.11306
8 1.43268
9 -1.13819
dtype: float64, shape: (10,)
to :
new_tsd = tsd.restrict(ep)
print(new_tsd)
Time (s)
---------- ---------
0 1.46394
1 -2.80092
2 0.541031
3 -1.72587
7 -2.11306
8 1.43268
9 -1.13819
dtype: float64, shape: (7,)
Numpy & pynapple#
Pynapple relies on numpy to store the data. Pynapple objects behave very similarly to numpy and numpy functions can be applied directly
tsdtensor = nap.TsdTensor(t=np.arange(100), d=np.random.rand(100, 3, 4))
If a numpy function preserves the time axis, a pynapple object is returned.
In this example, averaging a TsdTensor
along the second dimension returns a TsdFrame
:
print(
np.mean(tsdtensor, 1)
)
Time (s) 0 1 2 3
---------- ------- ------- ------- -------
0.0 0.33013 0.4627 0.73006 0.61229
1.0 0.59123 0.32815 0.49654 0.57221
2.0 0.29982 0.51111 0.48429 0.56353
3.0 0.40017 0.45744 0.80144 0.32528
4.0 0.36662 0.53841 0.70089 0.53777
5.0 0.62258 0.26711 0.26807 0.59745
6.0 0.61172 0.23249 0.3372 0.30009
...
93.0 0.81727 0.41856 0.65856 0.6714
94.0 0.77032 0.24557 0.41969 0.34449
95.0 0.67379 0.43365 0.2186 0.29866
96.0 0.42773 0.80362 0.47628 0.66186
97.0 0.54201 0.27513 0.52487 0.59144
98.0 0.48353 0.35594 0.53852 0.39497
99.0 0.73219 0.59482 0.76217 0.57961
dtype: float64, shape: (100, 4)
Averaging along the time axis will return a numpy array object:
print(
np.mean(tsdtensor, 0)
)
[[0.52766686 0.48547596 0.50788029 0.48104031]
[0.47919194 0.49809763 0.42413034 0.53757321]
[0.47441644 0.52088139 0.49983902 0.48014173]]
Slicing objects#
Slicing time series and intervals#
Ts
, Tsd
, TsdFrame
, TsdTensor
and IntervalSet
can be sliced similar to numpy array:
tsdframe = nap.TsdFrame(t=np.arange(10)/10, d=np.random.randn(10,4))
print(tsdframe)
Time (s) 0 1 2 3
---------- -------- -------- -------- --------
0 -0.39591 -0.60141 -0.86688 2.5233
0.1 0.25785 -1.33609 0.47324 0.5882
0.2 -0.85842 -0.71523 -0.38574 0.39879
0.3 0.38997 2.03509 0.50608 0.26818
0.4 -0.15638 -1.30329 -0.42698 -0.79085
0.5 2.22772 -1.08029 1.22277 -1.35992
0.6 1.34214 -1.34964 0.50856 0.55342
0.7 0.54623 -1.07726 1.51729 -1.49546
0.8 1.89048 -0.00772 0.30488 -0.57731
0.9 1.51653 -0.51721 -0.95661 0.92173
dtype: float64, shape: (10, 4)
print(tsdframe[4:7])
Time (s) 0 1 2 3
---------- -------- -------- -------- --------
0.4 -0.15638 -1.30329 -0.42698 -0.79085
0.5 2.22772 -1.08029 1.22277 -1.35992
0.6 1.34214 -1.34964 0.50856 0.55342
dtype: float64, shape: (3, 4)
print(tsdframe[:,0])
Time (s)
---------- ---------
0 -0.395906
0.1 0.257853
0.2 -0.858416
0.3 0.389966
0.4 -0.156377
0.5 2.22772
0.6 1.34214
0.7 0.546231
0.8 1.89048
0.9 1.51653
dtype: float64, shape: (10,)
ep = nap.IntervalSet(start=[0, 10, 20], end=[4, 15, 25])
print(ep)
start end
0 0 4
1 10 15
2 20 25
shape: (3, 2), time unit: sec.
print(ep[0:2])
start end
0 0 4
1 10 15
shape: (2, 2), time unit: sec.
print(ep[1])
start end
0 10 15
shape: (1, 2), time unit: sec.
Slicing TsGroup#
TsGroup
object can be indexed to return directly the timestamp object or sliced to return a new TsGroup
.
Indexing :
print(tsgroup[0], "\n")
Time (s)
0.033461938
0.035438823
0.159498739
0.306561755
0.439499861
0.622941105
0.647758662
...
99.147117633
99.158925088
99.252197591
99.59365335
99.750737203
99.788755465
99.833181822
shape: 1000
Slicing :
print(tsgroup[[0, 2]])
Index rate
------- -------
0 10.0091
2 30.0273
Core functions#
Objects have methods that can help transform and refine time series. This is a non exhaustive list.
Binning : counting events#
Time series objects have the count
method that count the number of timestamps. This is typically used when counting number of spikes within a particular bin over multiple intervals. The returned object is a Tsd
or TsdFrame
with the timestamps being the center of the bins.
count = tsgroup.count(1)
print(count)
Time (s) 0 1 2
------------ --- --- ---
0.504847274 9 23 29
1.504847274 12 23 24
2.504847274 13 16 42
3.504847274 10 19 22
4.504847274 10 15 26
5.504847274 12 21 28
6.504847274 8 21 28
...
93.504847274 14 21 26
94.504847274 8 21 29
95.504847274 8 18 29
96.504847274 15 15 29
97.504847274 13 22 26
98.504847274 8 16 23
99.504847274 8 15 33
dtype: float64, shape: (100, 3)
Thresholding#
Some time series have specific methods. The threshold
method of Tsd
returns a new Tsd
with all the data above or below a given value.
tsd = nap.Tsd(t=np.arange(10), d=np.random.rand(10))
print(tsd)
print(tsd.threshold(0.5))
Time (s)
---------- ---------
0 0.0278595
1 0.96828
2 0.731412
3 0.0266841
4 0.54859
5 0.452376
6 0.559601
7 0.965966
8 0.355655
9 0.552187
dtype: float64, shape: (10,)
Time (s)
---------- --------
1 0.96828
2 0.731412
4 0.54859
6 0.559601
7 0.965966
9 0.552187
dtype: float64, shape: (6,)
An important aspect of the tresholding is that the time support get updated based on the time points remaining. To get the epochs above/below a certain threshold, you can access the time support of the returned object.
print(tsd.time_support)
print(tsd.threshold(0.5, "below").time_support)
start end
0 0 9
shape: (1, 2), time unit: sec.
start end
0 0 0.5
1 2.5 3.5
2 4.5 5.5
3 7.5 8.5
shape: (4, 2), time unit: sec.
Time-bin averaging of data#
Many analyses requires to bring time series to the same rates and same dimensions. A quick way to downsample a time series to match in size for example a count array is to bin average. The bin_average
method takes a bin size in unit of time.
tsdframe = nap.TsdFrame(t=np.arange(0, 100)/10, d=np.random.randn(100,3))
print(tsdframe)
Time (s) 0 1 2
---------- -------- -------- --------
0.0 0.43367 1.38706 0.21329
0.1 1.79608 0.25768 -0.04883
0.2 1.6077 2.68275 -0.38209
0.3 -0.76207 2.27212 1.21925
0.4 0.262 0.8983 2.24452
0.5 0.99237 -1.0853 1.41603
0.6 -1.09002 -1.6514 0.31946
...
9.3 1.00751 -0.6393 -0.95774
9.4 0.52662 -0.56586 -0.03753
9.5 -0.78718 -2.43421 0.74212
9.6 0.04621 -0.00157 0.74224
9.7 -0.2099 -0.45933 -1.06439
9.8 0.72455 0.36496 -1.41628
9.9 -1.35717 -0.86257 -0.36837
dtype: float64, shape: (100, 3)
Here we go from a timepoint every 100ms to a timepoint every second.
print(tsdframe.bin_average(1))
Time (s) 0 1 2
---------- -------- -------- --------
0.5 0.23526 0.4733 0.40567
1.5 0.16819 -0.15535 -0.68163
2.5 0.1132 0.1296 0.45998
3.5 -0.09965 -0.01567 -0.01143
4.5 -0.11681 -0.3322 0.03363
5.5 0.27711 -0.25568 0.12754
6.5 -0.29875 -0.00513 -0.04388
7.5 0.00548 -0.3909 -0.82779
8.5 -0.05084 -0.27196 0.39056
9.5 -0.35515 -0.56115 -0.36527
dtype: float64, shape: (10, 3)
Loading data#
See here for more details about loading data.
Loading NWB#
Pynapple supports by default the NWB standard.
NWB files can be loaded with :
nwb = nap.load_file("path/to/my.nwb")
or directly with the NWBFile
class:
nwb = nap.NWBFile("path/to/my.nwb")
print(nwb)
my.nwb
┍━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┑
│ Keys │ Type │
┝━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━┥
│ units │ TsGroup │
┕━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┙
The returned object behaves like a dictionnary. The first column indicates the keys. The second column indicate the object type.
print(nwb['units'])
Index rate location group
------- ------ ---------- -------
0 1.0 brain 0
1 1.0 brain 0
2 1.0 brain 0
Overview of advanced analysis#
The process
module of pynapple contains submodules that group methods that can be applied for high level analysis. All of the method are directly available from the nap
namespace.
Important
Some functions have been doubled given the nature of the data. For instance, computing a 1d tuning curves from spiking activity requires the nap.compute_1d_tuning_curves
. The same function for calcium imaging data which is a continuous time series is available with nap.compute_1d_tuning_curves_continuous
.
This module computes correlograms of discrete events, for example the cross-correlograms of a population of neurons.
The decoding module perfoms bayesian decoding given a set of tuning curves and a TsGroup
.
Bandpass, lowpass, highpass or bandstop filtering can be done to any time series using either Butterworth filter or windowed-sinc convolution.
The perievent module has a set of functions to center time series and timestamps data around a particular events.
The randomize module holds multiple technique to shuffle timestamps in order to create surrogate datasets.
The spectrum module contains the methods to return the (mean) power spectral density of a time series.
Tuning curves of neurons based on spiking or calcium activity can be computed.
The wavelets module performs Morlet wavelets decomposition of a time series.