Note
Click here to download the full example code
NWB & Lazy-loading
Pynapple currently provides loaders for two data formats :
-
npz
with a special structure. You can check this notebook for a descrition of the methods for saving/loadingnpz
files.
This notebook focuses on the NWB format. Additionally it demonstrates the capabilities of pynapple for lazy-loading different formats.
Let's import libraries.
import numpy as np
import pynapple as nap
import os
import requests, math
import tqdm
import zipfile
Here we download the data.
project_path = "MyProject"
if project_path not in os.listdir("."):
r = requests.get(f"https://osf.io/a9n6r/download", stream=True)
block_size = 1024*1024
with open(project_path+".zip", 'wb') as f:
for data in tqdm.tqdm(r.iter_content(block_size), unit='MB', unit_scale=True,
total=math.ceil(int(r.headers.get('content-length', 0))//block_size)):
f.write(data)
with zipfile.ZipFile(project_path+".zip", 'r') as zip_ref:
zip_ref.extractall(".")
NWB
When loading a NWB file, pynapple will walk through it and test the compatibility of each data structure with a pynapple objects. If the data structure is incompatible, pynapple will ignore it. The class that deals with reading NWB file is nap.NWBFile
. You can pass the path to a NWB file or directly an opened NWB file. Alternatively you can use the function nap.load_file
.
Note
Creating the NWB file is outside the scope of pynapple. The NWB file used here has already been created before. Multiple tools exists to create NWB file automatically. You can check neuroconv, NWBGuide or even NWBmatic.
Out:
/mnt/home/gviejo/mambaforge/envs/pynapple/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.6.0-alpha because version 2.7.0 is already loaded.
return func(args[0], **pargs)
A2929-200711
┍━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┑
│ Keys │ Type │
┝━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━┥
│ units │ TsGroup │
│ position_time_support │ IntervalSet │
│ epochs │ IntervalSet │
│ z │ Tsd │
│ y │ Tsd │
│ x │ Tsd │
│ rz │ Tsd │
│ ry │ Tsd │
│ rx │ Tsd │
┕━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┙
Pynapple will give you a table with all the entries of the NWB file that are compatible with a pynapple object.
When parsing the NWB file, nothing is loaded. The NWBFile
keeps track of the position of the data within the NWB file with a key. You can see it with the attributes key_to_id
.
Out:
{'units': '70deb090-70e9-4c1d-ac9c-55abca5e41eb', 'position_time_support': '86c8ad5d-bcbd-48dc-a78b-c6004d7fd027', 'epochs': 'a4a46a1d-34b9-4436-9be1-cfb06f95fffe', 'z': 'a5762b8b-f500-491d-842f-2fe48f4239dd', 'y': 'a2a21e58-580a-41ef-82a3-62ee1661c72b', 'x': '7df5a354-2c36-4ee6-8562-5cbefe2ea7fd', 'rz': '0eff6663-f138-4537-af3e-4f1b1dcf0c4d', 'ry': 'dad1c314-4cee-4f15-9adf-aa447404aaca', 'rx': 'a57cb94a-0f84-4334-a23d-8f744014e3ab'}
Loading an entry will get pynapple to read the data.
Out:
Time (s)
---------- ---------
670.6407 -0.195725
670.649 -0.19511
670.65735 -0.194674
670.66565 -0.194342
...
1199.96995 -0.00237
1199.97825 -0.003156
1199.9866 -0.003821
1199.99495 -0.004435
dtype: float32, shape: (63527,)
Internally, the NWBClass
has replaced the pointer to the data with the actual data.
While it looks like pynapple has loaded the data, in fact it still did not. By default, calling the NWB object will return an HDF5 dataset.
Warning
New in 0.6.6
Out:
Notice that the time array is always loaded.
Out:
This is very useful in the case of large dataset that do not fit in memory. You can then get a chunk of the data that will actually be loaded.
Out:
Time (s)
---------- ---------
670.6407 -0.195725
670.649 -0.19511
670.65735 -0.194674
670.66565 -0.194342
...
679.9735 0.062756
679.98185 0.06277
679.99015 0.062819
679.9985 0.062878
dtype: float32, shape: (1124,)
Data are now loaded.
Out:
You can still apply any high level function of pynapple. For example here, we compute some tuning curves without preloading the dataset.
Out:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0.012548 2.841894 13.038401 3.308084 11.673396 6.045551 16.059311 7.589572 1.327709 3.002263 0.902544 3.986856 4.169603 0.578076 0.783199 0.126804
0.056355 7.510898 3.452621 10.658762 3.425617 8.972957 6.619774 33.125875 1.489063 2.762097 0.192884 5.180861 5.713220 3.128576 0.806255 0.262322
0.100161 0.000000 0.000000 0.000000 0.000000 6.667138 50.003534 0.000000 3.333569 3.333569 0.000000 10.000707 0.000000 0.000000 6.667138 0.000000
0.143968 0.000000 0.000000 0.000000 0.000000 0.000000 48.003393 0.000000 18.001272 0.000000 0.000000 0.000000 6.000424 0.000000 12.000848 0.000000
0.187774 0.000000 0.000000 0.000000 7.500530 0.000000 30.002121 0.000000 0.000000 7.500530 0.000000 15.001060 7.500530 0.000000 0.000000 0.000000
0.231581 0.000000 7.500530 0.000000 30.002121 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 7.500530 0.000000 0.000000 0.000000 0.000000
0.275387 0.000000 75.005301 7.500530 60.004241 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 7.500530 0.000000 0.000000 0.000000 0.000000
0.319194 0.000000 97.506892 0.000000 45.003181 0.000000 0.000000 0.000000 0.000000 15.001060 0.000000 0.000000 7.500530 7.500530 0.000000 0.000000
0.363000 0.000000 100.807125 0.000000 38.402714 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 9.600679 33.602375 0.000000 0.000000 0.000000
0.406807 0.000000 31.581179 1.263247 29.054685 0.000000 16.422213 0.000000 0.000000 0.000000 2.526494 2.526494 7.579483 0.000000 0.000000 0.000000
Warning
Carefulness should still apply when calling any pynapple function on a memory map. Pynapple does not implement any batching function internally. Calling a high level function of pynapple on a dataset that do not fit in memory will likely cause a memory error.
To change this behavior, you can pass lazy_loading=False
when instantiating the NWBClass
.
path = "MyProject/sub-A2929/A2929-200711/pynapplenwb/A2929-200711.nwb"
data = nap.NWBFile(path, lazy_loading=False)
z = data['z']
print(type(z.d))
Out:
/mnt/home/gviejo/mambaforge/envs/pynapple/lib/python3.11/site-packages/hdmf/utils.py:668: UserWarning: Ignoring cached namespace 'core' version 2.6.0-alpha because version 2.7.0 is already loaded.
return func(args[0], **pargs)
<class 'numpy.ndarray'>
Numpy memory map
In fact, pynapple can work with any type of memory map. Here we read a binary file with np.memmap
.
eeg_path = "MyProject/sub-A2929/A2929-200711/A2929-200711.eeg"
frequency = 1250 # Hz
n_channels = 16
f = open(eeg_path, 'rb')
startoffile = f.seek(0, 0)
endoffile = f.seek(0, 2)
f.close()
bytes_size = 2
n_samples = int((endoffile-startoffile)/n_channels/bytes_size)
duration = n_samples/frequency
interval = 1/frequency
fp = np.memmap(eeg_path, np.int16, 'r', shape = (n_samples, n_channels))
timestep = np.arange(0, n_samples)/frequency
print(type(fp))
Out:
Instantiating a pynapple TsdFrame
will keep the data as a memory map.
Out:
Time (s) 0 1 2 3 4 ...
---------- ---- ---- ---- ---- ---- -----
0.0 1003 836 1075 681 918 ...
0.0008 968 781 984 613 878 ...
0.0016 869 683 880 515 770 ...
0.0024 886 717 903 528 789 ...
...
1199.9928 -161 -175 -289 -183 -208 ...
1199.9936 -286 -320 -484 -323 -387 ...
1199.9944 -533 -448 -577 -403 -476 ...
1199.9952 -443 -334 -380 -266 -323 ...
dtype: int16, shape: (1499995, 16)
We can check the type of eeg.values
.
Out:
Zarr
It is also possible to use Higher level library like zarr also not directly.
import zarr
data = zarr.zeros((10000, 5), chunks=(1000, 5), dtype='i4')
timestep = np.arange(len(data))
tsdframe = nap.TsdFrame(t=timestep, d=data)
Out:
/mnt/home/gviejo/pynapple/pynapple/core/utils.py:196: UserWarning: Converting 'd' to numpy.array. The provided array was of type 'Array'.
warnings.warn(
As the warning suggest, data
is converted to numpy array.
Out:
To maintain a zarr array, you can change the argument load_array
to False.
Out:
Within pynapple, numpy memory map are recognized as numpy array while zarr array are not.
print(type(fp), "Is np.ndarray? ", isinstance(fp, np.ndarray))
print(type(data), "Is np.ndarray? ", isinstance(data, np.ndarray))
Out:
Similar to numpy memory map, you can use pynapple functions directly.
Out:
Time (s) 0 1 2 3 4
---------- --- --- --- --- ---
0.0 0 0 0 0 0
1.0 0 0 0 0 0
2.0 0 0 0 0 0
3.0 0 0 0 0 0
...
7.0 0 0 0 0 0
8.0 0 0 0 0 0
9.0 0 0 0 0 0
10.0 0 0 0 0 0
dtype: int32, shape: (11, 5)
group = nap.TsGroup({0:nap.Ts(t=[10, 20, 30])})
sta = nap.compute_event_trigger_average(group, tsdframe, 1, (-2, 3))
print(type(tsdframe.values))
print("\n")
print(sta)
Out:
<class 'zarr.core.Array'>
Time (s)
---------- -----------------
-2 [[0. ... 0.] ...]
-1 [[0. ... 0.] ...]
0 [[0. ... 0.] ...]
1 [[0. ... 0.] ...]
2 [[0. ... 0.] ...]
3 [[0. ... 0.] ...]
dtype: float64, shape: (6, 1, 5)
Total running time of the script: ( 0 minutes 1.091 seconds)
Download Python source code: tutorial_pynapple_nwb.py