Input-output & lazy-loading#

Pynapple provides loaders for NWB format.

Each pynapple objects can be saved as a npz with a special structure and loaded as a npz.

In addition, the Folder class helps you walk through a set of nested folders to load/save npz/nwb files.

NWB#

When loading a NWB file, pynapple will walk through it and test the compatibility of each data structure with a pynapple objects. If the data structure is incompatible, pynapple will ignore it. The class that deals with reading NWB file is nap.NWBFile. You can pass the path to a NWB file or directly an opened NWB file. Alternatively you can use the function nap.load_file.

Note

Creating the NWB file is outside the scope of pynapple. The NWB files used here have already been created before. Multiple tools exists to create NWB file automatically. You can check neuroconv, NWBGuide or even NWBmatic.

data = nap.load_file(nwb_path)

print(data)

/home/runner/.local/lib/python3.12/site-packages/hdmf/spec/namespace.py:583: UserWarning: Ignoring the following cached namespace(s) because another version is already loaded:
core - cached version: 2.4.0, loaded version: 2.8.0
The loaded extension(s) may not be compatible with the cached extension(s) in the file. Please check the extension documentation and ignore this warning if these versions are compatible.
  self.warn_for_ignored_namespaces(ignored_namespaces)

A2929-200711
┍━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┑
│ Keys                  │ Type        │
┝━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━┥
│ units                 │ TsGroup     │
│ position_time_support │ IntervalSet │
│ epochs                │ IntervalSet │
│ z                     │ Tsd         │
│ y                     │ Tsd         │
│ x                     │ Tsd         │
│ rz                    │ Tsd         │
│ ry                    │ Tsd         │
│ rx                    │ Tsd         │
┕━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┙

Pynapple will give you a table with all the entries of the NWB file that are compatible with a pynapple object. When parsing the NWB file, nothing is loaded. The NWBFile class keeps track of the position of the data within the NWB file with a key. You can see it with the attributes key_to_id.

data.key_to_id

{'units': 'de078912-a093-4e6b-b23c-8647885a7188',
 'position_time_support': '69ab7512-2d0a-45ba-af61-fae750c37f53',
 'epochs': '2cca3914-ede4-41b8-b500-502465cbed95',
 'z': 'e2142cf8-59d9-4637-b55d-796871c38e47',
 'y': 'c5a8bba7-2699-4450-aae1-6f5a05309828',
 'x': '2ba4ba5e-c6dc-4b9c-b69c-d0763240df46',
 'rz': '682f8edf-91cb-4c15-bdf2-5cc5327803f0',
 'ry': '57874425-e60e-4fe1-8bd3-7cb06d2adbfc',
 'rx': '48597919-cce7-4724-9f72-db6e164c3c3f'}

Loading an entry will get pynapple to read the data.

z = data['z']

print(data['z'])

Time (s)
----------  ---------
6407    -0.195725
649     -0.19511
65735   -0.194674
66565   -0.194342
674     -0.194059
68235   -0.193886
69065   -0.193676
...
94495   0.000398
95325  -0.000552
9616   -0.001479
96995  -0.00237
97825  -0.003156
9866   -0.003821
99495  -0.004435
dtype: float64, shape: (63527,)

Internally, the NWBClass has replaced the pointer to the data with the actual data.

While it looks like pynapple has loaded the data, in fact it did not. By default, calling the NWB object will return an HDF5 dataset.

print(type(z.values))

<class 'h5py._hl.dataset.Dataset'>

Notice that the time array is always loaded.

print(type(z.index.values))

<class 'numpy.ndarray'>

This is very useful in the case of large dataset that do not fit in memory. You can then get a chunk of the data that will actually be loaded.

z_chunk = z.get(670, 680) # getting 10s of data.

print(z_chunk)

Time (s)
----------  ---------
6407    -0.195725
649     -0.19511
65735   -0.194674
66565   -0.194342
674     -0.194059
68235   -0.193886
69065   -0.193676
...
9485     0.062836
95685    0.062831
96515    0.062789
9735     0.062756
98185    0.06277
99015    0.062819
9985     0.062878
dtype: float64, shape: (1124,)

Data are now loaded.

print(type(z_chunk.values))

<class 'numpy.ndarray'>

You can still apply any high level function of pynapple. For example here, we compute some tuning curves without preloading the dataset.

tc = nap.compute_1d_tuning_curves(data['units'], data['y'], 10)

Warning

Carefulness should still apply when calling any pynapple function on a memory map. Pynapple does not implement any batching function internally. Calling a high level function of pynapple on a dataset that do not fit in memory will likely cause a memory error.

To change this behavior, you can pass lazy_loading=False when instantiating the NWBClass.

data = nap.NWBFile(nwb_path, lazy_loading=False)

z = data['z']

print(type(z.d))

/home/runner/.local/lib/python3.12/site-packages/hdmf/spec/namespace.py:583: UserWarning: Ignoring the following cached namespace(s) because another version is already loaded:
core - cached version: 2.4.0, loaded version: 2.8.0
The loaded extension(s) may not be compatible with the cached extension(s) in the file. Please check the extension documentation and ignore this warning if these versions are compatible.
  self.warn_for_ignored_namespaces(ignored_namespaces)

<class 'numpy.ndarray'>

Saving as NPZ#

Pynapple objects have save methods to save them as npz files.

tsd = nap.Tsd(t=np.arange(10), d=np.arange(10))
tsd.save("my_tsd.npz")

print(nap.load_file("my_tsd.npz"))

Time (s)
----------  --
          0
          1
          2
          3
          4
          5
          6
          7
          8
          9
dtype: int64, shape: (10,)

To load a NPZ to pynapple, it must contain particular set of keys.

print(np.load("my_tsd.npz"))

NpzFile 'my_tsd.npz' with keys: t, d, start, end, type

When the pynapple object have metadata, they are added to the NPZ file.

tsgroup = nap.TsGroup({
    0:nap.Ts(t=[0,1,2]),
    1:nap.Ts(t=[0,1,2])
    }, metadata={"my_label":["a", "b"]})
tsgroup.save("group")

print(np.load("group.npz", allow_pickle=True))

NpzFile 'group.npz' with keys: type, _metadata, t, index, keys...

By default, they are added within the _metadata key:

print(dict(np.load("group.npz", allow_pickle=True))["_metadata"])

{'my_label': array(['a', 'b'], dtype='<U1')}

Memory map#

Numpy memory map#

Pynapple can work with numpy.memmap.

print(type(data))

<class 'numpy.memmap'>

Instantiating a pynapple TsdFrame will keep the data as a memory map.

eeg = nap.TsdFrame(t=timestep, d=data)

print(eeg)

Time (s)           0         1         2
----------  --------  --------  --------
         -0.18984   1.06459  -0.70633
         -1.33765  -0.08391   0.21352
          1.07157  -0.05164  -0.07782
         -0.50591   1.87033  -0.77583
         -0.22411   1.64142  -0.08435
          0.41989  -1.02416   1.44958
         -0.43207   0.64304   0.51506
         -1.58572   0.47507  -1.72942
          1.42899   0.19608   1.28719
         -1.12584   1.24789   2.22934
dtype: float32, shape: (10, 3)

We can check the type of eeg.values.

print(type(eeg.values))

<class 'numpy.memmap'>

Zarr#

It is possible to use Higher level library like zarr also not directly.

import zarr
zarr_array = zarr.zeros((10000, 5), chunks=(1000, 5), dtype='i4')
timestep = np.arange(len(zarr_array))

tsdframe = nap.TsdFrame(t=timestep, d=zarr_array)

/home/runner/.local/lib/python3.12/site-packages/pynapple/core/utils.py:196: UserWarning: Converting 'd' to numpy.array. The provided array was of type 'Array'.
  warnings.warn(

As the warning suggest, zarr_array is converted to numpy array.

print(type(tsdframe.d))

<class 'numpy.ndarray'>

To maintain a zarr array, you can change the argument load_array to False.

tsdframe = nap.TsdFrame(t=timestep, d=zarr_array, load_array=False)

print(type(tsdframe.d))

<class 'zarr.core.Array'>

Within pynapple, numpy memory map are recognized as numpy array while zarr array are not.

print(type(data), "Is np.ndarray? ", isinstance(data, np.ndarray))
print(type(zarr_array), "Is np.ndarray? ", isinstance(zarr_array, np.ndarray))

<class 'numpy.memmap'> Is np.ndarray?  True
<class 'zarr.core.Array'> Is np.ndarray?  False

Similar to numpy memory map, you can use pynapple functions directly.

ep = nap.IntervalSet(0, 10)
tsdframe.restrict(ep)

Time (s)      0    1    2    3    4
----------  ---  ---  ---  ---  ---
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
           0    0    0    0    0
          0    0    0    0    0
dtype: int32, shape: (11, 5)

group = nap.TsGroup({0:nap.Ts(t=[10, 20, 30])})

sta = nap.compute_event_trigger_average(group, tsdframe, 1, (-2, 3))

print(type(tsdframe.values))
print("\n")
print(sta)

<class 'zarr.core.Array'>


Time (s)
----------  -----------------
-2          [[0. ... 0.] ...]
-1          [[0. ... 0.] ...]
0           [[0. ... 0.] ...]
1           [[0. ... 0.] ...]
2           [[0. ... 0.] ...]
3           [[0. ... 0.] ...]
dtype: float64, shape: (6, 1, 5)

Navigating a dataset#

We can load a folder containing multiple animals and sessions with the Folders class. The method nap.load_folder provides a shortcut.

project = nap.load_folder(project_path)

print(project)

/home/runner/.local/lib/python3.12/site-packages/hdmf/spec/namespace.py:583: UserWarning: Ignoring the following cached namespace(s) because another version is already loaded:
core - cached version: 2.6.0-alpha, loaded version: 2.8.0
The loaded extension(s) may not be compatible with the cached extension(s) in the file. Please check the extension documentation and ignore this warning if these versions are compatible.
  self.warn_for_ignored_namespaces(ignored_namespaces)

📂 MyProject
└── 📂 sub-A2929

The pynapple IO offers a convenient way of visualizing and navigating a folder based dataset. To visualize the whole hierarchy of Folders, you can call the view property or the expand function.

project.view

📂 MyProject
└── 📂 sub-A2929
    └── 📂 A2929-200711
        ├── 📂 derivatives
        │   ├── position.npz    |        TsdFrame
        │   ├── spikes.npz      |        TsGroup
        │   ├── wake_ep.npz     |        IntervalSet
        │   └── sleep_ep.npz    |        IntervalSet
        ├── 📂 pynapplenwb
        │   └── A2929-200711    |        NWB file
        ├── stimulus-fish.npz       |        IntervalSet
        └── x_plus_1.npz    |        Tsd

Here it shows all the subjects (in this case only A2929), all the sessions and all of the derivatives folders. It shows as well all the NPZ files that contains a pynapple object and the NWB files.

The object project behaves like a nested dictionary. It is then easy to loop and navigate through a hierarchy of folders when doing analyses. In this case, we are gonna take only the session A2929-200711.

session = project["sub-A2929"]["A2929-200711"]
print(session)

📂 A2929-200711
├── 📂 derivatives
├── 📂 pynapplenwb
├── stimulus-fish.npz       |        IntervalSet
└── x_plus_1.npz    |        Tsd

The Folder view gives the path to any object. It can then be easily loaded.

print(project["sub-A2929"]["A2929-200711"]["pynapplenwb"]["A2929-200711"])

A2929-200711
┍━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━┑
│ Keys                  │ Type        │
┝━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━┥
│ units                 │ TsGroup     │
│ position_time_support │ IntervalSet │
│ epochs                │ IntervalSet │
│ z                     │ Tsd         │
│ y                     │ Tsd         │
│ x                     │ Tsd         │
│ rz                    │ Tsd         │
│ ry                    │ Tsd         │
│ rx                    │ Tsd         │
┕━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━┙

JSON sidecar file#

A good practice for sharing datasets is to write as many metainformation as possible. Following BIDS specifications, any data files should be accompagned by a JSON sidecar file.

This is possible using the Folder class of pynapple with the argument description.

epoch = nap.IntervalSet(start=np.array([0, 3]), end=np.array([1, 6]))
session.save("stimulus-fish", epoch, description="Fish pictures to V1")

It is then possible to read the description with the doc method of the Folder object.

session.doc("stimulus-fish")

╭─ MyProject/sub-A2929/A2929-200711/stimulus-fish.npz ─╮
│ time : 2025-06-16 21:59:43.152769                    │
│ info : Fish pictures to V1                           │
╰──────────────────────────────────────────────────────╯