浏览对象树

浏览对象树#

在本节中，我们将学习如何浏览树并检索数据以及有关实际数据的元信息。

在 examples/tutorial1-2.py 中，您将找到本节中所有代码的工作版本。

from pathlib import Path

temp_dir = Path(".temp")
temp_dir.mkdir(exist_ok=True)

from tables import *

遍历对象树#

从打开 PyTables 快速上手部分中创建的文件开始：

h5file = open_file(temp_dir/"tutorial1.h5", mode="a")

这次，我们以“a”ppend 模式打开文件。使用此模式向文件添加更多信息。

PyTables 遵循 Python 传统，提供了强大的内省能力，即您可以轻松地询问对象树的任何组件的信息，以及搜索树。

首先，您可以通过简单地打印现有的 File 实例来获得对象树的初步概览：

print(h5file)

.temp/tutorial1.h5 (File) 'Test file'
Last modif.: '2024-12-03T09:38:12+00:00'
Object Tree: 
/ (RootGroup) 'Test file'
/columns (Group) 'Pressure and Name'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector (Group) 'Detector information'
/detector/readout (Table(10,)) 'Readout example'

看起来我们所有的对象都在那里。现在，让我们利用文件迭代器来看看如何列出对象树中的所有节点：

for node in h5file:
    print(node)

/ (RootGroup) 'Test file'
/columns (Group) 'Pressure and Name'
/detector (Group) 'Detector information'
/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'
/detector/readout (Table(10,)) 'Readout example'

可以使用 walk_groups() 方法来仅列出树结构上的群组。

for group in h5file.walk_groups():
    print(group)

/ (RootGroup) 'Test file'
/columns (Group) 'Pressure and Name'
/detector (Group) 'Detector information'

请注意，tables.File.walk_groups() 实际上返回的是迭代器，而非对象列表。将这个迭代器与 tables.File.list_nodes() 方法结合使用是一种强大的组合。让我们通过例子来看看如何列出树中的所有数组：

for group in h5file.walk_groups("/"):
    for array in h5file.list_nodes(group, classname='Array'):
        print(array)

/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'

tables.File.list_nodes() 返回列表，其中包含挂在特定 Group 下的所有节点。如果指定了 classname 关键字，该方法将过滤掉所有不是该类后代实例的节点。只要求 Array 实例。在某些情况下，还存在一个名为 tables.File.iter_nodes() 的迭代器对应方法，可能会很方便，例如在处理后面有大量节点的组时。

可以通过使用 tables.File.walk_nodes() 特殊方法来组合这两个调用。例如：

for array in h5file.walk_nodes("/", "Array"):
    print(array)

/columns/name (Array(3,)) 'Name column selection'
/columns/pressure (Array(3,)) 'Pressure column selection'

这是在交互式工作时的便捷快捷方式。

最后，将列出 /detector 组中的所有 tabels.Leaf，即 tabels.Table 和 tabels.Array 实例。请注意，在这个组中只会选择 tabels.Table 类的实例（即 readout，这应该是正常情况）：

for leaf in h5file.root.detector._f_walknodes('Leaf'):
    print(leaf)

/detector/readout (Table(10,)) 'Readout example'

我们使用了 _f_walknodes() 方法的调用，使用了自然命名路径规范。

当然，您可以使用这些强大的方法进行更复杂的节点选择。但首先，让我们看看一些重要的 PyTables 对象实例变量。

设置和获取用户属性#

PyTables 提供了一种简单而简洁的方式，通过使用 AttributeSet 类来补充树中节点对象的含义。您可以通过标准属性 attrs 访问 tabels.Leaf 节点和 _v_attrs 访问 tabels.Group 节点中的此对象。

例如，假设想要保存 /detector/readout 表中数据采集的日期，以及采集过程中的温度：

table = h5file.root.detector.readout
table.attrs.gath_date = "Wed, 06/12/2003 18:33"
table.attrs.temperature = 18.4
table.attrs.temp_scale = "Celsius"

现在，让我们在 /detector 组中设置稍微复杂的属性：

detector = h5file.root.detector
detector._v_attrs.stuff = [5, (2.3, 4.5), "Integer and tuple"]

请注意，如何通过 _v_attrs 属性访问 AttributeSet 实例，因为 detector 是 Group 节点。通常，您可以将任何标准 Python 数据结构保存为属性节点

检索属性同样简单：

table.attrs.gath_date

'Wed, 06/12/2003 18:33'

table.attrs.temperature

18.4

table.attrs.temp_scale

'Celsius'

detector._v_attrs.stuff

[5, (2.3, 4.5), 'Integer and tuple']

可能猜到如何删除属性：

del table.attrs.gath_date

如果您想检查 /detector/table 的当前用户属性集，可以打印其表示（如果您在具有 rlcompleter 模块的 Unix Python 控制台上，请尝试按 TAB 键两次）：

table.attrs

/detector/readout._v_attrs (AttributeSet), 22 attributes:
   [CLASS := 'TABLE',
    FIELD_0_FILL := 0,
    FIELD_0_NAME := 'ADCcount',
    FIELD_1_FILL := 0,
    FIELD_1_NAME := 'TDCcount',
    FIELD_2_FILL := 0.0,
    FIELD_2_NAME := 'energy',
    FIELD_3_FILL := 0,
    FIELD_3_NAME := 'grid_i',
    FIELD_4_FILL := 0,
    FIELD_4_NAME := 'grid_j',
    FIELD_5_FILL := 0,
    FIELD_5_NAME := 'idnumber',
    FIELD_6_FILL := b'',
    FIELD_6_NAME := 'name',
    FIELD_7_FILL := 0.0,
    FIELD_7_NAME := 'pressure',
    NROWS := 10,
    TITLE := 'Readout example',
    VERSION := '2.7',
    temp_scale := 'Celsius',
    temperature := 18.4]

得到了所有属性（包括系统属性）。您可以使用 _f_list() 方法获取所有属性或仅获取用户或系统属性的列表：

print(table.attrs._f_list("all"))

['CLASS', 'FIELD_0_FILL', 'FIELD_0_NAME', 'FIELD_1_FILL', 'FIELD_1_NAME', 'FIELD_2_FILL', 'FIELD_2_NAME', 'FIELD_3_FILL', 'FIELD_3_NAME', 'FIELD_4_FILL', 'FIELD_4_NAME', 'FIELD_5_FILL', 'FIELD_5_NAME', 'FIELD_6_FILL', 'FIELD_6_NAME', 'FIELD_7_FILL', 'FIELD_7_NAME', 'NROWS', 'TITLE', 'VERSION', 'temp_scale', 'temperature']

print(table.attrs._f_list("user"))

['temp_scale', 'temperature']

print(table.attrs._f_list("sys"))

['CLASS', 'FIELD_0_FILL', 'FIELD_0_NAME', 'FIELD_1_FILL', 'FIELD_1_NAME', 'FIELD_2_FILL', 'FIELD_2_NAME', 'FIELD_3_FILL', 'FIELD_3_NAME', 'FIELD_4_FILL', 'FIELD_4_NAME', 'FIELD_5_FILL', 'FIELD_5_NAME', 'FIELD_6_FILL', 'FIELD_6_NAME', 'FIELD_7_FILL', 'FIELD_7_NAME', 'NROWS', 'TITLE', 'VERSION']

您还可以更改属性的名称：

table.attrs._f_rename("temp_scale","tempScale")
print(table.attrs._f_list())

['tempScale', 'temperature']

从PyTables 2.0版本开始，您也被允许设置、删除或重命名系统属性。

table.attrs._f_rename("VERSION", "version")
table.attrs.VERSION

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[22], line 2
      1 table.attrs._f_rename("VERSION", "version")
----> 2 table.attrs.VERSION

File /media/pc/data/lxw/envs/anaconda3a/envs/ai/lib/python3.12/site-packages/tables/attributeset.py:287, in AttributeSet.__getattr__(self, name)
    285 # If attribute does not exist, raise AttributeError
    286 if name not in self._v_attrnames:
--> 287     raise AttributeError(f"Attribute {name!r} does not exist "
    288                          f"in node: {self._v__nodepath!r}")
    290 # Read the attribute from disk. This is an optimization to read
    291 # quickly system attributes that are _string_ values, but it
    292 # takes care of other types as well as for example NROWS for
    293 # Tables and EXTDIM for EArrays
    294 format_version = self._v__format_version

AttributeError: Attribute 'VERSION' does not exist in node: '/detector/readout'

table.attrs.version

'2.7'

table.attrs._f_rename("version", "VERSION")
table.attrs.VERSION

'2.7'

属性是向你的数据添加持久性（元）信息的有用机制。

从PyTables 3.9.0版本开始，你还可以对单个列设置、删除或重命名属性。API 的设计旨在与表上的属性表现一致。

table.cols.pressure.attrs['units'] = 'kPa'
table.cols.energy.attrs['units'] = 'MeV'

获取对象元数据#

PyTables中的每个对象都包含了关于文件中数据的元数据信息。通常，这些元信息可以通过节点实例变量来访问。让我们来看一些例子：

print("Object:", table)

Object: /detector/readout (Table(10,)) 'Readout example'

print("Table name:", table.name)

Table name: readout

print("Table title:", table.title)

Table title: Readout example

print("Number of rows in table:", table.nrows)

Number of rows in table: 10

for name in table.colnames:
    print(name, ':= %s, %s' % (table.coldtypes[name], table.coldtypes[name].shape))

ADCcount := uint16, ()
TDCcount := uint8, ()
energy := float64, ()
grid_i := int32, ()
grid_j := int32, ()
idnumber := int64, ()
name := |S16, ()
pressure := float32, ()

要检查 /columns/pressure 数组对象中的元数据：

pressureObject = h5file.get_node("/columns", "pressure")
print("Info on the object:", repr(pressureObject))

Info on the object: /columns/pressure (Array(3,)) 'Pressure column selection'
  atom := Float64Atom(shape=(), dflt=0.0)
  maindim := 0
  flavor := 'numpy'
  byteorder := 'little'
  chunkshape := None

print("  shape: ==>", pressureObject.shape)

shape: ==> (3,)

print("  title: ==>", pressureObject.title)

title: ==> Pressure column selection

print("  atom: ==>", pressureObject.atom)

atom: ==> Float64Atom(shape=(), dflt=0.0)

请注意，使用了 get_node() 方法来访问树中的节点，而不是自然命名方法。两者都很有用，并且根据上下文，您将更喜欢其中之一。get_node() 的优点是它可以从路径名字符串（如本例所示）获取节点，并且还可以作为过滤器，仅显示特定位置中属于 classname 类的实例的节点。然而，通常情况下，我认为自然命名更优雅且更易于使用，尤其是在使用交互式控制台中存在的名称补全功能时。尝试这种自然命名和大多数 Python 控制台中存在的补全功能的强大组合，看看浏览对象树是多么愉快（好吧，至少对于这样的活动来说是很愉快的）。

如果您查看 pressureObject 对象的 type 属性，您可以验证它是“float64”数组。通过查看其 shape 属性，您可以推断出磁盘上的数组是一维的，并且有 3 个元素。

从 Array 对象读取数据#

一旦找到所需的 Array，请使用 read() 方法检索其数据：

pressureArray = pressureObject.read()
pressureArray

array([25., 36., 49.])

print("pressureArray is an object of type:", type(pressureArray))

pressureArray is an object of type: <class 'numpy.ndarray'>

nameArray = h5file.root.columns.name.read()
print("nameArray is an object of type:", type(nameArray))

nameArray is an object of type: <class 'list'>

print("Data on arrays nameArray and pressureArray:")
for i in range(pressureObject.shape[0]):
    print(nameArray[i], "-->", pressureArray[i])

Data on arrays nameArray and pressureArray:
b'Particle:      5' --> 25.0
b'Particle:      6' --> 36.0
b'Particle:      7' --> 49.0

您可以通过查看 type 调用的输出来看到，tables.Array.read() 方法为 pressureObject 实例返回了真实的 NumPy 对象。对 nameArray 对象实例的 tables.Array.read() 返回了本地的 Python 列表（字符串）。保存对象的类型作为 HDF5 属性（名为 FLAVOR）存储在磁盘上的对象中。然后，此属性作为数组元信息读取（可通过 Array.attrs.FLAVOR 变量访问），使读取的数组能够转换为原始对象。这提供了一种将各种对象保存为数组的方法，并保证您以后能够以原始形式恢复它们。

h5file.close()