9. Iterators

Both Series and Frame offer a variety of iterators (all generators) for flexible transversal of axis and values. In addition, all iterators have a family of apply methods for applying functions to the values iterated. In all cases, alternate “items” versions of iterators are provided; these methods return pairs of (index, value).

9.1. Element Iterators

9.1.1. Series

Series.iter_element()
Series.iter_element().apply(func, dtype)
Series.iter_element().apply_pool(func, dtype, max_workers, chunksize, use_threads)
Series.iter_element().apply_iter(func)
Series.iter_element().apply_iter_items(func)

Iterate over the values of the Series, or expose static_frame.IterNodeDelegate for function application.

>>> s = sf.Series((1, 2, 67, 62, 27, 14), index=('Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'))
>>> [x for x in s.iter_element()]
[1, 2, 67, 62, 27, 14]

>>> s.iter_element().apply(lambda v: v > 20)
<Series>
<Index>
Earth    False
Mars     False
Jupiter  True
Saturn   True
Uranus   True
Neptune  False
<<U7>    <bool>
>>> [x for x in s.iter_element().apply_iter(lambda v: v > 20)]
[False, False, True, True, True, False]

Series.iter_element_items()
Series.iter_element_items().apply(func)
Series.iter_element_items().apply_pool(func, dtype, max_workers, chunksize, use_threads)
Series.iter_element_items().apply_iter(func)
Series.iter_element_items().apply_iter_items(func)

Iterate over pairs of index and values of the Series, or expose static_frame.IterNodeDelegate for function application.

>>> s = sf.Series((1, 2, 67, 62, 27, 14), index=('Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune'))

>>> [x for x in s.iter_element_items()]
[('Earth', 1), ('Mars', 2), ('Jupiter', 67), ('Saturn', 62), ('Uranus', 27), ('Neptune', 14)]

>>> s.iter_element_items().apply(lambda k, v: v if 'u' in k else None)
<Series>
<Index>
Earth    None
Mars     None
Jupiter  67
Saturn   62
Uranus   27
Neptune  14
<<U7>    <object>

>>> [x for x in s.iter_element_items().apply_iter_items(lambda k, v: k.upper() if v > 20 else None)]
[('Earth', None), ('Mars', None), ('Jupiter', 'JUPITER'), ('Saturn', 'SATURN'), ('Uranus', 'URANUS'), ('Neptune', None)]

Deviations from Pandas

The functionality of Pandas pd.Series.map() and pd.Series.apply() can both be obtained with Series.iter_element().apply(). When given a mapping, Series.iter_element().apply() will pass original values unchanged if they are not found in the mapping. This deviates from pd.Series.map(), which fills unmapped values with NaN.

9.1.2. Frame

Frame.iter_element()
Frame.iter_element().apply(func)
Frame.iter_element().apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_element().apply_iter(func)
Frame.iter_element().apply_iter_items(func)

Iterate over the values of the Frame, or expose static_frame.IterNodeDelegate for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'), dtypes=dict(diameter=np.int64))

>>> [x for x in f.iter_element()]
[12756, 5.97, 6792, 0.642, 142984, 1898.0]

>>> f.iter_element().apply(lambda x: x ** 2)
<Frame>
<Index> diameter    mass                <<U8>
<Index>
Earth   162715536   35.640899999999995
Mars    46131264    0.41216400000000003
Jupiter 20444424256 3602404.0
<<U7>   <object>    <object>

Frame.iter_element_items()
Frame.iter_element_items().apply(func)
Frame.iter_element_items().apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_element_items().apply_iter(func)
Frame.iter_element_items().apply_iter_items(func)

Iterate over pairs of index / column coordinates and values of the Frame, or expose static_frame.IterNodeDelegate for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'))

>>> [x for x in f.iter_element_items()]
[(('Earth', 'diameter'), 12756), (('Earth', 'mass'), 5.97), (('Mars', 'diameter'), 6792), (('Mars', 'mass'), 0.642), (('Jupiter', 'diameter'), 142984), (('Jupiter', 'mass'), 1898.0)]

>>> f.iter_element_items().apply(lambda k, v: v ** 2 if k[0] == 'Mars' else None)
<Frame>
<Index> diameter mass                <<U8>
<Index>
Earth   None     None
Mars    46131264 0.41216400000000003
Jupiter None     None
<<U7>   <object> <object>

Deviations from Pandas

The functionality of Pandas pd.DataFrame.applymap() can be obtained with Frame.iter_element().apply(), though the latter accepts both callables and mapping objects.

9.2. Axis Iterators

Axis iterators are available on Frame to support iterating on rows or columns as NumPy arrays, named tuples, or Series. Alternative items functions are also available to pair values with the appropriate axis label (either columns or index).

Frame.iter_array(axis)
Frame.iter_array(axis).apply(func)
Frame.iter_array(axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_array(axis).apply_iter(func)
Frame.iter_array(axis).apply_iter_items(func)

Iterate over NumPy arrays of Frame axis, where axis 0 iterates column data and axis 1 iterates row data. The returned static_frame.IterNodeDelegate exposes interfaces for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'), dtypes=dict(diameter=np.int64))
>>> f
<Frame>
<Index> diameter mass      <<U8>
<Index>
Earth   12756    5.97
Mars    6792     0.642
Jupiter 142984   1898.0
<<U7>   <int64>  <float64>

>>> [x.tolist() for x in f.iter_array(axis=0)]
[[12756, 6792, 142984], [5.97, 0.642, 1898.0]]

>>> [x.tolist() for x in f.iter_array(axis=1)]
[[12756.0, 5.97], [6792.0, 0.642], [142984.0, 1898.0]]

>>> f.iter_array(axis=0).apply(np.sum)
<Series>
<Index>
diameter 162532.0
mass     1904.612
<<U8>    <float64>

Frame.iter_array_items(axis)
Frame.iter_array_items(axis).apply(func)
Frame.iter_array_items(axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_array_items(axis).apply_iter(func)
Frame.iter_array_items(axis).apply_iter_items(func)

Iterate over pairs of label, NumPy array, per Frame axis, where axis 0 iterates column data and axis 1 iterates row data. The returned static_frame.IterNodeDelegate exposes interfaces for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'))

>>> [x for x in f.iter_array_items(axis=0)]
[('diameter', array([ 12756,   6792, 142984])), ('mass', array([5.970e+00, 6.420e-01, 1.898e+03]))]

>>> [x for x in f.iter_array_items(axis=1)]
[('Earth', array([1.2756e+04, 5.9700e+00])), ('Mars', array([6.792e+03, 6.420e-01])), ('Jupiter', array([142984.,   1898.]))]

>>> f.iter_array_items(axis=1).apply(lambda k, v: v.sum() if k == 'Earth' else 0)
<Series>
<Index>
Earth    12761.97
Mars     0.0
Jupiter  0.0
<<U7>    <float64>

Frame.iter_tuple(axis)
Frame.iter_tuple(axis).apply(func)
Frame.iter_tuple(axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_tuple(axis).apply_iter(func)
Frame.iter_tuple(axis).apply_iter_items(func)

Iterate over NamedTuples of Frame axis, where axis 0 iterates column data and axis 1 iterates row data. The returned static_frame.IterNodeDelegate exposes interfaces for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'))

>>> [x for x in f.iter_tuple(axis=0)]
[Axis(Earth=12756, Mars=6792, Jupiter=142984), Axis(Earth=5.97, Mars=0.642, Jupiter=1898.0)]

>>> [x for x in f.iter_tuple(axis=1)]
[Axis(diameter=12756.0, mass=5.97), Axis(diameter=6792.0, mass=0.642), Axis(diameter=142984.0, mass=1898.0)]

>>> f.iter_tuple(1).apply(lambda nt: nt.mass / (4 / 3 * np.pi * (nt.diameter * 0.5) ** 3))
<Series>
<Index>
Earth    5.49328558e-12
Mars     3.91330208e-12
Jupiter  1.24003876e-12
<<U7>    <float64>

Frame.iter_tuple_items(axis)
Frame.iter_tuple_items(axis).apply(func)
Frame.iter_tuple_items(axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_tuple_items(axis).apply_iter(func)
Frame.iter_tuple_items(axis).apply_iter_items(func)

Iterate over pairs of label, NamedTuple, per Frame axis, where axis 0 iterates column data and axis 1 iterates row data. The returned static_frame.IterNodeDelegate exposes interfaces for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'))

>>> [x for x in f.iter_tuple_items(axis=0)]
[('diameter', Axis(Earth=12756, Mars=6792, Jupiter=142984)), ('mass', Axis(Earth=5.97, Mars=0.642, Jupiter=1898.0))]

>>> [x for x in f.iter_tuple_items(axis=1)]
[('Earth', Axis(diameter=12756.0, mass=5.97)), ('Mars', Axis(diameter=6792.0, mass=0.642)), ('Jupiter', Axis(diameter=142984.0, mass=1898.0))]

>>> f.iter_tuple_items(axis=1).apply(lambda k, v: v.diameter if k == 'Earth' else 0)
<Series>
<Index>
Earth    12756.0
Mars     0.0
Jupiter  0.0
<<U7>    <float64>

Frame.iter_series(axis)
Frame.iter_series(axis).apply(func)
Frame.iter_series(axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_series(axis).apply_iter(func)
Frame.iter_series(axis).apply_iter_items(func)

Iterate over Series of Frame axis, where axis 0 iterates column data and axis 1 iterates row data. The returned static_frame.IterNodeDelegate exposes interfaces for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'), dtypes=dict(diameter=np.int64))

>>> next(iter(f.iter_series(axis=0)))
<Series>
<Index>
Earth    12756
Mars     6792
Jupiter  142984
<<U7>    <int64>

>>> next(iter(f.iter_series(axis=1)))
<Series>
<Index>
diameter 12756.0
mass     5.97
<<U8>    <float64>

>>> f.iter_series(0).apply(lambda s: s.mean())
<Series>
<Index>
diameter 54177.333333333336
mass     634.8706666666667
<<U8>    <float64>

Frame.iter_series_items(axis)
Frame.iter_series_items(axis).apply(func)
Frame.iter_series_items(axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_series_items(axis).apply_iter(func)
Frame.iter_series_items(axis).apply_iter_items(func)

Iterate over pairs of label, Series, per Frame axis, where axis 0 iterates column data and axis 1 iterates row data. The returned static_frame.IterNodeDelegate exposes interfaces for function application.

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'))

>>> [(k, v.mean()) for k, v in f.iter_series_items(0)]
[('diameter', 54177.333333333336), ('mass', 634.8706666666667)]

>>> [(k, v.max()) for k, v in f.iter_series_items(1)]
[('Earth', 12756.0), ('Mars', 6792.0), ('Jupiter', 142984.0)]

>>> f.iter_series_items(0).apply(lambda k, v: v.mean() if k == 'diameter' else v.sum())
<Series>
<Index>
diameter 54177.333333333336
mass     1904.612
<<U8>    <float64>

Deviations from Pandas

The functionality of Pandas pd.DataFrame.itertuples() can be obtained with Frame.iter_tuple(axis=0). The functionality of Pandas pd.DataFrame.iterrows() can be obtained with Frame.iter_series(axis=0). The functionality of Pandas pd.DataFrame.iteritems() can be obtained with Frame.iter_series_items(axis=1). The functionality of Pandas pd.DataFrame.apply(axis) can be obtained with Frame.iter_series(axis).apply().

9.3. Group Iterators

9.3.1. Series

Series.iter_group(key)
Series.iter_group(key).apply(func)
Series.iter_group(key).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Series.iter_group(key).apply_iter(func)
Series.iter_group(key).apply_iter_items(func)

Iterator of Series formed from groups of unique values in a Series.

>>> s = sf.Series((0, 0, 1, 2), index=('Mercury', 'Venus', 'Earth', 'Mars'), dtype=np.int64)
>>> next(iter(s.iter_group()))
<Series>
<Index>
Mercury  0
Venus    0
<<U7>    <int64>
>>> [x.values.tolist() for x in s.iter_group()]
[[0, 0], [1], [2]]

Series.iter_group_items(key)
Series.iter_group_items(key).apply(func)
Series.iter_group_items(key).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Series.iter_group_items(key).apply_iter(func)
Series.iter_group_items(key).apply_iter_items(func)

Iterator of pairs of group value and the Series formed from groups of unique values in a Series.

>>> s = sf.Series((0, 0, 1, 2), index=('Mercury', 'Venus', 'Earth', 'Mars'))
>>> [(k, v.index.values.tolist()) for k, v in iter(s.iter_group_items()) if k > 0]
[(1, ['Earth']), (2, ['Mars'])]

Series.iter_group_index(depth_level)
Series.iter_group_index(depth_level).apply(func)
Series.iter_group_index(depth_level).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Series.iter_group_index(depth_level).apply_iter(func)
Series.iter_group_index(depth_level).apply_iter_items(func)

Iterator of Series formed from groups of unique Index labels.

Series.iter_group_index_items(depth_level)
Series.iter_group_index_items(depth_level).apply(func)
Series.iter_group_index_items(depth_level).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Series.iter_group_index_items(depth_level).apply_iter(func)
Series.iter_group_index_items(depth_level).apply_iter_items(func)

Iterator of pairs of group value and Series formed from groups of unique Index labels.

9.3.2. Frame

Frame.iter_group(key, axis)
Frame.iter_group(key, axis).apply(func)
Frame.iter_group(key, axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_group(key, axis).apply_iter(func)
Frame.iter_group(key, axis).apply_iter_items(func)

Iterate over groups (as Frames) based on unique values found in the column specified by key. If axis is 0, subgroups of rows are retuned and key selects columns; If axis is 1, subgroups of columns are returned and key selects rows.

>>> f = sf.Frame.from_dict(dict(mass=(0.33, 4.87, 5.97, 0.642), moons=(0, 0, 1, 2)), index=('Mercury', 'Venus', 'Earth', 'Mars'), dtypes=dict(moons=np.int64))
>>> next(iter(f.iter_group('moons')))
<Frame>
<Index> mass      moons   <<U5>
<Index>
Mercury 0.33      0
Venus   4.87      0
<<U7>   <float64> <int64>
>>> [x.shape for x in f.iter_group('moons')]
[(2, 2), (1, 2), (1, 2)]

Frame.iter_group_items(key, axis)
Frame.iter_group_items(key, axis).apply(func)
Frame.iter_group_items(key, axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_group_items(key, axis).apply_iter(func)
Frame.iter_group_items(key, axis).apply_iter_items(func)

Iterator over pairs of group value and groups (as Frame) based on unique values found in the column specified by key. If axis is 0, subgroups of rows are retuned and key selects columns; If axis is 1, subgroups of columns are returned and key selects rows.

>>> f = sf.Frame.from_dict(dict(mass=(0.33, 4.87, 5.97, 0.642), moons=(0, 0, 1, 2)), index=('Mercury', 'Venus', 'Earth', 'Mars'))
>>> [(k, v.index.values.tolist(), v['mass'].mean()) for k, v in f.iter_group_items('moons')]
[(0, ['Mercury', 'Venus'], 2.6), (1, ['Earth'], 5.97), (2, ['Mars'], 0.642)]

Frame.iter_group_index(depth_level, axis)
Frame.iter_group_index(depth_level, axis).apply(func)
Frame.iter_group_index(depth_level, axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_group_index(depth_level, axis).apply_iter(func)
Frame.iter_group_index(depth_level, axis).apply_iter_items(func)

Iterate over groups (as Frame) based on unique labels found in the index specified by depth_level. If axis is 0, subgroups of rows are retuned and depth_level selects columns; If axis is 1, subgroups of columns are returned and depth_level selects rows.

Frame.iter_group_index_items(depth_level, axis)
Frame.iter_group_index_items(depth_level, axis).apply(func)
Frame.iter_group_index_items(depth_level, axis).apply_pool(func, dtype, max_workers, chunksize, use_threads)
Frame.iter_group_index_items(depth_level, axis).apply_iter(func)
Frame.iter_group_index_items(depth_level, axis).apply_iter_items(func)

Iterator over pairs of group value and groups (as Frame) based on unique labels found in the index specified by depth_level. If axis is 0, subgroups of rows are retuned and depth_level selects columns; If axis is 1, subgroups of columns are returned and depth_level selects rows.

10. Function Application to Iterators

static_frame.Frame and static_frame.Series static_frame.IterNode attributes return, when called, static_frame.IterNodeDelegate instances. These instances are prepared for iteration via static_frame.IterNodeDelegate.__iter__(), and expose a number of methods for function application.

class IterNode(*, container: FrameOrSeries, function_values: Callable[[...], Iterable[Any]], function_items: Callable[[...], Iterable[Tuple[Any, Any]]], yield_type: static_frame.core.iter_node.IterNodeType, apply_type: static_frame.core.iter_node.IterNodeApplyType = <IterNodeApplyType.SERIES_ITEMS: 1>)[source]

Interface to a type of iteration on static_frame.Series and static_frame.Frame.

class IterNodeDelegate(func_values: Callable[[...], Iterable[Any]], func_items: Callable[[...], Iterable[Tuple[Any, Any]]], yield_type: static_frame.core.iter_node.IterNodeType, apply_constructor: Callable[[...], FrameOrSeries])[source]

Delegate returned from static_frame.IterNode, providing iteration as well as a family of apply methods.

IterNodeDelegate.__iter__() → Union[Iterator[Any], Iterator[Tuple[Any, Any]]][source]

Return a generator based on the yield type.

IterNodeDelegate.apply(func: Union[Callable[[...], Any], Mapping[Hashable, Any], Series], *, dtype: Union[str, numpy.dtype, type, None] = None) → FrameOrSeries[source]

Apply passed function to each object iterated, where the object depends on the creation of this instance.

Parameters
  • func – A function, or a mapping object that defines __getitem__. If a mapping is given, all values must be found in the mapping.

  • dtype – Type used to create the returned array.

IterNodeDelegate.apply_pool(func: Union[Callable[[...], Any], Mapping[Hashable, Any], Series], *, dtype: Union[str, numpy.dtype, type, None] = None, max_workers: Optional[int] = None, chunksize: int = 1, use_threads: bool = False) → FrameOrSeries[source]

Apply passed function to each object iterated, where the object depends on the creation of this instance. Employ parallel processing with either the ProcessPoolExecutor or ThreadPoolExecutor.

Parameters
  • func – A function, or a mapping object that defines __getitem__. If a mapping is given, all values must be found in the mapping.

  • dtype – Type used to create the returned array.

  • max_workers – Passed to the pool_executor, where None defaults to the max number of machine processes.

  • chunksize – Passed to the pool executor.

  • use_thread – When True, the ThreadPoolExecutor will be used rather than the default ProcessPoolExecutor.

IterNodeDelegate.apply_iter(func: Union[Callable[[...], Any], Mapping[Hashable, Any], Series]) → Generator[Any, None, None][source]

Generator that applies the passed function to each element iterated and yields the result.

Parameters

func – A function, or a mapping object that defines __getitem__. If a mapping is given, all values must be found in the mapping.

IterNodeDelegate.apply_iter_items(func: Union[Callable[[...], Any], Mapping[Hashable, Any], Series]) → Generator[Tuple[Any, Any], None, None][source]

Generator that applies function to each element iterated and yields the pair of element and the result.

Parameters

func – A function or a mapping object that defines __getitem__ and __contains__. If a mpping is given and a value is not found in the mapping, the value is returned unchanged (this deviates from Pandas Series.map, which inserts NaNs)