1. About StaticFrame¶
StaticFrame is an alternative dataframe library built on an immutable data model. StaticFrame is not a drop-in replacement for Pandas. While some conventions and API components are directly borrowed from Pandas, some are completely different, either by necessity (due to the immutable data model) or by choice (offering more uniform, less redundant, and more explicit interfaces). As StaticFrame does not support in-place mutation, architectures that made significant use of mutability in Pandas will require refactoring.
Please assist in development by reporting bugs or requesting features. We are a welcoming community and appreciate all feedback! Visit GitHub Issues. To get started contributing to StaticFrame, see Contributing.
2. About Immutability¶
The Series
and Frame
store data in immutable NumPy arrays. Once created, array values cannot be changed. StaticFrame manages NumPy arrays, setting the ndarray.flags.writeable
attribute to False on all managed and returned NumPy arrays.
>>> import static_frame as sf
>>> import numpy as np
>>> s = sf.Series((67, 62, 27, 14), index=('Jupiter', 'Saturn', 'Uranus', 'Neptune'), dtype=np.int64)
>>> s #doctest: +NORMALIZE_WHITESPACE
<Series>
<Index>
Jupiter 67
Saturn 62
Uranus 27
Neptune 14
<<U7> <int64>
>>> s['Jupiter'] = 68
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Series' object does not support item assignment
>>> s.iloc[0] = 68
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'InterfaceGetItem' object does not support item assignment
>>> s.values[0] = 68
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: assignment destination is read-only
To mutate values in a Series
or Frame
, a copy must be made. Convenient functional interfaces to assign to a copy are provided, using conventions familiar to NumPy and Pandas users.
>>> s.assign['Jupiter'](69)
<Series>
<Index>
Jupiter 69
Saturn 62
Uranus 27
Neptune 14
<<U7> <int64>
>>> s.assign['Uranus':](s['Uranus':] - 2)
<Series>
<Index>
Jupiter 67
Saturn 62
Uranus 25
Neptune 12
<<U7> <int64>
>>> s.assign.iloc[[0, 3]]((68, 11))
<Series>
<Index>
Jupiter 68
Saturn 62
Uranus 27
Neptune 11
<<U7> <int64>
Immutable data has the overwhelming benefit of providing the confidence that a client of a Series
or Frame
cannot mutate its data. This removes the need for many unnecessary, defensive copies, and forces clients to only make copies when absolutely necessary.
There is no guarantee that using immutable data will produce correct code or more resilient and robust libraries. It is true, however, that using immutable data removes countless opportunities for introducing flaws in data processing routines and libraries.
3. History¶
The ideas behind StaticFrame developed out of years of work with Pandas and related tabular data structures by the Investment Systems team at Research Affiliates, LLC. In May of 2017 Christopher Ariza proposed the basic model to the Investment Systems team and began implementation. The first public release was in May 2018.
4. Articles¶
2022: StaticFrame from the Ground Up: Getting Started with Immutable DataFrames
2022: Using Higher-Order Containers to Efficiently Process 7,163 (or More) DataFrames.
5. Presentations¶
The following presentations and interviews describe StaticFrame in greater depth.
PyData Global 2021: “Why Datetimes Need Units: Avoiding a Y2262 Problem & Harnessing the Power of NumPy’s datetime64”: https://www.youtube.com/watch?v=jdnr7sgxCQI
PyData LA 2019: “The Best Defense is not a Defensive Copy” (lightning talk starting at 18:25): https://youtu.be/_WXMs8o9Gdw
PyData LA 2019: “Fitting Many Dimensions into One The Promise of Hierarchical Indices for Data Beyond Two Dimensions”: https://youtu.be/xX8tXSNDpmE
PyCon US 2019: “A Less Kind, Less Gentle DataFrame” (lightning talk starting at 53:00): https://pyvideo.org/pycon-us-2019/friday-lightning-talksbreak-pycon-2019.html
Talk Python to Me, interview: https://talkpython.fm/episodes/show/204/staticframe-like-pandas-but-safer
PyData LA 2018: “StaticFrame: An Immutable Alternative to Pandas”: https://pyvideo.org/pydata-la-2018/staticframe-an-immutable-alternative-to-pandas.html
6. Contributors¶
These members of the Investment Systems team have contributed greatly to the design of StaticFrame:
Brandt Bucher
Charles Burkland
Guru Devanla
John Hawk
John McCloskey
Adam Kay
Mark LeMoine
Myrl Marmarelis
Tom Rutherford
Yu Tomita
Quang Vu
Thanks also for additional contributions from GitHub users:
https://github.com/InvestmentSystems/static-frame/graphs/contributors