7. Assignment / Dropping / Masking

Series and Frame provide interface attributes for exposing assignment-like operations, dropping data, and producing masks. Each interface attribute exposes a root __getitem__ interface, as well as __getitem__ interfaces on loc and iloc attributes, exposing the full range selection approaches.

7.1. Assignment

The assign-to-copy interfaces permit expressive assignment to new containers with the same flexibility as Pandas and NumPy. As all underlying data is immutable, the caller will not be mutated. With Frame objects, the minimum amount of data will be copied to the new Frame, depending on the type of assignment and the organization of the underlying TypeBlocks.

7.1.1. Series

Series.assign[key](value)
Series.assign.loc[key](value)
Series.assign.iloc[key](value)

Replace the values specified by the key with value.

Parameters
  • key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices.

  • value – The value to be assigned. Can be a single value, an iterable of values, or a Series.

Returns

static_frame.Series

>>> s = sf.Series.from_items((('Venus', 108.2), ('Earth', 149.6), ('Saturn', 1433.5)))
>>> s
<Series>
<Index>
Venus    108.2
Earth    149.6
Saturn   1433.5
<<U6>    <float64>
>>> s.assign['Earth'](150)
<Series>
<Index>
Venus    108.2
Earth    150.0
Saturn   1433.5
<<U6>    <float64>
>>> s.assign['Earth':](0)
<Series>
<Index>
Venus    108.2
Earth    0.0
Saturn   0.0
<<U6>    <float64>
>>> s.assign.loc[s < 150](0)
<Series>
<Index>
Venus    0.0
Earth    0.0
Saturn   1433.5
<<U6>    <float64>
>>> s.assign.iloc[-1](0)
<Series>
<Index>
Venus    108.2
Earth    149.6
Saturn   0.0
<<U6>    <float64>

7.1.2. Frame

Frame.assign[key](value)
Frame.assign.loc[key](value)
Frame.assign.iloc[key](value)

Replace the values specified by the key with value.

Parameters
  • key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices. The root __getitem__ interface is a column selector; loc and iloc interfaces accept one or two arguments, for either row selection or row and column selection (respectively).

  • value – The value to be assigned. Can be a single value, an iterable of values, a Series, or a Frame.

Returns

static_frame.Frame

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 6792, 142984), mass=(5.97, 0.642, 1898)), index=('Earth', 'Mars', 'Jupiter'), dtypes=dict(diameter=np.int64))
>>> f
<Frame>
<Index> diameter mass      <<U8>
<Index>
Earth   12756    5.97
Mars    6792     0.642
Jupiter 142984   1898.0
<<U7>   <int64>  <float64>
>>> f.assign['mass'](f['mass'] * .001)
<Frame>
<Index> diameter mass               <<U8>
<Index>
Earth   12756    0.00597
Mars    6792     0.000642
Jupiter 142984   1.8980000000000001
<<U7>   <int64>  <float64>
>>> f.assign.loc['Mars', 'mass'](0)
<Frame>
<Index> diameter mass      <<U8>
<Index>
Earth   12756    5.97
Mars    6792     0.0
Jupiter 142984   1898.0
<<U7>   <int64>  <float64>
>>> f.assign.loc['Mars':, 'diameter'](0)
<Frame>
<Index> diameter mass      <<U8>
<Index>
Earth   12756    5.97
Mars    0        0.642
Jupiter 0        1898.0
<<U7>   <int64>  <float64>
>>> f.assign.loc[f['diameter'] > 10000, 'mass'](0)
<Frame>
<Index> diameter mass      <<U8>
<Index>
Earth   12756    0.0
Mars    6792     0.642
Jupiter 142984   0.0
<<U7>   <int64>  <float64>

7.2. Dropping Data

While data from a Series or Frame can be excluded through common selection interfaces, in some cases it is more efficient and readable to specify what to drop rather than what to keep. The drop interface return new containers, efficiently removing the values specified by the key. For Frame, removal of rows and columns can happen simultaneously.

7.2.1. Series

Series.drop[key]
Series.drop.loc[key]
Series.drop.iloc[key]

Remove the values specified by the key.

Parameters

key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices.

Returns

static_frame.Series

>>> s = sf.Series((0, 0, 1, 2), index=('Mercury', 'Venus', 'Earth', 'Mars'), dtype=np.int64)
>>> s
<Series>
<Index>
Mercury  0
Venus    0
Earth    1
Mars     2
<<U7>    <int64>
>>> s.drop[s < 1]
<Series>
<Index>
Earth    1
Mars     2
<<U7>    <int64>
>>> s.drop[['Mercury', 'Mars']]
<Series>
<Index>
Venus    0
Earth    1
<<U7>    <int64>
>>> s.drop.iloc[-2:]
<Series>
<Index>
Mercury  0
Venus    0
<<U7>    <int64>

7.2.2. Frame

Frame.drop[key]
Frame.drop.loc[key]
Frame.drop.iloc[key]

Remove the values specified by the key.

Parameters

key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices. The root __getitem__ interface is a column selector; loc and iloc interfaces accept one or two arguments, for either row selection or row and column selection (respectively).

Returns

static_frame.Frame

>>> f = sf.Frame.from_dict(dict(diameter=(12756, 142984, 120536), temperature=(15, -110, -140)), index=('Earth', 'Jupiter', 'Saturn'), dtypes=dict(diameter=np.int64, temperature=np.int64))
>>> f
<Frame>
<Index> diameter temperature <<U11>
<Index>
Earth   12756    15
Jupiter 142984   -110
Saturn  120536   -140
<<U7>   <int64>  <int64>
>>> f.drop['diameter']
<Frame>
<Index> temperature <<U11>
<Index>
Earth   15
Jupiter -110
Saturn  -140
<<U7>   <int64>
>>> f.drop.loc[f['temperature'] < 0]
<Frame>
<Index> diameter temperature <<U11>
<Index>
Earth   12756    15
<<U7>   <int64>  <int64>
>>> f.drop.iloc[-1, -1]
<Frame>
<Index> diameter <<U11>
<Index>
Earth   12756
Jupiter 142984
<<U7>   <int64>

7.3. Masking Data

While Boolean Series and Frame can be created directly or with comparison operators (or functions like isin()), in some cases it is desirable to directly specify a mask through the common selection idioms.

7.3.1. Series

Series.mask[key]
Series.mask.loc[key]
Series.mask.iloc[key]

Mask (set to True) the values specified by the key and return a Boolean Series.

Parameters

key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices.

Returns

static_frame.Series

7.3.2. Frame

Frame.mask[key]
Frame.mask.loc[key]
Frame.mask.iloc[key]

Mask (set to True) the values specified by the key and return a Boolean Frame.

Parameters

key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices. The root __getitem__ interface is a column selector; loc and iloc interfaces accept one or two arguments, for either row selection or row and column selection (respectively).

Returns

static_frame.Frame

7.4. Creating a Masked Array

NumPy masked arrays permit blocking out problematic data (i.e., NaNs) while maintaining compatibility with nearly all NumPy operations.

https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html

7.4.1. Series

Series.masked_array[key]
Series.masked_array.loc[key]
Series.masked_array.iloc[key]

Mask (set to True) the values specified by the key and return a NumPy MaskedArray.

Parameters

key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices.

Returns

np.ma.MaskedArray

7.4.2. Frame

Frame.masked_array[key]
Frame.masked_array.loc[key]
Frame.masked_array.iloc[key]

Mask (set to True) the values specified by the key and return a NumPy MaskedArray.

Parameters

key – A selector, either a label, a list of labels, a slice of labels, or a Boolean array. The root __getitem__ takes loc labels, loc takes loc labels, and iloc takes integer indices. The root __getitem__ interface is a column selector; loc and iloc interfaces accept one or two arguments, for either row selection or row and column selection (respectively).

Returns

np.ma.MaskedArray