Detail: Batch: Accessor Regular Expression#

Overview: Batch: Accessor Regular Expression

Batch.via_re(pattern, flags, /).search(pos, endpos)

via_re = <function Batch.via_re>[source]

InterfaceRe.search(pos=0, endpos=None)[source]

Scan through string looking for the first location where this regular expression produces a match and return True, else False. Note that this is different from finding a zero-length match at some point in the string.

Parameters:

pos – Gives an index in the string where the search is to start; it defaults to 0.
endpos – Limits how far the string will be searched; it will be as if the string is endpos characters long.

>>> bt = sf.Batch((('i', sf.Frame(np.arange(6).reshape(3,2), index=('p', 'q', 'r'), columns=('a', 'b'), name='x')), ('j', sf.Frame.from_fields(((10, 2, np.nan, 2), ('qrs ', 'XYZ', '', '123'), ('1517-01-01', '1517-04-01', 'NaT', '1517-04-01')), columns=('a', 'b', 'c'), dtypes=dict(c=np.datetime64), name='x'))))
>>> bt
<Batch max_workers=None>
>>> bt.via_re('[X123]').search().to_frame()
<Frame>
<Index>                   a      b      c        <<U1>
<IndexHierarchy>
i                p        False  True   nan
i                q        True   True   nan
i                r        False  False  nan
j                0        True   False  True
j                1        True   True   True
j                2        False  False  False
j                3        True   True   True
<<U1>            <object> <bool> <bool> <object>

Batch.via_re(pattern, flags, /).match(pos, endpos)

via_re = <function Batch.via_re>[source]

InterfaceRe.match(pos=0, endpos=None)[source]

If zero or more characters at the beginning of string match this regular expression return True, else False. Note that this is different from a zero-length match.

Parameters:

pos – Gives an index in the string where the search is to start; it defaults to 0.
endpos – Limits how far the string will be searched; it will be as if the string is endpos characters long.

>>> bt = sf.Batch((('i', sf.Frame(np.arange(6).reshape(3,2), index=('p', 'q', 'r'), columns=('a', 'b'), name='x')), ('j', sf.Frame.from_fields(((10, 2, np.nan, 2), ('qrs ', 'XYZ', '', '123'), ('1517-01-01', '1517-04-01', 'NaT', '1517-04-01')), columns=('a', 'b', 'c'), dtypes=dict(c=np.datetime64), name='x'))))
>>> bt
<Batch max_workers=None>
>>> bt.via_re('[X123]').match().to_frame()
<Frame>
<Index>                   a      b      c        <<U1>
<IndexHierarchy>
i                p        False  True   nan
i                q        True   True   nan
i                r        False  False  nan
j                0        True   False  True
j                1        True   True   True
j                2        False  False  False
j                3        True   True   True
<<U1>            <object> <bool> <bool> <object>

Batch.via_re(pattern, flags, /).fullmatch(pos, endpos)

via_re = <function Batch.via_re>[source]

InterfaceRe.fullmatch(pos=0, endpos=None)[source]

If the whole string matches this regular expression, return True, else False. Note that this is different from a zero-length match.

Parameters:

pos – Gives an index in the string where the search is to start; it defaults to 0.
endpos – Limits how far the string will be searched; it will be as if the string is endpos characters long.

>>> bt = sf.Batch((('i', sf.Frame(np.arange(6).reshape(3,2), index=('p', 'q', 'r'), columns=('a', 'b'), name='x')), ('j', sf.Frame.from_fields(((10, 2, np.nan, 2), ('qrs ', 'XYZ', '', '123'), ('1517-01-01', '1517-04-01', 'NaT', '1517-04-01')), columns=('a', 'b', 'c'), dtypes=dict(c=np.datetime64), name='x'))))
>>> bt
<Batch max_workers=None>
>>> bt.via_re('123').fullmatch().to_frame()
<Frame>
<Index>                   a      b      c        <<U1>
<IndexHierarchy>
i                p        False  False  nan
i                q        False  False  nan
i                r        False  False  nan
j                0        False  False  False
j                1        False  False  False
j                2        False  False  False
j                3        False  True   False
<<U1>            <object> <bool> <bool> <object>

Batch.via_re(pattern, flags, /).split(maxsplit)

via_re = <function Batch.via_re>[source]

InterfaceRe.split(maxsplit=0)[source]

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting tuple.

Parameters:: maxsplit – If nonzero, at most maxsplit splits occur, and the remainder of the string is returned as the final element of the tuple.

>>> bt = sf.Batch((('i', sf.Frame(np.arange(6).reshape(3,2), index=('p', 'q', 'r'), columns=('a', 'b'), name='x')), ('j', sf.Frame.from_fields(((10, 2, np.nan, 2), ('qrs ', 'XYZ', '', '123'), ('1517-01-01', '1517-04-01', 'NaT', '1517-04-01')), columns=('a', 'b', 'c'), dtypes=dict(c=np.datetime64), name='x'))))
>>> bt
<Batch max_workers=None>
>>> bt.via_re('[X123]').split().to_frame()
<Frame>
<Index>                   a           b                c                    <<U1>
<IndexHierarchy>
i                p        ('0',)      ('', '')         nan
i                q        ('', '')    ('', '')         nan
i                r        ('4',)      ('5',)           nan
j                0        ('', '0.0') ('qrs ',)        ('', '5', '7-0', '-…
j                1        ('', '.0')  ('', 'YZ')       ('', '5', '7-04-0',…
j                2        ('nan',)    ('',)            ('NaT',)
j                3        ('', '.0')  ('', '', '', '') ('', '5', '7-04-0',…
<<U1>            <object> <object>    <object>         <object>

Batch.via_re(pattern, flags, /).findall(pos, endpos)

via_re = <function Batch.via_re>[source]

InterfaceRe.findall(pos=0, endpos=None)[source]

Return all non-overlapping matches of pattern in string, as a tuple of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a tuple of groups; this will be a tuple of tuples if the pattern has more than one group. Empty matches are included in the result.

Parameters:

pos – Gives an index in the string where the search is to start; it defaults to 0.
endpos – Limits how far the string will be searched; it will be as if the string is endpos characters long.

>>> bt = sf.Batch((('i', sf.Frame(np.arange(6).reshape(3,2), index=('p', 'q', 'r'), columns=('a', 'b'), name='x')), ('j', sf.Frame.from_fields(((10, 2, np.nan, 2), ('qrs ', 'XYZ', '', '123'), ('1517-01-01', '1517-04-01', 'NaT', '1517-04-01')), columns=('a', 'b', 'c'), dtypes=dict(c=np.datetime64), name='x'))))
>>> bt
<Batch max_workers=None>
>>> bt.via_re('[X123]').findall().to_frame()
<Frame>
<Index>                   a        b               c                    <<U1>
<IndexHierarchy>
i                p        ()       ('1',)          nan
i                q        ('2',)   ('3',)          nan
i                r        ()       ()              nan
j                0        ('1',)   ()              ('1', '1', '1', '1')
j                1        ('2',)   ('X',)          ('1', '1', '1')
j                2        ()       ()              ()
j                3        ('2',)   ('1', '2', '3') ('1', '1', '1')
<<U1>            <object> <object> <object>        <object>

Batch.via_re(pattern, flags, /).sub(repl, count)

via_re = <function Batch.via_re>[source]

InterfaceRe.sub(repl, count=0)[source]

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern is not found, the string is returned unchanged.

Parameters:

repl – A string or a function; if it is a string, any backslash escapes in it are processed.
count – The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced.

>>> bt = sf.Batch((('i', sf.Frame(np.arange(6).reshape(3,2), index=('p', 'q', 'r'), columns=('a', 'b'), name='x')), ('j', sf.Frame.from_fields(((10, 2, np.nan, 2), ('qrs ', 'XYZ', '', '123'), ('1517-01-01', '1517-04-01', 'NaT', '1517-04-01')), columns=('a', 'b', 'c'), dtypes=dict(c=np.datetime64), name='x'))))
>>> bt
<Batch max_workers=None>
>>> bt.via_re('[X123]').sub('==').to_frame()
<Frame>
<Index>                   a     b      c              <<U1>
<IndexHierarchy>
i                p        0     ==     nan
i                q        ==    ==     nan
i                r        4     5      nan
j                0        ==0.0 qrs    ==5==7-0==-0==
j                1        ==.0  ==YZ   ==5==7-04-0==
j                2        nan          NaT
j                3        ==.0  ====== ==5==7-04-0==
<<U1>            <object> <<U5> <<U6>  <object>

Batch.via_re(pattern, flags, /).subn(repl, count)

via_re = <function Batch.via_re>[source]

InterfaceRe.subn(repl, count=0)[source]

Perform the same operation as sub(), but return a tuple (new_string, number_of_subs_made).

Parameters:

repl – A string or a function; if it is a string, any backslash escapes in it are processed.
count – The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced.

>>> bt = sf.Batch((('i', sf.Frame(np.arange(6).reshape(3,2), index=('p', 'q', 'r'), columns=('a', 'b'), name='x')), ('j', sf.Frame.from_fields(((10, 2, np.nan, 2), ('qrs ', 'XYZ', '', '123'), ('1517-01-01', '1517-04-01', 'NaT', '1517-04-01')), columns=('a', 'b', 'c'), dtypes=dict(c=np.datetime64), name='x'))))
>>> bt
<Batch max_workers=None>
>>> bt.via_re('[X123]').subn('==', 1).to_frame()
<Frame>
<Index>                   a            b           c                  <<U1>
<IndexHierarchy>
i                p        ('0', 0)     ('==', 1)   nan
i                q        ('==', 1)    ('==', 1)   nan
i                r        ('4', 0)     ('5', 0)    nan
j                0        ('==0.0', 1) ('qrs ', 0) ('==517-01-01', 1)
j                1        ('==.0', 1)  ('==YZ', 1) ('==517-04-01', 1)
j                2        ('nan', 0)   ('', 0)     ('NaT', 0)
j                3        ('==.0', 1)  ('==23', 1) ('==517-04-01', 1)
<<U1>            <object> <object>     <object>    <object>