Splitter Classes
linesep
provides a set of classes (called splitters) for splitting
strings in chunks, inspired by the IncrementalEncoder
and
IncrementalDecoder
classes of the codecs
module. Input is fed to a
splitter instance one piece at a time, and the segments split from the input so
far are (depending on the methods used) either returned immediately or else
retrieveable from the splitter afterwards. This is useful when you have a data
source that is neither a string nor a filehandle.
If the input is in the form of an iterable, a splitter can be used to iterate over it and yield each segment:
>>> import linesep
>>> splitter = linesep.SeparatedSplitter("|", retain=True)
>>> input_data = ["one|two|thr", "ee|four|", "five||six"]
>>> for item in splitter.itersplit(input_data):
... print(repr(item))
...
'one'
'|'
'two'
'|'
'three'
'|'
'four'
'|'
'five'
'|'
''
'|'
'six'
Alternatively, input can be provided to the splitter one piece at a time by
passing it to the split()
method, which returns all newly-split off
items:
>>> splitter = linesep.TerminatedSplitter("\0", retain=False)
>>> splitter.split("foo\0bar\0baz")
['foo', 'bar']
>>> splitter.split("\0quux\0gnusto\0", final=True)
['baz', 'quux', 'gnusto']
At a lower level, input can be provided to the feed()
method, and
the output can be retrieved with get()
or getall()
:
>>> splitter = linesep.UniversalNewlineSplitter(retain=True, translate=True)
>>> splitter.feed("foo\nbar\r\nbaz")
>>> splitter.nonempty
True
>>> splitter.get()
'foo\n'
>>> splitter.nonempty
True
>>> splitter.get()
'bar\n'
>>> splitter.nonempty
False
>>> splitter.get()
Traceback (most recent call last):
...
SplitterEmptyError: No items available in splitter
>>> splitter.close()
>>> splitter.nonempty
True
>>> splitter.get()
'baz'
>>> splitter.nonempty
False
Like the *_preceded
, *_separated
, and *_terminated
functions,
strings passed to splitters may be either binary or text. However, the input
to a single instance of a splitter must be either all binary or all text, and
the output type will match.
Splitters
- class linesep.Splitter[source]
New in version 0.4.0.
Abstract base class for all splitters. The abstract methods are an implementation detail; this class is exported only for
isinstance()
and typing purposes and should not be subclassed by users.Splitter
and its subclasses are generic inAnyStr
; i.e., they should be written in type annotations asSplitterClass[AnyStr]
,SplitterClass[str]
, orSplitterClass[bytes]
, as appropriate.- feed(data: AnyStr) None [source]
Split input
data
. Any segments or separators extracted can afterwards be retrieved by callingget()
orgetall()
.- Raises
SplitterClosedError – if
close()
has already been called on this splitter
- get() AnyStr [source]
Retrieve the next unfetched item that has been split from the input.
- Raises
SplitterEmptyError – if there are no items currently available
- split(data: AnyStr, final: bool = False) list[AnyStr] [source]
Split input
data
and return all items thus extracted. Setfinal
toTrue
if this is the last chunk of input.Note that, if a previous call to
feed()
was not followed by enough calls toget()
to retrieve all items, any items left over from the previous round of input will be prepended to the list returned by this method.- Raises
SplitterClosedError – if
close()
has already been called on this splitter
- close() None [source]
Indicate to the splitter that the end of input has been reached. No further calls to
feed()
orsplit()
may be made after calling this method unlessreset()
orsetstate()
is called in between.Depending on the internal state, calling this method may cause more segments or separators to be split from unprocessed input; be sure to fetch them with
get()
orgetall()
.
- reset() None [source]
Reset the splitter to its initial state, as though a new instance with the same parameters were constructed
- getstate() SplitterState [source]
Retrieve a representation of the splitter’s current state
- setstate(state: SplitterState) None [source]
Restore the state of the splitter to the what it was when the corresponding
getstate()
call was made
- itersplit(iterable: Iterable) Iterator [source]
Feed each element of
iterable
as input to the splitter and yield each item produced.None of the splitter’s other methods should be called while iterating over the yielded values.
The splitter’s state is saved & reset before processing the iterable, and the saved state is restored at the end. If you break out of the resulting iterator early, the splitter will be in an undefined state unless & until you reset it.
- async aitersplit(aiterable: AsyncIterable) AsyncIterator [source]
Like
itersplit()
, but for asynchronous iterators
- class linesep.ParagraphSplitter(retain: bool = False, translate: bool = True)[source]
New in version 0.5.0.
A splitter that splits segments terminated by one or more blank lines (i.e., lines containing only a line ending), where lines are terminated by the ASCII newline sequences
"\n"
,"\r\n"
, and"\r"
.
- class linesep.PrecededSplitter(separator: AnyStr, retain: bool = False)[source]
New in version 0.4.0.
A splitter that splits segments preceded by a given string.
A separator at the beginning of the input simply starts the first segment, and a separator at the end of the input creates an empty trailing segment. Two adjacent separators always create an empty segment between them.
- Parameters
- Raises
ValueError – if
separator
is an empty string
- class linesep.SeparatedSplitter(separator: AnyStr, retain: bool = False)[source]
New in version 0.4.0.
A splitter that splits segments separated by a given string.
A separator at the beginning of the input creates an empty leading segment, and a separator at the end of the input creates an empty trailing segment. Two adjacent separators always create an empty segment between them.
Note that, when
retain
is true, separators are returned as separate items, alternating with segments (unlikeTerminatedSplitter
andPrecededSplitter
, where separators are appended/prepended to the segments). In a list returned bysplit()
orgetall()
, the segments will be the items at the even indices (starting at 0), and the separators will be at the odd indices (assuming you’re callingget()
the right amount of times and not leaving any output unfetched).- Parameters
- Raises
ValueError – if
separator
is an empty string
- class linesep.TerminatedSplitter(separator: AnyStr, retain: bool = False)[source]
New in version 0.4.0.
A splitter that splits segments terminated by a given string.
A separator at the beginning of the input creates an empty leading segment, and a separator at the end of the input simply terminates the last segment. Two adjacent separators always create an empty segment between them.
- Parameters
- Raises
ValueError – if
separator
is an empty string
- class linesep.UnicodeNewlineSplitter(retain: bool = False, translate: bool = True)[source]
New in version 0.5.0.
A splitter that splits segments terminated by the same set of line endings as recognized by the
str.splitlines()
method. Note that, unlike other splitters, this class is not generic and is only usable onstr
values, notbytes
.
Utilities
- linesep.get_newline_splitter(newline: Optional[str] = None, retain: bool = False) Splitter[str] [source]
New in version 0.4.0.
Return a splitter for splitting on newlines following the same rules as the
newline
option toopen()
.Specifically:
If
newline
isNone
, a splitter that splits on all ASCII newlines and converts them to"\n"
is returned.If
newline
is""
(the empty string), a splitter that splits on all ASCII newlines and leaves them as-is is returned.If
newline
is"\n"
,"\r\n"
, or"\r"
, a splitter that splits on the given string is returned.If
newline
is any other value, aValueError
is raised.
Note that this function is limited to splitting on
str
s and does not supportbytes
.
- class linesep.SplitterState[source]
New in version 0.4.0.
A representation of the internal state of a splitter, returned by
getstate()
. This can be passed tosetstate()
to restore the spitter’s internal state to what it was previously.A given
SplitterState
should only be passed to thesetstate()
method of a splitter of the same class and with the same constructor arguments as the splitter that produced theSplitterState
; otherwise, the behavior is undefined.Instances of this class should be treated as opaque objects and should not be inspected, nor should any observed property be relied upon to be the same in future library versions.