Splitter Classes

linesep provides a set of classes (called splitters) for splitting strings in chunks, inspired by the IncrementalEncoder and IncrementalDecoder classes of the codecs module. Input is fed to a splitter instance one piece at a time, and the segments split from the input so far are (depending on the methods used) either returned immediately or else retrieveable from the splitter afterwards. This is useful when you have a data source that is neither a string nor a filehandle.

If the input is in the form of an iterable, a splitter can be used to iterate over it and yield each segment:

>>> import linesep
>>> splitter = linesep.SeparatedSplitter("|", retain=True)
>>> input_data = ["one|two|thr", "ee|four|", "five||six"]
>>> for item in splitter.itersplit(input_data):
...     print(repr(item))
...
'one'
'|'
'two'
'|'
'three'
'|'
'four'
'|'
'five'
'|'
''
'|'
'six'

Alternatively, input can be provided to the splitter one piece at a time by passing it to the split() method, which returns all newly-split off items:

>>> splitter = linesep.TerminatedSplitter("\0", retain=False)
>>> splitter.split("foo\0bar\0baz")
['foo', 'bar']
>>> splitter.split("\0quux\0gnusto\0", final=True)
['baz', 'quux', 'gnusto']

At a lower level, input can be provided to the feed() method, and the output can be retrieved with get() or getall():

>>> splitter = linesep.UniversalNewlineSplitter(retain=True, translate=True)
>>> splitter.feed("foo\nbar\r\nbaz")
>>> splitter.nonempty
True
>>> splitter.get()
'foo\n'
>>> splitter.nonempty
True
>>> splitter.get()
'bar\n'
>>> splitter.nonempty
False
>>> splitter.get()
Traceback (most recent call last):
    ...
SplitterEmptyError: No items available in splitter
>>> splitter.close()
>>> splitter.nonempty
True
>>> splitter.get()
'baz'
>>> splitter.nonempty
False

Like the *_preceded, *_separated, and *_terminated functions, strings passed to splitters may be either binary or text. However, the input to a single instance of a splitter must be either all binary or all text, and the output type will match.

Splitters

class linesep.Splitter[source]

New in version 0.4.0.

Abstract base class for all splitters. The abstract methods are an implementation detail; this class is exported only for isinstance() and typing purposes and should not be subclassed by users.

Splitter and its subclasses are generic in AnyStr; i.e., they should be written in type annotations as SplitterClass[AnyStr], SplitterClass[str], or SplitterClass[bytes], as appropriate.

feed(data: AnyStr) None[source]

Split input data. Any segments or separators extracted can afterwards be retrieved by calling get() or getall().

Raises

SplitterClosedError – if close() has already been called on this splitter

get() AnyStr[source]

Retrieve the next unfetched item that has been split from the input.

Raises

SplitterEmptyError – if there are no items currently available

property nonempty: bool

Whether a subsequent call to get() would return an item

getall() list[AnyStr][source]

Retrieve all unfetched items that have been split from the input

split(data: AnyStr, final: bool = False) list[AnyStr][source]

Split input data and return all items thus extracted. Set final to True if this is the last chunk of input.

Note that, if a previous call to feed() was not followed by enough calls to get() to retrieve all items, any items left over from the previous round of input will be prepended to the list returned by this method.

Raises

SplitterClosedError – if close() has already been called on this splitter

close() None[source]

Indicate to the splitter that the end of input has been reached. No further calls to feed() or split() may be made after calling this method unless reset() or setstate() is called in between.

Depending on the internal state, calling this method may cause more segments or separators to be split from unprocessed input; be sure to fetch them with get() or getall().

property closed: bool

Whether close() has been called on this splitter

reset() None[source]

Reset the splitter to its initial state, as though a new instance with the same parameters were constructed

getstate() SplitterState[source]

Retrieve a representation of the splitter’s current state

setstate(state: SplitterState) None[source]

Restore the state of the splitter to the what it was when the corresponding getstate() call was made

itersplit(iterable: Iterable) Iterator[source]

Feed each element of iterable as input to the splitter and yield each item produced.

None of the splitter’s other methods should be called while iterating over the yielded values.

The splitter’s state is saved & reset before processing the iterable, and the saved state is restored at the end. If you break out of the resulting iterator early, the splitter will be in an undefined state unless & until you reset it.

async aitersplit(aiterable: AsyncIterable) AsyncIterator[source]

Like itersplit(), but for asynchronous iterators

class linesep.ParagraphSplitter(retain: bool = False, translate: bool = True)[source]

New in version 0.5.0.

A splitter that splits segments terminated by one or more blank lines (i.e., lines containing only a line ending), where lines are terminated by the ASCII newline sequences "\n", "\r\n", and "\r".

Parameters
  • retain (bool) – Whether to include the trailing newlines in split items (True) or discard them (False, default)

  • translate (bool) – Whether to convert all newlines (both trailing and internal) to "\n" (True, default) or leave them as-is (False)

class linesep.PrecededSplitter(separator: AnyStr, retain: bool = False)[source]

New in version 0.4.0.

A splitter that splits segments preceded by a given string.

A separator at the beginning of the input simply starts the first segment, and a separator at the end of the input creates an empty trailing segment. Two adjacent separators always create an empty segment between them.

Parameters
  • separator (AnyStr) – The string to split the input on

  • retain (bool) – Whether to include the separators in split items (True) or discard them (False, default)

Raises

ValueError – if separator is an empty string

class linesep.SeparatedSplitter(separator: AnyStr, retain: bool = False)[source]

New in version 0.4.0.

A splitter that splits segments separated by a given string.

A separator at the beginning of the input creates an empty leading segment, and a separator at the end of the input creates an empty trailing segment. Two adjacent separators always create an empty segment between them.

Note that, when retain is true, separators are returned as separate items, alternating with segments (unlike TerminatedSplitter and PrecededSplitter, where separators are appended/prepended to the segments). In a list returned by split() or getall(), the segments will be the items at the even indices (starting at 0), and the separators will be at the odd indices (assuming you’re calling get() the right amount of times and not leaving any output unfetched).

Parameters
  • separator (AnyStr) – The string to split the input on

  • retain (bool) – Whether to include the separators in split items (True) or discard them (False, default)

Raises

ValueError – if separator is an empty string

class linesep.TerminatedSplitter(separator: AnyStr, retain: bool = False)[source]

New in version 0.4.0.

A splitter that splits segments terminated by a given string.

A separator at the beginning of the input creates an empty leading segment, and a separator at the end of the input simply terminates the last segment. Two adjacent separators always create an empty segment between them.

Parameters
  • separator (AnyStr) – The string to split the input on

  • retain (bool) – Whether to include the separators in split items (True) or discard them (False, default)

Raises

ValueError – if separator is an empty string

class linesep.UnicodeNewlineSplitter(retain: bool = False, translate: bool = True)[source]

New in version 0.5.0.

A splitter that splits segments terminated by the same set of line endings as recognized by the str.splitlines() method. Note that, unlike other splitters, this class is not generic and is only usable on str values, not bytes.

Parameters
  • retain (bool) – Whether to include the newlines in split items (True) or discard them (False, default)

  • translate (bool) – Whether to convert all retained newlines to "\n" (True, default) or leave them as-is (False)

class linesep.UniversalNewlineSplitter(retain: bool = False, translate: bool = True)[source]

New in version 0.4.0.

A splitter that splits segments terminated by the ASCII newline sequences "\n", "\r\n", and "\r".

Parameters
  • retain (bool) – Whether to include the newlines in split items (True) or discard them (False, default)

  • translate (bool) – Whether to convert all retained newlines to "\n" (True, default) or leave them as-is (False)

Utilities

linesep.get_newline_splitter(newline: Optional[str] = None, retain: bool = False) Splitter[str][source]

New in version 0.4.0.

Return a splitter for splitting on newlines following the same rules as the newline option to open().

Specifically:

  • If newline is None, a splitter that splits on all ASCII newlines and converts them to "\n" is returned.

  • If newline is "" (the empty string), a splitter that splits on all ASCII newlines and leaves them as-is is returned.

  • If newline is "\n", "\r\n", or "\r", a splitter that splits on the given string is returned.

  • If newline is any other value, a ValueError is raised.

Note that this function is limited to splitting on strs and does not support bytes.

Parameters

retain (bool) – Whether the returned splitter should include the newlines in split items (True) or discard them (False, default)

class linesep.SplitterState[source]

New in version 0.4.0.

A representation of the internal state of a splitter, returned by getstate(). This can be passed to setstate() to restore the spitter’s internal state to what it was previously.

A given SplitterState should only be passed to the setstate() method of a splitter of the same class and with the same constructor arguments as the splitter that produced the SplitterState; otherwise, the behavior is undefined.

Instances of this class should be treated as opaque objects and should not be inspected, nor should any observed property be relied upon to be the same in future library versions.

exception linesep.SplitterClosedError[source]

Bases: ValueError

New in version 0.4.0.

Raised when feed() or split() is called on a splitter after its close() method is called

exception linesep.SplitterEmptyError[source]

Bases: Exception

New in version 0.4.0.

Raised when get() is called on a splitter that does not have any unfetched items to return