Comparison to mPyPl
mPyPl
is a project with certain similarities to
liquidata
.
A major architectural difference is that mPyPl
uses generators to pull data
through the pipeline, while liquidata
uses coroutines to push the data
through the pipeline. This is because liquidata
was designed to allow easy
bifurcation of flows into independent unsynchronized branches. (liquidata
will
probably also support pull-pipelines in the future.) Both mPyPl
and
liquidata
support synchronized, named branches by sending compound objects
with named components through the flow. mPyPl
's and liquidata
's approach to
managing these names is markedly different.
Here we compare and contrast the APIs provided by the two packages.
This example appears in the quickstart for mPyPl
:
import mPyPl as mp
images = (
mp.get_files('images',ext='.jpg')
| mp.as_field('filename')
| mp.apply('filename','image', lambda x: imread(x))
| mp.apply('filename','date', get_date)
| mp.apply(['image','date'],'result',lambda x: imprint(x[0],x[1]))
| mp.select_field('result')
| mp.as_list)
Here is its translation into liquidata
from liquidata import pipe, source, name, get, put
images = pipe(
get_files(...) >> source, name.filename,
imread * get.filename >> put.image,
get_date * get.filename >> put.date,
imprint * get.image.date)
Observations:
-
liquidata
highlights the high-level information about what happens in the pipeline:get_files
,imread
,get_date
,imprint
. In contrast,mPyPl
buries it in the noise. -
liquidata
avoids the use of strings as symbols. -
mPyPl
provides a specificget_files
utility;liquidata
can work with any iterable source of files, but providing such sources is outside of the scope ofliquidata
's goals. -
mp.as_field('filename')
is equivalent toname.filename
-
mp.apply
serves three purposes:- mapping a function over the stream data
- selecting arguments from the compound flow items
- placing the result back in the compound flow items
In contrast
liquidata
separates these concerns- mapping is done by default: no need to ask for it
get
selects argumentsput
places results
-
mp.apply(['image', 'date'], 'result', lambda x: imprint(x[0],x[1]))
- creates an argument tuple containing
image
anddate
- uses a
lambda
to unpack the argument tuple into the call toimprint
- puts the result back in the compound flow under the name
result
In contrast, in
imprint * get.image.date
get.image.date
creates an argument tuple*
unpacks the augment tuple into the call toimprint
- The lack of
put
causes the result to continue downstream on its own: the other items in the compound flow are no longer needed!
- creates an argument tuple containing
-
mp.select_field('result')
translates toget.result
inliquidata
. It extracts the interesting item from the compound flow. In theliquidata
version this step is not needed, because it was done implicitly in the previous step: by avoiding the use of>> put.result
, the result continued down the pipe on its own, rather than being placed in the compound object along with everything else. That is to sayimprint * get.image.date
is equivalent to
imprint * get.image.date >> put.result, get.result
-
mp.as_list
collects the results in a list. The equivalent (which would be writtenout(into(list))
) is missing from theliquidata
version, because it's the default. -
out(into(...))
is far more general thanmp.as_list
, as it will work with any callable that consumes iterables, such asset
,tuple
,min
,max
,sum
,sorted
,collections.Counter
, ... including any and all that will be written in the future.