Comparison to mPyPl
mPyPl is a project with certain similarities to
liquidata.
A major architectural difference is that mPyPl uses generators to pull data
through the pipeline, while liquidata uses coroutines to push the data
through the pipeline. This is because liquidata was designed to allow easy
bifurcation of flows into independent unsynchronized branches. (liquidata will
probably also support pull-pipelines in the future.) Both mPyPl and
liquidata support synchronized, named branches by sending compound objects
with named components through the flow. mPyPl's and liquidata's approach to
managing these names is markedly different.
Here we compare and contrast the APIs provided by the two packages.
This example appears in the quickstart for mPyPl:
import mPyPl as mp
images = (
mp.get_files('images',ext='.jpg')
| mp.as_field('filename')
| mp.apply('filename','image', lambda x: imread(x))
| mp.apply('filename','date', get_date)
| mp.apply(['image','date'],'result',lambda x: imprint(x[0],x[1]))
| mp.select_field('result')
| mp.as_list)
Here is its translation into liquidata
from liquidata import pipe, source, name, get, put
images = pipe(
get_files(...) >> source, name.filename,
imread * get.filename >> put.image,
get_date * get.filename >> put.date,
imprint * get.image.date)
Observations:
-
liquidatahighlights the high-level information about what happens in the pipeline:get_files,imread,get_date,imprint. In contrast,mPyPlburies it in the noise. -
liquidataavoids the use of strings as symbols. -
mPyPlprovides a specificget_filesutility;liquidatacan work with any iterable source of files, but providing such sources is outside of the scope ofliquidata's goals. -
mp.as_field('filename')is equivalent toname.filename -
mp.applyserves three purposes:- mapping a function over the stream data
- selecting arguments from the compound flow items
- placing the result back in the compound flow items
In contrast
liquidataseparates these concerns- mapping is done by default: no need to ask for it
getselects argumentsputplaces results
-
mp.apply(['image', 'date'], 'result', lambda x: imprint(x[0],x[1]))- creates an argument tuple containing
imageanddate - uses a
lambdato unpack the argument tuple into the call toimprint - puts the result back in the compound flow under the name
result
In contrast, in
imprint * get.image.dateget.image.datecreates an argument tuple*unpacks the augment tuple into the call toimprint- The lack of
putcauses the result to continue downstream on its own: the other items in the compound flow are no longer needed!
- creates an argument tuple containing
-
mp.select_field('result')translates toget.resultinliquidata. It extracts the interesting item from the compound flow. In theliquidataversion this step is not needed, because it was done implicitly in the previous step: by avoiding the use of>> put.result, the result continued down the pipe on its own, rather than being placed in the compound object along with everything else. That is to sayimprint * get.image.dateis equivalent to
imprint * get.image.date >> put.result, get.result -
mp.as_listcollects the results in a list. The equivalent (which would be writtenout(into(list))) is missing from theliquidataversion, because it's the default. -
out(into(...))is far more general thanmp.as_list, as it will work with any callable that consumes iterables, such asset,tuple,min,max,sum,sorted,collections.Counter, ... including any and all that will be written in the future.