What is this?
liquidata
is a Python Embedded Domain Specific Language (EDSL) which aims to encourage and facilitate
-
increasing the signal-to-noise ratio in source code
-
avoiding using strings to represent symbols in the API
-
code reuse through composition of reusable and orthogonal components
-
dataflow programming
-
function composition
-
lazy processing.
Why would I want this?
Dataflow networks
It can be helpful to think of your computations as flows through a network or graph of components. For example
candidates
|
quick_screen
|
expensive_screen -------.
| \
can dance ? can sing ?
| |
hop test pitch test
| |
skip test rhythm test
| |
jump test |
| |
sum scores sum scores
| |
score above 210 ? score above 140 ?
| |
output dancers output singers
The aim of liquidata
is to allow you to express the idea laid out in the graph
above, in code that reflects the structure of the graph. A liquidata
implementation of the graph might look something like this:
select_candidates = pipe(
{ quick_screening },
{ expensive_screening },
[ { can_sing },
test_pitch,
test_rhythm,
sum_scores.pitch.rhythm,
{ score_above(140) },
out.singers
],
{ can_dance },
test_hop,
test_skip,
test_jump,
sum_scores.hop.skip.jump,
{ score_above(210) },
out.dancers)
selected = select_candidates(candidates)
# Do something with the results
send_to_singer_committee(selected.singers)
send_to_dancer_committee(selected.dancers)
Function composition
If you feel that the signal is drowned out by the noise in code written like this
for name in filenames:
file_ = open(name):
for line in file_:
for word in line.split():
print(word)
and that the intent is clearer in code presented like this
pipe(source << filenames, open, join, str.split, join, sink(print))
then you might find liquidata
interesting.
Still with me?
That was a trivial example. Let's have a look at something a little more involved.
If you are perfectly happy reading and writing code like this
def keyword_frequency_loop(directories):
counter = Counter()
for directory in directories:
for (path, dirs, files) in os.walk(directory):
for filename in files:
if not filename.endswith('.py'):
continue
for line in open(os.path.join(path, filename)):
for name in line.split('#', maxsplit=1)[0].split():
if iskeyword(name):
counter[name] += 1
return counter
then liquidata
is probably not for you.
But if the last example leaves you wanting to extract the core meaning from the noise, and you feel that this
all_files = os.walk, JOIN, NAME.path.dirs.files
pick_python_files = GET.files * (JOIN, { use(str.endswith, '.py') }) >> PUT.filename
file_contents = GET.path.filename * os.path.join, open, JOIN
ignore_comments = use(str.split, '#', maxsplit=1), GET[0]
pick_keywords = str.split, JOIN, { iskeyword }
keyword_frequency_pipe = pipe(
all_files,
pick_python_files,
file_contents,
ignore_comments,
pick_keywords,
OUT(INTO(Counter)))
is a step in the right direction, and if you feel that abstraction should be as easy as getting the above version by extracting subsequences from this prototype
keyword_frequency_pipe = pipe(
os.walk, JOIN,
NAME.path.dirs.files,
GET.files * (JOIN, { use(str.endswith, '.py') }) >> PUT.filename,
GET.path.filename * os.path.join,
open, JOIN,
use(str.split, '#', maxsplit=1),
GET[0],
str.split, JOIN,
{ iskeyword },
OUT(INTO(Counter)))
then you might want to read on.
Running these samples
-
select_candidates
is an outline of the solution, which omits details. As such, it is not executable. -
keyword_frequency_loop
and both versions ofkeyword_frequency_pipe
are both complete executable examples.
To run keyword_frequency_loop
, you will need these imports:
import os
from keyword import iskeyword
from collections import Counter
To run (either version of) keyword_frequency_pipe
you will additionally need
to get liquidata
, and import thus:
from liquidata import pipe, name as NAME, get as GET, put as PUT, join as JOIN, out as OUT, into as INTO, use
(The liquidata components were uppercased in order to highlight them in the example.)