This seems much like pipelines in a Unix shell, and I have also seen long pipeline expressions in Dart since it has streams built in. Pipeline expressions can be convenient, but the intermediate...
This seems much like pipelines in a Unix shell, and I have also seen long pipeline expressions in Dart since it has streams built in.
Pipeline expressions can be convenient, but the intermediate results are implicit, so it can quite difficult to tell what data is flowing between one stage and the next. When working on the shell you often build them up interactively, so you see the output of each stage in pipeline before adding another stage to the end.
But a code reviewer can't see that when they are reading a pipeline expression in checked-in source code. It's often a good idea to use local variables to name intermediate results, and perhaps write out their types explicitly if it helps readability.
It's basically the same problem as with any other giant expression. Expressions can be quite dense and it's important not to get carried away. A sequence of variable assignments lets you understand the code one step at a time. In some cases, a helper function can hide a complicated expression.
I both agree and disagree with you here. On one hand, I agree that Although I didn't make this point explicitly in the post, that's part of what I was getting at when thinking about "names as...
I both agree and disagree with you here. On one hand, I agree that
It's often a good idea to use local variables to name intermediate results, and perhaps write out their types explicitly if it helps readability.
Although I didn't make this point explicitly in the post, that's part of what I was getting at when thinking about "names as emphasis". If everything has a name, then naming an intermediate result doesn't call any attention to it. But when just a few things have a name, adding an intermediate named variable can offer a nice pause point for the reader, and emphasize that the data is now in a meaningful state for them to consider.
When working on the shell you often build them up interactively, so you see the output of each stage in pipeline before adding another stage to the end.… But a code reviewer can't see that when they are reading a pipeline expression in checked-in source code.
I'd hope that they can see that it's a pipeline based on the syntax/formatting of the code and can inspect the output at various points through the pipeline (either by running the program or via tests). With a shell pipeline, that might involve the tee command; with Raku ==> chains, it would probably involve adding ==> { .say; $_ }() (which prints its input and then passes it on unmodified).
Yes, agreed. If you can modify the program easily, then you can look at intermediate output. This is part of the debugging process. But in my experience, most code reviewers don’t run the program,...
Yes, agreed.
If you can modify the program easily, then you can look at intermediate output. This is part of the debugging process. But in my experience, most code reviewers don’t run the program, they just read it.
If you want a test that checks the intermediate output, there would need to be a way for the test to get at it, which would probably mean breaking up the pipeline into multiple functions somehow.
This was a nice read, but the thing that strikes me about the point free style is that it seems like there is a lot of shifting of load around that ultimately I don’t think fulfills the stated...
This was a nice read, but the thing that strikes me about the point free style is that it seems like there is a lot of shifting of load around that ultimately I don’t think fulfills the stated aim:
Programming in a pointfree style can make code far more readable; done correctly, it makes code less obscure rather than more.
The reason that I think Perl Raku is often derided as line noise or "write-only" by is because to program in Raku fluently requires first learning a lot of new symbols and idioms (i.e., reserved names and syntactic sugar that the language provides). That is to say, Raku allows you to avoid using your own names in many situations because the built-in affordances of the language provide basic objects and functions that are quite powerful if you can keep the logic of your program in your head and are willing to adopt a functional style.
Given that the chosen example involved some text processing, there is also a lot of reliance on regex, which is itself another domain-specific language with its own special symbols (i.e., another lexicon of special names that are afforded, but require familiarity).
I agree that there is a subjective quality that the rewritten version possesses. I wouldn’t call that quality "clarity", but rather some aesthetic quality that borders on cleverness and punctiliousness. But, to the reader who is illiterate in Raku, it is absolutely line noise. That’s not a critique, and if I take the author’s word that the rewrite is not even idiomatic Raku, I dare not imagine what an idiomatic version of this program might look like.
The main issue I have with the point free style of the rewritten version is that while the flow of the program is quite clear with the use of the feed (==>) operator, I think that modifying or refactoring the point free version would be so much more difficult than the 101 version. The context that one needs to build up in the point free version between any two ==> is a lot. You basically need to keep the whole program in your head to grok what any given part of the pipeline is doing. And if you want to make a change, you may have a long distance dependency in the pipeline that would affect things somewhere far away.
For programs that are intended to be reused, or are useful enough to end up being reused, they likely will need to be modified and refactored down the line. So, for one-off scripts, my critique isn’t consequential. But for any code that is going to be read (esp. by multiple different human readers), I think avoiding line noise is a much higher priority than aiming for any sort of aesthetic punctiliousness.
I think the concern about readability and density of language-specific idioms is a major reason why languages like Raku are avoided for larger programs that are intended to be used, reused, and extended/modified.
To give a brief example that occurred to me, what if I wanted to modify each version of this program to change the order that results are printed so that the players with the most wins are at the bottom rather than top of the list? Scanning the program to find the part that is relevant is so much easier because there is a very obvious line where the sort order is determined:
my @sorted = @names.sort({ %sets{$_} }).sort({ %matches{$_} }).reverse;
Whereas, in the point free style, we would want to remove this line:
==> reverse()
But, we’d basically have to read and grok the entire pipeline to establish the context in order to interpret that line. Or at least we’d have to make some educated guesses based on the named capture groups in the preceding line. If the capture groups were just named <x>, <y>, and <z>, or were just indexed by their order, I’m not sure I’d be able to understand it at all without having seen the 101 version first.
This is an entirely fair critique and raises several good points. In particular, I really like what you have to say about "aesthetic punctiliousness" (a nice phrase, by the way); I probably am too...
This is an entirely fair critique and raises several good points. In particular, I really like what you have to say about "aesthetic punctiliousness" (a nice phrase, by the way); I probably am too drawn to code that's "pretty"/fits my aesthetic sensibilities. I try (perhaps unsuccessfully?) to prevent this from interfering with clarity.
A couple more specific replies:
[T]o the reader who is illiterate in Raku, [the rewritten program] is absolutely line noise.... The main issue I have with the point free style of the rewritten version is that ... modifying or refactoring the point free version would be so much more difficult than the 101 version. ... But for any code that is going to be read (esp. by multiple different human readers), I think avoiding line noise is a much higher priority than aiming for any sort of aesthetic punctiliousness.
I think the concern about readability and density of language-specific idioms is a major reason why languages like Raku are avoided for larger programs that are intended to be used, reused, and extended/modified.
If I'm understanding you correctly, you believe that a program should not only be clear to someone well versed in a particular language but should also be clear to someone who is new to the language/unfamiliar with the language specific idioms. Is that correct?
That view is extremely widespread -- and I agree that it is often correct. In particular, that seems to be correct in a large team/high turnover/corporate environment: if you have a lot of different people joining a team and many/all of them aren't experts in the particular language, then readability in that sense is extremely important. (In my view, it's not at all a coincidence that the high turnover environment I described is basically a description of Google, and golang is one of the most readable languages in this sense.)
In my opinion, however, there's another way to program -- a way that isn't as good a fit for the sort of programming done in large organizations, and thus isn't as well served by mainstream languages funded by those organizations. When writing code using a low-turnover methodology, clarity of the code is still paramount, but it's clarity to someone steeped in the idioms of the language, not clarity to a newcomer. (Ok, "steeped in the idioms of the language" sounds way more pretentious than I wanted; all I really mean is "someone who has programmed the language 6+ months".) And that's the sort of clarity I'm aiming for.
what if I wanted to modify each version of this program to change the order that results are printed so that the players with the most wins are at the bottom rather than top of the list? Scanning the program to find the part that is relevant is so much easier because there is a very obvious line where the sort order is determined:
my @sorted = @names.sort({ %sets{$_} }).sort({ %matches{$_} }).reverse;
Whereas, in the point free style, we would want to remove this line:
==> reverse()
But, we’d basically have to read and grok the entire pipeline to establish the context in order to interpret that line.
I don't think that's entirely fair. The way I'd put it is that instead of changing
my @sorted = @names.sort({ %sets{$_} }).sort({ %matches{$_} }).reverse;
(Yes, this concept is formatted as one line in the original and two in the revision. But it could easily have been one in the revised version if I hadn't been optimizing for narrow screens. And, in any event, it's one logical unit regardless of the linebreak.)
IMO, ==> sort(...) is just as clear a signpost for "this is the code that does the sorting" as my @sorted = ... -- maybe even a bit easier to find when skimming the code.
(By the way, .<matches>, .<sets>, and .<name> aren't named captures; they're the keys in each of the hashes in the array we're sorting. I agree 100% that giving those keys names like x, y, and z would make the code much less readable.)
I think it’s an ideal to aspire to. It’s not possible, in practice, except for trivial programs. I don’t think I can make the argument any better than Joel Spolsky and his adage "It’s harder to...
If I'm understanding you correctly, you believe that a program should not only be clear to someone well versed in a particular language but should also be clear to someone who is new to the language/unfamiliar with the language specific idioms. Is that correct?
I think it’s an ideal to aspire to. It’s not possible, in practice, except for trivial programs. I don’t think I can make the argument any better than Joel Spolsky and his adage "It’s harder to read code than to write it."
(By the way, .<matches>, .<sets>, and .<name> aren't named captures; they're the keys in each of the hashes in the array we're sorting. I agree 100% that giving those keys names like x, y, and z would make the code much less readable.)
Ah, I see now. The fact that Raku uses the same symbols <> for regex capture group identifiers and associative keys confused me (I’m obviously illiterate in Raku). I can sort of get behind the motivation of raising regex to be a first class thing, and as such, having named capture groups be more similar to associative keys, but since regexes still don’t seem to be raised up out of being a domain specific language (DSL) within Raku even if regexes are built-in, it’s still confusing. Overloading of symbols like this within DSLs makes programs even more difficult to read because you have to remember which language you’re reading in order to interpret the program correctly.
IMO, ==> sort(…) is just as clear a signpost for "this is the code that does the sorting" as my @sorted = … -- maybe even a bit easier to find when skimming the code.
I think the main thing is that the well chosen name @sorted makes it very clear what’s going on. Yes, the built-in function name sort() is potentially clear, but declaring @sorted by name is even clearer, IMO.
Also, thanks for linking to A Raku Manifesto, Part 3 piece. I think the trade-offs you discuss there are really good to keep in mind. This is something that bugs the hell out of me. I hate it when...
Also, thanks for linking to A Raku Manifesto, Part 3 piece. I think the trade-offs you discuss there are really good to keep in mind. This is something that bugs the hell out of me. I hate it when working with others who start off writing a shell script and then start trying to maintain it as it grows and feature creeps. At some point, usually after ~3 people have touched significant parts of the code, I think it’s better to take a step back and rewrite the thing in a language that has been designed with full-scale software engineering in mind.
But, for little functions or quality of life things that you throw in your .*rc or write for personal use, I totally see the appeal of trying to optimize for mastery and individual productivity.
I find pointfree programming (also called "tacit programming") really powerful, but most/all of the descriptions I've found are way more abstract than I'd prefer. This is my attempt to talk...
I find pointfree programming (also called "tacit programming") really powerful, but most/all of the descriptions I've found are way more abstract than I'd prefer. This is my attempt to talk through the advantages in the context of a real-world example.
This seems much like pipelines in a Unix shell, and I have also seen long pipeline expressions in Dart since it has streams built in.
Pipeline expressions can be convenient, but the intermediate results are implicit, so it can quite difficult to tell what data is flowing between one stage and the next. When working on the shell you often build them up interactively, so you see the output of each stage in pipeline before adding another stage to the end.
But a code reviewer can't see that when they are reading a pipeline expression in checked-in source code. It's often a good idea to use local variables to name intermediate results, and perhaps write out their types explicitly if it helps readability.
It's basically the same problem as with any other giant expression. Expressions can be quite dense and it's important not to get carried away. A sequence of variable assignments lets you understand the code one step at a time. In some cases, a helper function can hide a complicated expression.
I both agree and disagree with you here. On one hand, I agree that
Although I didn't make this point explicitly in the post, that's part of what I was getting at when thinking about "names as emphasis". If everything has a name, then naming an intermediate result doesn't call any attention to it. But when just a few things have a name, adding an intermediate named variable can offer a nice pause point for the reader, and emphasize that the data is now in a meaningful state for them to consider.
I'd hope that they can see that it's a pipeline based on the syntax/formatting of the code and can inspect the output at various points through the pipeline (either by running the program or via tests). With a shell pipeline, that might involve the
tee
command; with Raku==>
chains, it would probably involve adding==> { .say; $_ }()
(which prints its input and then passes it on unmodified).Yes, agreed.
If you can modify the program easily, then you can look at intermediate output. This is part of the debugging process. But in my experience, most code reviewers don’t run the program, they just read it.
If you want a test that checks the intermediate output, there would need to be a way for the test to get at it, which would probably mean breaking up the pipeline into multiple functions somehow.
This was a nice read, but the thing that strikes me about the point free style is that it seems like there is a lot of shifting of load around that ultimately I don’t think fulfills the stated aim:
The reason that I think
PerlRaku is often derided as line noise or "write-only" by is because to program in Raku fluently requires first learning a lot of new symbols and idioms (i.e., reserved names and syntactic sugar that the language provides). That is to say, Raku allows you to avoid using your own names in many situations because the built-in affordances of the language provide basic objects and functions that are quite powerful if you can keep the logic of your program in your head and are willing to adopt a functional style.Given that the chosen example involved some text processing, there is also a lot of reliance on regex, which is itself another domain-specific language with its own special symbols (i.e., another lexicon of special names that are afforded, but require familiarity).
I agree that there is a subjective quality that the rewritten version possesses. I wouldn’t call that quality "clarity", but rather some aesthetic quality that borders on cleverness and punctiliousness. But, to the reader who is illiterate in Raku, it is absolutely line noise. That’s not a critique, and if I take the author’s word that the rewrite is not even idiomatic Raku, I dare not imagine what an idiomatic version of this program might look like.
The main issue I have with the point free style of the rewritten version is that while the flow of the program is quite clear with the use of the feed (
==>
) operator, I think that modifying or refactoring the point free version would be so much more difficult than the 101 version. The context that one needs to build up in the point free version between any two==>
is a lot. You basically need to keep the whole program in your head to grok what any given part of the pipeline is doing. And if you want to make a change, you may have a long distance dependency in the pipeline that would affect things somewhere far away.For programs that are intended to be reused, or are useful enough to end up being reused, they likely will need to be modified and refactored down the line. So, for one-off scripts, my critique isn’t consequential. But for any code that is going to be read (esp. by multiple different human readers), I think avoiding line noise is a much higher priority than aiming for any sort of aesthetic punctiliousness.
I think the concern about readability and density of language-specific idioms is a major reason why languages like Raku are avoided for larger programs that are intended to be used, reused, and extended/modified.
To give a brief example that occurred to me, what if I wanted to modify each version of this program to change the order that results are printed so that the players with the most wins are at the bottom rather than top of the list? Scanning the program to find the part that is relevant is so much easier because there is a very obvious line where the sort order is determined:
Whereas, in the point free style, we would want to remove this line:
But, we’d basically have to read and grok the entire pipeline to establish the context in order to interpret that line. Or at least we’d have to make some educated guesses based on the named capture groups in the preceding line. If the capture groups were just named
<x>
,<y>
, and<z>
, or were just indexed by their order, I’m not sure I’d be able to understand it at all without having seen the 101 version first.This is an entirely fair critique and raises several good points. In particular, I really like what you have to say about "aesthetic punctiliousness" (a nice phrase, by the way); I probably am too drawn to code that's "pretty"/fits my aesthetic sensibilities. I try (perhaps unsuccessfully?) to prevent this from interfering with clarity.
A couple more specific replies:
If I'm understanding you correctly, you believe that a program should not only be clear to someone well versed in a particular language but should also be clear to someone who is new to the language/unfamiliar with the language specific idioms. Is that correct?
That view is extremely widespread -- and I agree that it is often correct. In particular, that seems to be correct in a large team/high turnover/corporate environment: if you have a lot of different people joining a team and many/all of them aren't experts in the particular language, then readability in that sense is extremely important. (In my view, it's not at all a coincidence that the high turnover environment I described is basically a description of Google, and golang is one of the most readable languages in this sense.)
In my opinion, however, there's another way to program -- a way that isn't as good a fit for the sort of programming done in large organizations, and thus isn't as well served by mainstream languages funded by those organizations. When writing code using a low-turnover methodology, clarity of the code is still paramount, but it's clarity to someone steeped in the idioms of the language, not clarity to a newcomer. (Ok, "steeped in the idioms of the language" sounds way more pretentious than I wanted; all I really mean is "someone who has programmed the language 6+ months".) And that's the sort of clarity I'm aiming for.
I've written about this at more length previously.
I don't think that's entirely fair. The way I'd put it is that instead of changing
you'd change
(Yes, this concept is formatted as one line in the original and two in the revision. But it could easily have been one in the revised version if I hadn't been optimizing for narrow screens. And, in any event, it's one logical unit regardless of the linebreak.)
IMO,
==> sort(...)
is just as clear a signpost for "this is the code that does the sorting" asmy @sorted = ...
-- maybe even a bit easier to find when skimming the code.(By the way,
.<matches>
,.<sets>
, and.<name>
aren't named captures; they're the keys in each of the hashes in the array we're sorting. I agree 100% that giving those keys names likex
,y
, andz
would make the code much less readable.)I think it’s an ideal to aspire to. It’s not possible, in practice, except for trivial programs. I don’t think I can make the argument any better than Joel Spolsky and his adage "It’s harder to read code than to write it."
Ah, I see now. The fact that Raku uses the same symbols
<>
for regex capture group identifiers and associative keys confused me (I’m obviously illiterate in Raku). I can sort of get behind the motivation of raising regex to be a first class thing, and as such, having named capture groups be more similar to associative keys, but since regexes still don’t seem to be raised up out of being a domain specific language (DSL) within Raku even if regexes are built-in, it’s still confusing. Overloading of symbols like this within DSLs makes programs even more difficult to read because you have to remember which language you’re reading in order to interpret the program correctly.I think the main thing is that the well chosen name
@sorted
makes it very clear what’s going on. Yes, the built-in function namesort()
is potentially clear, but declaring@sorted
by name is even clearer, IMO.Also, thanks for linking to A Raku Manifesto, Part 3 piece. I think the trade-offs you discuss there are really good to keep in mind. This is something that bugs the hell out of me. I hate it when working with others who start off writing a shell script and then start trying to maintain it as it grows and feature creeps. At some point, usually after ~3 people have touched significant parts of the code, I think it’s better to take a step back and rewrite the thing in a language that has been designed with full-scale software engineering in mind.
But, for little functions or quality of life things that you throw in your
.*rc
or write for personal use, I totally see the appeal of trying to optimize for mastery and individual productivity.I find pointfree programming (also called "tacit programming") really powerful, but most/all of the descriptions I've found are way more abstract than I'd prefer. This is my attempt to talk through the advantages in the context of a real-world example.