One thing we learned from implementing job control in https://oils.pub is that the differing pipeline semantics of bash and zsh makes a difference
In bash, the last part of the pipeline is forked (unless shopt -s lastpipe)
In zsh, it isn't
$ bash -c 'echo hi | read x; echo $x' # no output
$ zsh -c 'echo hi | read x; echo $x'
hi
And then that affects this case: bash$ sleep 5 | read
^Z
[1]+ Stopped sleep 5 | read
zsh$ sleep 5 | read # job control doesn't apply to this case in zsh
^Zzsh: job can't be suspended
So yeah the semantics of shell are not very well specified (which is one reason for OSH and YSH). I recall a bug running an Alpine Linux shell script where this difference matters -- if the last part is NOT forked, then the script doesn't runI think there was almost a "double bug" -- the script relied on the `read` output being "lost", even though that was likely not the intended behavior
[0] https://gist.github.com/rrampage/5046b60ca2d040bcffb49ee38e8...
controlling terminal
session leader
job control
The parser was easy in comparison.https://www.destroyallsoftware.com/screencasts/catalog/shell...
Also helpful may be running strace on your shell, then reviewing the output line by line to make sure you understand each. This is a VERY instructive exercise to do in general.
The first line was always to turn off echo, and I've always wondered why that was a decision for batch script. Or I'm misremembering. 30 years of separation makes it hard to remember the details.
IIRC readline uses a `char *` internally since the length of a user-edited line is fairly bounded.
Dealing with the corner cases ends up teaching you a lot about a language and for an ancient language like the shell, dealing with the corner cases also takes you through the thinking process of the original authors and the constraints they were subject to. I found myself in this situation while writing EndBASIC and wrote an article with the surprises I encountered, because I found the journey fascinating: https://www.endbasic.dev/2023/01/endbasic-parsing-difficulti...
We can easily imagine it done a better way - for all the criticism of Windows, PowerShell gives a glimpse into this hypothetical future.
[0] https://github.com/lourencovales/codecrafters/blob/master/sh...
I have seen this misconception many times
In Oils, we have some pretty minor elaborations of the standard model, and it makes things a lot easier
How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html
Everything I wrote there still holds, although that post could use some minor updates (and OSH is the most bash-compatible shell, and more POSIX-compatible than /bin/sh on Debian - e.g. https://pages.oils.pub/spec-compat/2025-11-02/renamed-tmp/sp... )
---
To summarize that, I'd say that doing as much work as possible in the lexer, with regular languages and "lexer modes", drastically reduces the complexity of writing a shell parser
And it's not just one parser -- shell actually has 5 to 15 different parsers, depending on how you count
I often show this file to make that point: https://oils.pub/release/0.37.0/pub/src-tree.wwz/_gen/_tmp/m...
(linked from https://oils.pub/release/0.37.0/quality.html)
Fine-grained heterogenous algebraic data types also help. Shells in C tend to use a homogeneous command* and word* kind of representation
https://oils.pub/release/0.37.0/pub/src-tree.wwz/frontend/sy... (~700 lines of type definitions)
For side-projects, I have to ask myself if I'm writing a parser, or if I'm building something else; e.g. for a toy programming language, it's way more fun to start with an AST and play around, and come back to the parser if you really fall in love with it.