Email or username:

Password:

Forgot your password?
Julia Evans

spent some time today working on this diagram of how the ASCII control characters work in unix

there are a lot of mistakes/missing nuance but I think it's really interesting how little structure there is. Special codes (like `3` for SIGINT) that are handled by the OS are mixed with just regular keypresses (like `13` for "enter”) which are mixed with codes that are handled by the application (like `1` for Ctrl-A in readline)

(not looking for history lessons right now)

82 comments
powersoffour

@b0rk this absolutely would have saved me an afternoon of work when I ran into \x0b in a data stream nested inside a newline-delimited data stream!

Julia Evans

some other things I think are interesting about the control codes:

* some of the "OS terminal driver stuff" is also sometimes handled by readline
* there are only 33 of them, so “Ctrl-1" is not a thing, Only Ctrl-A to Z plus seven more (@, [, \, ], ^, _, ?)
* You can't really use `Ctrl+M` as a keyboard shortcut because that's "Enter". Same for Ctrl-I.
* a lot of the "readline" ones will work even if you aren't using readline, many libraries mimic readline's functionality

Athena L.M.

@b0rk several of the readline control codes are implemented as keyboard shortcuts in some common text editing widgets, particularly Ctrl+A/E

lucy 🌸

@b0rk ctrl+h is particularly weird for me because in GUI emacs that's the prefix for the help commands, but in the terminal it's backspace so I try and get help and accidentally delete my stuff

Bart Groeneveld

@b0rk That is also another hint for the 'j' in vim: go down one line, just like ctrl+j does in a terminal. Or is this just a coincident?

Gomijacogeo

@bartavi A bit of both plus a decision by an early terminal manufacturer. catonmat.net/why-vim-uses-hjkl

ASCII had already defined the codes and 8 (aka ^H or backspace) and 10 (aka ^J or linefeed) already had a kind of intrinsic left- and down-ness to them. 11 and 12 (^K and ^L, or vertical tab and formfeed) kind of implied motion, but had more meaning when terminals printed to paper rather than a screen.

The folks at LS decided since your fingers were already there, why not complete the arrow set.

@bartavi A bit of both plus a decision by an early terminal manufacturer. catonmat.net/why-vim-uses-hjkl

ASCII had already defined the codes and 8 (aka ^H or backspace) and 10 (aka ^J or linefeed) already had a kind of intrinsic left- and down-ness to them. 11 and 12 (^K and ^L, or vertical tab and formfeed) kind of implied motion, but had more meaning when terminals printed to paper rather than a screen.

_L4NyrlfL1I0

@bartavi @b0rk Likely coincidence. Have a look at "man ascii" and look at characters Decimal 1 to 26. Those are for the most part what you will get if it doesn't already do something special (like Ctrl+C).

For example, Ctrl+H is character 8, which is backspace (but only the backspace character). Ctrl+I is a tab, and Ctrl+J is a newline '\n'. (This means Ctrl+M is actually a carriage return '\r', but your os likely translates both into '\n' in the input that your terminal programs receive).

_L4NyrlfL1I0

@bartavi @b0rk I'm not actually completely sure what's going on with Ctrl+J and Ctrl+M, need to test that further.

Mark Dominus

@b0rk 0x7f is usually called "delete" or "DEL", to constrast it with 0x08 which is "backspace", and they are different and sometimes the terminal gets confused.

But your chart calls them both "backspace", which I think might tend to add to the amount of confusion in the universe.

Julia Evans

@mjd thanks that's a good point -- I'm very confused about what Ctrl-H does in practice -- to me it feels like 0x08 is just kind of some weird old cruft that has no real purpose but maybe that's not true

(if it does have a purpose I'd be interested to know what it is! I don't have a backspace key on my keyboard right now but my impression is that the backspace key is supposed to map to an ASCII DEL)

⛧ esoterik ⛧

@b0rk if you type "abcdef" at a prompt and then put your cursor between c and d (abc|def) then:

0x08 (backspace) usually produces ab|def

0x7f (delete) usually produces abc|ef

@mjd

Zack Weinberg

@b0rk @mjd I know that in the days of printing terminals and paper tape, the ASCII "backspace" and "delete" control codes had two clearly distinct functions.

Backspace meant literally back up the print head or tape punch one space. It *didn't* erase anything, and it was common to "overprint" two or more characters in the same space (a vestige of this survives today, some programs will emit a letter, ^H, and then _, and expect the letter to be underlined).

Delete, on the other hand, meant cross out the character at the *current* print head position. That's why it's all by itself at the high end of the ASCII range: its encoding is all-bits-1, which is all seven holes punched on a 7-bit paper tape, so if you back up your tape punch and punch DEL it completely destroys whatever used to be there.

I do not know how we got from there to how it is today.

@b0rk @mjd I know that in the days of printing terminals and paper tape, the ASCII "backspace" and "delete" control codes had two clearly distinct functions.

Backspace meant literally back up the print head or tape punch one space. It *didn't* erase anything, and it was common to "overprint" two or more characters in the same space (a vestige of this survives today, some programs will emit a letter, ^H, and then _, and expect the letter to be underlined).

Chris Siebenmann

@b0rk @mjd My memory is that there was a big schism in the physical terminal world about what the 'backspace' key generated. Many terminals had it generate Ctrl-H, but some used DEL; I believe DEC terminals such as the VT-100 were in the DEL camp, and they were very popular. This carried over to Unix workstations, with vendors doing different mappings (eg SGI made their 'backspace' key generate Ctrl-H in a default configuration; DEC used DEL of course).

Chris Siebenmann

@b0rk @mjd Today, X on Linux has a 'BackSpace' keysym and a 'Delete' keysym, generated by their respective keyboard keys, but I believe that all terminal emulators map both of them to DEL by default (xterm and gnome-terminal certainly do). Things like browsers tend to make the big BackSpace key delete characters to the left of the cursor (what you expect) and the 'Delete' key delete to the right, which can be a surprise.

Jef Poskanzer :batman:

@cks @b0rk @mjd With Unix caught in the middle of the schism, yeah. I had a small part in resolving things: at a Usenix 'ask the BSD team' session, I stood up and suggested that they deal with it by having two erase characters in the tty driver. And they said, Ok! Thus was born erase2.

Petru Ratiu

@cks @b0rk @mjd To this day my circle of IRC friends uses the mini-joke of ^H^H^H being poorly translated backspace presses even though we collectively forgot what chain of systems was needed to produce this specific "terminal mojibake".

Simon Tatham

@b0rk @mjd I see a lot of people have posted about history, but as I see it the _effects_ of the history are to create two current schools of thought.

1. the Backspace key (above Return) is 0x08, and the Del key (on the nav keypad) is 0x7F

2. Backspace is 0x7F, and Del is some sequence like ESC[3~ (similar to the rest of that keypad)

Emacs users like #2 because it frees up 0x08 to be ^H for help. But enough people like #1 that lots of software all needs a config option!

Chris Siebenmann

@b0rk Fun trivia on the Ctrl-1 and Ctrl-M and so on thing: non-terminal graphical programs with direct access to keypress information can bind them (of course). This is especially fun for programs that can operate either in the terminal or as stand-alone graphical applications, like GNU Emacs, because you can innocently create a binding for yourself that only works in GUI mode (which you might not notice for a while if you only rarely use the editor in a terminal).

Ben Zanin

@cks @b0rk like how avy-mode¹ recommends binding Ctrl-: in its default docs², with the consequence that it's unusable by folks who use terminal emacs (or a combination of GUI and terminal emacs panes connecting to a shared emacs daemon). Unless they're wizardly enough to feel comfortable reading all documentation about it when an interpretation filter in their brain!

¹: karthinks.com/software/avy-can

²: github.com/abo-abo/avy?tab=rea

maswan

@b0rk
I use ^M occasionally, in particular the combo of ^V^M to get a newline into a vim substitution expression.

yumaikas/sakiamu

@b0rk web-based injections attacks make a bit more sense, given that this was part of the basis for stuff

Schamschula

@b0rk I also use Ctrl-K (delete from cursor to right) a lot! I just reminded myself that Ctrl-U deletes from the cursor to the left. I tend to use Ctrl-W instead.

Dan Neuman

@b0rk Ah, this is why every time I hit ctl-X it starts downloading emacs.

Hisham

@b0rk I would paint Ctrl+H (backspace) green as well. On my terminal, at least, Ctrl+H and the Backspace key are equivalent.

Steve Loughran

@b0rk what did Control-X do before emacs was invented?

Zack Weinberg

@b0rk related to the mostly lack of structure is that the ASCII spec gives functions for all of the control codes, and these mostly have nothing to do with how they are used by Unix terminal programs. The ones where there is a connection are ^H and ^? (backspace), ^J and ^M (newline), ^I (tab), ^L (form feed / clear and redraw) ^G (bell / beep), ESC (escape), and debatably ^D (end of transmission / pseudo-end of file).

You said you didn't want a history lesson but if you decide you're curious the place to start is the monograph "The Evolution of Character Codes, 1874-1968" falsedoor.com/doc/ascii_evolut

@b0rk related to the mostly lack of structure is that the ASCII spec gives functions for all of the control codes, and these mostly have nothing to do with how they are used by Unix terminal programs. The ones where there is a connection are ^H and ^? (backspace), ^J and ^M (newline), ^I (tab), ^L (form feed / clear and redraw) ^G (bell / beep), ESC (escape), and debatably ^D (end of transmission / pseudo-end of file).

Zack Weinberg

@b0rk the readline codes are theoretically mnemonic based on the letter, like ^R for reverse search, but a lot of them are a real stretch (^Y "yank" = paste???)

Alfred M. Szmidt

@b0rk This chart is confusing, since it mixes things... many of the #GNU Readline keybindings have little to do with ASCII bing dings but with Emacs, which is where they came from (and Emacs came from a system that didn't use ASCII).

Alfred M. Szmidt

@b0rk C-r is not really right, it is search history in reverse, depending on your system if you are using #GNU Readline C-s will search it forward (both come from Emacs, and are probably 40 years old or more by now). Most UNIX terminals though hijack C-s to be scroll lock ...

1-bit machine goth

@b0rk minor nitpick, ^D is the EOT (End-of-Transmission) control character, which the unix terminal driver translates to an EOF (which technically isn't a character) indicator/condition if it's at the start of a line

Jonathan Moore

@b0rk

Did you notice that the control character is the same as the corresponding letter you type with a change in the high order octal place?

Ex bell is octal 007 and G is octal 107.

I find this really useful when I can't remember how to type a particular control character, I can just look at the ASCII man page to figure it out.

Julia Evans

@Moore interesting what's an example of a control character you might look at the ASCII man page to type in?

I know about this correspondance but I've never found a use for it

Jonathan Moore

@b0rk anything other than ^C.
Like if you want to send \r instead of \n.

I have only used it for practical reasons once or twice.

Hans Hübner

@b0rk It is worthwhile to think of "Unix" and "readline" separately.

"Unix" provides terminal driver functionality, and cooked mode is part of the driver. Originally, all line editing functionality was provided by the tty driver, and different Unix variants had different features. The BSD driver was way more feature rich than the Unix System V driver. One thing all Unix tty drivers have in common, however, is that they have programmable control characters.

1/

Julia Evans

@hanshuebner I don't really understand what you mean, I think it's obvious that "readline" exists at a very different level than the terminal driver?

what's interesting to me is that from a user’s perspective it's very unclear which things are being handled by the OS and which are passed through to the application and they're just kind of chaotically mixed together

Hans Hübner

@b0rk I don't think that it is all that obvious that readline is not part of the driver. Providing kernel-level line editing basically was a necessity in the 16bit era, but when machines got more powerful, providing these things in user mode became more common.

What that also meant is that there was no single line editing system, but each user-mode program could (and needed) to provide its own. GNU readline, albeit popular, was really hindered by its license.

Hans Hübner

@b0rk With a user-mode line editor, it is only the control characters that generate interrupts which require driver support, and modern implementations don't even need to rely on the driver to send signals (i.e. they treat Ctrl-C like any other character and send a SIGINT if they want to).

Hans Hübner

@b0rk The defaults for the control characters varied between systems. System V originally had @ as the kill character and DEL as interrupt, which was rather annoying when you used a terminal that had DEL where you would have the BS key.

When machines got faster, it became feasible to have a user-mode line editing program do the input processing. tcsh was the first popular shell to have line editing with arrow key support. ksh provided vi-style editing in a hybrid setting.

2/

Hans Hübner

@b0rk I bore you with these historical details because I think that the question "what control characters does Unix use" cannot really be answered on that level in a meaningful manner. It may make sense to discuss common driver-based and user mode line editing facilities, but generally, in the Unix world, they are all configurable and the defaults changed over time.

Other systems (i.e. VMS or DG/OS) were way less flexible, which meant that they required specific terminals.

3/3

Julia Evans

@hanshuebner hmm it feels a lot more fixed to me than that, like I know there are a lot of unixes out there but personally as someone who has only ever used Mac and Linux machines and would absolutely never change my `stty` settings in a million years, for me there are a lot of assumptions it's very fair to make about how these things are going to work

like it's hard to imagine that very many people are going to remap SIGINT to something other than Ctrl-C even if theoretically you can

Hans Hübner

@b0rk I don't disagree. I guess for most people, the fact that a terminal driver exists and can be misconfigured is just a historical nuisance and not something that they can or need to take advantage of.

Julia Evans

@hanshuebner for me it's useful to know about the terminal driver because it means that when I’m using some terrible program without line editing support (which happens more often than I'd like!) I know that the terminal driver will still let me do `Ctrl-W` or `Ctrl-U` when I make a mistake

Hans Hübner

@b0rk Absolutely! This is why I think that tty driver functionality and user-mode line editing behavior deserve separate treatment. I think it is still useful to know that the driver exists and when it is active. Also, it is worth knowing that BSD derived systems have the status character (Ctrl-T) and that incorrect setting of the erase character can bite you.

Julia Evans

@hanshuebner thanks, I just learned that Ctrl+H is sometimes backspace today

wfk

@b0rk in a literal sense, ctrl-H *is* backspace. The ERASE function in termios has historically been set to either DEL or BACKSPACE, depending on the keyboard layout and configuration of the terminal being used. This is why many terminal emulators still have a configuration setting to indicate what the Backspace/DEL/<- key on your keyboard should generate. Whether user space code honours the termios setting when running in raw mode depends on the user space code...

Julia Evans

@wfk on my machine ctrl-H is not backspace, would be curious to see the output of stty -a on your machine to see how it works there!

wfk replied to Julia

@b0rk we may be using different definitions of backspace here. I suspect you are talking about what is named the ERASE function in termios: undo the previously typed character on the line. I use backspace in the ASCII meaning, which is the ASCII code associated with ctrl-H. If your keyboard does not generate that ASCII code when you press ctrl-H, that's a seriously non-standard setup. This is a function of the terminal (emulator), not of the tty driver.

Julia Evans replied to wfk

@wfk yea I guess what I'm trying to figure out is whether 0x08 has much of a real purpose today or whether it's mostly cruft

Eli the Bearded

@b0rk @hanshuebner
As an example, the tty level driver is operating when you are in password entering mode. The readline functions will not be there, but tty level backspace, delete word, and delete line will be.

And people who want to can reset tty or readline config, but they are controlled in different ways.

(I disable emacs style editing everywhere I can use vi style.)

Raven667

@hanshuebner talking about software maintenance, the context around it changes which changes what design and abstraction makes sense. Given today's terminal context, maybe one could design a new tty layer that is much simpler by making more assumptions about the capabilities of the terminals and apps.

Stuart Langridge

@b0rk wait wait wait, sigint is signal 3 (so presumably kill -3 pid works?) and C is the third letter, and that’s why ctrl-C quits things?? How fascinating! I had no idea! Every time you do a thing I learn something :)

wfk

@b0rk part of the lack of structure is due to the historical development of how these are interpreted. The kernel tty driver termios subsystem interprets some characters, depending on the termios ioctl settings. Some application level code, including, but not limited to, libreadline, changes these settings and then interprets some of these in user space. How libreadline interprets certain characters depends on the implementation and the settings (emacs vs vi mode and more).

wfk

@b0rk so, as @hanshuebner points out, a lot depends on exactly what software you are currently running, and how you have configured the various parts of the system. Now, most users will to a significant extent just use whatever default settings their system provides, and most Linux distros have very similar defaults, so most of this will be mostly similar for most users most of the time. Which makes it all the more frustrating when they run into something that is just a bit different...

Martin Jambon 🌍🌎🌏

@b0rk Ctrl-Q is useful to get unstuck after pressing Ctrl-S by accident 😄

Ölbaum

@b0rk Thanks for this diagram. So it looks like letters were assigned in alphabetical order as they thought of functions, which explains why cursor movements seem to follow no logic with respect to the keys layout: it’s because they indeed follow no logic. 🤦🏼‍♂️

Norman Wilson

@b0rk Don't be too dogmatic. These days they're all changeable, and some of us change them. I'm fond of DEL as interrupt, not just because it's historic but because it's a simple single key to smash. Sticking to @ as my kill character is more obstreperous. I am no stick-in-the-mud, though: BS for erase since I started using Unix.

I happen to know that ken used # for erase as late as the latter 1980s.

Alaric Snell-Pym

@b0rk under NetBSD (and maybe others, but sadly not Linux) ctrl+T sends a SIGINFO to the foreground process. Some long-running tools like dd will respond to this signal with a status line saying how many blocks transferred so far, and the default signal handler prints the PID and some CPU usage stats for the process! A really nice feature I miss a lot on Linux 😥

Vincent

@b0rk my lazyness made me a Ctrl-r and Ctrl-l 💕❤️

Sandor Szücs

@itwars @b0rk yeah, I guess my most used keys in terminal 😀

jleedev

@b0rk I just discovered that Ctrl-Backspace gives ^H (not that I can think of a use for it other than "also backspace").

Vim just taught me this:

Linux-backspace
Note about Linux: By default the backspace key
produces CTRL-?, which is wrong.

Julia Evans

@jleedev haha I'm so curious about the history of that “Ctrl-?, which is wrong" statement (I have no idea why it would be wrong but obviously the person who wrote that felt strongly about it!)

Stuart Marks

@b0rk It seems mostly disused these days, but Ctrl-O used to be bound to DISCARD by default. When typed by the user, the terminal driver would discard (“flush”) all output to the terminal until the program next read from the terminal.

This was useful if some command produced a lot of unwanted output, especially on a slow terminal. I first saw this on TOPS-20, and it also appeared in BSD Unix.

Stuart Marks

@b0rk Another obscure character is Ctrl-Y (DSUSP, delayed suspend). Similar function to Ctrl-Z, which sends SIGTSTP immediately. However, Ctrl-Y is buffered up by the terminal driver with other input. When it is about to be read by the program, the driver sends SIGTSTP at that point and discards the Ctrl-Y.

I think both DSUSP and DISCARD are BSDisms. Various man pages seem to indicate they aren’t supported on Linux.

Adrian Cockcroft

@stuartmarks @b0rk I remember using Ctrl-O on vt220 terminals to shut them up. Not sure if it was a thing on paper roll tty-33 and Dec-10 because that was too long ago…

Dmitri Kalintsev

@stuartmarks @b0rk omg where were you in 86 when I was trying to get stuff done using the godawful slow-ass terminals on our uni's PDP11?

tom jennings

@b0rk

I know you said you don't want histories but for del/bs, history matters.

They did/do not perform the same function. Back space moved the physical print position or cursor) to the previous position on the same line. This allowed overstrike, on ink to paper printing terminals, or a three-character sequence as data (A bs B).

DEL, 127 decimal, "deleted" a previously printed character. On a teletype etc, you'd back-space to physically move the print head/cursor over the previous character, then punch all the holes. The strong convention was that all holes were ignored.

In bash etc this is all silly.

Similarly so, cr, LF, vt, tab, enq/wru (control E) ...

sensitiveresearch.com/Archive/

@b0rk

I know you said you don't want histories but for del/bs, history matters.

They did/do not perform the same function. Back space moved the physical print position or cursor) to the previous position on the same line. This allowed overstrike, on ink to paper printing terminals, or a three-character sequence as data (A bs B).

Central Illumination Agency

@tomjennings @b0rk I had never considered this stuff until two minutes ago, and now I’m intrigued.

Thank you 😊

Lars Brinkhoff

@b0rk I love how the ITS operating system shows its (ha) influence on 36% of this table.

Lars Brinkhoff

@b0rk (I'm holding back so many history lessons right now)

realdebil

@b0rk Can I ask for a more colorblind-friendly color palette? It's hard to tell "Used by readline" and "OS terminal driver" apart

doragasu

@b0rk Thanks! I know most of those, but not all. And seeing them on a clean diagram like that is cool!

Charlesflorian

@b0rk This may be a silly question, but how do some of these key combinations work with different keyboard layouts? For example, on a Swedish (mac) layout, the @-symbol is alt-gr-2; Do you type ctrl-alt-gr-2 to get the null character?

Andrew Gallagher

@Charlesflorian @b0rk pretty sure it’s just ctrl-2. On most layouts the ctrl-symbol codes are (also) mapped to ctrl-2...ctrl-8

Ajaya Agrawalla

@b0rk You are doing god's work. Thank you.

Merc

@b0rk For me, CTRL-U is "oops I messed up typing my password let me try again". Where CTRL-K is "kill from cursor to end of line", CTRL-U is "kill entire line".

This is really useful for password entry fields where you don't get stars or any other indication you've typed anything, so you don't know where your cursor is.

But, with anything involving shells or terminals, it's complicated. Apparently BASH and some other things think CTRL-U is "kill from cursor to beginning of line" and many other shells including ZSH think it's "kill entire line". I don't actually know what interpretation is in charge in a password entry field... but I still mash CTRL-U when I know I mistyped something.

@b0rk For me, CTRL-U is "oops I messed up typing my password let me try again". Where CTRL-K is "kill from cursor to end of line", CTRL-U is "kill entire line".

This is really useful for password entry fields where you don't get stars or any other indication you've typed anything, so you don't know where your cursor is.

Go Up