Funny story: using kilo was the final straw [1] in getting me to give up on terminals. These days I try to do all my programming atop a simple canvas I can draw pixels on.
Here's the text editor I use all the time these days (and base lots of forks off of): https://git.sr.ht/~akkartik/text2.love. 1200 LoC, proportional font, word-wrap, scrolling, clipboard, unlimited undo. Can edit Moby Dick.
I really enjoyed the plan9 way of an application slurping up the terminal window (not a real terminal anyway) and then using it as full fledged GUI window. No weird terminal windows floating around in the background and you still could return to it when quitting for any logs or outputs.
Whatever works! I mostly use LÖVE, and it supports both. Some reasons to run it from the terminal rather than simply double-clicking or a keyboard shortcut in the OS:
* While I'm building an app I want to run from a directory rather than a .love file.
* I want to pass additional arguments. Though I also extensively use drag and drop for filenames.
Not GP but the terminal is inefficient and limiting for input and UI. For one you cannot detect key-up and key-down events, only a full key press. The press of multiple (non-modifier) keys at once can't be recognized either. Also there are some quirks, like in many terminals your application cannot distinguish between the Tab key and Ctrl-I as they look the same. But in some (e.g. Alacritty) it can work, so now if you have two different keybindings for Tab & Ctrl-I your program will behave differently in different terminals.
If you want to do anything that's not printing unformatted text right where the cursor is, you need to print out control sequences that tell the terminal where to move the cursor or format the upcoming text. So you build weird strings, print them out and then the terminal has to parse the string to know what to do. As you can imagine this is kind of slow.
If you accidentally print a line that's too long it might break and shift the rest of the UI. That's not too bad because it's a monospaced font, so you only have to count the unicode symbols (not bytes)...until you realize chinese symbols are rendered twice as wide. Text is weird and in the terminal there is nothing but text. But to be fair it's still a lot simpler than proportional fonts and a lot of fun, but I definitely understand why someone would decide to just throw pixels on a canvas and not deal with the historical quirks.
I think there's lots of scope for improvements to terminals, but I feel like this is more a question of "nobody has asked for it".
There's been plenty of recent innovation in terminals (e.g. support for a variety of new types of underlines to enable "squigglies" for error reporting is an example; new image support is another), and adding a code to enable more detailed key reporting the same way we have upgraded mouse event reporting over the years wouldn't be hard, and these things tends to spread quickly.
With respect to "accidentally printing a line that's too long", you can turn off auto-wrap in any terminal that supports DECAWM (\033[?7h / \033[?7l ).
That it's "kinda slow" really shouldn't be an issue - it was fast enough for hardware a magnitude slower than today. Parsing it requires a fairly simple state machine. If can't keep up with VT100/ANSI escape sequences, your parser is doing something very wrong.
The difficulty of unicode is fair enough, and sadly largely unavoidable, but that part is even worse in a GUI; the solution there is to use code to measure the rendered string, and it's not much harder to get that right for terminals either. It'd be nice if unicode had done this in a nicer way (e.g. indicated it in the encoding).
For my own terminal, I'm toying with the idea of allowing proportional text with an escape code, and make use of it in my editor. If I do, it'll be strictly limited: Indicate a start and end column where the text is proportional, and leave it to the application to specify a font and figure out the width itself.
Worst case scenario would be that you send the escape, and the editor doesn't get an escape acknowledging it has been enabled back, and falls back on monospaced text and keeps working fine in a regular terminal. This way, evolving terminal capabilities can be done fairly easily with backwards compatibility.
Anything is possible to fix; the question is why bother. Every fix cuts into the benefit of compatibility. The fundamental model of a wrapping/scrolling teletype isn't a good fit for the way we use computers today. (It does make sense if you work in a real text mode console. Then you are really avoiding all the complexity of a graphics stack by using hard-coded capabilities your hardware provides.)
A simple flat array of pixels seems like a much more timeless mental model to build durable software on top of. You don't have to wonder how different computers will react to a write just off the bottom right of the screen, and so on.
This isn't meant to detract from the broader point about the limitations of terminals, but a simple array of pixels is among the least efficient ways to interact with modern GPUs, especially if it doesn't support rectangular copy operations. The best way to interact with a GPU today and for the foreseeable future is through command buffers, not direct pixel access per se.
There are multiple axes of "best". The simplest, most portable, and most reproducible way to interact with a GPU is direct pixel access. Sometimes that's not fast enough, of course, but that's mainly when you're suffering from uncontrollable fits of interactivity. Most of the time, the best solution to that problem is to redesign your user interface to require less interaction: https://worrydream.com/MagicInk/
> The ubiquity of frustrating, unhelpful software interfaces has motivated decades of research into “Human-Computer Interaction.” In this paper, I suggest that the long-standing focus on “interaction” may be misguided. For a majority subset of software, called “information software,” I argue that interactivity is actually a curse for users and a crutch for designers, and users’ goals can be better satisfied through other means.
But yeah if you're playing an FPS you probably want to talk to your GPU through command buffers rather than pixel buffers.
There's going to be a compatibility-performance tradeoff here, to be sure, though the compatibility issue is going to be more with "very old platforms" and the performance issue is going to be more with "very high resolutions on very high refresh rates". So it's a question of whether you want to produce something that works well on current and past hardware vs. works well on current and future hardware, with some allowance for "can't please everybody".
I don't consider scrolling a large page to be an "uncontrollable fit of interactivity" but it's going to struggle to stay smooth using a single, simple linear array of pixels that's manipulated solely by the CPU. If you can at least work with multiple pixel buffers and operate on them at least somewhat abstractly so that even basic operations can be pushed down to the GPU, even if you don't work directly with command buffers, that will go a long way to bridging the gap between past and future, at least for 2D interfaces.
The compatibility issue is mostly going to be with future platforms that subtly change the semantics of the interfaces you're using or whose drivers have different bugs than the drivers you tested on. To take a trivial example, most GPUs don't bother to implement IEEE 754 gradual underflow.
I think you're wrong about struggling to stay smooth scrolling a large page. Maybe it was true on the original iPhone in 02007? Or it's true of complex multilayered translucent vector art with a fixed background? But it's not true of things like text with inline images.
Let's suppose that scrolling a large page involves filling a 4K pixel buffer, 3840×2160, with 32-bit color. If you have an in-memory image of the page, this is just 2160 memcpys of the appropriate 15360-byte pixel line; you're going to be memcpy-bandwidth-limited, because figuring out where to copy the pixels from is a relatively trivial calculation by comparison. On the laptop I'm typing this on (which incidentally doesn't have a 4K screen) memcpy bandwidth to main memory (not cache) is 10.8 gigabytes per second, according to http://canonical.org/~kragen/sw/dev3/memcpycost.c. The whole pixel buffer you're filling is only 33.2 megabytes, so this takes 3.1 milliseconds. (Of one CPU core.) Even at 120fps this is less than half the time required.
(For a large page you might want to not keep all your JPEGs decompressed in RAM, re-decoding them as required, but this is basically never done on the GPU.)
But what if the page is full of text and you have to rerender the visible part from a font atlas every frame? That's not quite as fast on the CPU, but it's still not slow enough to be a problem.
If you have a tree of glyph-index strings with page positions in memory already, finding the glyph strings that are on the screen is computationally trivial; perhaps in an 16-pixel-tall font, 2160 scan lines is 135 lines of text, each of which might contain five or six strings, and so you just have to find the 600 strings in the tree that overlap your viewport. Maybe each line has 400 glyphs in it, though 60 would be more typical, for a total of 55000 glyphs to draw.
We're going to want to render one texel per pixel to avoid fuzzing out the letters, and by the same token we can, I think, presuppose that the text is not rotated. So again in our inner loop we're memcpying, but this time from the font atlas into the pixel buffer. Maybe we're only memcpying a few pixels at a time, like an average of 8, so we end up calling memcpy 55000×16 ≈ 900k times per frame, requiring on the order of 10 million instructions, which is on the order of an extra millisecond. So maybe instead of 3 milliseconds your frame time is 4 milliseconds.
(It might actually be faster instead of slower, because the relevant parts of the font atlas are probably going to have a high data cache hit rate, so memcpy can go faster than 10 gigs a second.)
I did test something similar to this in http://canonical.org/~kragen/sw/dev3/propfont.c, which runs on one core of this laptop at 84 million glyphs per second (thus about 0.7ms for our hypothetical 55000-glyph screenful) but it's doing a somewhat harder job because it's word-wrapping the text as it goes. (It's using a small font, so it takes less memcpy time per glyph.)
So maybe scrolling a 4K page might take 4 milliseconds per screen update on the CPU. If you only use one core. I would say it was "struggling to stay smooth" if the frame rate fell below 30fps, which is 33 milliseconds per frame. So you have almost an order of magnitude of performance headroom. If your window is only 1920×1080, you have 1½ orders of magnitude of headroom, 2 orders of magnitude if you're willing to use four cores.
I did some basic tests with SDL3 and SDL3_ttf, using only surfaces in CPU memory and with acceleration disabled, on my 2560p 144Hz monitor and the copying was never a bottleneck. I was concretely able to achieve an average of 3ms per frame, well under the 144Hz budget of 6.9ms per frame, to scroll a pre-rendered text box with a small border in a fullscreen window. Even at 4K resolution (though that monitor is only 60Hz), I was seeing 5-6 ms per frame, still good enough for 144Hz and leaving lots of time to spare for 60Hz. I think this certainly proves that smoothly scrolling a text box, at least with a powerful desktop computer, is not an issue using only direct pixel access.
The bigger issue, though, may be rendering the text in the first place. I'm not sure how much the GPU can help there, though it is at least possible with SDL3_ttf to pass off some of the work to the GPU; I may test that as well.
> The bigger issue, though, may be rendering the text in the first place. I'm not sure how much the GPU can help there, though it is at least possible with SDL3_ttf to pass off some of the work to the GPU; I may test that as well.
The font rendering gets slow if you re-render the glyphs regularly. This becomes a challenge if you render anti-aliased glyphs at sub-pixel offsets, and so make the cost of caching them get really high.
If you keep things on pixel boundaries, caching them is cheap, and so you just render each glyph once at a given size, unless severely memory constrained.
For proportional text or if you add support for ligatures etc. it can get harder, but I think for most scenarios your rendering would have a really high cache hit ratio unless you're very memory constrained.
My terminal is written in Ruby, and uses a TTF engine in Ruby, and while it's not super-fast, the font rendering isn't in the hot path in normal use and so while speeding up my terminal rendering is somewhere on my todo list (far down), the font rendering isn't where I'll spending time...
Even the worst case of rendering a full screen of text in 4k at a tiny font size after changing font size (and so throwing away the glyph cache) is pretty much fast enough.
I think this is pretty much the worst case scenario you'll run into on a modern system - Ruby isn't fast (though much faster than it was) - and running a pure Ruby terminal with a pure Ruby font renderer with a pure Ruby X11 client library would only get "worse" if I go crazy enough to write a pure Ruby X11 server as well (the thought has crossed my mind).
If I were to replace any of the Ruby with a C extension, the inner rendering loop that constructs spans of text that reuses the same attributes (colors, boldness etc) and issues the appropriate X calls would be where I'd focus, but I think that too can be made substantially faster than it currently is just by improving the algorithm used instead.
I think it's okay for glyph generation to be slow as long as it doesn't block redraw and immediate user feedback such as scrolling. While you can make that problem easier by throwing more horsepower at the problem, I think that to actually solve it you need to design the software so that redraw doesn't wait for glyph generation. It's a case where late answers are worse than wrong answers.
I had forgotten or didn't know that you'd also written a pure Ruby replacement for Xlib! That's pretty exciting! I'm inclined to regard X-Windows as a mistake, though. I think display servers and clients should communicate through the filesystem, by writing window images and input events to files where the other can find them. Inotify is also a botch of an API, but on Linux, inotify provides deep-submillisecond latency for filesystem change notification.
> I had forgotten or didn't know that you'd also written a pure Ruby replacement for Xlib!
That one is not all me. I've just filled in a bunch of blanks[1], mostly by specifying more packets after the original maintainer disappeared. I keep meaning to simplify it, as while it works well, I find it unnecessarily verbose. I'm also tempted to bite the bullet and write the code to auto-generate the packet handling from the XML files used for XCB.
I think there's large parts of X11 that are broken, but the more I'm looking at my stack, and how little modern X clients use of X, the more tempted I am to try to write an X server as well, and see how much cruft I could strip away if I just implement what is needed to run the clients I care about (you could always run Xvnc or Xephyr or similar if you want to run some other app).
That would make it plausible to then separate the rendering backend and the X protocol implementation, and toy with simpler/cleaner protocols...
Yeah, text rendering can get arbitrarily difficult—if you let it. Rotated and nonuniformly scaled text, Gaussian filters for drop shadows, TrueType rasterization and hinting, overlapping glyphs, glyph selection in cases where there are multiple candidate glyphs for a code point, word wrap, paragraph-filling optimization, hyphenation, etc. But I think that most of those are computations you can do less often than once per frame, still in nearly linear time, and computing over kilobytes of data rather than megabytes.
The point is well taken. I don't know much about interacting with GPUs. I don't particularly care so far about getting more performance, given the wildly fast computers I have and my use cases (I don't make or play games). I _do_ care about power efficiency; do GPUs help there? Modern GPU-based terminal implementations aren't particularly power efficient in my experience..
There's so many factors affecting power efficiency that it's hard to give a categorical answer. A lot of it is dependent on factors that vary widely, from the hardware in use, to the display setup (resolution and refresh rate), to the quality of the drivers, to the window system (composited or not), to the size (cols x rows) of the terminal window, to the feature set involved, etc.
The problem with a lot of GPU accelerated terminals, if I had to wager a guess, is that they draw as fast as possible. Turning off GPU acceleration likely forces things to happen much slower thanks to various bottlenecks like memory bandwidth and sharing CPU time with other processes. GPU acceleration of most GUI apps puts them in a similar position as video games. It doesn't have to happen as fast as possible, and can be throttled through e.g. v-sync or lower-than-max FPS targets or turning on and off specific display features that might tax the GPU more (e.g. if shaders get involved, alpha blending is used, etc.).
The sibling comment makes a good point about compatibility and simplicity, though those don't always translate into lower power usage.
> It doesn't have to happen as fast as possible, and can be throttled through e.g. v-sync or lower-than-max FPS targets or turning on and off specific display features that might tax the GPU more (e.g. if shaders get involved, alpha blending is used, etc.).
Exactly.
E.g if you want to render as fast as possible, the logical way of doing it is to keep track of how many lines have been output (the easiest, but not necessarily most efficient, way is to render to a scrollback buffer) and then separately, synced to v-sync if you prefer, start rendering from what is at the top of the virtual text-version of the screen when the rendering starts a new frame.
Do this in two threads, and you can then render to the bitmap at whatever FPS you can handle, while you can let the app running in the terminal output text as fast as it can produce it:
If the text-output thread manages to add more than one line to the end of the buffer per frame rendered to the bitmap, your output will just scroll more than one line per frame.
You've then decoupled the decision of the FPS necessary from how fast the app you're running can output text, and frankly, your FPS needs to dip fairly low before that looks janky.
The reason to bother is that a lot of prefer terminals and want to evolve them. The reason they're not evolving faster isn't that compatibility is really a problem because we see new terminal capabilities gain support fairly quickly, but usually because there often isn't a major perceived need for the features people who don't use terminals much think are missing.
People don't add capabilities to try to attract people like you who don't want terminals in the first place.
Wrapping and scrolling can be turned off on any terminal newer than the vt100, or constrained to regions. I never wonder how a different computer reacts to writing off the bottom right of the screen, because that works just fine on every terminal that matters. The actual differences are relatively minor if you don't do anything esoteric.
A "simple" flat array of pixels means you have to reimplement much of a terminal, such as attribute rendering, font rendering etc. It's not a huge amount of work, but not having to is nice.
So is the network transparency, and vnc etc. is not a viable replacement.
"A simple flat array of pixels means you have to reimplement much of a terminal, such as attribute rendering, font rendering etc. It's not a huge amount of work, but not having to is nice."
For me the debate isn't about implementing a terminal vs something else. I assume one uses tools others build in either case. The question is how much the tools hinder your use case. I find a canvas or a graphical game engine (which implement fonts and so on) hinders me less in building the sorts of tools I care about building. A terminal feels like more of a hindrance.
> I assume one uses tools others build in either case
And there is the disconnect. For a terminal app, you often don't need to.
> I find a canvas or a graphical game engine (which implement fonts and so on) hinders me less in building the sorts of tools I care about building. A terminal feels like more of a hindrance.
And for me it's the opposite. The tools I build mostly works on text. A terminal provides enough that I usually don't need any extra dependencies.
The disconnect might be that I'm counting the weight of the terminal program itself. I think this makes for a more fair comparison. The terminal program is built by others, it often uses many of the same libraries for pixel graphics and font rendering.
I find this thinking bizarre. What matters is the simplicity of the API my code has to talk to.
You'd have a better argument if most people built their own terminals, like I have (mine is only ~2k lines of code, however), as then there'd be a reasonable argument you're writing that code anyway. But most people don't.
Even then I'd consider it fairly specious, because the terminal code is a one off cost to give every TUI application a simple, minimal API that also gives me the ability to display their UI on every computer system built in at least the last half a century.
I write plenty of code that requires more complex UI's too, and don't try to force those into a terminal, but I also would never consider building a more complex UI for an app that can be easily accommodated in a terminal.
Availability over ssh is indeed a good point. I've reduced my reliance over the network at the same time I've grown disenchanted with terminals; thanks for pointing out that connection.
The rest are mutually incommensurable worldviews, and we have to agree to disagree.
I think I've said this to you before, but in case I haven't, I suspect that the best way to handle proportional text in formattable plain text is some variant of Nick Gravgaard's "elastic tabstops" https://nickgravgaard.com/elastic-tabstops/:
> Each cell ends with a tab character. A column block is a run of uninterrupted vertically adjacent cells. A column block is as wide as the widest piece of text in the cells it contains or a minimum width (plus padding). Text outside column blocks is ignored [for layout, though it is displayed].
I think the main deficiency in his proposal is that he enforces a minimum of one space of whitespace padding on the right of his column.
I think you can avoid making that padding visible.
In terms of making this work in a terminal, what I'd imagine would be having the app still aligning things to fixed-width column boundaries, but handling the elastic tab stops based on knowing a maximum extent of a single-width character used (you'd still need to deal with the unicode mess) and setting tab stops based on that. Adding an escape to report the font extents would be easy.
I'll have to do some experiments on this... Especially as I'm the kind of maniac who likes to align multi-line spans of code in way linters will yell at.
I was thinking you could change the terminal to implement elastic tabstops so that apps written for the elastic-tabstop terminal wouldn't have to worry about font metrics. You could use ^L and ^V as delimiters to nest a whole new table inside a table cell, giving you layout capabilities comparable to the WWW before CSS, similar to how HN is laid out. Languages like C and JS that treat all whitespace the same can have ^L and ^V inserted freely in random places.
I'm going to have to think about exactly how to do this best. My one concern would be to either handle line-wrap properly, or handle overflow.
Overflow is "easy" if you assume a line is always of a reasonable max length: Just turn off wraparound, and print the whole line.
But if you want to handle super-long lines, you'd at least want a way to "clip" the text you're sending to the terminal to avoid the cost of outputting all of it to the terminal each time you're moving past it.
Depending on how, maybe the minimum width of a glyph is instead what you worry about in cases where you don't wrap.
Let's say the terminal is 80 monospaced characters wide, and you switch on proportional mode.
If you have a long line, you just want to know the maximum number of characters you should output to guarantee that the proportional region is full.
Maybe just an escape to get the terminal to report the maximum number of proportional glyphs that will fit in a given field is enough.
The worst case scenario then is a screen full of I's or 1's or similar, where maybe you spit out twice as much text that the terminal propmptly clips away, but most of the time you'd only output a few more characters than necessary, so I think that's fine.
Not sure how to cleanly extend that to allow wraparound without the app needing to be aware of the font. E.g. a text editor will want to be able to figure out which point in the text should be at the top of the screen as you scroll. Doing that with wraparound without knowing font extent would require some other way to measure. Printing a bit extra might be fine.
For non-interactive applications, none of this would matter - you'd just print everything the same way you do now, and let the terminal sort out the rest.
Maybe adding experimental support for that would be a good starting point. E.g. being able to cat a tab-separated file and get an aligned table with proportional text would be an awesome test case.
I'm now also curious how much would break if I "just" start doing proportional rendering by default when not using the alternate screen.
E.g. anything trying to do layout with spaces would obviously break, but I'm wondering if maybe it'd affect few enough non-interactive programs that making that the default, and just wrap/alias the few things that'd break. My shell already filters output of a number of commands to colorize them, like pstree, so having a small "blacklist" of programs I use that'd need to be switched back to monospaced, or filtered further, wouldn't necessary be a big deal.
Damn you for adding more things to my TODO list :-)
I feel like overflow as an error-handling thing would be reasonable to handle in a variety of ways; however, if you start trying to offer HTML-like word wrap inside tables, you start experiencing difficult optimization problems. I think it's probably better to let applications split paragraphs into lines based on some kind of heuristic, embedding the newlines into the text that the terminal has to handle.
A lot of things would benefit from being able to output tab-separated output. (I feel like https://okmij.org/ftp/papers/DreamOSPaper.html used to have screenshots?) Things like pstree want a "dot leader" approach, where you tag a cell (with an escape sequence?) as being padded out to the column width not with whitespace but with more copies of its contents. In http://canonical.org/~kragen/sw/dev3/alglayout.py I'm doing it with the ~ operator, but that's just nesting stacks of hboxes and vboxes for layout and outputting just fixed-width ASCII, not using full-fledged tables and outputting pixels. (There's an OCaml version of that program if that would be more readable; I haven't written a Ruby version.)
And to make matters worse, unlike a GUI, the terminal doesn't provide any semantic information about the content it displays to the OS.
This is a problem for accessibility software, screen readers, UI automation, voice control etc.
If you want a screen reader to announce that a menu option is selected, you need some way to signal to the OS that there's a menu open, that some text is a menu option, and that the option has the "selected" state. All serious GUI frameworks let you do this (and mostly do it automatically for native controls), so does the web.
TUIs do not (and currently can not) do this. While this is not really a problem for shells or simple Unix utilities, as they just output text which you can read with a screen reader just fine, it gets really annoying with complicated, terminal-based UIs. The modern AI coding agents are very prominent examples of how not to do this right.
TUI's could be made do this relatively easily. "All" you need is to pick an escape sequence to assign a semantic label to the following span of text, and have the terminal use whatever OS mechanism to make that available to assistive tech.
Of course, that doesn't help unless/until at least one prominent terminal actually does it and a few major terminal applications adds support for it.
This is a problem with every TUI out there built using ncurses. "What escape code does your terminal emit for backspace?" is a completely artificial problem at this point.
There are good reasons to deal with the terminal: I need programs built for it, or I need to interface with programs built for it. Programs that deal with 1D streams of bytes for stdin and stdout are simpler in text mode. But for anything else, I try to avoid it.
Immature, obviously. Far fewer person-hours of labor have been put in relative to what you use all the time. But I find it worthwhile to get off the constant treadmill of new versions with features I don't care about. Cutting down on complexity there creates headroom for me or you to try out new approaches I or you might care more about.
My most common development environments these days:
* A live-programming infinite surface of definitions that works well on a big screen: https://git.sr.ht/~akkartik/driver.love Has minimal syntax highlighting for just Lua comments and strings.
My own editor is array of lines in Ruby, and in now about 8 years of using it daily, and having the actual editor interact with the buffer storage via IPC to a server holding all the buffers, it's just not been a problem.
It does become a problem if you insist on trying to open files of hundred of MB of text, but my thinking is that I simply don't care to treat that as a text editing problem for my main editor, because files that size are usually something I only ever care to view or is better off manipulating with code.
If you want to be able to open and manipulate huge files, you're right, and then an editor using these kind of simple methods isn't for you. That's fine.
As it stands now, my editor holds every file I've ever opened and not explicitly closed in the last 8 years in memory constantly (currently, 5420 buffers; the buffer storage is persisted to disk every minute or so, so if I reboot and open the same file, any unsaved changes are still there unless I explicitly reload), and it's not even breaking the top 50 or so of memory use on my machine usually (those are all browser tabs...)
I'm not suggesting people shouldn't use "fancier" data structures when warranted. It's great some editors can handle huge files. Just that very naive approaches will work fine for a whole lot of use cases.
E.g. the 5420 open buffers in my editor currently are there because even the naive approach of never garbage collecting open buffers just hasn't become an issue yet - my available RAM has increased far faster than the size of the buffer storage so adding a mechanism for culling them just hasn't become a priority.
Oh by "more complex" operations I referred to multiple cursors and multi line regex searches. I've noticed some performance problems in my own editor but it's mostly because "lines" become fragmented, if you allocate all the lines with their own allocation, they might be far away from each other in memory. It's especially true when programming where lines are relatively short.
Regex searches and code highlight might introduce some hitches due to all of the seeking.
Kakoune has been my main editor for the past year (give or take) and uses an array of lines [0]. Ironically, multi-cursor and regex are some of the main features that made it attractive to me.
I just tested it out on the 100MB enwik8 file I have laying around and it does slow down significantly (took 4-5 seconds to load in the file and has a 1 second delay on changing a line). But that is not really the type of file you would be opening with your main editor.
There's a wildly out of data repo here[1] that I badly need to push updates to, and with the caveat odds are there are lots of missing pieces that'll make you struggle to get it working on your system. I wouldn't recommend it - I dumped in Github mostly mostly because why not rather than for people to actually use.
Difficulties will include e.g. helper scripts executed to do things like stripping buffers, a dependency on rofi when you try to open files, and a number of other things that works great on my machine and not so well elsewhere.
I have about 2-3 years worth of updates and cleanups I should get around to pushing to Github that does include some attempts to make it slightly easier for other people to run.
The two things I think are nice and worth picking up on is the use of DrB to get client-server, which means the editor is "multi window" simply by virtue of spawning new separate instance of itself. It's then multi-pane/frame by relying on me running a tiling wm, so splitting the buffer horizontally and vertically is "just" a matter of a tiny helper script ensuring the window opens below/to the right of the current window respectively.
But some other things, like the syntax highlighting (using Rouge) is in need of a number of bugfixes and cleanups; I keep meaning to modify the server to keep metadata about the lines and pull the syntax highlighting out so it runs in a separate process, talking directly to the server, for example.
The core data structure (array of lines) just isn't that well suited to more complex operations.
Modern CPUs can read and write memory at dozens of gigabytes per second.
Even when CPUs were 3 orders of magnitude slower, text editors using a single array were widely used. Unless you introduce some accidentally-quadratic or worse algorithm in your operations, I don't think complex datastructures are necessary in this application.
The actual latency budget would be less than a single frame to be completely non-noticable, so you are in fact limited to less than 1 GB to move per each keystroke. And each character may hold additional metadata like syntax highlight states, so 1 GB of movable memory doesn't translate to 1 GB of text either. You are still correct in that a line-based array is enough for most cases today, but I don't think it's generally true.
> The core data structure (array of lines) just isn't that well suited to more complex operations.
Just how big (and how many lines) does your file have to be before it is a problem? And what are the complex operations that make it a problem?
(Not being argumentative - I'd really like to know!)
On my own text editor (to which I lost the sources way back in 2004) I used an array of bytes, had syntax highlighting (Used single-byte start-stop codes for syntax highlighting) and used a moving "window" into the array for rendering. I never saw a latency problem back then on a Pentium Pro, even with files as large as 20MB.
I am skeptical of the piece table as used in VS Code being that much faster; right now on my 2011 desktop, a VS Code with no extra plugins has visible latency when scrolling by holding down the up/down arrow keys and a really high keyboard repeat setting. Same computer, same keyboard repeat and same file using Vim in a standard xterm/uxterm has visibly better scrolling; takes half as much time to get to the end of the file (about 10k lines).
From what I have experienced the complex data structures used here are more about maintaining responsiveness when overall system load is high and that may result slightly slower performance overall. Say you used the variable "x" a thousand times in your 10k lines of code and you want to do a find and replace on it to give it a more descriptive name like, "my_overused_variable," think about all of the memory copying that is happening if all 10k lines are in a single array. If those 10k lines are in 10k arrays which are all twice the size of the line you reduce that a fair amount. It might be slower than simpler methods when the system load is low but it will stay responsive longer.
I think vim uses a gap structure, not a single array but don't remember.
I am not a programmer, my experience could very well be due to failings elsewhere in my code and my reasoning could be hopelessly flawed, hopefully someone will correct me if I am wrong. It has also been awhile since I dug into this, the project which got me to dig into this is one of the things which got me to finally make an account on hn and one of my first submissions was Data Structures for Text Sequences.
VS Code used 40-60 bytes per line, so a file with 15 million single character lines balloons from 30 MB to 600+ MB. kilo uses 48 bytes per line on my 64-bit machine (though you can make it 40 if you move the last int with the other 3 ints instead of wasting space on padding for memory alignment), so it would have the same issue.
I have never seen a file like this in my life, let alone opened one. I'm sure they exist and people will want to open them in text editors instead of processing with sed/awk/Python, but now we're well into the 5-sigma of edge cases.
I played around with kilo when it was released, and eventually made a multi-buffer version with support for scripting with embedded Lua. Of course it was just a fun hack not a serious thing, I continue to do all my real editing with Emacs, but it did mean I got to choose the best project name:
Here’s a second recommendation for that tutorial. It’s the first coding tutorial I’ve finished because it’s really good and I enjoyed building the foundational software program that my craft relies on. I don’t use that editor but it was fun to create it.
Author of hecto here, thank you for mentioning it! I wrote the first version around 5 years ago and I’m happy that people still use it. (I updated it in the meantime)
Reading through this code is a veritable rite of passage. You learn how C works, how text editors work, how VT codes work, how syntax highlighting works, how find works, and how little code it really takes to make anything when you strip away almost all conveniences, edge cases, and error handling.
I made a similar editor using Lazarus... since it has syntax highlighting components... I guess that's cheating. The more I think about it though, I wonder if Freepascal could produce a nice GUI for Neovim.
I did try to build one in Qt in C++ years ago, stopped at trying to figure out how to add Syntax Highlighting since I'm not really that much into C++. Pivoted it to work like Notepad so I was still happy with how it wound up.
Although it does cheat a bit in an effort to better handle Unicode:
> unicode-width is used to determine the displayed width of Unicode characters. Unfortunately, there is no way around it: the unicode character width table is 230 lines long.
Personally, this is the reason I don't really buy the extreme size reduction; such projects generally have to sacrifice some essential features that demand a certain but necessary amount of code.
A lot of those features are only "essential" for a subset of possible users.
My own editor exists because I realised it was possible to write an editor smaller than my Emacs configuration. While my editor lacks all kinds of features that are "essential" for lots of other people, it doesn't lack any features essential for me.
So in terms of producing a perfect all-round editor that will work for everyone, sure, editors like Kilo will always be flawed.
Their value is in providing a learning experience, something that works for the subset who don't need those features, or a basis for people to customise something just right for their needs in a compact way. E.g. my own editor has quirks that are custom-tailored to my workflow, and even to my environment.
You are right, but then there is not much reason to make it public because it can't be very useful for general users. I have lots of code that was written only for myself and I don't intend to publish at all.
There's plenty of reason to make it public as basis for others to make it their own, or to learn from.
I have lots of code I've published not because it's useful to a lot of people as-is, but because it might be helpful. And a lot of my projects are based on code written by others that was not "very useful for general users".
E.g. my editor started out based on Femto [1], a very minimalist example of how small an editor can be. It cut some time of starting from scratch, even though there's now practically nothing left of the original.
Similarly, my terminal relies on a Ruby rewrite of a minimalist truetype renderer that in itself would be of little value for most people, who should just use FreeType. But it was highly valuable to me - allowing me to get a working pure-Ruby TrueType renderer in a day.
Not "very useful for general users" isn't a very useful metric for whether something is worthwhile.
(While the current state of my editor isn't open, yet, largely for lack of time, various elements of it are, in the form of Ruby gems where I've extracted various functionality.)
[1] There are at least 3 editors named Femto, presumably inspired by being smaller than Nano, the way Nano followed Pico, but this is the one I started with: https://github.com/agorf/femto
Ah darn. Closing in on retirement (will never happen, coding is too much fun for profit or charity) age, I resistent building an editor but I want to. Need to. I hacked so much vim, emacs, eclipse, vs code and its all crap (the newer, the worse: all these useless gimmicks you won't use past grade school aaarrr while lacking power user features). Can I do better? This seems a good start.
Check my YouTube channel for an experiment on how I added UTF-8 support to Kilo showcasing what was, back then, some early experiment at using LLMs to ship working code. I need to push the changes after some cleanup, but that's not enough for a Kilo 2, as it makes things more complicated without showing worthwhile programming techniques. So it's sitting on my laptop.
Funny story: using kilo was the final straw [1] in getting me to give up on terminals. These days I try to do all my programming atop a simple canvas I can draw pixels on.
Here's the text editor I use all the time these days (and base lots of forks off of): https://git.sr.ht/~akkartik/text2.love. 1200 LoC, proportional font, word-wrap, scrolling, clipboard, unlimited undo. Can edit Moby Dick.
[1] https://git.sr.ht/~akkartik/teliva
Someone else who eschews terminals and replaced them:
https://arcan-fe.com/2025/01/27/sunsetting-cursed-terminal-e...
I really enjoyed the plan9 way of an application slurping up the terminal window (not a real terminal anyway) and then using it as full fledged GUI window. No weird terminal windows floating around in the background and you still could return to it when quitting for any logs or outputs.
Hey Akkartik! That's really interesting! At the moment you're still using a terminal to launch the individual apps or something else?
Whatever works! I mostly use LÖVE, and it supports both. Some reasons to run it from the terminal rather than simply double-clicking or a keyboard shortcut in the OS:
* While I'm building an app I want to run from a directory rather than a .love file.
* I want to pass additional arguments. Though I also extensively use drag and drop for filenames.
* I want to print() while debugging.
> These days I try to do all my programming atop a simple canvas I can draw pixels on.
Why?
Not GP but the terminal is inefficient and limiting for input and UI. For one you cannot detect key-up and key-down events, only a full key press. The press of multiple (non-modifier) keys at once can't be recognized either. Also there are some quirks, like in many terminals your application cannot distinguish between the Tab key and Ctrl-I as they look the same. But in some (e.g. Alacritty) it can work, so now if you have two different keybindings for Tab & Ctrl-I your program will behave differently in different terminals.
If you want to do anything that's not printing unformatted text right where the cursor is, you need to print out control sequences that tell the terminal where to move the cursor or format the upcoming text. So you build weird strings, print them out and then the terminal has to parse the string to know what to do. As you can imagine this is kind of slow.
If you accidentally print a line that's too long it might break and shift the rest of the UI. That's not too bad because it's a monospaced font, so you only have to count the unicode symbols (not bytes)...until you realize chinese symbols are rendered twice as wide. Text is weird and in the terminal there is nothing but text. But to be fair it's still a lot simpler than proportional fonts and a lot of fun, but I definitely understand why someone would decide to just throw pixels on a canvas and not deal with the historical quirks.
I think there's lots of scope for improvements to terminals, but I feel like this is more a question of "nobody has asked for it".
There's been plenty of recent innovation in terminals (e.g. support for a variety of new types of underlines to enable "squigglies" for error reporting is an example; new image support is another), and adding a code to enable more detailed key reporting the same way we have upgraded mouse event reporting over the years wouldn't be hard, and these things tends to spread quickly.
With respect to "accidentally printing a line that's too long", you can turn off auto-wrap in any terminal that supports DECAWM (\033[?7h / \033[?7l ).
That it's "kinda slow" really shouldn't be an issue - it was fast enough for hardware a magnitude slower than today. Parsing it requires a fairly simple state machine. If can't keep up with VT100/ANSI escape sequences, your parser is doing something very wrong.
The difficulty of unicode is fair enough, and sadly largely unavoidable, but that part is even worse in a GUI; the solution there is to use code to measure the rendered string, and it's not much harder to get that right for terminals either. It'd be nice if unicode had done this in a nicer way (e.g. indicated it in the encoding).
For my own terminal, I'm toying with the idea of allowing proportional text with an escape code, and make use of it in my editor. If I do, it'll be strictly limited: Indicate a start and end column where the text is proportional, and leave it to the application to specify a font and figure out the width itself.
Worst case scenario would be that you send the escape, and the editor doesn't get an escape acknowledging it has been enabled back, and falls back on monospaced text and keeps working fine in a regular terminal. This way, evolving terminal capabilities can be done fairly easily with backwards compatibility.
Anything is possible to fix; the question is why bother. Every fix cuts into the benefit of compatibility. The fundamental model of a wrapping/scrolling teletype isn't a good fit for the way we use computers today. (It does make sense if you work in a real text mode console. Then you are really avoiding all the complexity of a graphics stack by using hard-coded capabilities your hardware provides.)
A simple flat array of pixels seems like a much more timeless mental model to build durable software on top of. You don't have to wonder how different computers will react to a write just off the bottom right of the screen, and so on.
This isn't meant to detract from the broader point about the limitations of terminals, but a simple array of pixels is among the least efficient ways to interact with modern GPUs, especially if it doesn't support rectangular copy operations. The best way to interact with a GPU today and for the foreseeable future is through command buffers, not direct pixel access per se.
There are multiple axes of "best". The simplest, most portable, and most reproducible way to interact with a GPU is direct pixel access. Sometimes that's not fast enough, of course, but that's mainly when you're suffering from uncontrollable fits of interactivity. Most of the time, the best solution to that problem is to redesign your user interface to require less interaction: https://worrydream.com/MagicInk/
> The ubiquity of frustrating, unhelpful software interfaces has motivated decades of research into “Human-Computer Interaction.” In this paper, I suggest that the long-standing focus on “interaction” may be misguided. For a majority subset of software, called “information software,” I argue that interactivity is actually a curse for users and a crutch for designers, and users’ goals can be better satisfied through other means.
But yeah if you're playing an FPS you probably want to talk to your GPU through command buffers rather than pixel buffers.
There's going to be a compatibility-performance tradeoff here, to be sure, though the compatibility issue is going to be more with "very old platforms" and the performance issue is going to be more with "very high resolutions on very high refresh rates". So it's a question of whether you want to produce something that works well on current and past hardware vs. works well on current and future hardware, with some allowance for "can't please everybody".
I don't consider scrolling a large page to be an "uncontrollable fit of interactivity" but it's going to struggle to stay smooth using a single, simple linear array of pixels that's manipulated solely by the CPU. If you can at least work with multiple pixel buffers and operate on them at least somewhat abstractly so that even basic operations can be pushed down to the GPU, even if you don't work directly with command buffers, that will go a long way to bridging the gap between past and future, at least for 2D interfaces.
The compatibility issue is mostly going to be with future platforms that subtly change the semantics of the interfaces you're using or whose drivers have different bugs than the drivers you tested on. To take a trivial example, most GPUs don't bother to implement IEEE 754 gradual underflow.
I think you're wrong about struggling to stay smooth scrolling a large page. Maybe it was true on the original iPhone in 02007? Or it's true of complex multilayered translucent vector art with a fixed background? But it's not true of things like text with inline images.
Let's suppose that scrolling a large page involves filling a 4K pixel buffer, 3840×2160, with 32-bit color. If you have an in-memory image of the page, this is just 2160 memcpys of the appropriate 15360-byte pixel line; you're going to be memcpy-bandwidth-limited, because figuring out where to copy the pixels from is a relatively trivial calculation by comparison. On the laptop I'm typing this on (which incidentally doesn't have a 4K screen) memcpy bandwidth to main memory (not cache) is 10.8 gigabytes per second, according to http://canonical.org/~kragen/sw/dev3/memcpycost.c. The whole pixel buffer you're filling is only 33.2 megabytes, so this takes 3.1 milliseconds. (Of one CPU core.) Even at 120fps this is less than half the time required.
(For a large page you might want to not keep all your JPEGs decompressed in RAM, re-decoding them as required, but this is basically never done on the GPU.)
But what if the page is full of text and you have to rerender the visible part from a font atlas every frame? That's not quite as fast on the CPU, but it's still not slow enough to be a problem.
If you have a tree of glyph-index strings with page positions in memory already, finding the glyph strings that are on the screen is computationally trivial; perhaps in an 16-pixel-tall font, 2160 scan lines is 135 lines of text, each of which might contain five or six strings, and so you just have to find the 600 strings in the tree that overlap your viewport. Maybe each line has 400 glyphs in it, though 60 would be more typical, for a total of 55000 glyphs to draw.
We're going to want to render one texel per pixel to avoid fuzzing out the letters, and by the same token we can, I think, presuppose that the text is not rotated. So again in our inner loop we're memcpying, but this time from the font atlas into the pixel buffer. Maybe we're only memcpying a few pixels at a time, like an average of 8, so we end up calling memcpy 55000×16 ≈ 900k times per frame, requiring on the order of 10 million instructions, which is on the order of an extra millisecond. So maybe instead of 3 milliseconds your frame time is 4 milliseconds.
(It might actually be faster instead of slower, because the relevant parts of the font atlas are probably going to have a high data cache hit rate, so memcpy can go faster than 10 gigs a second.)
I did test something similar to this in http://canonical.org/~kragen/sw/dev3/propfont.c, which runs on one core of this laptop at 84 million glyphs per second (thus about 0.7ms for our hypothetical 55000-glyph screenful) but it's doing a somewhat harder job because it's word-wrapping the text as it goes. (It's using a small font, so it takes less memcpy time per glyph.)
So maybe scrolling a 4K page might take 4 milliseconds per screen update on the CPU. If you only use one core. I would say it was "struggling to stay smooth" if the frame rate fell below 30fps, which is 33 milliseconds per frame. So you have almost an order of magnitude of performance headroom. If your window is only 1920×1080, you have 1½ orders of magnitude of headroom, 2 orders of magnitude if you're willing to use four cores.
I did some basic tests with SDL3 and SDL3_ttf, using only surfaces in CPU memory and with acceleration disabled, on my 2560p 144Hz monitor and the copying was never a bottleneck. I was concretely able to achieve an average of 3ms per frame, well under the 144Hz budget of 6.9ms per frame, to scroll a pre-rendered text box with a small border in a fullscreen window. Even at 4K resolution (though that monitor is only 60Hz), I was seeing 5-6 ms per frame, still good enough for 144Hz and leaving lots of time to spare for 60Hz. I think this certainly proves that smoothly scrolling a text box, at least with a powerful desktop computer, is not an issue using only direct pixel access.
The bigger issue, though, may be rendering the text in the first place. I'm not sure how much the GPU can help there, though it is at least possible with SDL3_ttf to pass off some of the work to the GPU; I may test that as well.
> The bigger issue, though, may be rendering the text in the first place. I'm not sure how much the GPU can help there, though it is at least possible with SDL3_ttf to pass off some of the work to the GPU; I may test that as well.
The font rendering gets slow if you re-render the glyphs regularly. This becomes a challenge if you render anti-aliased glyphs at sub-pixel offsets, and so make the cost of caching them get really high.
If you keep things on pixel boundaries, caching them is cheap, and so you just render each glyph once at a given size, unless severely memory constrained.
For proportional text or if you add support for ligatures etc. it can get harder, but I think for most scenarios your rendering would have a really high cache hit ratio unless you're very memory constrained.
My terminal is written in Ruby, and uses a TTF engine in Ruby, and while it's not super-fast, the font rendering isn't in the hot path in normal use and so while speeding up my terminal rendering is somewhere on my todo list (far down), the font rendering isn't where I'll spending time...
Even the worst case of rendering a full screen of text in 4k at a tiny font size after changing font size (and so throwing away the glyph cache) is pretty much fast enough.
I think this is pretty much the worst case scenario you'll run into on a modern system - Ruby isn't fast (though much faster than it was) - and running a pure Ruby terminal with a pure Ruby font renderer with a pure Ruby X11 client library would only get "worse" if I go crazy enough to write a pure Ruby X11 server as well (the thought has crossed my mind).
If I were to replace any of the Ruby with a C extension, the inner rendering loop that constructs spans of text that reuses the same attributes (colors, boldness etc) and issues the appropriate X calls would be where I'd focus, but I think that too can be made substantially faster than it currently is just by improving the algorithm used instead.
I think it's okay for glyph generation to be slow as long as it doesn't block redraw and immediate user feedback such as scrolling. While you can make that problem easier by throwing more horsepower at the problem, I think that to actually solve it you need to design the software so that redraw doesn't wait for glyph generation. It's a case where late answers are worse than wrong answers.
I had forgotten or didn't know that you'd also written a pure Ruby replacement for Xlib! That's pretty exciting! I'm inclined to regard X-Windows as a mistake, though. I think display servers and clients should communicate through the filesystem, by writing window images and input events to files where the other can find them. Inotify is also a botch of an API, but on Linux, inotify provides deep-submillisecond latency for filesystem change notification.
> I had forgotten or didn't know that you'd also written a pure Ruby replacement for Xlib!
That one is not all me. I've just filled in a bunch of blanks[1], mostly by specifying more packets after the original maintainer disappeared. I keep meaning to simplify it, as while it works well, I find it unnecessarily verbose. I'm also tempted to bite the bullet and write the code to auto-generate the packet handling from the XML files used for XCB.
I think there's large parts of X11 that are broken, but the more I'm looking at my stack, and how little modern X clients use of X, the more tempted I am to try to write an X server as well, and see how much cruft I could strip away if I just implement what is needed to run the clients I care about (you could always run Xvnc or Xephyr or similar if you want to run some other app).
That would make it plausible to then separate the rendering backend and the X protocol implementation, and toy with simpler/cleaner protocols...
[1] https://github.com/vidarh/ruby-x11
Thanks for checking me on that!
Yeah, text rendering can get arbitrarily difficult—if you let it. Rotated and nonuniformly scaled text, Gaussian filters for drop shadows, TrueType rasterization and hinting, overlapping glyphs, glyph selection in cases where there are multiple candidate glyphs for a code point, word wrap, paragraph-filling optimization, hyphenation, etc. But I think that most of those are computations you can do less often than once per frame, still in nearly linear time, and computing over kilobytes of data rather than megabytes.
The point is well taken. I don't know much about interacting with GPUs. I don't particularly care so far about getting more performance, given the wildly fast computers I have and my use cases (I don't make or play games). I _do_ care about power efficiency; do GPUs help there? Modern GPU-based terminal implementations aren't particularly power efficient in my experience..
There's so many factors affecting power efficiency that it's hard to give a categorical answer. A lot of it is dependent on factors that vary widely, from the hardware in use, to the display setup (resolution and refresh rate), to the quality of the drivers, to the window system (composited or not), to the size (cols x rows) of the terminal window, to the feature set involved, etc.
The problem with a lot of GPU accelerated terminals, if I had to wager a guess, is that they draw as fast as possible. Turning off GPU acceleration likely forces things to happen much slower thanks to various bottlenecks like memory bandwidth and sharing CPU time with other processes. GPU acceleration of most GUI apps puts them in a similar position as video games. It doesn't have to happen as fast as possible, and can be throttled through e.g. v-sync or lower-than-max FPS targets or turning on and off specific display features that might tax the GPU more (e.g. if shaders get involved, alpha blending is used, etc.).
The sibling comment makes a good point about compatibility and simplicity, though those don't always translate into lower power usage.
> It doesn't have to happen as fast as possible, and can be throttled through e.g. v-sync or lower-than-max FPS targets or turning on and off specific display features that might tax the GPU more (e.g. if shaders get involved, alpha blending is used, etc.).
Exactly.
E.g if you want to render as fast as possible, the logical way of doing it is to keep track of how many lines have been output (the easiest, but not necessarily most efficient, way is to render to a scrollback buffer) and then separately, synced to v-sync if you prefer, start rendering from what is at the top of the virtual text-version of the screen when the rendering starts a new frame.
Do this in two threads, and you can then render to the bitmap at whatever FPS you can handle, while you can let the app running in the terminal output text as fast as it can produce it:
If the text-output thread manages to add more than one line to the end of the buffer per frame rendered to the bitmap, your output will just scroll more than one line per frame.
You've then decoupled the decision of the FPS necessary from how fast the app you're running can output text, and frankly, your FPS needs to dip fairly low before that looks janky.
The reason to bother is that a lot of prefer terminals and want to evolve them. The reason they're not evolving faster isn't that compatibility is really a problem because we see new terminal capabilities gain support fairly quickly, but usually because there often isn't a major perceived need for the features people who don't use terminals much think are missing.
People don't add capabilities to try to attract people like you who don't want terminals in the first place.
Wrapping and scrolling can be turned off on any terminal newer than the vt100, or constrained to regions. I never wonder how a different computer reacts to writing off the bottom right of the screen, because that works just fine on every terminal that matters. The actual differences are relatively minor if you don't do anything esoteric.
A "simple" flat array of pixels means you have to reimplement much of a terminal, such as attribute rendering, font rendering etc. It's not a huge amount of work, but not having to is nice.
So is the network transparency, and vnc etc. is not a viable replacement.
"A simple flat array of pixels means you have to reimplement much of a terminal, such as attribute rendering, font rendering etc. It's not a huge amount of work, but not having to is nice."
For me the debate isn't about implementing a terminal vs something else. I assume one uses tools others build in either case. The question is how much the tools hinder your use case. I find a canvas or a graphical game engine (which implement fonts and so on) hinders me less in building the sorts of tools I care about building. A terminal feels like more of a hindrance.
> I assume one uses tools others build in either case
And there is the disconnect. For a terminal app, you often don't need to.
> I find a canvas or a graphical game engine (which implement fonts and so on) hinders me less in building the sorts of tools I care about building. A terminal feels like more of a hindrance.
And for me it's the opposite. The tools I build mostly works on text. A terminal provides enough that I usually don't need any extra dependencies.
The disconnect might be that I'm counting the weight of the terminal program itself. I think this makes for a more fair comparison. The terminal program is built by others, it often uses many of the same libraries for pixel graphics and font rendering.
I find this thinking bizarre. What matters is the simplicity of the API my code has to talk to.
You'd have a better argument if most people built their own terminals, like I have (mine is only ~2k lines of code, however), as then there'd be a reasonable argument you're writing that code anyway. But most people don't.
Even then I'd consider it fairly specious, because the terminal code is a one off cost to give every TUI application a simple, minimal API that also gives me the ability to display their UI on every computer system built in at least the last half a century.
I write plenty of code that requires more complex UI's too, and don't try to force those into a terminal, but I also would never consider building a more complex UI for an app that can be easily accommodated in a terminal.
I guess I'll stop here then.
Availability over ssh is indeed a good point. I've reduced my reliance over the network at the same time I've grown disenchanted with terminals; thanks for pointing out that connection.
The rest are mutually incommensurable worldviews, and we have to agree to disagree.
I think I've said this to you before, but in case I haven't, I suspect that the best way to handle proportional text in formattable plain text is some variant of Nick Gravgaard's "elastic tabstops" https://nickgravgaard.com/elastic-tabstops/:
> Each cell ends with a tab character. A column block is a run of uninterrupted vertically adjacent cells. A column block is as wide as the widest piece of text in the cells it contains or a minimum width (plus padding). Text outside column blocks is ignored [for layout, though it is displayed].
I think the main deficiency in his proposal is that he enforces a minimum of one space of whitespace padding on the right of his column.
I love that. Thanks.
I think you can avoid making that padding visible.
In terms of making this work in a terminal, what I'd imagine would be having the app still aligning things to fixed-width column boundaries, but handling the elastic tab stops based on knowing a maximum extent of a single-width character used (you'd still need to deal with the unicode mess) and setting tab stops based on that. Adding an escape to report the font extents would be easy.
I'll have to do some experiments on this... Especially as I'm the kind of maniac who likes to align multi-line spans of code in way linters will yell at.
I thought you might like it!
I was thinking you could change the terminal to implement elastic tabstops so that apps written for the elastic-tabstop terminal wouldn't have to worry about font metrics. You could use ^L and ^V as delimiters to nest a whole new table inside a table cell, giving you layout capabilities comparable to the WWW before CSS, similar to how HN is laid out. Languages like C and JS that treat all whitespace the same can have ^L and ^V inserted freely in random places.
I'm going to have to think about exactly how to do this best. My one concern would be to either handle line-wrap properly, or handle overflow.
Overflow is "easy" if you assume a line is always of a reasonable max length: Just turn off wraparound, and print the whole line.
But if you want to handle super-long lines, you'd at least want a way to "clip" the text you're sending to the terminal to avoid the cost of outputting all of it to the terminal each time you're moving past it.
Depending on how, maybe the minimum width of a glyph is instead what you worry about in cases where you don't wrap.
Let's say the terminal is 80 monospaced characters wide, and you switch on proportional mode.
If you have a long line, you just want to know the maximum number of characters you should output to guarantee that the proportional region is full.
Maybe just an escape to get the terminal to report the maximum number of proportional glyphs that will fit in a given field is enough.
The worst case scenario then is a screen full of I's or 1's or similar, where maybe you spit out twice as much text that the terminal propmptly clips away, but most of the time you'd only output a few more characters than necessary, so I think that's fine.
Not sure how to cleanly extend that to allow wraparound without the app needing to be aware of the font. E.g. a text editor will want to be able to figure out which point in the text should be at the top of the screen as you scroll. Doing that with wraparound without knowing font extent would require some other way to measure. Printing a bit extra might be fine.
For non-interactive applications, none of this would matter - you'd just print everything the same way you do now, and let the terminal sort out the rest.
Maybe adding experimental support for that would be a good starting point. E.g. being able to cat a tab-separated file and get an aligned table with proportional text would be an awesome test case.
I'm now also curious how much would break if I "just" start doing proportional rendering by default when not using the alternate screen.
E.g. anything trying to do layout with spaces would obviously break, but I'm wondering if maybe it'd affect few enough non-interactive programs that making that the default, and just wrap/alias the few things that'd break. My shell already filters output of a number of commands to colorize them, like pstree, so having a small "blacklist" of programs I use that'd need to be switched back to monospaced, or filtered further, wouldn't necessary be a big deal.
Damn you for adding more things to my TODO list :-)
I feel like overflow as an error-handling thing would be reasonable to handle in a variety of ways; however, if you start trying to offer HTML-like word wrap inside tables, you start experiencing difficult optimization problems. I think it's probably better to let applications split paragraphs into lines based on some kind of heuristic, embedding the newlines into the text that the terminal has to handle.
A lot of things would benefit from being able to output tab-separated output. (I feel like https://okmij.org/ftp/papers/DreamOSPaper.html used to have screenshots?) Things like pstree want a "dot leader" approach, where you tag a cell (with an escape sequence?) as being padded out to the column width not with whitespace but with more copies of its contents. In http://canonical.org/~kragen/sw/dev3/alglayout.py I'm doing it with the ~ operator, but that's just nesting stacks of hboxes and vboxes for layout and outputting just fixed-width ASCII, not using full-fledged tables and outputting pixels. (There's an OCaml version of that program if that would be more readable; I haven't written a Ruby version.)
And to make matters worse, unlike a GUI, the terminal doesn't provide any semantic information about the content it displays to the OS.
This is a problem for accessibility software, screen readers, UI automation, voice control etc.
If you want a screen reader to announce that a menu option is selected, you need some way to signal to the OS that there's a menu open, that some text is a menu option, and that the option has the "selected" state. All serious GUI frameworks let you do this (and mostly do it automatically for native controls), so does the web.
TUIs do not (and currently can not) do this. While this is not really a problem for shells or simple Unix utilities, as they just output text which you can read with a screen reader just fine, it gets really annoying with complicated, terminal-based UIs. The modern AI coding agents are very prominent examples of how not to do this right.
TUI's could be made do this relatively easily. "All" you need is to pick an escape sequence to assign a semantic label to the following span of text, and have the terminal use whatever OS mechanism to make that available to assistive tech.
Of course, that doesn't help unless/until at least one prominent terminal actually does it and a few major terminal applications adds support for it.
Terminals are full of hacks. For example, in my terminal project linked above the Readme says this:
"Backspace is known to not work in some configurations. As a workaround, typing ctrl-h tends to work in those situations." (https://git.sr.ht/~akkartik/teliva#known-issues)
This is a problem with every TUI out there built using ncurses. "What escape code does your terminal emit for backspace?" is a completely artificial problem at this point.
There are good reasons to deal with the terminal: I need programs built for it, or I need to interface with programs built for it. Programs that deal with 1D streams of bytes for stdin and stdout are simpler in text mode. But for anything else, I try to avoid it.
Sorry for jumping off topic but I came across mu recently - looks very interesting! Hope to try it out properly when I get a moment
Thank you! Hit me up any time.
This is an interesting concept, how do editors like this fair for writing code though?
Immature, obviously. Far fewer person-hours of labor have been put in relative to what you use all the time. But I find it worthwhile to get off the constant treadmill of new versions with features I don't care about. Cutting down on complexity there creates headroom for me or you to try out new approaches I or you might care more about.
My most common development environments these days:
* A live-programming infinite surface of definitions that works well on a big screen: https://git.sr.ht/~akkartik/driver.love Has minimal syntax highlighting for just Lua comments and strings.
* An environment that lets me add hyperlinks, graphics and box-and-arrow diagrams in addition to code. Also works on mobile devices. Examples: https://akkartik.itch.io/sokoban, https://akkartik.name/post/2025-03-08-devlog, https://akkartik.name/post/2025-05-12-devlog
The second set of apps are built using the first approach.
Reminds me of Eskil's apps way back when https://www.quelsolaar.com/love/development.html
Kilo is a fun weekend project, but I learned the hard way that it's not a good base uppon which you should build your own text editor.
The core data structure (array of lines) just isn't that well suited to more complex operations.
Anyway here's what I built: https://github.com/lorlouis/cedit
If I were to do it again I'd use a piece table[1]. The VS code folks wrote a fantastic blog post about it some time ago[2].
[1] https://en.m.wikipedia.org/wiki/Piece_table [2] https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...
My own editor is array of lines in Ruby, and in now about 8 years of using it daily, and having the actual editor interact with the buffer storage via IPC to a server holding all the buffers, it's just not been a problem.
It does become a problem if you insist on trying to open files of hundred of MB of text, but my thinking is that I simply don't care to treat that as a text editing problem for my main editor, because files that size are usually something I only ever care to view or is better off manipulating with code.
If you want to be able to open and manipulate huge files, you're right, and then an editor using these kind of simple methods isn't for you. That's fine.
As it stands now, my editor holds every file I've ever opened and not explicitly closed in the last 8 years in memory constantly (currently, 5420 buffers; the buffer storage is persisted to disk every minute or so, so if I reboot and open the same file, any unsaved changes are still there unless I explicitly reload), and it's not even breaking the top 50 or so of memory use on my machine usually (those are all browser tabs...)
I'm not suggesting people shouldn't use "fancier" data structures when warranted. It's great some editors can handle huge files. Just that very naive approaches will work fine for a whole lot of use cases.
E.g. the 5420 open buffers in my editor currently are there because even the naive approach of never garbage collecting open buffers just hasn't become an issue yet - my available RAM has increased far faster than the size of the buffer storage so adding a mechanism for culling them just hasn't become a priority.
Oh by "more complex" operations I referred to multiple cursors and multi line regex searches. I've noticed some performance problems in my own editor but it's mostly because "lines" become fragmented, if you allocate all the lines with their own allocation, they might be far away from each other in memory. It's especially true when programming where lines are relatively short.
Regex searches and code highlight might introduce some hitches due to all of the seeking.
Kakoune has been my main editor for the past year (give or take) and uses an array of lines [0]. Ironically, multi-cursor and regex are some of the main features that made it attractive to me.
I just tested it out on the 100MB enwik8 file I have laying around and it does slow down significantly (took 4-5 seconds to load in the file and has a 1 second delay on changing a line). But that is not really the type of file you would be opening with your main editor.
[0]: https://github.com/mawww/kakoune/blob/2d8c0b8bf0d7d18218d4c9...
I'd love to see the code of that editor. Is it publicly available somewhere?
There's a wildly out of data repo here[1] that I badly need to push updates to, and with the caveat odds are there are lots of missing pieces that'll make you struggle to get it working on your system. I wouldn't recommend it - I dumped in Github mostly mostly because why not rather than for people to actually use.
Difficulties will include e.g. helper scripts executed to do things like stripping buffers, a dependency on rofi when you try to open files, and a number of other things that works great on my machine and not so well elsewhere.
I have about 2-3 years worth of updates and cleanups I should get around to pushing to Github that does include some attempts to make it slightly easier for other people to run.
The two things I think are nice and worth picking up on is the use of DrB to get client-server, which means the editor is "multi window" simply by virtue of spawning new separate instance of itself. It's then multi-pane/frame by relying on me running a tiling wm, so splitting the buffer horizontally and vertically is "just" a matter of a tiny helper script ensuring the window opens below/to the right of the current window respectively.
But some other things, like the syntax highlighting (using Rouge) is in need of a number of bugfixes and cleanups; I keep meaning to modify the server to keep metadata about the lines and pull the syntax highlighting out so it runs in a separate process, talking directly to the server, for example.
[1] https://github.com/vidarh/re
The core data structure (array of lines) just isn't that well suited to more complex operations.
Modern CPUs can read and write memory at dozens of gigabytes per second.
Even when CPUs were 3 orders of magnitude slower, text editors using a single array were widely used. Unless you introduce some accidentally-quadratic or worse algorithm in your operations, I don't think complex datastructures are necessary in this application.
The actual latency budget would be less than a single frame to be completely non-noticable, so you are in fact limited to less than 1 GB to move per each keystroke. And each character may hold additional metadata like syntax highlight states, so 1 GB of movable memory doesn't translate to 1 GB of text either. You are still correct in that a line-based array is enough for most cases today, but I don't think it's generally true.
Movement of GB's of data being noticeable should be considered a feature, imho.
And if those GB's represent text, with user trying to edit that as a single file, well then... PEBKAC.
> The core data structure (array of lines) just isn't that well suited to more complex operations.
Just how big (and how many lines) does your file have to be before it is a problem? And what are the complex operations that make it a problem?
(Not being argumentative - I'd really like to know!)
On my own text editor (to which I lost the sources way back in 2004) I used an array of bytes, had syntax highlighting (Used single-byte start-stop codes for syntax highlighting) and used a moving "window" into the array for rendering. I never saw a latency problem back then on a Pentium Pro, even with files as large as 20MB.
I am skeptical of the piece table as used in VS Code being that much faster; right now on my 2011 desktop, a VS Code with no extra plugins has visible latency when scrolling by holding down the up/down arrow keys and a really high keyboard repeat setting. Same computer, same keyboard repeat and same file using Vim in a standard xterm/uxterm has visibly better scrolling; takes half as much time to get to the end of the file (about 10k lines).
From what I have experienced the complex data structures used here are more about maintaining responsiveness when overall system load is high and that may result slightly slower performance overall. Say you used the variable "x" a thousand times in your 10k lines of code and you want to do a find and replace on it to give it a more descriptive name like, "my_overused_variable," think about all of the memory copying that is happening if all 10k lines are in a single array. If those 10k lines are in 10k arrays which are all twice the size of the line you reduce that a fair amount. It might be slower than simpler methods when the system load is low but it will stay responsive longer.
I think vim uses a gap structure, not a single array but don't remember.
I am not a programmer, my experience could very well be due to failings elsewhere in my code and my reasoning could be hopelessly flawed, hopefully someone will correct me if I am wrong. It has also been awhile since I dug into this, the project which got me to dig into this is one of the things which got me to finally make an account on hn and one of my first submissions was Data Structures for Text Sequences.
https://www.cs.unm.edu/~crowley/papers/sds.pdf
VS Code used 40-60 bytes per line, so a file with 15 million single character lines balloons from 30 MB to 600+ MB. kilo uses 48 bytes per line on my 64-bit machine (though you can make it 40 if you move the last int with the other 3 ints instead of wasting space on padding for memory alignment), so it would have the same issue.
https://github.com/antirez/kilo/blob/323d93b29bd89a2cb446de9...
> a file with 15 million single character lines
I have never seen a file like this in my life, let alone opened one. I'm sure they exist and people will want to open them in text editors instead of processing with sed/awk/Python, but now we're well into the 5-sigma of edge cases.
How timely, I just finished going through a tutorial that builds a text editor like kilo from scratch: https://viewsourcecode.org/snaptoken/kilo/index.html
Would highly recommend the tutorial as it is really well done.
I remember that tutorial fondly.
I played around with kilo when it was released, and eventually made a multi-buffer version with support for scripting with embedded Lua. Of course it was just a fun hack not a serious thing, I continue to do all my real editing with Emacs, but it did mean I got to choose the best project name:
https://github.com/skx/kilua
Here’s a second recommendation for that tutorial. It’s the first coding tutorial I’ve finished because it’s really good and I enjoyed building the foundational software program that my craft relies on. I don’t use that editor but it was fun to create it.
This is one of my favorite moderate-level projects for playing with different programming languages.
The original in C: https://git.timshomepage.net/tutorials/kilo
Go: https://git.timshomepage.net/timw4mail/gilo
Rust: https://git.timshomepage.net/timw4mail/rs-kilo
And the more rusty tutorial version (Hecto): https://git.timshomepage.net/tutorials/hecto
PHP: https://git.timshomepage.net/timw4mail/php-kilo
...and Typescript: https://git.timshomepage.net/timw4mail/scroll
Author of hecto here, thank you for mentioning it! I wrote the first version around 5 years ago and I’m happy that people still use it. (I updated it in the meantime)
Reading through this code is a veritable rite of passage. You learn how C works, how text editors work, how VT codes work, how syntax highlighting works, how find works, and how little code it really takes to make anything when you strip away almost all conveniences, edge cases, and error handling.
I made a similar editor using Lazarus... since it has syntax highlighting components... I guess that's cheating. The more I think about it though, I wonder if Freepascal could produce a nice GUI for Neovim.
I did try to build one in Qt in C++ years ago, stopped at trying to figure out how to add Syntax Highlighting since I'm not really that much into C++. Pivoted it to work like Notepad so I was still happy with how it wound up.
https://github.com/Giancarlos/qNotePad
It also inspired this similar Rust project: https://github.com/ilai-deutel/kibi#comparison-with-kilo
Although it does cheat a bit in an effort to better handle Unicode:
> unicode-width is used to determine the displayed width of Unicode characters. Unfortunately, there is no way around it: the unicode character width table is 230 lines long.
Personally, this is the reason I don't really buy the extreme size reduction; such projects generally have to sacrifice some essential features that demand a certain but necessary amount of code.
A lot of those features are only "essential" for a subset of possible users.
My own editor exists because I realised it was possible to write an editor smaller than my Emacs configuration. While my editor lacks all kinds of features that are "essential" for lots of other people, it doesn't lack any features essential for me.
So in terms of producing a perfect all-round editor that will work for everyone, sure, editors like Kilo will always be flawed.
Their value is in providing a learning experience, something that works for the subset who don't need those features, or a basis for people to customise something just right for their needs in a compact way. E.g. my own editor has quirks that are custom-tailored to my workflow, and even to my environment.
You are right, but then there is not much reason to make it public because it can't be very useful for general users. I have lots of code that was written only for myself and I don't intend to publish at all.
There's plenty of reason to make it public as basis for others to make it their own, or to learn from.
I have lots of code I've published not because it's useful to a lot of people as-is, but because it might be helpful. And a lot of my projects are based on code written by others that was not "very useful for general users".
E.g. my editor started out based on Femto [1], a very minimalist example of how small an editor can be. It cut some time of starting from scratch, even though there's now practically nothing left of the original.
Similarly, my terminal relies on a Ruby rewrite of a minimalist truetype renderer that in itself would be of little value for most people, who should just use FreeType. But it was highly valuable to me - allowing me to get a working pure-Ruby TrueType renderer in a day.
Not "very useful for general users" isn't a very useful metric for whether something is worthwhile.
(While the current state of my editor isn't open, yet, largely for lack of time, various elements of it are, in the form of Ruby gems where I've extracted various functionality.)
[1] There are at least 3 editors named Femto, presumably inspired by being smaller than Nano, the way Nano followed Pico, but this is the one I started with: https://github.com/agorf/femto
> It also inspired this similar Rust project
And these projects:
https://github.com/antirez/kilo/forks
Ah darn. Closing in on retirement (will never happen, coding is too much fun for profit or charity) age, I resistent building an editor but I want to. Need to. I hacked so much vim, emacs, eclipse, vs code and its all crap (the newer, the worse: all these useless gimmicks you won't use past grade school aaarrr while lacking power user features). Can I do better? This seems a good start.
Funny. These days when I see a headline like that, I assume it's some type of web component.
Why are all the commenters so eager to get out of terminals?
One interesting thing is that even some of those 1000 lines could have been eliminated.
It duplicates the C library's cfmakeraw() function, for instance.
https://man.freebsd.org/cgi/man.cgi?query=cfmakeraw&sektion=...
This seems like a great alternative for Nano; though Nano is really good and just works.
ed is the standard text editor.
So a text editor is about grid manipulation ?
on first look, the name sounds heavy, but the product actually turns out to be very light.
go figure.
;)
Last serious work on this was in 2020. Lacks news worthiness imho.
This post might be a response to yesterday's announcement of Microsoft's new Edit editor, which has some of the same features as kilo.
1. https://news.ycombinator.com/item?id=44031529
Probably not, since Kilo was first posted here 9 years ago https://news.ycombinator.com/item?id=12073926
Check my YouTube channel for an experiment on how I added UTF-8 support to Kilo showcasing what was, back then, some early experiment at using LLMs to ship working code. I need to push the changes after some cleanup, but that's not enough for a Kilo 2, as it makes things more complicated without showing worthwhile programming techniques. So it's sitting on my laptop.
[dead]