Operating system user interface toolkits (Article)

Description :: Ugh.

Why does the VCL let me set, for every scrollable area, whether or not the scrollbars should act smoothly or not? (That is, should it redraw while the user scrolls, or wait until the user is done scrolling and redraw only once?) Why can I set the font on every button, change the background color of various elements, or otherwise futz with things? Why do some applications go and invent neat features (such as a better color-picker or the ability to drag tabs to re-order them) and yet those innovations aren't available to all the other applications I run? Why does Microsoft provide us with a "places" sidebar for the standard File Dialog, and give TweakUI the ability to modify the default list of locations, but then re-invent it for new versions of Office and ignore our custom settings? (They did it again with cute balloon tips, and then the Office "ribbon", which, by the way, you can't get rid of if it's not your thing?) More generally, why are the widgets I'm given and use oriented towards "how" rather than "what"?

So here's what I want.

I want a toolkit that focuses on the "what" of writing applications, and lets the presentation and flow be handled externally. I don't care how a user picks a color, so long as the user does so. I don't care how a user decides to run a command, so long as the user does so. I don't care how a user picks a date (whether it's typed, picked from a calendar, taken from a list of dates recently seen in the user's emails, or randomly picked by the operating system) so long as it's picked. I don't care how the user picks one item out of many, whether it's through radio buttons, drop-downs, a single-select listview, a dial, a popup menu, or some other system I have yet to see. I just want the user to pick one item, and only one, from the list provided. That's all.

I also don't care whether my user is using a GUI or not. The flow of my program isn't affected by that. If I'm writing a wizard, I know I have a multi-step process in which each step must meet certain criteria. Moving forward depends on previous data, moving back must necessarily recompute the rest of the path. Where are windows in this? Nowhere. Where's the "back button"? Nowhere. There's a concept of a user asking to "go back", but nothing about how. The user, for all I care, could have a psychic link with the computer where she simply thinks "go back, damn you" and it does. Maybe there's a special key on the keyboard, or a hologram in the air. Maybe clapping your hands makes it go back. Maybe you simply type, at a command prompt, "back" and hit 'enter'.

What would the toolkit include? A concept of 'task', at least. A concept of 'action' or 'command' as well. A concept of datatype-specific areas of data input, data output, and combined data input/output (edit fields). We commonly mix these, but I've found it's dangerous. The system should not predefine these datatypes. It should not have a list of components, such as "checkbox", "listview", "drop-down", "text edit". Applications should request the entry of values by type only, and let the toolkit find the user-preferred mode of entry. I like to enter colors by hex code, you like to use decimal bright/color/saturation, and my friends all prefer different types of color pickers: to each his own! When picking one item out of several, I prefer something that lets me do a single click, because it's faster. When there are many items, I know screen space is limited. Why not let the system pick, based on the number of items and available space, which to use? There would be concepts of grouping controls together, yes. "These controls" modify "those controls" (changing the "zoom" level will update the map). There would be a concept of which actions require which input, as well: if you're using the command-line, you would want to be prompted for any information you didn't immediately provide. If you're using a GUI, you might want it to inform you that you didn't fill out a specific field. You would want a good concept of validity. (Although ideally, all controls allow you to input only valid data, not "let it pass" until later, problems arise when you're dealing with multiple controls whose validity depend on each other. This is "tuple validity" rather than "attribute validity", and we can relegate validity-checking to actions, such as "Save" or "Commit".)

You've got a table of data? Good. Let that be that. Some users want a listview, others want a graph, others want a tree, others want a dependency map, others want ... uh ... oh, sure, why not: progress bars in the listview! You shouldn't, as a programmer, have to decide this, and users should not, as users, have to deal with what you've chosen. The data's there, it should be usable in whatever way the user pleases. If the "table-like-data" is editable, you should only have to provide validity-checking procedures, and otherwise let users roam free on the data. You're safe, you don't allow anything invalid. (See 'transition' vs. 'state' validity checking in one of my other articles.) Who cares how the user goes about editing the data? Do they do it directly in the listview? In an area next to it? In a window that pops up? Psychically?

Oh, and the thing that started this: tabs. You've got too much stuff to fit on one screen, you think. So you add tabsheets. Why not a list of pages? Why not a tree of pages? Why not have all the pages, one after the other, in a very tall scroll-box? Why not also have them be collapsible, so the user can save space as desired? Why not have an icon list? Why not just have a drop-down of available pages? Why have multi-line tabs rather than tabs with that ">" button ("tear" in the name, I think)? Just let your users use whatever "area-picking" thing they want, assuming that's even relevant. (Is each page a task? Are these just groups of controls that make sense together? Is there some other way to describe why things are on certain tabsheets rather than others?)

Speaking of tabs: you've noticed (surely) that applications have started adding tabs into their interfaces recently, right? Tabbed instant-messaging (as in GAIM), tabbed browsing (Firefox), tabbed email, tabbed... whatever. But every application has to do this individually. Adding it actually requires effort on the part of developers of each piece of software. Why does the operating system (or windowing system, or windowing toolkit, or user interface toolkit) not simply know that several tasks are "similar" and give you tabs to navigate between them if that happens to be your preference? Rather than have Firefox developers "add" tabbed browsing, have your windowing system add tabs (if you like) around all browser windows (assuming each is a task)? (Yes, it would most likely be slightly less pretty, as, most likely, the tabs would be on the outside of the window. But that's not the point.) There are legacy MDI (multiple-document-interface) applications that were built with the standard "Window" menu in their main menu, from which you could get to any of the currently open documents; is this not effectively the same as tabbed workspaces? Can the GUI layer not recognize this and switch between modes automatically? If you like tabs, why shouldn't you get a tabbed workspace, since the application is already an MDI? And if you don't like IE's new tabbed browsing, why not be able to switch back to an MDI mode, with multiple sub-windows that can be tiled horizontally, tiled vertically, or cascaded?

And why in the world does your software require you to read through help pages to figure out how to change an option or perform some action? In the command-line world, we have tab completion -- you may not know exactly the command name, but at least you can start typing and it'll give you a list of options. I've heard of GUI apps for credit unions where you can use drag'n'drop to perform transactions, but the app also includes, essentially, a command-line. Why do we not at least have a bar at the bottom of Microsoft Word where we can type "tab", hit enter, and a tabbed interface comes up with the various interfaces you might mean? One of the tabs would actually include the options for indenting the section of text you currently have highlighted, which you couldn't find because with the new Office, you have to go into Paragraph [->], notice a Tabs... button at the bottom, then bring up that dialog, etc. If you don't remember how to get to "merge cells" in Excel, shouldn't you be able to type "merge" and have it at least bring up a set of possible commands you might want to execute? If you want to do the same thing over and over again, a command-line is really pretty great. Some apps build-in support for VBA or Python or Lua as a scripting language, but then hide it, or force you to develop functions first, then execute the functions. Some users really want to just do one thing, and keep hitting enter, and have it do it over and over again. Shortcut keys are nice, but you can't have them for every command; forcing a user to go through a set of menus and submenus just to click, again and again, on "Gaussian blur" until they're happy with the result just isn't nice.

I suppose you can see a push for this, somewhat, in the "middle/business tier" concept. This isn't strictly a "three-tier" idea, you just need to separate the application GUI from the logic it deals with. The gimp, for example, is written such that you can call methods in its libraries, run commands from a command prompt, or use an application (with GUI) to modify your files, and the same code is used in all cases. The code specifies only "what can be done", both in terms of actions and requesting information. Server-side java applications are typically aiming for this; classes will provide functions to get/set properties, and other code is responsible for representing that to the user. This isn't quite what I want, in that without a separate, customized layer to do the interaction, the libraries don't "do" anything. We don't have a generic layer that knows how to talk to code libraries and offer you the proper options for 'things you can do now', ask for parameters, validate them, and pass them to the functions defined in the library. But it's a very, very similar concept.

Ideally, the toolkit would be such that an application would not need to be rewritten when porting from one environment to another -- whether from DOS to Windows or KDE to floating holograms, or Braille tactile interfaces, or any other. Only the representation (rendering) layer would need to change, but when such a layer is written, it should apply to all applications, not just one. (That's the main difference with just writing libraries that various applications can use: the library doesn't change when porting, but the application does, and the change doesn't apply to other libraries.)

Thanks to a Slashdot article, it's obvious to me now that this all relates to making interfaces useful to blind, deaf, or otherwise physically impaired persons. By having a plugin-oriented layer between 'what' the application needs and provides (input and output) and 'how' the user will handle that, we can replace visual interfaces with audible ones, we can provide tactile interfaces where once there was only text, and we can rid ourselves of the hack that is screen-scraping.

This is all about presentation vs. interface -- now if only CSS had delivered on the promise of separating the two. (In case you hadn't noticed: CSS still very often depends on the original HTML author naming tags with easily-identified and very precise names and/or classes. Without that, a third party can't very well, or at all, create CSS to transform a page as desired. The information layer is still required to provide clues to the presentation layer in non-orthogonal ways.)

Heck, one of my skunkworks projects at work is to convert all of our listviews, across the entire application (500+ screens at the moment, some with many listviews) to be presentation-agnostic. The idea is that the data should be available to the user, but the presentation of that data (currently a listview, sortable, with columns that you can turn on/off and some rows you can turn on/off based on a specific validity field) should be up to the user. We should be able to add new views, and let the user pick. We're not talking "make a chart out of this" but rather "see this as a chart", directly in the space provided on the screen, or as a pop-up window, I suppose. Regardless, we should separate the data from the presentation of the data. Even if this works out, and even if we get lots of different presentation modules working, that feature will still be bound to our application. I won't suddenly have that across all applications on my OS. And that annoys me to no end.

[Tired. Will/did write more later. Still more to come, maybe.]
Someone else going on about the same sorts of problems, with similar proposed solutions.

Continued at top