Article > Programming languages
Description :: New languages seem to pop up all the time -- why?
I hear it all the time. Cobol is better for reports, C for system stuff, Java for portability, C++ for games, Python for in-program scripting, Perl for command-line administration utilities, PHP for web-page coding, ASP for something similar but different, Ada for things that need to not crash, and so forth.

I've programmed in several of those languages, though I mostly code in C++ and occasionally (when maintaining this website) do a bit of PHP. I haven't touched Cobol since the last time I had to use it for a college class, even though I write printable reports at work. I've coded in assembly, and had fun doing it (despite the fact that it was x86 assembly, which I hear is about the worst one to learn with) but don't use it for anything on a yearly basis.

Look, another new language!
While poking around a few nights ago, I came across a description of XUL, a language for doing forms and other scripting work using XML for the language, and seeming to be meant to be embedded in things like webpages. It supported a sort of "structure" definition for building your own datatypes, it had function calls, built-in datatypes, and all the usual stuff that comes with yet-another-programming-language.

So I wrote myself a note, in knee-jerk fashion, telling myself that my current ill feeling wasn't from dinner, but from new languages popping up all the time, with baggage. Baggage? What baggage?

That's a problem. Ideally, I'd like my programming languages to only differ in style or ease, but not in functionality. I should be able to code in any language, re-using functions from other languages, passing values regardless of datatype, trusting that things will happen as intended. After all, to me, a language isn't about the datatypes or the functions made available by default, it's about flow structure: the if/then/else type of stuff. The rest is orthogonal to the language: who cares if you're adding 8-bit or 32-bit integers; it should just work, right?

Programming languages are quite a bit like human languages: they blur syntax and semantics. When we say "english", we don't just mean the grammar of the language, the availability of certain constructs involving adverbs, punctuation, etc. We also include a certain amount of vocabulary (though most people don't know most of the available vocabulary, but still seem to get by speaking the language) and meaning in expressions. It's like a toolbox that comes with a set of screws of various sizes, some spare car parts, and an extra rubber-band. You're not sure why you have the extra stuff, but it's a bundled package. You can't speak english without using english words. You can't even just interject words and phrases from other languages, and expect them to fall into place cleanly. Grammar varies across languages, so your sub-expression may not make sense in context. The words you choose may be similar across languages, but vary slightly in connotation.

I'm an idealist. As an idealist, I look at programming languages I've used and think "in the end, it's all about running sequences of bits through a processor, manipulating other bits, to accomplish some task." It seems like they're all fairly equal. It also seems like I should get to pick-and-choose what I want, based on my situation, without suddenly being locked into a language.

Common ground
IDL and Corba are examples of some work to make this possible. They're not perfect, but they define datatypes and functions outside of any given language (thereby defining yet another language, though really not a full programming language.) More than that, they abstract away locale: you can make a "function call" that results in sending data across a network to be massaged and sent back. The other computer might be running code written in Cobol, while your code is written in C. You're not supposed to notice. Problems? Not all languages support Corba-style type definitions. Moreover, they require that you use certain datatypes that are common to all languages (common denominator) such that you can only define your own datatypes in terms of an existing pre-defined list of datatypes that was decided on the basis of already-existing languages (that is, the ones that support and are supported by Corba.) Another problem? Well, not all programming languages allow the same things.

Languages and what they understand
For example: C++ lets you define a generic function that can take any number of datatypes as parameters, and still attempt to accomplish the same thing, relatively-speaking. A generic sum() function could take integers, and add them. It could also take text, and concatenate (put end-to-end) the given bits of text. Not all languages respect this kind of definition. Many require that you define a separate function for each possible set of datatypes to be used.

Another example: some functions know what a pointer is, others don't. That's a problem if you define a datatype that relies on pointers, like linked-lists. One language can poke around at the innards, while another can't (and might not even be able to make use of the provided "interface" functions, such as a function that inserts a new value in the list, given a pointer to the position at which to insert the new value.)

C++ lets you define a datatype in terms of several others: the new datatype now acts as all of them simultaneously. Java doesn't quite allow this: the datatype can act like one other datatype, and supports functions just like the rest, but you have to rewrite how those work each time. Some languages, like both of the above, support methods -- asking an object to perform an action, as opposed to functions or procedures that are independent machines with input and output, but no memory of what they've been doing. If you allow datatypes like "person" to print out the name and address of the person, then languages lacking this knowledge will need access to a function that takes a "person" as input, and generates output as text. (Note: methods are, at a lower level, just functions that take an implied parameter that is the value of the thing on which they're called. It works, but it's a bit weird.) And what of exceptions? One language may allow a function to just skip returning a result, and "throw" an exception (a bit like throwing a tantrum) instead. The code that called the function has to know what to do when that happens -- but not all languages allow for this, and therefore wouldn't know how to cope when no result is given, but an error appears instead. Where's that number they were waiting for? Terribly confusing.

I also dream that you could use any language you like in any given situation. I haven't quite touched on this, so I'll explain now.

Context-sensitive languages
One language I deal with often is SQL (at least, a dialect of SQL.) My database server lets me mess around with its variables through this language. Great. Why not use, say, Lisp code inside my server? Some database servers might allow this (such as Postgresql) while others might not. I've got some modelling software (trueSpace) that comes with the ability to use Python code to script objects, but you can't exactly just go out and use another language "just because you want to." In some cases, a language is designed specifically for a given environment: it is assumed that if you're writing code in that language, certain facilities will be available to you, certain variables will be around. Most of the time, it seems like this happened because control was desired: making sure that no programming language, allowed in, would be like a bull in a china shop, knocking everything over.

Knowing what to do with those tiny little bits
I'm an idealist, but I have to deal with the fact that it's not going to be safe to allow any language to interact with any environment, passing off requests to functions written in other languages, using datatypes designed in yet another. At some point, the values are going to be passed to, say, some C code that can start messing with individual bits, bypassing all sorts of security. Or perhaps I'll call a function written in a language that doesn't precisely care about datatypes, and just watch my data get eaten. It's all just bits -- and bits aren't safe.

Another problem? You normally need something to "start with" -- most languages give you basic building blocks that don't depend on anything else to work. From those, you can start to write new, better things. From the availability of text variables, you can build a structure that defines a street address; from integers, you can do the memory-manipulation required to create linked-lists or matrices. Asking languages not to come with pre-packaged stuff is just asking for them to be useless, like english with no vocabulary. (Orwell, 1984: That for which we have no word, cannot be expressed.) Yet defining default datatypes leads to immediate incompatibility with other languages: my integer isn't like yours, etc.

Why do I care? Well, I want a database server that is, basically, just a server which can run functions on values, storing things in variables, etc. without consideration for the language used to describe what I want, nor any consideration for what datatypes and functions are available, or what language was used to create them. Inherently, however, I have a problem: I have to decide what the limits are on function calls, for example. Can they accept any number of parameters? Does the order of parameters matter, or do I specify a name with each parameter? Can I find out ahead of time if parameters of certain types or values will work, or do I have to wait for it to fail?

I dream of a day when I can pick up any software that allows scripting (ah, did I mention that there's a problem here involving compiled-only languages and interpreted-only languages, and the few that can be either?) and tell it where to find definitions for my favorite language, and start hacking. Are datatypes available in this new context that my language has never seen before? I should still be able to use them, at least through functions defined for them.

I also realize "it's not gonna happen" -- and I'm annoyed.

One ring to rule them all?
I'm not sure I can even write my own language to talk to all the others. Their most basic definition of what makes up a datatype, a function call, and so forth might make it impossible. Most languages represent tradeoffs of various sorts, because making everything available just doesn't work. A simple language seems best, but how would it make use of other languages? A complex one might do that, but then roles are reversed: I might use a feature, expecting another language to "know what I mean" (say, the use of a pointer to a function, meaning "call this function when you're done" -- so-called "callback" functions.)

Besides, in a given context, who cares about having several languages available? You only need to provide one that can do the job, right? Sure, it's proprietary, sure it comes with its own set of bugs and limitations, sure it means learning yet another language ... but at least you know it works in-context! But what language to use? I don't know what will be needed ... I only know what the context is. Damn.

[This article is just part of my ramblings, involved in my actually and naively trying to write my own fully-functional database server. Why? Because variety is good? And because I believe in making things truly orthogonal (that is, making things be modular and interchangeable, completely independent problems that can be solved separately without affecting other parts of a system,) if they can be. I'm an idealist, remember?]