The file type dilemma (Reference)

Description :: Where should a file's type be stored?

I found this article insightful, and interesting mostly because it helps tie different fields of study together: namespaces, domain theory ... it's all there. He just happens to not be (that I can tell) a database nut like I am.

So, at some point in the past, the type of a file was stored separately from the name on Macs. It was stored with other metadata, including the creator, and various important dates in the file's life. Some decision-making -- prompted by the need to interoperate with other file systems which did not have a native, logical place to store the file type other than as an addendum to the file's name -- resulted in MacOS X storing the file's type in two locations: one in metadata, the other in the file's name. This was generally seen as a bad move.

The author also mentions MIME types, and their transmission in HTTP headers ahead of the data. This is another case where, technically, the file's type isn't determined solely by name (and can be determined entirely without the name -- you'll note that the site graph generated on this website is in PNG format, but your browser shows the file as being the result of a PHP script that doesn't include a PNG file extension). I think a natural extension to this concept is to see the file's type indicator(s) as part of the file's data itself. Maybe not.

For the sake of argument, also remember that many file types use special headers in the data to indicate (redundantly) the type of the file: all JPEG files start with the same heading, for example. If you open a file that looks to be named "something.avi" in Windows' Media Player, and the player detects that the file is actually an MPEG file (according to the data portion), it can still recover from the naming issue, though it'll warn you first. Here, the file's name just helps you get the right application going, but the application double-checks the file for the exact type, just in case. The type (that matters) is in the file. Cool, eh?

[Before I forget: in Pascal, and other languages too I'm sure, the stored type of a value is the "tag", particularly in "union" or "record" structures, where the type of the value may change during the lifetime of the variable, and you need to know which type it currently is. Tag. Good word to remember.]

[Also: in databases, particularly Firebird, you can declare "domains" which are both nominative (by name) and constrained (by a check constraint) but not structural; you can say that DEGREE_F is a new type of floating-point number, with its own constraint (nothing less than absolute zero), while DEGREE_C is another type of floating-point number with a different name and constraint -- but you're unfortunately still not prevented from comparing values from the two types directly even though it makes no sense to do so. The system still looks at the values structurally when performing operations on them, as the various constraints on DEGREE_F and DEGREE_C don't impact the ability to perform number plus number operations. Nominative vs. structural vs. constrained: another distinction to keep in mind.]

Continued at top

Link :: http://arstechnica.com/reviews/os/metadata.ars/3

Extras