Cascading File Types -- A different kind of Microformat
secretGeek .:dot Nuts about dot Net:.
home .: about .: sign up .: sitemap .: secretGeek RSS

Cascading File Types!

A different kind of Microformat

Say you've just created an application, and it uses a new type of file. This new type of file will be identified by its very own file extension, associated with the new app.

For example, you might use a ".snapper" file extension, even though the file itself is just "xml".

The ".snapper" filename is not very helpful to a user, as it hides the fact that this is an xml file. The only way for a person to work out that this is an xml file, would be to look inside.

Conversely, a ".xml" file extension would be unhelpful to the operating system, as it hides the fact that this is a snapper file. Again, the only way to work out that this is a snapper file, would be to look inside, and find the schema that this document matches (if any) (knowing me, probably none... sorry).

And this is a common scenario, particularly with variations of xml files.

So i'm suggesting a new micro-format, and this micro-format has nothing to do with the current microformat buzz on the internet. This is to do with multiple file extensions, set theory, cascading inheritance, and all sorts of tricky stuff. Yet it's very simple.

You can pick it up in under a minute.

(continues...)

Instead of just one file extension, why not give a file a whole bunch of file extensions, starting with the least specific and ending with the most specific!

An over the top example would be:

MyTimesheetSettings.txt.sgml.xml.snapper

What does this file name mean?

  • Everything before the first dot is the name itself.
  • After the first dot we have the most general type of the file: it's a text file.
  • Then we have a more specific rule: it's an sgml file.
  • Then a more specific fact again: this particular sgml file is xml.
  • Then a more specific fact again: this particular xml file is of type 'snapper'.

Now this could be useful if, for example, the only verb defined for .snapper files is 'open', but the 'edit' verb is defined for .xml files.

Or maybe on your system, you don't know how to edit xml files, but you do know how to edit text files. Then, right clicking on the file in windows explorer, you'd not only have the choice to open the file with TimeSnapper, for example, but also to edit it with a text editor.

Today we often layer a specific format inside a general open format. And general open formats are built upon more general, more open, formats. (We could be fancy and call it some kind of aristotelian hierarchical classification system... but it's been too long since i read sophie's world, so i'm not gonna keep pretending i remember that stuff)

Anyway, I came up with this idea for a different reason altogether.

Bloody Polyglotics Again!

What if a file combined two languages, intermingled in the one document. For example, what if a file could be opened both as a valid sql file, say, and as a seXml file. Or as a C# file, and a seXml file.

There's a technical name for a program that can be compiled by two different compilers, and after a lot of googling i tracked it down... polyglot!

A more general case: what about files that contain multiple discrete syntaxes in a single document. A common example: a Valid html file might also be a valid xml file. You want to view it as html, but you want to edit it as xml.

(Okay we have the xhtml extension for that... but if we invent new extensions for every combination of two or more existing extensions, we'll be looking at a lot of extensions within the next ten thousand years.)

Or how about a file that combines javascript, css, and html. Perhaps you'd like to edit the css component in one application, the javascript component in another and the html component in a third. Maybe these multiple file extensions could allow for such behaviour.

(In this last case, the applications would need to be clever enough to know the data they're interested in, and to avoid the data they're not interested in. But it's kind a possible.)

(What i'd like to see is a codegenerator that spits out all types of files (it might create ".cs" files, ".config" files, ".sql" files and everything else). But by adding other names earlier in the list ".wscg.cs", ".wscg.config", ".wscg.sql", it can still reserve the right to edit these file types... even though it knows nothing about them. Provided it knows how comments work in the target format, it can embed it's own iXml or seXml tags amongst these comments.... possibly providing enough information to re-generate the files, and identify user edited portions...)

Well, that's my 'microformat' idea of the day. It's only micro-useful, so don't micro-flame me.

A follow on thought from this was covered in yesterday's iXml post.

[Update: renamed as 'Cascading File Types' based on comment from Jonno. Cheers Jonno!]





'Dan F' on Fri, 07 Jul 2006 01:44:46 GMT, sez:

Dude! I like it a lot. Hope someone at MS reads this.



'Jonno' on Fri, 07 Jul 2006 03:09:41 GMT, sez:

very good idea.

But to get more traction I think you need a more descriptive name for the idea than calling it a 'microformat'.

Maybe something like 'cascading file types'



'lb' on Fri, 07 Jul 2006 03:16:21 GMT, sez:

good call jonno -- i'm gonna run with that.



'Ian Horwill' on Fri, 07 Jul 2006 05:32:41 GMT, sez:

I'm waiting for the day when MS finally produces an O/S that doesn't require meta file information, i.e. file type(s), to be embedded in the damned file name! Maybe in Vista+1 the excitement of allowing extensions to be longer than 3 chars will have died down enough for someone to think about it.



'lb' on Fri, 07 Jul 2006 06:11:22 GMT, sez:

yeh ian, there has long been this kind of embarrassment about the concept of file extensions.

and hence they've never been pushed or touted.

but the fact is that we're stuck with them now. We can undo history. Hence i say we should embrace them! take them further!

turn your thinking around: if we're stuck with them, we'd better make the most fo them. so then... what can we do with them? harness the power of the meta-file-information!!

there just abbout the only piece of meta data i've ever seen that is almost always (a) present and (b) correct... think about it....


thanks for the feedback man.



'Stef Robb' on Fri, 07 Jul 2006 07:25:02 GMT, sez:

I quite often have to email Windows Registry patches, .reg files. Most email content-filtering systems will strip these out for obvious reasons, so I usually just append .txt on to the end of the filename giving a .reg.txt extension.

I've though for some time that in the Ideal World a double click would open such a file in the user's primary text editor and a right-click would provide a list (cascaded if > 1 entry) of alternative editors/handlers based on the hierarchy of extensions - in this case offering regedit as an alternative.

I'm sure this could already be implemented now by some keen developer, no need to wait on Microsoft.



'Chad' on Fri, 07 Jul 2006 11:38:40 GMT, sez:

This convention is actually already used in the *ahem* video encoding enthusiast communities. Typically you will see a .divx.avi or .xvid.avi extension on many of the encoded video files.



'Keff' on Fri, 07 Jul 2006 16:24:02 GMT, sez:

Nice, but sooo far from the real world :).
"Everything before the first dot is the name itself" - sorry, it isn't. Just check your mp3 collection. Or anything written in Java. How about classes.console.something.something.different.txt.xml.snapper - please write me a parser to distinguish the extension :)



'Jeremy Brayton' on Fri, 07 Jul 2006 16:26:34 GMT, sez:

One very huge problem I always run into when I think about this stuff: 260 character filename limit in Windows XP. By naming it ALL of those extra names, you quickly butt up against that, especially if you store stuff on your Desktop for instance (C:\Documents and Settings\User.Domain\Desktop\ is longer than C:\Program Files\ on any given day of the week).

If anything I would shrink the "pseudo-extension" to the necessary elements. It's an Xml file type, but a snapper file format. Who on earth needs to remember that Xml is a part of Sgml which is really just text? My mother isn't going to remember, so why should I?

I touched on the idea using a different approach a while ago here: http://geekswithblogs.net/jbrayton/articles/RobOSFileTypes.aspx. Of course not having the 260 path name limit makes practically all of the problems with either approach go away.



'Matt Titchener' on Fri, 07 Jul 2006 18:50:26 GMT, sez:

(linked here from Larkware - recognised the name)

Now see, there's an idea floating around about file tagging and meta-data that almost perfectly complements this Cascading File type approach: hierarchic file tagging.

I won't explain the details of my hierarchic file-tagging idea (as I'm sure it's not unique) as its kinda beside the point, but suffice to say that it would be possible to implement a Cascading File type approach by its very nature. By having tens of tags associated with a file, some of which are meta-tags themselves (filetype:, author:, etc.) you could have the follwing:

...random list of tags then...
filetype:text
filetype:sgml
filetype:xml
filetype:snapper

The file NAME could stay the same: 'snapper config file' for example, but the information contained therein would result in the behaviour described in your entry. Contextual changes: go to edit, we get UltraEdit, go to view, we get an XML viewer, go to change the config we bring up the TimeSnapper configuration dialogue; with added power of standard tags on top.

As I'm sure you're aware it's technically possible to place this style of working into Windows at this present moment - with some serious dev time - but it's all still pretty high-level. I'd like to see this tagging/CFT approach baked right into an OS - imagine the speed of search and convience of contextual execution!

Anyways, perhaps I'm getting carried away. I'd love someone to try and pick apart these ideas, just so see if it's all a pipe dream... but for now after reading Leon's post, I'm pretty excited about the possibilities.

Thanks Leon.



'Keff' on Fri, 07 Jul 2006 19:24:29 GMT, sez:

Matt: Nice idea, but one trip through usb flashdisk/internet file service/email/anything that doesn't transfer metadata correctly, and you're f**ed.
I agree that world would be much better place if we didn't have to maintain backwards compatibility :).



'lb' on Sat, 08 Jul 2006 01:59:25 GMT, sez:

>Who on earth needs to remember that Xml
>is a part of Sgml

totally agree... i only put sgml in the file hierarchy as a kind of in-joke between me and myself.



'Jonno' on Sat, 08 Jul 2006 02:02:11 GMT, sez:

Leon,

I put together some thoughs on how to go about bootstrapping this idea at http://blog.jamtronix.com/2006/07/cascading_filetype_demultiplex.html



'lb' on Sat, 08 Jul 2006 02:04:17 GMT, sez:

>classes.console.something.something.
>different.txt.xml.snapper - please
>write me a parser to distinguish the >extension :)

no need to be certain which part is the file name. the OS can check that if there are verbs for each part after "classes" -- if there are no verbs then it's okay...

this could be turned into an advantage, where namespaces can be linked to verbs -- for editing/linking to libraries... something powerful not sure what.

in any case the "unknown extensions" would never be hidden, even if you have that awful default setting left on (i.e. "hide extensions for known file types"). so it wouldn't break in the java example.




'Matt Titchener' on Sun, 09 Jul 2006 07:19:06 GMT, sez:

You know, I'm not so sure about this namespace idea. If we assume most computer users don't even KNOW what a namespace is, then we have to assume that this would be somewhat baffling to them, at least if it were used as the file naming convention.
Seeing 'different.txt.xml.snapper' might make SOME sense to you and me, but it's not exactly 'human readable'. I think the way this should be going is to place the namespace/linked-verbs idea 'behind' the name. Thus retaining the power but improving the usability - sounds like a win-win to me.

Keff: See, with hierarchic file tags, I'm not sure we'd need to worry about backwards compatibility. From what I describe retaining the meta-data in a open-standard binary file-header, and having all OTHER OS's just ignore the bits it doesn't like would do the trick. This is already possible. This would mean you could throw files around willy-nilly, but as soon as you hit a HFS (hierarchic file system) BAM order again! :)



'Matt Titchener' on Sun, 09 Jul 2006 07:23:02 GMT, sez:

p.s. As a modification, to my first post, placing in:

filetype:text.sgml.xml.snapper

makes more sense, and retains Leon's idea more accurately.



'Roddy' on Wed, 12 Jul 2006 07:37:49 GMT, sez:

I love the concept of inheritance of filetpyes, but I *hate* the idea of that being explicitly provided in every filename. It would be like having to provide the full inheritance path of an object every time you wanted to use one, like saying

void foo(label :
TObject.TWinControl.TCustomLabel.Tlabel)

instead of plain...

void foo(label : TLabel)

when you register a filetype with windows shell, you should be able to somehow say "inherits_from(basetype)",

snapper.inherits_from(xml)

and then you'd get all the inherited verb functionality (editors, etc).



'Wim Coenen' on Wed, 12 Jul 2006 08:25:15 GMT, sez:

This is just an ugly hack which jams one particular type of metadata (i.e. the filetype) inside another piece of metadata (i.e. the filename).

There is a more general problem here that needs fixing. What is needed is a filesystem with more flexible support for file metadata.

The Reiser4 filesystem (http://www.namesys.com/v4/v4.html) seems to be on to something here. Reiser4 collapses the concepts of 'files', 'directories' and 'attributes' into a single semantic primitive. Each file in a Reiser4 filesystem can be accessed as a directory. This means you can elegantly save arbitrary file metadata as tiny "subfiles".




'Justin Crowell' on Wed, 12 Jul 2006 15:17:09 GMT, sez:

I like this idea but I think it would be better if the order were reversed - starting with the most specific extension and ending with the least.

For example:
MyFile.snapper.xml.txt

This way the most general extension would be recognized by default (text) and the default action would be the most general (text editor) for current apps and smarter apps could recognize the subtypes if they want.

Sure a better solution would be a new filesystem but we have to wait on MS for that...



'gerrard' on Wed, 12 Jul 2006 15:55:00 GMT, sez:

I'm partial to what Microsoft Office (esp. InfoPath) does for xml files. there is a processing instruction inside of the file itself to identify the program which should be used with the file. If you don't have the particular office app installed, it's just xml to you.



'lb' on Fri, 14 Jul 2006 01:30:34 GMT, sez:

to gerrard:

> "processing instruction inside of the file itself to identify the program which should be used with the file"

i've never liked that as it means that
you don't know what you're eating until you've taken a bite.

i.e. windows has to look inside every (xml) file to work out what type it is.

also, while xml is built for this kind of thing, other file formats are not. a c# document for example, or a html file, could be polyglotic, but to determine this by parsing the file is way too process-intensive.

to: Justin Crowell

>I like this idea but I think it would be
>better if the order were reversed -
>starting with the most specific extension
>and ending with the least.

nah this wouldn't work, i don't think. for example, clicking on a 'Fred.resume.xml.text' file would mean that the file opens in notepad by default. No user on earth should be editing xml in notepad, unless they have to. And your default user doesn't want to edit the file at all, they just want to use it (double click on it and launch (in this case) the 'resume viewing' application, for exmaple. So: 'Fred.text.xml.resume' is a better name: and in explorer the user would see the icon that's associated with the 'resume' extension.

cheers
lb



Comments disabled due to spam. Feel free to follow up via email: leonbambrick at gmail dot com.


Articles

The Canine Pyramid The Canine Pyramid
Humans: A Tragedy. Humans: A Tragedy.
ACK! ACK!
OfficeQuest... Gamification for the Office Suite OfficeQuest... Gamification for the Office Suite
New product launch: NimbleSET New product launch: NimbleSET
Programming The Robot from Diary of a Wimpy Kid Programming The Robot from Diary of a Wimpy Kid
Happy new year 2014 Happy new year 2014
Downtime as a service Downtime as a service
The Shape of Your Irrationality The Shape of Your Irrationality
This is why I don't go to nice restaurants any more. This is why I don't go to nice restaurants any more.
A flowchart of what programmers do at work all day A flowchart of what programmers do at work all day
The Telepresent Man. The Telepresent Man.
Interview with an Ex-Microsoftie. Interview with an Ex-Microsoftie.
CRUMBS! Commandline navigation tool for Powershell CRUMBS! Commandline navigation tool for Powershell
Little tool for making Amazon affiliate links Little tool for making Amazon affiliate links
Extracting a Trello board as markdown Extracting a Trello board as markdown
hgs: Manage Lots of Mercurial Projects Simultaneously hgs: Manage Lots of Mercurial Projects Simultaneously
You Must Get It! You Must Get It!
AddDays: A Very Simple Date Calculator AddDays: A Very Simple Date Calculator
Google caught in a lie. Google caught in a lie.
NimbleText 2.0: More Than Twice The Price! NimbleText 2.0: More Than Twice The Price!
A Computer Simulation of Creative Work, or 'How To Get Nothing Done' A Computer Simulation of Creative Work, or 'How To Get Nothing Done'
NimbleText 1.9 -- BoomTown! NimbleText 1.9 -- BoomTown!
Line Endings. Line Endings.
**This** is how you pivot **This** is how you pivot
Art of the command-line helper Art of the command-line helper
Go and read a book. Go and read a book.
Slurp up mega-traffic by writing scalable, timeless search-bait Slurp up mega-traffic by writing scalable, timeless search-bait
Do *NOT* try this Hacking Script at home Do *NOT* try this Hacking Script at home
The 'Should I automate it?' Calculator The 'Should I automate it?' Calculator

Archives Complete secretGeek Archives

TimeSnapper -- Automated Screenshot Journal TimeSnapper: automatic screenshot journal

25 steps for building a Micro-ISV 25 steps for building a Micro-ISV
3 minute guides -- babysteps in new technologies: powershell, JSON, watir, F# 3 Minute Guide Series
Universal Troubleshooting checklist Universal Troubleshooting Checklist
Top 10 SecretGeek articles Top 10 SecretGeek articles
ShinyPower (help with Powershell) ShinyPower
Now at CodePlex

Realtime CSS Editor, in a browser RealTime Online CSS Editor
Gradient Maker -- a tool for making background images that blend from one colour to another. Forget photoshop, this is the bomb. Gradient Maker



[powered by Google] 

How to be depressed How to be depressed
You are not inadequate.



Recommended Reading


the little schemer


The Best Software Writing I
The Business Of Software (Eric Sink)

Recommended blogs

Jeff Atwood
Joseph Cooney
Phil Haack
Scott Hanselman
Julia Lerman
Rhys Parry
Joel Pobar
OJ Reeves
Eric Sink

InfoText - amazing search for SharePoint
LogEnvy - event logs made sexy
Computer, Unlocked. A rapid computer customization resource
Aussie Bushwalking
BrisParks :: best parks for kids in brisbane
PhysioTec, Brisbane Specialist Physiotherapy & Pilates
 
home .: about .: sign up .: sitemap .: secretGeek RSS .: © Leon Bambrick 2006 .: privacy

home .: about .: sign up .: sitemap .: RSS .: © Leon Bambrick 2006 .: privacy