'Reuse' Is Not Usable

March 27, 2007

blog, color, editor, google, html, ideas rat, it industry, javascript, security, sql, sysadmin, tools, UX, working blue

First, here's my attempt at a greenspunism:

Bambrick's 8th Rule of Code Reuse

It's far easier and much less trouble to find and use a bug-ridden, poorly implemented snippet of code written by a 13 year old blogger on the other side of the world, than it is to find and use the equivalent piece of code, written by your team leader on the other side of a cubicle partition.

And I think that 'the copy and paste school of code reuse' is flourishing, and will always flourish, even though it gives very suboptimal results.

Let's look at some reasons why it flourishes, and some reasons why it's not so hot in the long run (feel free to suggest your own).

Copy and Paste from the Internet Is Good Because...

Code stored on blogs, forums, and the WWW in general is very easy to find
You can inspect the code before you use it
Comments at blogs give some small level of feedback that might improve quality
Google Pagerank (for example) sort of means that you're more likely to find code that might be higher quality.
Code that is easy to read is more likely to be copied (and pasted) -- a positive side effect of this is easier maintenance
The programmer's ego may drive him or her to only publish code that he/she believes is of sufficient quality
Other... (any suggestions?)

But Copy and Paste from the Internet Is BAD Because...

If the author improves the code, you're not likely to get those benefits
If you find a bug or improve the code, you're not likely to pass that improvement back to the author
An ability to 'Inspect the code before you use it' can mean very little if the whole reason you're copying the code is because you don't understand how to implement it yourself
Google Pagerank (for example) doesn't directly address the quality of the code, or its fitness for your purpose.
Code shown in blogs (for example) is often 'demo code' and may purposely gloss over important concerns: e.g. no error handling, no care taken to avoid sql injection, no care for encoding etc. (All the sort of things that make an author/blogger say 'Of course this is just for demo purposes and you'd never use this in production')
Security Concerns (I'll just leave it at that; the paranoiacs can make up their own details).
Other... (any suggestions?)

So I started thinking along the wrong lines... I started thinking "What about these internal repositories of code we already have... why don't we try and find some way to make them more shareable... make them as easy to find and use as the code written by a 13 year old hacker in Kazakhstan.

Well, my thinking went down these lines....

Let's bolt tools onto our source code repository that generates, and indexes, beautifully formatted html pages of that code. Employees can browse these pages, at their leisure on your company intranet, without any of the hassle of having to 'get latest' or load an editor or IDE, or any of that other junk.... These little websites would be searchable... oops!

And I stopped right there because my head started thinking in two completely different paths at once.

"intranet search" is broken

On the one hand, "intranet search" is broken. I say it's broken because ranking algorithms, like pagerank, don't work on small datasets such as intranets. So the ranking of the results you find is always junk.

And more often than not, intranet search is broken long before you get to the ranking of the results. Intranet search solutions are usually slow, or incomplete, or over-engineered, or they just plain fail.

So even if there was a tool bolted onto your source code repository that generated a set of html pages etc.... well, 'Bambrick's 8th Rule of Code Reuse' would still hold true. It'd still be quicker to google for the answer than to use your dinky little company intranet.

But on the other hand, maybe intranet search of code doesn't need to be as broken as normal intranet search. Intranet search is fundamentally broken, because the data set is too small and sparsely cross-linked to provide meaningful ranking. And without IComparable, there is no sorting.

But look at the amazing amount of information you can get from a static analysis tool like NDepend. That thing is seriously powerful. Just say one piece of code calls a method in another piece of code. We can say that it has voted for that method. If we analyse an entire repository of solutions, we can start to establish rankings for the code that is in use, according to where the votes are directed. And this ranking data could be used to help determine the relevant order for search results against that code base.

And once you have content that can be ranked accurately, it becomes possible to have fast and effective searches... google style.

But anyway.... let's continue to assume that intranet search is broken... so what's the other approach?

Re-Invent 'Copy and Paste' of Internet Code

What would be nice would be if there was a way for you to receive updates to the code you've grabbed from the internet.

But most of the implementations that spring to mind are no good for this purpose. For example -- I could sign up to receive an email any time the code I've 'borrowed' is changed. This is ugly because (a) I wouldn't sign up, and (b) even if i did sign up, the email would be indistinguishable from spam, (c) even I paid close attention to the update emails, I'd have a hard time tracking down where in my own code i've pasted the updated code, and (d) why would a 13 year old blogger on the other side of the world want to set up an email spamming server, just to share snippets you can copy and paste anyway.

Another approach (that wouldn't work) is for our 13 year old blogger to have an internet-facing source code repository which shares the relavent snippet with you. Rather than copy and paste the kid's code, you'd add a reference to his library, and/or retrieve a copy of his code that way. Updates would be pushed out to you whenever you 'get latest'. But again, why is the kid gonna build this kind of infrastructure.

And by now you've guessed what the best solution is. Our old friend RSS. But RSS for code sharing.

You copy and paste a chunk of code straight off the internet, and with it you paste some kind of unique ID (embedded in a comment) that pinpoints the source of that snippet. Your IDE is smart enough to recognise this guid: You are automatically subscribed to the RSS feed for that piece of code, and optionally you can subscribe to the comments for that snippet as well.

The 13 year old hacker need produce nothing but a piece of XML.

One day your IDE/Code aggregator detects a new entry in the relevant RSS file: this indicates an update to the code you copied. You're given the opportunity to compare the difference between the code you have, and the new code (using the same diff/merge tools you're already familiar with). You can choose to cancel or allow the change. And more: you can view the comments around this change (or contribute to them, or unsubscribe from them).

But this form of RSS wouldn't be limited to code snippets. You could use it for entire open source libraries. (Ah, a new version of Prototype.js... let me see what's changed...) You could use it for closed-source Dll's. (Ah, a new MAPI.dll, let me see how the interface has changed...) The same tagging could be used on code assets at every level of the code hierarchy.

RSS for Open Code Sharing seems like a nice little thought experiment. Inside my head, just now, it's working perfectly.

What do you reckon? Feel free to insult and belittle this idea -- i've just had it and i'm not attached to it yet ;-)

(Incidentally, I'm dismissing any of the 'silo' style solutions available from those all-too-proprietary IDE add-ins that let you share snippets. Unless the solution is 'Open', and will remain open, there's no incentive for anyone to use it.)

Next → ← Previous

My book "Choose Your First Product" is available now.

It gives you 4 easy steps to find and validate a humble product idea.

Learn more.

secretGeek.net