Found a Commodore 64 on the side of the road

So I found a Commodore 64 on the side of the road this morning.

In Brisbane we have something called a 'roadside collection' which is a week where you get all the junk out of your garage and pile it up on the street outside your house. The city council come along and collect it, but not before hordes of scavengers pick through it and take just about everything, no matter how worthless.

I was walking to the bus this morning, just as one of my neighbours was putting out their stuff for roadside collection. Included in their dusty heap of crap was an original commodore 64! I picked it up and looked at it for a minute, then put it down and kept walking.

I looked back in time to see a blue utility truck pull up and two young guys start dismantling the heap and taking things, including the C-64.

For a moment I was dissappointed that I hadn't grabbed it and stuffed it in my bag. It might be cool to be able to pull out a commodore 64 whenever I needed an extra-core for a particularly difficult computation.

But then I figure, no, I'm happy not to be the guy with the cubicle full of old computers. I'm sure there are ample computer museums that will help the C-64 live on into posterity, so that old fogies can reminisce about the first program they ever wrote: 10 Goto 10. ;-)

 

Number 1 Sign Your Software Project Is Doomed

blog, UX,

I noticed these three similar headings at Dzone (and elsewhere)

  1. Top Ten Signs Your Software Project is Doomed
  2. 7 Signs Your Project Will Never Make it to Production
  3. Warning Signs Your Web Application Project May Fail

And I thought I should write down the:

Number 1 Sign Your Software Project Is Doomed:

  1. You immediately clicked on and read all three of those articles because you had a strong gut feeling you should.

Sometimes, whatever you think you are -- that's what you are.

 

'Reuse' Is Not Usable

First, here's my attempt at a greenspunism:

Bambrick's 8th Rule of Code Reuse

It's far easier and much less trouble to find and use a bug-ridden, poorly implemented snippet of code written by a 13 year old blogger on the other side of the world, than it is to find and use the equivalent piece of code, written by your team leader on the other side of a cubicle partition.

And I think that 'the copy and paste school of code reuse' is flourishing, and will always flourish, even though it gives very suboptimal results.

Let's look at some reasons why it flourishes, and some reasons why it's not so hot in the long run (feel free to suggest your own).

Copy and Paste from the Internet Is Good Because...

  • Code stored on blogs, forums, and the WWW in general is very easy to find
  • You can inspect the code before you use it
  • Comments at blogs give some small level of feedback that might improve quality
  • Google Pagerank (for example) sort of means that you're more likely to find code that might be higher quality.
  • Code that is easy to read is more likely to be copied (and pasted) -- a positive side effect of this is easier maintenance
  • The programmer's ego may drive him or her to only publish code that he/she believes is of sufficient quality
  • Other... (any suggestions?)

But Copy and Paste from the Internet Is BAD Because...

  • If the author improves the code, you're not likely to get those benefits
  • If you find a bug or improve the code, you're not likely to pass that improvement back to the author
  • An ability to 'Inspect the code before you use it' can mean very little if the whole reason you're copying the code is because you don't understand how to implement it yourself
  • Google Pagerank (for example) doesn't directly address the quality of the code, or its fitness for your purpose.
  • Code shown in blogs (for example) is often 'demo code' and may purposely gloss over important concerns: e.g. no error handling, no care taken to avoid sql injection, no care for encoding etc. (All the sort of things that make an author/blogger say 'Of course this is just for demo purposes and you'd never use this in production')
  • Security Concerns (I'll just leave it at that; the paranoiacs can make up their own details).
  • Other... (any suggestions?)

So I started thinking along the wrong lines... I started thinking "What about these internal repositories of code we already have... why don't we try and find some way to make them more shareable... make them as easy to find and use as the code written by a 13 year old hacker in Kazakhstan.

Well, my thinking went down these lines....

Let's bolt tools onto our source code repository that generates, and indexes, beautifully formatted html pages of that code. Employees can browse these pages, at their leisure on your company intranet, without any of the hassle of having to 'get latest' or load an editor or IDE, or any of that other junk.... These little websites would be searchable... oops!

And I stopped right there because my head started thinking in two completely different paths at once.

"intranet search" is broken

On the one hand, "intranet search" is broken. I say it's broken because ranking algorithms, like pagerank, don't work on small datasets such as intranets. So the ranking of the results you find is always junk.

And more often than not, intranet search is broken long before you get to the ranking of the results. Intranet search solutions are usually slow, or incomplete, or over-engineered, or they just plain fail.

So even if there was a tool bolted onto your source code repository that generated a set of html pages etc.... well, 'Bambrick's 8th Rule of Code Reuse' would still hold true. It'd still be quicker to google for the answer than to use your dinky little company intranet.

But on the other hand, maybe intranet search of code doesn't need to be as broken as normal intranet search. Intranet search is fundamentally broken, because the data set is too small and sparsely cross-linked to provide meaningful ranking. And without IComparable, there is no sorting.

But look at the amazing amount of information you can get from a static analysis tool like NDepend. That thing is seriously powerful. Just say one piece of code calls a method in another piece of code. We can say that it has voted for that method. If we analyse an entire repository of solutions, we can start to establish rankings for the code that is in use, according to where the votes are directed. And this ranking data could be used to help determine the relevant order for search results against that code base.

And once you have content that can be ranked accurately, it becomes possible to have fast and effective searches... google style.

But anyway.... let's continue to assume that intranet search is broken... so what's the other approach?

Re-Invent 'Copy and Paste' of Internet Code

What would be nice would be if there was a way for you to receive updates to the code you've grabbed from the internet.

But most of the implementations that spring to mind are no good for this purpose. For example -- I could sign up to receive an email any time the code I've 'borrowed' is changed. This is ugly because (a) I wouldn't sign up, and (b) even if i did sign up, the email would be indistinguishable from spam, (c) even I paid close attention to the update emails, I'd have a hard time tracking down where in my own code i've pasted the updated code, and (d) why would a 13 year old blogger on the other side of the world want to set up an email spamming server, just to share snippets you can copy and paste anyway.

Another approach (that wouldn't work) is for our 13 year old blogger to have an internet-facing source code repository which shares the relavent snippet with you. Rather than copy and paste the kid's code, you'd add a reference to his library, and/or retrieve a copy of his code that way. Updates would be pushed out to you whenever you 'get latest'. But again, why is the kid gonna build this kind of infrastructure.

And by now you've guessed what the best solution is. Our old friend RSS. But RSS for code sharing.

You copy and paste a chunk of code straight off the internet, and with it you paste some kind of unique ID (embedded in a comment) that pinpoints the source of that snippet. Your IDE is smart enough to recognise this guid: You are automatically subscribed to the RSS feed for that piece of code, and optionally you can subscribe to the comments for that snippet as well.

The 13 year old hacker need produce nothing but a piece of XML.

One day your IDE/Code aggregator detects a new entry in the relevant RSS file: this indicates an update to the code you copied. You're given the opportunity to compare the difference between the code you have, and the new code (using the same diff/merge tools you're already familiar with). You can choose to cancel or allow the change. And more: you can view the comments around this change (or contribute to them, or unsubscribe from them).

But this form of RSS wouldn't be limited to code snippets. You could use it for entire open source libraries. (Ah, a new version of Prototype.js... let me see what's changed...) You could use it for closed-source Dll's. (Ah, a new MAPI.dll, let me see how the interface has changed...) The same tagging could be used on code assets at every level of the code hierarchy.

RSS for Open Code Sharing seems like a nice little thought experiment. Inside my head, just now, it's working perfectly.

What do you reckon? Feel free to insult and belittle this idea -- i've just had it and i'm not attached to it yet ;-)

(Incidentally, I'm dismissing any of the 'silo' style solutions available from those all-too-proprietary IDE add-ins that let you share snippets. Unless the solution is 'Open', and will remain open, there's no incentive for anyone to use it.)

 

Damn Lambda

The increasingly 'functional' nature of C# still confuses me at times.

An
anonymous function with
anonymous parameters set an
anonymous object to an
anonymous type with an
anonymous value.

Cancel or Allow?

~~~~anon.
 

Powershell on Rails -- MonadRail!

Okay -- i've been going on about powershell a lot lately. And you want to hear about other things. I can accept that. Even the people on the bus seem to pull faces when i start spontaneously talking about powershell.

(An old lady on the bus this morning, for example, just didn't get it. --A universal parser! I said, but did she even smile? Not a smirk.)

Well, the reason for the powershell obsession is that I'm currently reading "Powershell in Action" by Bruce Payette which is a cracker! It goes deep into the language, as deep as can be.

And I've been thinking about using Powershell for writing websites. Rails-style, no less.

Conveniently, one of the examples Bruce provides is a webserver implemented using powershell (download source code from the Manning website and see 'Invoke-Webserver.ps1' from chapter 11)(or for a similar example, see Vivek Sharma or this very dangerous example from soapy frog )

What got me thinking about a rail-style powershell server were Two things. One: in response to the hanselminutes on monoRail, and two because Mike Schinkel mentioned the concept of a powershell web programming in his hunt for a new language.

Mike S:

"[Jeffrey Snover said] that PowerShell can do web, and will be able to do it more easily in the future. [Mike feels that] webified PowerShell should be a url-based object-selector-and-invoker like Django or Rudy on Rails."

Which is a nice idea. Seemingly, the major limitation with using Powershell as a webserver, at the moment, is our old friend threading. A powershell webserver is just a polling loop that handles all requests synchronously. Hence, every client has to wait in line to have their request processed. Not good. Ideally it would instead receive request-events and handle each of them on new threads. This (multi-threading) is promised as a future feature.

Anyway -- looking at Bruce's example of a webserver -- it would be pretty easy to see how it could be given rails-like behaviour.

Note that I do *NOT* have the time to be thinking about this... so i just wanted to sketch the basic idea here and see if anyone is interested in taking over.

Basically Bruce's script waits for get requests and then looks for simple mathematical expressions, such a "2+2" -- which is evaluates and sends back to the browser. This is the relevant bit of code:

        $received = @($received -match "GET")[0]
        if ($received)  
        {
            $expression = $received -replace "GET */" -replace
                'HTTP.*$' -replace '%20',' '
            if ($expression -match '[0-9.]+ *[-+*/%] *[0-9.]+') 
            { 								

Well, instead of expecting a mathematical expression like that, we could look for a path that has:

Zero or more 'areas', followed by exactly one 'controller name', followed by exacly one 'action', optionally followed by one or more 'parameters'.

In honour of John Backus (r.i.p.), i'd love to express this in EBNF... but alas, the brain forgets.

Anyway, i'll just write it in DWIM: {/area{/area}}/controller/action{?paramname{=value}{+paramname{=value}}}

So once we've tokenized the path into areas, controller, action, params...

...we check that the relevant 'controller' exists in the specified 'area'.

Where a 'controller' is just a cmdLet or maybe a script, with specific parameters (action, params).

And an 'area' is just a sub-folder. Nothing more, nothing less.

We pass the action and the params to the relevant controller, and then.... either the controller takes over from there, or maybe it just returns some html for us to return to the client.

Go for it. You have one hour.

Oh, and the name I've got for this one is 'MonadRail' as a play on 'monorail' and 'monad' (the original code name for Powershell)

 

Remove Duplicate Rows From A Text File Using Powershell

If your text file is already sorted... then removing duplicates is very easy.

PS:\> gc $filename | get-unique > $newfileName

(But remember, the Get-Unique command only works on sorted data!)

If the file's content is not sorted, and the final order of the lines is unimportant, then it's also easy....

Sort it -- and then use Get-Unique

gc $filename | sort | get-unique > $newfileName

(You now end up with a file that is sorted, and where every line is unique)

However... the case that one bumps into is always the tricky case...

If the file data is not 'sorted', but the order *is* important... then it's a little trickier. I've got an approach... let's turn it into a solution.

Remove Duplicate Rows From A Text File Using Powershell... unsorted file, where order is important

I'm going to add each line to a hash table.

But before adding it -- i'll check if the line is already in the hash table.

If it's not there yet -- then I'll send that line into the new file. Here's an example:

PS H:\> $hash = @{}      # define a new empty hash table
PS H:\> gc c:\rawlist.txt | 
>> %{if($hash.$_ -eq $null) { $_ }; $hash.$_ = 1} > 
>> c:\newlist.txt

I test it out... given this input:

Apple
Dog
Dog
Carrot
Banana
Fun
Dog
Apple
Egg
Carrot
Egg

I get this output...

Apple
Dog
Carrot
Banana
Fun
Egg

Okay... i thought that was going to be really hard. Huh.

One more thing to do... see if I can comment this a little better...

PS H:\> $hash = @{}                 # Define an empty hashtable
PS H:\>  gc c:\rawlist.txt |        # Send the content of the file into the pipeline...
>>  % {                             # For each object in the pipeline...
>>                                      # note '%' is an alias of 'foreach-object'          
>>     if ($hash.$_ -eq $null) {    # if that line is not a key in our hashtable...
>>                                      # note -eq means 'equals'
>>                                      # note $_ means 'the data we got from the pipe'
>>                                      # note $null means NULL
>>         $_                       # ... send that line further along the pipe
>>     };
>>     $hash.$_ = 1                 # Add that line to the hash (so we won't send it again)
>>                                      # note that the value isn't important here,
>>                                      # only the key. ;-)
>>  } > c:\newlist.txt              # finally... redirect the pipe into a new file.
>>

By the way, my tools NimbleText and NimbleSET make removing duplicates from a list (or a file) even easier.

 

Is there a general solution to string templating?

String Templating seems to be a problem that gets solved over and over again. But is there a general problem underneath at all? And if so, can a general solution be designed, implemented everywhere and used with confidence? Read on for rampant speculation.

In various systems I've worked on, we let users set 'templates' for things such as emails or SMS.

Typically, we define a few 'special strings' that end users can embed in these 'templates', a bit like merge codes in a word document.

For example:

Dear %UserName%

Thank you for your complaint letter, dated %ComplaintDate%.

We take all complaints seriously and will address this issue in the next release of our product, version %Nextver%, due out on %NextVerDate%.

[etc.]

Okay -- that's the simplest case. I've seen the same sort of thing in many many places. Reporting services, as one example.

Next step up from that, you have something like CodeSmith where more complex substitutions can be employed:

#region <%= SourceTable.Name %> Class

..and you can define properties to be used, like so:

<%@ Property Name="SourceTable" Type="SchemaExplorer.TableSchema" Category="Context" Description="A Table." %>

And you can even define scripts (which I won't go into here...)

And here's an even simpler example of a powerful templating solution:

String fullName = String.Format("{0} {1}", firstName, lastName);

And in the last few days I've been looking at the in-built string expansion in Powershell. For example:

PS H:\> $name = 'fred'
PS H:\> "Hello $name, can you count to $(2+2) ?"
Hello fred, can you count to 4 ?
PS H:\>

(Not to mention Double-Quoted Here-Strings! Woah Baby!)

Now these are all different examples of what i'm calling 'templating solutions'

The thing is though, that everywhere I look there seem to be more and more custom solutions to this problem. Some of them are slightly standardised, most of them are very home-baked.

Each of them requires a parser, a mini language of 'special strings'... some input data that gets injected it... and then either a little or whole lot more complexity on top of that.

So each time a parser is written for this, it has to ignore or re-solve the character escaping problems. And then users want to specify numeric formatting. And string functions. And simple arithmetic. And date arithmetic. And more. And anywhere from a little to a lot of familiar code has to be written or rewritten (more code is bad, right).

All of which makes me wonder: "is there a general templating problem?"

We can say that there's a general 'data storage and retrieval' problem -- and as such we have Relational Databases and the famous 'Domain-Specific-Language' SQL

And we can say that there's a general 'transmission of structured data' problem -- and for this we have another Widely Implemented mini-Language XML

And we can say that there's a general 'match and or replace all sorts of patterns in text' problem -- and for this we have another Widely Implemented (and Mis-Implemented) 'Domain-Specific-Language' Regular Expressions

But what about string-templating? Is it a definable problem? Can it be specified once, implemented on any platform, and re-used with confidence?

Eh?

Incidentally, I may as well list some other string-templating related solutions I've stumbled onto in the time I've been thinking about this...

  • XSLT (but because the templates are never human-writable, I ignore this as a general solution in itself. Could underpin a good solution.)
  • StringTemplate -- implemented in Java, Python and C#. Might be promising...
  • A Powershell Templating Engine -- not sure about this one.
  • asp (classic) and php are basically string writing engines... with specific support http & html
  • Haml
  • wscg world's simplest code generator uses arrays of data and special strings like '$1'. But I like it.
 

Remove empty lines from a file using Powershell.

I needed to remove the blank lines from a file.

Normally when I need to do this, I use a regular expression in TextPad. (replace "\r\n\r\n" with "\r\n"... and iterate) -- but TextPad wasn't available on this machine, and I couldn't connect to the internet to grab it.

So I fired up PowerShell and messed around with the syntax until it worked.

gc c:\FileWithEmptyLines.txt | where {$_ -ne ""} > c:\FileWithNoEmptyLines.txt

I don't know if that was the prettiest way to do it -- but I got the result I needed ;-)

The nicest thing was that I didn't need to look anything up -- I just tried variations until I got the result I wanted. I didn't remember the syntax of the 'where' statement or the 'not equal to' operator -- I guessed and got them right within one or two guesses. Nice language design, Bruce!

(I didn't even remember the command 'gc' -- but since i wanted the powershell equivalent of the 'type' command, so i entered 'alias type' and found 'Get-Content' is the powershell equivalent of 'Type' which i guessed was also known as just 'gc')