aaron swartz: the early works
I can't stop thinking about, wondering about, caring about, reading about the tragic life of Aaron Swartz. There's a lot I want to write. I think I could fill a book just trying to process what it means, what is an appropriate response, what's it all about. But I'm not going to attempt that.
I've been reading Aaron's blog, on and off, for over ten years. Ten years is a long time. And by my own estimates, those particular 10 years were the longest in history.
Long ago I printed out his HOWTO: Be more productive for multiple re-reads and have returned to it many times since.
I wanted to go back, right back, and try to work out the earliest stuff of his that I read. And I wanted to watch the progression of his ideas as they emerged.
From his blog 'raw thought' -- there's a link to 'Older Posts' which takes you to 'the archive' (grouped by theme).
From there is a link to 'Full Archives' which takes you to the reverse-chronological archives.
These stretch back to May 2005 (the oldest entry on that page is about a server crash after which he had to restart his blogging. Under the so called 'Full archives' section there's no link to anything prior to May 2005.
Now I'm certain he was blogging long before that -- I'm certain I was reading his blog long before that.
Is the stuff before that server crash lost? I hoped not, so I set about locating it.
I clearly remember his powerpoint remix (from 2003!) - it got published in a book of Joel Spolsky's - and I soon tracked that down.
Taking a look at the url suggests a numbered blogging system (from Dave Winer's Radio Userland), and from there it's easy to find all of his prior blog entries.
After a bit of binary searching I found what looks like Aaron's first Hello, world, with article id of '81'.
So I wrote a powershell script to download everything (I hardly think aaronsw would object !!) and found that the articles go from number 81 up to 1691, with a few gaps.
Here's the script.
# Downloads aaron's early stuff # i've done this the hard way because i didn't have time to do it the easy way. $client = new-object System.Net.WebClient $nums = 81..1691 #detected up to 1691 (April 26, 2005) $nums | % { $url = [string]::Format( "http://www.aaronsw.com/weblog/{0:000000}",$_) $path = join-path $(get-location) ([string]::Format("aaronsw_{0:000000}.html",$_)) Write-Host "downloading " $url " to " $path $client.DownloadFile( $url, $path ) #sleep for 4 seconds before grabbing, to give the server time to exhale. Start-Sleep -s 4 }
Then I wrote a script to walk through those files and create an archive page in the same style as Aaron's other archive pages.
It's not pretty code, it got the job done...
dir .\aaronsw_*.html | % { #extract the filenumber out of the name... i should've made this easier. $num = $_.Name.Split("_")[1].Split(".")[0] #calculate the target url for this file $url = [string]::Format("http://www.aaronsw.com/weblog/{0}",$num) #load the file $article = gc $_.Name #grab the title $titleRegex = [regex]'h1>(.*)</h1>' $title = $titleRegex.Match($article).Groups[1].Value #grab the time $timeRegex = [regex]'<p class="posted">posted ([^(]+) \(' $time = $timeRegex.Match($article).Groups[1].Value #output the url, title and time, as html $item = [string]::Format('<p><a href="{0}">{1}</a> ({2})</p>',$url,$title,$time) $item >> archivePreCrash.html }
So the result is this fairly complete list of pre-server crash articles:
Now this takes us up to April 2005. And the post-crash articles start in May 2005, so it probably means that everything's accounted for, except maybe a month's worth of blogging. There are some missing articles within that period, and some lost stuff. I can see that he restored it from the wayback machine where possible, but sometimes there was nothing to grab.
There are a lot of gems in there (and of course a bit of drivel: this starts when he was 15). I was going to pull out a few quotes, but I'd rather let you do that for yourself. He was a thoughtful guy. It'd be great if he was still around.
Next → ← PreviousMy book "Choose Your First Product" is available now.
It gives you 4 easy steps to find and validate a humble product idea.