Pure Danger Tech


navigation
home

Ruby script to generate weekly Twitter links

30 Mar 2009

I thought it would be fun to write a Ruby script to generate the HTML table I used in the previous post and it turned out to be a nice little exercise. I admit way up front here that I am still getting the feel for Ruby so it’s entirely likely that my Ruby skillz are non-idiomatic or down-right bogus. If so, please let me know so I can learn from you gurus…

First, I wanted to hit the Twitter Search API to grab all of my tweets with links in the last week. You can pretty quickly figure this out by using the Twitter advanced search screen and looking at the resulting search, something like: “from:puredanger filter:links since:2009-03-23” should do the trick.

I decided to write the script in Ruby as I wanted something relatively short and I assumed that there were existing Ruby libs out there. Sure enough, I quickly found the twitter-search gem and it worked flawlessly. You just need to grab the gem as mentioned on the main page.

To actually run a search, you need some code like:

require 'rubygems'
require 'twitter_search'

agent = "recent-links"
@client = TwitterSearch::Client.new agent
@tweets = @client.query "from:puredanger filter:links since:2009-03-23"
@tweets.each do |tweet|
  # work with properties of the tweet like:
  #  tweet.id
  #  tweet.created_at
  #  tweet.text
  #  tweet.from_user
  #  tweet.from_user_id 
  #  tweet.to_user
  #  tweet.to_user_id
  #  ...and some others you can look up if you want
end

Regexen

As we all know, knowing regex gets you all the chicks. In this case I went with something super quick and dirty: /(http:\S+)/. Feel free to make it as complicated and perfect as you like. I decided that I wanted to remove the url from the tweet and make the whole tweet into a link itself in the output.

That yielded this ugly chunk (could be shorter but was easier to debug this way):

text = tweet.text
    link_regex = /(http:\S+)/
    text_no_links = text.gsub(link_regex, '')
    links = text.scan(link_regex)[0]
    link = links[0]

Unraveling short urls

I noticed then that I had a bunch of crappy shortened urls and thought it would be nice to resolve them into the more understandable (in blog context with hovers) long urls. There is a very nice service that can help with this called LongURLPlease.com. I’ve been using their Firefox plugin for a while and find it very helpful (not just in Twitter Web, but also in forums, web mail, etc).

They also happen to have an API (cha-ching). So it was simple enough to write my own Ruby interface to their API which will resolve a short url into a long one (and if it’s not a short url, give you back the original). Here is longurl.rb:

require 'rubygems'
require 'net/http'
require 'json'
require 'cgi'

module LongURLPlease

  class LongUrlClient
    LONG_URL_API_URL = 'http://www.longurlplease.com/api/v1.1'
    LONG_URL_TIMEOUT = 5

    def headers
      { "Content-Type" => 'application/json' }
    end

    # the api can handle multiple url queries and response, but this does just 1
    def query(search)
      url = URI.parse(LONG_URL_API_URL)
      url.query = "q=#{CGI.escape(search)}"
 
      # req = Net::HTTP::Get.new(url.path)
      http = Net::HTTP.new(url.host, url.port)
      http.read_timeout = LONG_URL_TIMEOUT

      json = http.get(url.to_s, headers).body
      urls = JSON.parse(json)
      long_url = urls.values[0]
      if long_url.nil? then search else long_url end
    end 
  end
end

I hereby release this (and all code in this blog entry) as public domain – do anything you want with it. Copy, re-release, re-distribute, make it into a gem, publish it as your own, etc. It comes with no warranty of any kind, you accept all risk, and no attribution is required. Enjoy.

You can use this API by doing something like:

require 'longurl'

@longurl_client = LongURLPlease::LongUrlClient.new
short_link = # something
long_link = @longurl_client.query short_link

Paging

The Twitter search api won’t necessarily hand you all responses in the world – the API exposes a paging parameter to walk through multiple pages of results. To do this with the twitter-search API, you need to pass a hash of url parameters instead of a string (for just the query):

params = { 'q' => "from:puredanger filter:links since:2009-03-23",
           'page' => page.to_s }
  @tweets = @client.query params

Clean up

And finally, I did a round of clean up to pull out hard-coded twitter IDs, make the # of days history and # of max links into command line arguments, etc. I also added some simple code at the end to dump the results oldest to newest (you get them newest first) in an HTML table, easy for me to copy/paste into my blog.

The finished script is this:

require 'rubygems'
require 'twitter_search'
require 'longurl'

# input parameters
user = 'NO_USER_SET'  # required
max_days = 7          # optional
max_links = 100       # optional

if ARGV.empty? 
  puts 'ruby mylinks.rb <user> [<max days> [<max links>]]'
  exit 1
else
  user = ARGV[0]
  max_days = ARGV[1].to_i if ARGV.size > 1
  max_links = ARGV[2].to_i if ARGV.size > 2
end

agent = "recent-links-#{user}"
@client = TwitterSearch::Client.new agent

@longurl_client = LongURLPlease::LongUrlClient.new
page = 1
@rows = []

while @rows.size < max_links
  params = { 'q' => "from:#{user} filter:links since:#{Date.today - max_days}",
           'page' => page.to_s }
  @tweets = @client.query params
  if @tweets.size == 0
    break
  end

  @tweets.each do |tweet|
    text = tweet.text
    link_regex = /(http:\S+)/
    text_no_links = text.gsub(link_regex, '')
    links = text.scan(link_regex)[0]
    link = links[0]
    long_link = @longurl_client.query link
    @rows << "<tr><td>#{tweet.created_at[5,6]}</td><td><a href=\"#{long_link}\">#{text_no_links}</a></td></tr>"
  end
  
  page = page+1
end

# print table
puts 
puts '<table border="1">'
puts '<tr><th>Posted</th><th>Tweet</th></tr>'
@rows[0..(max_links-1)].reverse_each { |row| puts row  }
puts '</table>'

And I ran it with: ruby mylinks.rb puredanger 7

Hope you enjoyed this diversion as much as me….feel free to hack it into whatever you need. Please post comments about improvements too if you have some. Seems like it would be pretty trivial to do a Javascript version of this that could actually run completely in the browser, but I am far worse at Javascript than I am at Ruby. :)