Got CHUNK?: March 2008

Friday, 7 March 2008

Date ranges from a list of dates

You need date ranges before you can use acts_as_line. Often you will just have a list of booked dates.

I've pluginized my date range method, which takes a collection of dates and finds the ranges: date_ranger


DateRanger
==========

Identify start/end dates in a set.

Example
=======

require 'date_ranger'
require 'date'

dates = %w(2008-04-26 2008-04-27 2008-04-28 2008-04-29 2008-04-30 2008-05-01 2008-05-02 2008-05-24 2008-05-25 2008-05-26 2008-05-27 2008-05-28 2008-05-29 2008-05-30 2008-07-12 2008-07-13 2008-07-14 2008-07-15 2008-07-16 2008-07-17 2008-07-18 2008-07-26 2008-07-27 2008-07-28 2008-07-29 2008-07-30 2008-07-31 2008-08-01 2008-08-09 2008-08-10 2008-08-11 2008-08-12 2008-08-13 2008-08-14 2008-08-15 2008-08-16 2008-08-17 2008-04-21 2008-04-21 2008-08-18 2008-08-19 2008-04-23 2008-08-20 2008-04-22 2008-08-21 2008-08-22 2008-12-27 2008-12-28 2008-12-29 2008-12-30 2008-12-31 2008-03-29 2008-03-30 2008-03-31 2008-04-01 2008-04-02 2008-04-03 2008-04-04 2008-04-05 2008-04-06 2008-04-07 2008-04-08 2008-04-09 2008-04-10 2008-04-11 2008-04-12 2008-04-13 2008-04-14 2008-04-15 2008-04-16 2008-04-20 2008-04-17 2008-04-18 2008-05-24 2008-05-25 2008-05-26 2008-05-27 2008-05-28 2008-05-29 2008-05-30 2008-05-31 2008-06-01 2008-06-02 2008-06-03 2008-06-04 2008-06-05 2008-06-06 2008-07-05 2008-07-06 2008-07-07 2008-07-08 2008-07-09 2008-07-10 2008-07-11 2008-07-12 2008-07-13 2008-07-14 2008-07-15 2008-07-16 2008-07-17 2008-07-18 2008-07-19)

bookings = DateRanger.new(dates)
p bookings.ranges

>> [{"end"=>"2008-04-18", "start"=>"2008-03-29"}, {"end"=>"2008-04-23", "start"=>"2008-04-20"}, {"end"=>"2008-05-02", "start"=>"2008-04-26"}, {"end"=>"2008-06-06", "start"=>"2008-05-24"}, {"end"=>"2008-07-19", "start"=>"2008-07-05"}, {"end"=>"2008-08-01", "start"=>"2008-07-26"}, {"end"=>"2008-08-22", "start"=>"2008-08-09"}, {"end"=>"2008-12-31", "start"=>"2008-12-27"}]

csv = bookings.to_csv('|')
bookings.csv.each {|row|
p row
}

gives:

"2008-03-29|2008-04-18"
"2008-04-20|2008-04-23"
"2008-04-26|2008-05-02"
"2008-05-24|2008-06-06"
"2008-07-05|2008-07-19"
"2008-07-26|2008-08-01"
"2008-08-09|2008-08-22"
"2008-12-27|2008-12-31"

UPDATE:

gem install dateranger

api has changed slightly in the gem release:

DateRanger::Range.new(dates_array)

Plugin is best at this stage, easier to hack.

Crisscrossing date range queries in Rails

If you are building an application that takes reservations or searches for availability, you're likely to want to query against booked date ranges to find if the candidate date range is available, or if it conflicts with an existing booking.

There are various ways to find overlapping date ranges with SQL, using the SQL OVERLAPS operator, or a custom query, as described (in detail) by depesz on http://www.depesz.com: http://www.depesz.com/index.php/2007/11/21/find-overlapping-time-ranges/

In the past I might have written a method like this in my model to find available units (more often than not combined with other search criteria):

 @rooms = Room.find_by_sql ["
     SELECT * FROM rooms WHERE rooms.id NOT IN
     (SELECT DISTINCT ON (rooms.id) rooms.id FROM bookings
     INNER JOIN rooms ON rooms.id=bookings.rooms_id
     WHERE ((start_date, end_date) OVERLAPS (DATE '#{start_date[0]}',
     INTERVAL '#{interval} days') = true;"]

Enter acts_as_line, a simple plugin that unifies a date range into a single geometry and looks for other ranges (now lines) that spatially intersect with it.

I am putting this together for querying larger datasets with several million date ranges, where OVERLAPS may feel the squeeze. The utility of using this method for small datasets is questionable, but it is trivial to extend a Postgres database with PostGIS, and I think I'll find the plugin more convenient than writing OVERLAPS queries. It certainly shouldn't be slower. I've yet to benchmark it against OVERLAPS on large, indexed tables, but I expect it to be quicker.

In its current form, the plugin ssumes the model has existing date fields, start_date and end_date, of type date or datetime, and a geometry column named 'geom'. There is a migration generator to add the dates geometry to an existing table (providing the postgres database is spatially enabled). I will be adding options to specify alternate field names via the model soon, and will probably rename the geometry field to date_geom to minimise the chances of conflict with existing schemas under the defaults (although I don't anticipate anyone storing other, geographic, geometries on the same table as the one storing date ranges).

Adding acts_as_line to a model,


AgencyBooking < ActiveRecord::Base

  acts_as_line

end

makes the following methods available:


object = AgencyBooking.find(:first)

results = AgencyBooking.touching(object) | results = Foo.intersects(object,true)
results = AgencyBooking.asunder(object) | results = Foo.intersects(object,false)

results = AgencyBooking.touching(object,{:id => 123})
results = AgencyBooking.touching(object,{:id => '>123'})
results = AgencyBooking.touching(object,{:id => '<>123'}) etc
results = AgencyBooking.touching(object,{:id => 123, :title => 'bar'})

Some usage examples



create a new record:

>> booking = AgencyBooking.new(:start_date => Date.today, :end_date => Date.today+6.days)
=> #<AgencyBooking id: nil, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-07", end_date: "2008-03-13", created_at: nil, updated_at: nil, geom: nil>
>> booking.save
=> true
>> booking
=> #<AgencyBooking id: 12746, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-07", end_date: "2008-03-13", created_at: "2008-03-07 10:05:45", updated_at: "2008-03-07 10:05:45", geom: "0102000000020000000000000000000000000000803DF4D1410...">



find overlapping records:

>> AgencyBooking.touching(booking)
=> [#<AgencyBooking id: 12740, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-01 00:00:00", end_date: "2008-03-11 00:00:00", created_at: "2008-03-06 16:00:00", updated_at: "2008-03-06 16:00:00", geom: "01020000000200000000000000000000000000004043F2D1410...">, #<AgencyBooking id: 12746, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-07 00:00:00", end_date: "2008-03-13 00:00:00", created_at: "2008-03-07 10:05:45", updated_at: "2008-03-07 10:05:45", geom:
"0102000000020000000000000000000000000000803DF4D1410...">]

find records that do not overlap:

>> AgencyBooking.asunder(booking)
=> [#<AgencyBooking id: 12741, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-01 00:00:00", end_date: "2008-03-02 00:00:00", created_at: "2008-03-06 16:00:02", updated_at: "2008-03-06 16:00:02", geom: "01020000000200000000000000000000000000004043F2D1410...">, #<AgencyBooking id: 12742, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-01 00:00:00", end_date: "2008-03-01 00:00:00", created_at: "2008-03-06 16:00:04", updated_at: "2008-03-06 16:00:04", geom: "01020000000200000000000000000000000000004043F2D1410...">, #<AgencyBooking id: 12743, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-21 00:00:00", end_date: "2008-03-23 00:00:00", created_at: "2008-03-06 16:01:41", updated_at: "2008-03-06 16:01:41", geom: "01020000000200000000000000000000000000003CD7F8D1410...">, #<AgencyBooking id: 12744, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-21 00:00:00", end_date: "2008-03-21 00:00:00", created_at: "2008-03-06 16:02:28", updated_at: "2008-03-06 16:02:28", geom: "01020000000200000000000000000000000000003CD7F8D1410...">, #<AgencyBooking id: 12745, agency_id: nil, agency_unit_id: nil, start_date: "2008-03-01 00:00:00", end_date: "2008-03-06 00:00:00", created_at: "2008-03-06 18:00:47", updated_at: "2008-03-06 18:00:47", geom: "01020000000200000000000000000000000000004043F2D1410...">]

You can grab the initial release from github:

acts_as_line

I still have much to add and refine.

Sunday, 2 March 2008

DRYing up CMS with Google

As a follow-on to Picasync, I've started work on a library to interface with Google Documents, so that google can be used to manage a website's text and articles in addition to its image galleries.

Google Documents provides users with a familiar, intuitive word processing UI, revision history and drafts, and document folders... at the expense of lots of hpricot cannon-fodder.

The process:

Find document on google, and :fetch its content
Strip script and form tags and tag attributes (including inline styles)
convert to textile (c/o James Stewart's html2textile ruby script)
escape left-over html
convert back to [spartan? hopefully] html from textile via RedCloth (with some house-keeping regex that needs really expanding upon)

i.e.

doc = Gdocsync::Document.find_by_title("Foo", :fetch).clothe

or, skip Redcloth (but still escape tatterdemalion tags after textile conversion):

doc = Gdocsync::Document.find_by_title("Foo", :fetch).textile

With safe_html, the user gets pretty much what they see on google (however that document might be constructed), but you also get stuck with inline styles and tag soup -- only script tags (which Google Docs doesn't appear to allow, anyway), and form tags, are stripped:

doc = Gdocsync::Document.find_by_title("Foo", :fetch).safe_html

You also have recourse to the raw html body with no modifications beyond google's own processing (the 'raw' document is stored in the object and used as the starting point for the previous methods):

doc = Gdocsync::Document.find_by_title("Foo", :fetch).raw

This can all be easily tied together with database tables via a rake task. The following snippet loops through Properties, looks up the document on google and updates the database record. This example matches document title with object title field, which is not very smart -- in the next few days I will be making use of google document directories and tightening things up.

namespace :google do

 task :docs => :environment do
   require '../gdocsync/lib/gdocsync'

   Property.find(:all).each do |property|
     doc = Gdocsync::Document.find_by_title(property.title, :fetch)
     Property.update(property.id, :description => doc.clothe)
   end

 end

end

$ rake google:docs

Gdocsync git repo (early days).

Saturday, 1 March 2008

Ruby Interface for Google Picasa API

A nascent ruby module for interfacing with Picasa and mirroring user albums locally (initially conceived so I could programmatically delete the multitude of galleries I'd spawned while hacking around trying to get stuff to work).


Picasync::Album.find(:all).each {|album|
 album.delete!
}

%w(one two three).each do |title|
 album = Picasync::Album.new(title)
 album.create!
end

Easy. No ability to upload images via the api yet, although its purpose is to farm-out uploading and cms tasks to Picasa's UI anyhow, so that Picasa can essentially power a site's galleries (but without hotlinking or hitting their feed on every page load).

Albums are synced locally via Picasync::Sync::All.new & Picasync::Sync::CSV.new, which fetches files to a single directory, hashing the file names and generating a couple of csvs for image sizes, captions and parent albums. Still to add table migrations and automatic csv imports.

It's also simple to fetch files arbitrarily, be it a whole album, or a particular image in a set (although things aren't properly tied together with csv generation yet):


Picasync::Image.mirror(:album, album.id)
Picasync::Image.mirror(image.id, album.id)

Uses google's ClientLogin authentication scheme.

It's a work in progress and definitely not a drop-in solution for end users, but you can grab (and contribute to) the code on Github.

See the Readme for the methods I've got around to adding.


albums = Picasync::Album.find(:all, :images)
albums.each do |album|
puts album.title
album.images.each {|image|
  puts img.medium
  puts img.caption
}
end

Got CHUNK?