Sunday, 2 March 2008

DRYing up CMS with Google

As a follow-on to Picasync, I've started work on a library to interface with Google Documents, so that google can be used to manage a website's text and articles in addition to its image galleries.

Google Documents provides users with a familiar, intuitive word processing UI, revision history and drafts, and document folders... at the expense of lots of hpricot cannon-fodder.

The process:
  • Find document on google, and :fetch its content
  • Strip script and form tags and tag attributes (including inline styles)
  • convert to textile (c/o James Stewart's html2textile ruby script)
  • escape left-over html
  • convert back to [spartan? hopefully] html from textile via RedCloth (with some house-keeping regex that needs really expanding upon)
i.e.
doc = Gdocsync::Document.find_by_title("Foo", :fetch).clothe
or, skip Redcloth (but still escape tatterdemalion tags after textile conversion):
doc = Gdocsync::Document.find_by_title("Foo", :fetch).textile
With safe_html, the user gets pretty much what they see on google (however that document might be constructed), but you also get stuck with inline styles and tag soup -- only script tags (which Google Docs doesn't appear to allow, anyway), and form tags, are stripped:
doc = Gdocsync::Document.find_by_title("Foo", :fetch).safe_html
You also have recourse to the raw html body with no modifications beyond google's own processing (the 'raw' document is stored in the object and used as the starting point for the previous methods):
doc = Gdocsync::Document.find_by_title("Foo", :fetch).raw
This can all be easily tied together with database tables via a rake task. The following snippet loops through Properties, looks up the document on google and updates the database record. This example matches document title with object title field, which is not very smart -- in the next few days I will be making use of google document directories and tightening things up.
namespace :google do

task :docs => :environment do
require '../gdocsync/lib/gdocsync'

Property.find(:all).each do |property|
doc = Gdocsync::Document.find_by_title(property.title, :fetch)
Property.update(property.id, :description => doc.clothe)
end

end

end

$ rake google:docs


Gdocsync git repo (early days).

No comments: