Got CHUNK?: DRYing up CMS with Google

As a follow-on to Picasync, I've started work on a library to interface with Google Documents, so that google can be used to manage a website's text and articles in addition to its image galleries.

Google Documents provides users with a familiar, intuitive word processing UI, revision history and drafts, and document folders... at the expense of lots of hpricot cannon-fodder.

The process:

Find document on google, and :fetch its content
Strip script and form tags and tag attributes (including inline styles)
convert to textile (c/o James Stewart's html2textile ruby script)
escape left-over html
convert back to [spartan? hopefully] html from textile via RedCloth (with some house-keeping regex that needs really expanding upon)

i.e.

doc = Gdocsync::Document.find_by_title("Foo", :fetch).clothe

or, skip Redcloth (but still escape tatterdemalion tags after textile conversion):

doc = Gdocsync::Document.find_by_title("Foo", :fetch).textile

With safe_html, the user gets pretty much what they see on google (however that document might be constructed), but you also get stuck with inline styles and tag soup -- only script tags (which Google Docs doesn't appear to allow, anyway), and form tags, are stripped:

doc = Gdocsync::Document.find_by_title("Foo", :fetch).safe_html

You also have recourse to the raw html body with no modifications beyond google's own processing (the 'raw' document is stored in the object and used as the starting point for the previous methods):

doc = Gdocsync::Document.find_by_title("Foo", :fetch).raw

This can all be easily tied together with database tables via a rake task. The following snippet loops through Properties, looks up the document on google and updates the database record. This example matches document title with object title field, which is not very smart -- in the next few days I will be making use of google document directories and tightening things up.

namespace :google do

 task :docs => :environment do
   require '../gdocsync/lib/gdocsync'

   Property.find(:all).each do |property|
     doc = Gdocsync::Document.find_by_title(property.title, :fetch)
     Property.update(property.id, :description => doc.clothe)
   end

 end

end

$ rake google:docs

Gdocsync git repo (early days).

Got CHUNK?

Sunday, 2 March 2008

DRYing up CMS with Google

No comments:

Blog Archive

About Me