Google Documents provides users with a familiar, intuitive word processing UI, revision history and drafts, and document folders... at the expense of lots of hpricot cannon-fodder.
- Find document on google, and :fetch its content
- Strip script and form tags and tag attributes (including inline styles)
- convert to textile (c/o James Stewart's html2textile ruby script)
- escape left-over html
- convert back to [spartan? hopefully] html from textile via RedCloth (with some house-keeping regex that needs really expanding upon)
or, skip Redcloth (but still escape tatterdemalion tags after textile conversion):
doc = Gdocsync::Document.find_by_title("Foo", :fetch).clothe
With safe_html, the user gets pretty much what they see on google (however that document might be constructed), but you also get stuck with inline styles and tag soup -- only script tags (which Google Docs doesn't appear to allow, anyway), and form tags, are stripped:
doc = Gdocsync::Document.find_by_title("Foo", :fetch).textile
You also have recourse to the raw html body with no modifications beyond google's own processing (the 'raw' document is stored in the object and used as the starting point for the previous methods):
doc = Gdocsync::Document.find_by_title("Foo", :fetch).safe_html
This can all be easily tied together with database tables via a rake task. The following snippet loops through Properties, looks up the document on google and updates the database record. This example matches document title with object title field, which is not very smart -- in the next few days I will be making use of google document directories and tightening things up.
doc = Gdocsync::Document.find_by_title("Foo", :fetch).raw
$ rake google:docs
namespace :google do
task :docs => :environment do
Property.find(:all).each do |property|
doc = Gdocsync::Document.find_by_title(property.title, :fetch)
Property.update(property.id, :description => doc.clothe)
Gdocsync git repo (early days).