Google Sitemap Generator
Google sitemaps are nice for telling google what is where. Often clients want it for SEO or you have a site which has new content all the time and you want to keep google up to date.
Whatever the reason is thats you are interested in these little xml files, the following code allows you to generate a sitemap for a dynamic site in ruby.
Firstly the class:
require 'net/http'
require 'uri'
# A class specific to the application which generates a google sitemap from
# the contents of the database.
# Author: Alastair Brunton
class GoogleSitemapGenerator
def initialize(base_url, sources)
@base_url = base_url
@sources = sources
end
# The main generator method which in turn adds to the path_array from the different
# sources.
# Sources are: pages, events, properties
def generate
path_ar = Array.new
@sources.each do |source|
# initialize the class and call the get_paths method on it.
path_ar = path_ar + eval("#{source}.get_paths")
end
xml = generate_xml(path_ar)
save_file(xml)
update_google
end
# This creates the xml document.
def generate_xml(path_ar)
xml_str = ""
xml = Builder::XmlMarkup.new(:target => xml_str)
xml.instruct!
xml.urlset(:xmlns=>'http://www.google.com/schemas/sitemap/0.84') {
path_ar.each do |path|
xml.url {
xml.loc(@base_url + path[:url])
xml.lastmod(path[:last_mod])
xml.changefreq('weekly')
}
end
}
xml_str
end
# Saves the xml file to disc. This could also be used to ping the webmaster tools
def save_file(xml)
File.open(RAILS_ROOT + '/public/sitemap.xml', "w+") do |f|
f.write(xml)
end
end
# Notify google of the new sitemap
def update_google
sitemap_uri = @base_url + '/sitemap.xml'
escaped_sitemap_uri = URI.escape(sitemap_uri)
Net::HTTP.get('www.google.com',
'/webmasters/sitemaps/ping?sitemap=' +
escaped_sitemap_uri)
end
end
You will notice that an array of strings are passed when calling the generator. These are names of object which implement the get_paths method. An example get_paths class method is as follows:
# for the google sitemap
def self.get_paths
path_ar = Array.new
Property.live_properties.each do |property|
path_ar << {:url => "/property/#{property.to_param}", :last_mod => property.updated_at.strftime('%Y-%m-%d')}
end
path_ar
end
Basically, you need an array of hashes which each contain the url and the last_mod.
To call this little beastie it is best done from a cron on the production server. An example rake task to do this is as follows:
namespace :google_sitemap do
desc "Generate a google sitemap from the site."
task(:generate => :environment) do
sources = ['Page', 'Event', 'Property']
sitemap = GoogleSitemapGenerator.new('http://www.your_url.com', sources)
sitemap.generate
end
end
Remember when you are calling it from a cron to pass the RAILS_ENV. This generator does rely on rails but you could convert it to only rely on ruby by modifying the rake task and changing the RAILS_ROOT reference in the save_file method. Probably can be made to work with Merb but I am unsure of how merb and rake work together. Will hopefully get my hands dirty with Merb sometime soon.
cd /var/www/apps/site/current /usr/bin/rake RAILS_ENV=production google_sitemap:generate