Developer: Ruby on Rails
   DOWNLOAD
 Oracle Database XE
 Ruby OCI8 Adapter
 Ruby on Rails
 Ruby Gems
   TAGS
xe, rubyonrails, All

Easy Integration: From XML to the Datastore Without the Mess


by Matt Kern

Enforce simple persistence of XML data to Oracle using ActiveRecord and XML::Mapping.

Published June 2007

XML has become the world's de facto data exchange format, and Ruby on Rails is a full participant in that framework. Using a combination of the XML::Mapping Ruby gem and the ActiveRecord component of Rails (without all the other heavy components), you can parse an XML document, map it to an object, manipulate the object, and persist it to an Oracle database backend with less code than you could imagine. As an added bonus, you have the full power and flexibility of the legendary ActiveRecord at your service from the Rails stack.

There are a couple of options in the Ruby world for marshalling and unmarshalling data from XML to objects and back. ROXML is the simpler of the two options allowing for very easy mapping of values to and from your Ruby objects. However, ROXML's simplicity is also its biggest weakness. The ROXML API isn't nearly as rich as that of XML::Mapping. For example, there is no way to define a default value for a given element or attribute in ROXML while there is an API for doing so in XML::Mapping. Both libraries come as gems and as such are very easy to install. Both libraries also depend upon REXML for XML Parsing, although XML::Mapping has its own XPath implementation.

I've chosen to use the XML::Mapping gem for this article based on the more complete API, although in practice either library can be used to achieve the same results (with slightly different implementations, obviously—the principles remain the same). Both libraries require that you "include" them since they're implemented as Ruby Modules. (Ruby Modules allow you to "mix-in" the module methods into the class in which they are included. This is the way that functionality common to many classes is implemented, as epitomized by the Enumerable Module.)

In both cases, the libraries do not play nice with ActiveRecord. ActiveRecord makes copious use of hooks like method_missing—what the community affectionately (and not-so-affectionately at times) calls "Railsy magic". This means that the implementation is not quite as simple as including the XML::Mapping module directly into your ActiveRecord classes. You'll see what that means soon, but first get the required components installed.

Installation


First install the XML Mapping gem. Like any gem, installation is simple. Just issue:
    
$ sudo gem install xml-mapping
You'll likely need root permission to install a gem; the sudo is needed unless you're already root. Now let's check to see that it was properly installed:
$ gem list xml-mapping --local

*** LOCAL GEMS ***

xml-mapping (0.8.1)
    An easy to use, extensible library for mapping Ruby objects to XML
    and back. Includes an XPath interpreter.

If you're not developing a Rails application and you want to use the rich functionality of the XML::Mapping gem and ActiveRecord together, you can still do so. ActiveRecord can be used outside the Rails framework, and it can be a real timesaver for developing scripts to automate tedious data dumps and other applications. To use ActiveRecord outside Rails you'll need to install the ActiveRecord gem using the same command as any other gem installation:

      
$ sudo gem install activerecord --include-dependencies     

The install command above will install both ActiveRecord and its dependency ActiveSupport. Have a look at the API documentation for ActiveSupport since you've installed it. It provides some excellent extensions to the Ruby core to make things like calculating dates simpler and more readable. Again, check to ensure it was installed:

      
$ gem list activerecord --local

*** LOCAL GEMS ***

activerecord (1.15.2)
    Implements the ActiveRecord pattern for ORM.        
If you are developing a Rails application you should already have Rails installed. If you have Rails installed then you've got ActiveRecord and you can disregard the installation of the activerecord gem above. If you've been following this series then you should be set. If you're new to all this, take a look at Ruby on Rails on Oracle: A Simple Tutorial.

Co-dependent Libraries

I alrwady mentioned that both the XML mapping libraries rely on REXML earlier, but there's no need to install REXML as it's part of the Ruby core. REXML has a bit of a reputation for being slow, but given how quickly a test application can be written using REXML and its descendents, it's an excellent use of time to see how well it performs for your needs.

REXML has always been the de facto standard for XML parsing under Ruby, but the libxml project seems to have picked up of late. According to the informal benchmarks at the libxml project site, libxml is substantially faster than REXML particularly when it comes to XPath. Hopefully, at some point either one (or both) of these libraries will be compatible with the libxml project as well. But remember that the XML::Mapping gem includes its own XPath implementation and it adds some nifty features like precompiled XPath queries and write access to your XPath expressions. The reason I bring up the slowness of REXML's XPath implementation is this: Although the XML::Mapping gem comes with its own XPath implementation, REXML's is still used at times. Using Ruby's profiler will show you what calls are made while using the gem and you'll see that REXML XPath parsers are called, at least when using XML::Mapping#load_from_xml.

The Not-So-Gory Details

With all that out of the way, let's see some code. Much of this example code can be run outside of Rails as long as you've set up ActiveRecord to have a valid connection to your Oracle database. To get ActiveRecord working outside of Rails, simply require it in your script or non-Rails application and then establish a connection. Here's an example:
     
require 'rubygems'
require_gem 'activerecord'

ActiveRecord::Base.establish_connection( :adapter => "oci", :host => "localhost/XE", 
                                         :username => "discographr", 
                                         :password => "password" )

Class Person < ActiveRecord::Base
end
As I mentioned earlier, ActiveRecord and XML::Mapping can make writing scripts a snap. Imagine a script that needs to dump data from a database to XML for export and you can start to appreciate why. You'll appreciate it even more when you're finished with this article!

If you're not sure how to get a Rails application talking to Oracle, read Obie Fernandez's "Connecting to Oracle from Ruby on Rails".

 
<?xml version="1.0" encoding="UTF-8"?>
<person>
  <name>Ruby Jones</name>
  <age>45</age>
  <address name="home">
    <street_address>1200 Main Street</street_address>
    <city>Anytown</city>
    <state>SD</state>
    <zip_code>12345</zip_code>
  </address>
  <address name="work">
    <street_address>9898 Center Street</street_address>
    <city>Anotherville</city>
    <state>SD</state>
    <zip_code>11223</zip_code>
  </address>
</person>
You need to create a schema for ActiveRecord to use for the Person model:

     
CREATE TABLE people (
    id NUMBER(38) PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    age NUMBER(3)
)

CREATE TABLE addresses (
    id NUMBER(38) PRIMARY KEY,
    street_address VARCHAR(100) NOT NULL,
    city VARCHAR(100) NOT NULL,
    state VARCHAR(2) NOT NULL,
    zip_code NUMBER(5) NOT NULL,
    name VARCHAR(100) NOT NULL,
    person_id NUMBER(38)
)         
With the schema in place the next step is to create the ActiveRecord classes:
     
class Person < ActiveRecord::Base
  has_many :addresses
end

class Address < ActiveRecord::Base
  has_one :person
end       
Remember that ActiveRecord subclasses will look for a table named "people", the pluralized form of "person" and infer the attributes from the table for which the class is named. So assuming you have a valid connection to the database, you can call the following even though you've never defined :name or :age explicitly in the model class:
     
user = Person.new ( :name => "Chuck Palahniuk",
                    :age => 45 )
user.save!        
Now, recall that earlier I said that ActiveRecord and the XML mapping libraries do not "play nice" together. It turns out that they just can't share! Ideally, you could just include the XML::Mapping library into the ActiveRecord subclass; they'd behave well together and you could load up the attributes for the model directly from an XML document. Or you could load up an object from the database and write it out to XML. But it's not quite that simple. (Although with some reworking the libraries could be made to behave together!) They step on each other's toes when it comes to attribute accessors. So instead of adding the include for XML::Mapping to the ActiveRecord subclass, just create wrapper classes for the XML representation of the models:

require_gem 'xml-mapping'

class AddressXml
  include XML::Mapping
  
  text_node :street_address, "street_address"
  text_node :city, "city"
  text_node :state, "state"
  numerical_node :zip_code, "zip_code"
        text_node :name, "@name"
end


class PersonXml
  include XML::Mapping

  text_node :name, "name"
  hash_node :address, "address", "@name", :class => AddressXml
end
The order of these class definitions is important. You define AddressXml before PersonXml because PersonXml references AddressXml in the hash_node call. If you hadn't defined AddressXml first the hash_node call would have complained about AddressXml being undefined. You could have handled the dependency this way, reversing the order of the definitions and using forward declarations:
require_gem 'xml-mapping'

# forward declaration
class AddressXml; end

class PersonXml
  include XML::Mapping

  text_node :name, "name"
  hash_node :addresses, "address", "@name", :class => AddressXml
end


class AddressXml
  include XML::Mapping
  
  text_node :street_address, "street_address"
  text_node :city, "city"
  text_node :state, "state"
  numerical_node :zip_code, "zip_code"
  text_node :name, "@name" 
end
That will keep PersonXml#hash_node from complaining, as well.

So far, you've simply included the XML::Mapping module into your wrapper classes and then used the mixed-in methods to map XML nodes to our objects. text_node maps an XPath path, like "name" to an object attribute, such as :name. hash_node allows you to map a hash of objects, in this case Address objects, to an attribute like :addresses. In the case of hash_node the third parameter acts as the key to the hash. In our example above, "@name" is the XPath path to the name attribute on the address element. You'll see that hash key in action in just a bit.

XML::Mapping mixes in a number of node types including text_node, numerical_node, hash_node and array_node. Even better you can create your own node types. That's a bit beyond the scope of this article, but have a look at the README for the module for details, it's a very powerful feature (one that ROXML can't touch).

Now, ignore the ActiveRecord piece of this implementation for a moment and take a look at what your XML wrapper classes have brought. You can now load up our XML document by calling:

     
pxml = PersonXml.load_from_file('path_to_xml_file')
Of course, you'd need to make sure that PersonXml was defined by calling require prior to this call, if needed.

Once you've loaded PersonXml, you now have access to all the data bound to your object. The easiest way to see this is through a few example calls:

pxml.name
=> "Ruby Jones"

pxml.addresses['home'].city
=> "Anytown"

pxml.addresses
=> {"home"=>#<Address:0x57f738 @street_address="1200 Main Street", 
@state="SD", @zip_code="12345", @city="Anytown">, 
"work"=>#<Address:0x50b860 @street_address="9898 Center Street", 
@state="SD", @zip_code="11223", @city="Anotherville">}
Now that you have the XML data mapped to the wrapper objects, you can work on transferring that data to your ActiveRecord objects. (See why it would be so nice if you could skip this extra step? It's still a minimal quantity of work for what you're getting, though.)
a_person = Person.new( :name => pxml.name,
                       :age => pxml.age )

a_person.addresses << Address.new( :street_address => pxml.addresses['home'].street_address,
                                   :city => pxml.addresses['home'].city,
                                   :state => pxml.addresses['home'].state,
                                   :zip_code => pxml.addresses['home'].zip_code,
                                   :name => pxml.addresses['home'].name )

a_person.save!
With that, you've opened and parsed the XML file, mapped the data to the XML wrapper objects to make the data easier to manipulate, then mapped the XML wrapper object attributes to the ActiveRecord models, and finally, saved the model to the database!

Want to accomplish the same thing with even less code?

     
[:name, :age].each{|attr| a_person.send(("#{attr}="), pxml.send(attr))}
Based on that example I'll leave it to you to figure out how to add an address that way.

Example Implementation

Now that we've covered how to use the XML::Mapping gem with ActiveRecord, let's see how to implement it in a real-world application. Here you'll use the Discographr application described here; if you're not familiar with it please go back and review the app. Here we're going to integrate an XML document provided by Last.fm's excellent Web services. Last.fm is a social networking application (and so much more) for music lovers. It allows users to automatically submit the names of tracks and archive their personal music listening history. It also allow users to tag virtually anything from artists to tracks.

Discographr will be using an XML document generated by Last.fm's RESTful Web service to add albums and their details to its database. This sort of functionality could be used to automatically pull in all the data for an album, including song data, if a user was to enter the name of the album and the artist in a Web form. (It should be mentioned here that Last.fm's Web services are for non-commercial use only unless otherwise granted by Last.fm.)

Let's start with an XML document generated by a RESTful call to http://ws.audioscrobbler.com/1.0/album/Pinback/Offcell/info.xml. The XML generated by this request is:
<?xml version="1.0" encoding="UTF-8"?>
<album artist="Pinback" title="Offcell">
    <reach>6325</reach>
    <url>http://www.last.fm/music/Pinback/Offcell</url>
    <releasedate>    10 Jun 2003, 00:00</releasedate>
    <coverart>
        <small>http://images.amazon.com/images/P/B00009EIS9.01._SCMZZZZZZZ_.jpg</small>
        <medium>http://images.amazon.com/images/P/B00009EIS9.01._SCMZZZZZZZ_.jpg</medium>
        <large>http://images.amazon.com/images/P/B00009EIS9.01._SCMZZZZZZZ_.jpg</large>
    </coverart>
    <mbid>28cc2841-46e4-40ab-b371-989a749a8368</mbid>
    <tracks>
                <track title="Microtonic Wave">
            <reach>5226</reach>
            <url>http://www.last.fm/music/Pinback/_/Microtonic+Wave</url>
                    </track>
                <track title="Victorious D">
            <reach>3521</reach>
            <url>http://www.last.fm/music/Pinback/_/Victorious+D</url>
                    </track>
                <track title="Offcell">
            <reach>4894</reach>
            <url>http://www.last.fm/music/Pinback/_/Offcell</url>
                    </track>
                <track title="B">
            <reach>5375</reach>
            <url>http://www.last.fm/music/Pinback/_/B</url>
                    </track>
                <track title="Grey Machine">
            <reach>3495</reach>
            <url>http://www.last.fm/music/Pinback/_/Grey+Machine</url>
                    </track>
            </tracks>
</album>
You need to do some initial set up prior to writing any code. All the XML wrapper classes will depend on the XML::Mapping module so add a require_gem statement to the end of RAILS_ROOT/config/environment.rb:
                     
require_gem 'xml-mapping'
Before you can do much with the XML from Last.fm you need to create the XML wrapper objects for this data pull. In order to do that you'll create a module under RAILS_ROOT/lib. Rails will automatically load the class definitions as long as the names of files or directories are the lowercase form of the class or module name. (If for some reason you decided not to follow that convention you can always explicitly require files in environment.rb.) So, under lib, create a folder called last_fm_album_pull. Then create a file called album_xml.rb and place the following code there:
    
module LastFmAlbumPull
  
  class AlbumXml
    include XML::Mapping
  
    text_node :release_name, "@title"
    text_node :releasedate, "releasedate"
    text_node :artist, "@artist"
    array_node :tracks, "tracks", "track", :class => LastFmAlbumPull::SongXml
  
  end
Create another file in the same directory called song_xml.rb and add this code to it:
module LastFmAlbumPull

  class SongXml
    include XML::Mapping

    text_node :title, "@title"

  end

end
You use a module to namespace the XML wrappers so that if you end up adding other data pulls or pushes you can use the same names with a different module name in front of it. Also, it makes the code more portable. You can easily take this module out of the lib directory and use it to map the same XML data in another application by keeping all the classes tidy in a module. You'd also probably wrap the code that queries the Web service in this module, but for now just do it by hand.

Now you can map the XML wrapper attributes to the appropriate ActiveRecord models. Fire up the console and try out our data pull:

     
$ script/console
Loading development environment.        
First load the wrapper objects with the XML file data you got from Last.fm:
>>  
                              
lfm_album =  LastFmAlbumPull::AlbumXml.load_from_file(File.join(RAILS_ROOT, "album.xml"))
=> #<LastFmAlbumPull::AlbumXml:0x31d35a8 @artist="Pinback", @releasedate="10 Jun 2003, 00:00", 
     @tracks=[#<LastFmAlbumPull::SongXml:0x31d0614 @title="Microtonic Wave">, 
#<LastFmAlbumPull::SongXml:0x31cf9bc @title="Victorious D">, 
#<LastFmAlbumPull::SongXml:0x31cebe8 @title="Offcell">, 
#<LastFmAlbumPull::SongXml:0x31ce300 @title="B">, 
#<LastFmAlbumPull::SongXml:0x31cde28 @title="Grey Machine">], 
@release_name="Offcell">
                            
After a successful load you'll create an ActiveRecord album out of the data load by mapping the wrapper object attributes to the ActiveRecord model attributes. Notice the manipulation of the release_date field. You could have chosen to do that in the wrapper object, but it seems more appropriate to let the class receiving the data decide how to format it. You might want to move the LastFmAlbumPull Module to another application that requires the full date, not just the year.
     
>>  
                              
album = Album.create( :release_name => lfm_album.release_name, 
                         :year => lfm_album.releasedate.strip.to_date.year, 
                         :artist => Artist.find_or_create_by_name(lfm_album.artist) )
=> #<Album:0x3214af8 @artist=#<Artist:0x3214990 @attributes={"created_on"=>Tue Feb 27 13:47:24 PST 2007, 
"name"=>"Pinback", "updated_on"=>Tue Feb 27 13:47:24 PST 2007, "id"=>10202}>,
@new_record=false, @new_record_before_save=true, 
@errors=#<ActiveRecord::Errors:0x3210c28 @base=#<Album:0x3214af8 ...>, 
@errors={}>, @attributes={"created_on"=>Tue Feb 27 14:30:18 PST 2007, 
"artist_id"=>10202, "updated_on"=>Tue Feb 27 14:30:18 PST 2007, 
"id"=>10064, "year"=>2003, "release_name"=>"Offcell"}>
                            
Next, iterate through the tracks (which are objects as created by the wrapper classes). You'll populate the songs for the album, mapping the wrapper class attributes to ActiveRecord's model attributes again. You're going to set Song#length to zero for now as the Web service doesn't provide that piece of information.
>>  
                              
lfm_album.tracks.each_with_index do |track, index|                         
?> album.songs << Song.new( :title => track.title, 
                            :track_number => index + 1, :length => 0 )
>> end  
=> [#<LastFmAlbumPull::SongXml:0x31cfc14 @title="Microtonic Wave">, 
#<LastFmAlbumPull::SongXml:0x31cef80 @title="Victorious D">, 
#<LastFmAlbumPull::SongXml:0x31ce648 @title="Offcell">, 
#<LastFmAlbumPull::SongXml:0x31cdf7c @title="B">, 
#<LastFmAlbumPull::SongXml:0x31cdc20 @title="Grey Machine">]
                            
Finally, save the album object to the database!
     
>>  
                              
album.save
=> true
                            
You've taken the XML and persisted it to Oracle with fewer than 20 lines of code—and that's including the additional overhead of the XML wrapper classes! On top of that you have the full power ActiveRecord at our disposal (the model definitions are 12 lines of code!) Granted, there's a lot missing here—in a real-world application you would add validation on the models. Regardless, even after adding the validations the code required for such incredible functionality is minimal.

Using this technique it's possible to create very rich functionality with a bare minimum of effort. This technique can be used to create RESTful Web service interfaces, parse RSS feeds, process external order data, or anything else you can think of inside or outside of Rails.


Matt Kern has been searching for and developing ways to make life easier through technologies like Rails for years—mostly an attempt at finding ways to spend ever more time roaming the mountains of Central Oregon with his family. He is the founder of Artisan Technologies Inc. and co-founder of Atlanta PHP.