|
Open Source
New Life for Old Microsoft Access Data
By Marcel Gagné
Learn how to easily migrate Access data to the Oracle Database on Linux using open-source tools
One of the sad facts about our relentless journey forward in the world of computers and software is that we sometimes find ourselves abandoning old data. We move from one word processing office package to another, only to discover that our old documents can no longer be opened.
The same is true of databases. A small shop may find that storing information on a PC database is sufficient, but what happens when it's time to move? We may gripe about leaving our valuable information behind, but we often do leave it. However, what may have seemed unimportant enough to abandon at the time can sometimes demand our attention.
I recently had just such a problem. A community organization I worked with had a PC-based application with a rather large membership database in Microsoft Access format. A custom program from a third party had been written to work with the database, but in time the license had expired and there was no longer a way to get an update of the software. As luck would have it, the information had suddenly become important again. We had the old database backed up and stored on a Linux system. In our new open source world, plenty of powerful databases, such as the Oracle Database on Linux, were available to us. One option would have been to purchase a copy of Microsoft Access and create an application, but that would have involved locking ourselves into the same proprietary mold that had gotten us into this mess in the first place. Furthermore, the organization wanted to be able to access this data remotely from a secure server.
Going Microsoft again wasn't a desirable option. Linux would be the foundation for this new database server, and it would run an industry-standard SQL database. So how would we go about extracting the data trapped in the old MDB files without resorting to setting up a Windows server with a new Access license and another lock-in to a single-architecture solution?
Looking around for solutions to this dilemma led me to Brian Bruns' Web site and his mdbtools package, a collection of command-line tools plus a GUI whose sole purpose is to provide a way to extract information from Access databases.
Getting and Installing mdbtools
The first thing you need to do is pick up a copy of the mdbtools package. Head over to http://mdbtools.sourceforge.net/, and you'll find source as well as RPM packages that should work for several RPM-based distributions. Building from source is pretty simple and consists of the extract-and-build five-step.
tar -xzvf mdbtools-0.5.tar.gz
cd mdbtools-0.5
./configure
make
su -c "make install"
It is also possible to get your hands on the very latest and greatest by downloading and building from CVS. This isn't as complicated as it might sound. Start by connecting and logging into the CVS server anonymously.
cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/mdbtools login
Your next step is to check out the mdbtools package:
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/mdbtools co mdbtools
Note that each of the previous commands is a single, unbroken line. After entering the above command, you see a listing going across your screen of the various files in the distribution as the download occurs. When the download is done, switch to the download directory (called mdbtools) and do the following.
./autogen.sh --enable-sql --with-unixodbc=/usr/local
make
su -c "make install"
Doing Some Exploring with mdbtools
The mdbtools package comes with a series of command-line tools as well as a GUI for manipulating and extracting data from mdb files. I'll cover the GUI shortly, but first I'd like to introduce you to some of the command-line tools. The most basic of these is mdb-ver, which you use to identify the type of mdb file you are dealing with.
$ mdb-ver members.mdb
JET3
The result in this case is "JET3," which tells me that my community members file is based on an Access 97 application. A return of "JET4" would have indicated an Access 2000 or Access XP database. Slightly-more-interesting information starts to emerge with the mdb-tables command.
$ mdb-tables members.mdb
ActionTaken CaseMembers ContactFollowUp ContactReason ContactSource
ContactStatus Donations Issues Location MemberInfo PersonContacted Status ToDo
By default, all of the database's tables will be dumped on a single line. To list them individually, one per line, use the -1 flag. Some tables won't be displayed, however; these are the system tables, the ones starting with Msys. If you want to see these listed as well, use the -S flag.
The output above gives you a breakdown of the various tables that make up your Access database. What's interesting at this point is that we can actually do SQL queries, albeit a limited subset of SQL, on the database, even though the database is no longer running on a Microsoft system. That is the purpose of the mdb-sql command. To enter the interactive SQL interpreter, type mdb-sql, followed by the database name. You'll find yourself at a 1 => prompt, where you can enter a single or multiline query. When you are finished entering and you want to run the query, type go and press Enter.
$ mdb-sql members.mdb
1 => list tables
2 => go
In the above example, I essentially re-created the mdb-tables command. The difference is that the output is formatted in a nice, boxed table with a header and dashed lines all around. You can override this behavior when calling mdb-sql, by passing -H to suppress the header and -p to get rid of the box.
$ mdb-sql -H -p members.mdb
You can exit the SQL interpreter at any time by typing quit and pressing Enter.
To create a query, you can either use the default interactive mode or send a query on one line. In the following example, I use another command that lets me dig a little deeper into the format of the database's tables, the describe table query.
$ echo "describe table Location" | mdb-sql members.mdb
+------------------------------+--------------------+----------+
|Column Name |Type |Size |
+------------------------------+--------------------+----------+
|ID |Long Integer |4 |
|Address1 |Text |255 |
|Address2 |Text |255 |
|Address3 |Text |255 |
|Phone |Text |50 |
|Fax |Text |50 |
|DefaultInd |Boolean |0 |
+------------------------------+--------------------+----------+
To facilitate scripting, queries can also be imported from an external file with the -I flag and output to a text file with the -o flag and specification of a filename.
Using these simple SQL queries, we can learn a lot about the layout of the database and its associated tables. By stripping headers and fancy print, we can export to text files and manipulate the results by using shell scripts. In fact, if all you are looking for is some simple reports from your old data, this may be all you need. If, however, you'd like to migrate the database to a server-based SQL database such as Oracle, mdbtools can help you there as well. (What you do with the data once it gets there, of course, is an entirely different article.)
Getting the Data Out
There are at least two steps to getting the data out. The first is to create a new database on your server and its associated tables. The second is to load the data. Using the above examples, you can get all the details about the various tables that make up the original Access database. Armed with that information, you can then write the necessary scripts to, one by one, re-create those tables specific to your database. Luckily, you won't have to sweat it too much, because the mdbtools package provides a tool for that as well. It's called mdb-schema. Given that my community members table contains some sensitive information, let me demonstrate by using the training database provided with Microsoft Access, Northwind.mdb.
$ mdb-schema Northwind.mdb
The result is a text file of the schema sent to standard out, in DDL (Data Definition Language) format. If you want to capture the outputvery usefulmake sure you redirect the output to a file. By default, the DDL output is in Access format, as shown in the sample below.
DROP TABLE Customers;
CREATE TABLE Customers
(
CustomerID Text (10),
CompanyName Text (80),
ContactName Text (60),
ContactTitle Text (60),
Address Text (120),
City Text (30),
Region Text (30),
PostalCode Text (20),
Country Text (30),
Phone Text (48),
Fax Text (48)
);
The above output shows only one table (for brevity), and that probably makes it a good time to tell you that you can specify a single table with the -T flag, which I will demonstrate in a moment. The database format of the DDL output is referred to as the back end. At this time, five different back ends are supported. They are "access," for Microsoft Access; "sybase," for Sybase; "oracle," for Oracle; and "postgres," for PostgreSQL (and, in the CVS code, "mysql," for MySQL). So, to create the Oracle-format DDL for the SavedQueries table, I use this command:
$ mdb-schema -T Customers Northwind.mdb oracle
Here is the result of this schema export:
DROP TABLE Customers;
CREATE TABLE Customers
(
CustomerID VARCHAR2 (10),
CompanyName VARCHAR2 (80),
ContactName VARCHAR2 (60),
ContactTitle VARCHAR2 (60),
Address VARCHAR2 (120),
City VARCHAR2 (30),
Region VARCHAR2 (30),
PostalCode VARCHAR2 (20),
Country VARCHAR2 (30),
Phone VARCHAR2 (48),
Fax VARCHAR2 (48)
);
So now we know the details of our database. We have its format, tables, and schema. All we need is data, and this is where the mdb-export command comes into play. In its simplest form, it looks like this:
mdb-export database.mdb TableName
For exporting data, the default is to include headers, use commas for delimiters, and to put double quotation marks around each of the fields. Headers can be suppressed with the -H modifier. A -Q suppresses the automatic use of quotation marks around all fields. Finally, you can also change the delimiter to something other than the comma default. For historical reasons, I tend to prefer a tilde.
mdb-export -H -Q -d "~" Northwind.mdb Customers
The Goods on the GUI
Aside from the command-line tools provided as part of the package, there's a nice, GTK-based graphical interface, the MDB File Viewer. The program that calls the GUI is called gmdb2. You can start the program by specifying a database name on the command line or simply start the program and open the file graphically by clicking on File and then on Open and selecting the filename from another dialog box.
The GUI has six tabs: Tables, Querys (sic), Forms, Reports, Macros, and f{x} Modules. Have a look at Figure 1, and you'll see the MDB File Viewer open to the default Tables tab.
The mdbtools GUI, gmdb2
On the right side of the viewer are three buttons. Clicking on any of the tables highlights those buttons. The Definition button pops up a window showing that table's schema, in a nice tabular report format, as shown in Figure 2.
Clicking on the Definition button displays a table's schema.
The Data button is next on the list. Clicking on it provides a tabular listing of the records in the selected table, as in Figure 3.
A simple dump of a table's data
Given that we can see the information in the table, you might be asking yourself whether it's possible at this point to do some queries on that information. Click on Tools, and select SQL Window. From here, you can enter simple SQL queries, just as with the mdb-sql command-line utility.
The original idea, of course, was to get the data from Access into another database format. To export the database schema, click on Tools and select Export Schema. From the resulting dialog box (Figure 4), start by specifying a filename for the output. The drop-down list labeled Table lets you select a dump of the entire database schema or a single table. The Schema Dialect, meanwhile, provides a choice of the various supported back-end formats. Depending on your particular destination database format, you may also want to include Relationships in the output schema. Because the result gives you all the necessary statements for re-creating your tables, you can also choose to include the DROP TABLE commands just before CREATE TABLE.
Exporting an Oracle format schema
Finally, we have Export, probably the most important of those three buttons, given that we want to import this data elsewhere. When you select a table and click on Export, you will have a few choices to make (see Figure 5). The resulting dialog box begins by asking for a filename; you can select an existing file or enter a new one. You must then select the line terminator formatline feeds only for Linux and UNIX, carriage returns for the Macintosh, and carriage return and line feed for Windows and DOS.
Exporting table data in CSV format
In the next field, you have a choice of separator. Several defaults are provided in the drop-down list, but you can override these by typing in your own character (a tilde, for instance). The next two selections decide how and when quotation marks are used around fields and what the quotation mark character is. Finally, a check box lets you decide whether or not to include headers in your export.
With the CSV data in hand, you are now free to import your data into the newly created database.
Those Other Tabs and the Future
No doubt you've noticed that I've skipped over some tantalizing features of the gmdb2 GUI. That's because although the graphical interface does give you additional insight into what components existed in the original database, this information is just information, for the time being.
The tool provides some hints about what the future holds for the mdb-tools package. For instance, if you click on the Forms tab, you will see a list of the VB forms that have been created for the database. Similarly, the Reports tab provides a glimpse into the predefined reports.
The mdbtools package continues to be in development; a mailing list enables developers and other interested parties to discuss the package and push its development further (you'll find a link for joining the mailing list on the mdbtools main page). The next milestone, 0.6, will be the first release to incorporate write capability. With its mdb-import facility, it will soon be possible to take a CVS file and load it into an MDB database. Further on, plans include full insert, update, and delete functions and a more robust and capable ODBC driver (the current incarnation provides only basic features). These last two enhancements will make it possible for users on Linux systems to populate or edit Microsoft Access databases, thus providing some real possibilities for interoperability between Linux and Windows. Among the more ambitious planned enhancements is the ability to convert Access forms into an open source format, making an easy Access-to-Linux transition possible. I'm looking forward to seeing where this goes.
In the meantime, the mdb-tools package provides a capable gateway for those who are making a move to open source software alternatives but aren't ready to throw out their data and start over again.
Marcel Gagné (mggagne@salmar.com) lives in Mississauga, Ont. Canada. He is the author of Moving to Linux: Kiss the Blue Screen of Death Goodbye! and Linux System Administration: A User's Guide (both from Addison-Wesley). In real life, he is president of Salmar Consulting Inc., a systems integration and network consulting firm.
|