Import any XML to DBF
Probably you often need to convert an XML file or a folder with XML files into one or several DBF files. The problem is that the XML structure, its hierarchy and nesting levels differ from file to file. In this article, we'll take a detailed look on how to import an XML file of ANY structure to a flat DBF table.
A new feature called Import Custom XML was implemented in DBF Commander Professional version 2.7 (build 40). This command is available via the command line interface. The syntax is as follows:
As you can see, there are four parameters:
- -icx – means Import Custom XML.
- <source_file_mask> – the full name of the file (including path), or the name of the folder containing the XML files to be imported. You can also load only certain files by using a mask, e.g. "D:\xml\some_prefix_*.xml". In this case, only the files whose extension is XML and whose name starts with "some_prefix" will be converted to DBF. If you want to convert all files in the folder, use the following mask: "D:\xml\*.xml".
- <result_file_name_or_folder> – the destination folder, e.g. "D:\dbf\". In this case, all existing DBF files will be rewritten. If you want to import the source XML files into one DBF file, use the exact file name, e.g. "D:\dbf\result.dbf". But in this case, the resulting file should be created before the conversion process starts, otherwise an error will occur. If a single DBF file is used for the resulting data, it will not be rewritten, and each new record will be inserted into the same file.
- <map_file_name> – the full name of the so-called "map-file", which is a simple DBF file containing the definition of the source and resulting file fields. Below we'll consider the process of creating a map-file.
- <-attr> - use this parameter if you'd like to import tags with certain attributes.
DBF fields definition, or How to create a map-file
It is always better to learn something by using an example. In our situation, we need some well-known XML. Let's take an RSS news feed XML file (available here):
As you can see, there are four <item> elements nested in the <channel> element, which is itself nested in the top-level <rss> element. There are also elements of the same nesting level as the <item> elements: <generator>, <title>, <description>, etc.
Suppose that each <item> element should be represented as one row in the resulting DBF file. So our file will contain four records.
To create a map-file, first let's create the resulting DBF file. We need it because then we'll be able to create a map-file in a few clicks. Launch DBF Commander Professional, click File -> New, and create a DBF file. In the case of the RSS XML file, the structure is as follows:
Click OK. The DBF file will be created, and now it can be used as a resulting file for our import from the XML process. You can use any available field types: Character, Integer, Numeric, Date, etc.
To create a map-file based on the resulting file, click the Export Structure tool button (see the picture above). When the Save dialog opens, choose the path and filename, and then save the structure as a DBF file.
Matching XML tags to DBF fields
We have to do one more thing yet: define the XML tags that should be inserted in the fields of the resulting DBF file.
Open the DBF file you've just saved. Click File -> Structure and add a new field named "xmltagname" of the Character type. You can add more fields in the map-file if you want to. For example, let's add a field named "comment". Click OK to save the changes.
Now fill every row of the new "xmltagname" field with the XML nodes path (separated by "->") that should be inserted into the corresponding field in the DBF file:
It means that the value of the XML node <title> nested in the <channel> node, which is nested in the top-level <rss>, will be inserted into the "title" field; the XML tag "rss -> channel -> description" will be inserted into the "desc" field, and so on.
In case of using '-attr' parameter, paths of the map-file must contain corresponding attributes as well, e.g.: 'rss version="2.0" -> channel -> title'.
Pay attention to the "item_desc" field. The XML tag "rss -> channel -> item -> description" contains the CDATA section with HTML markup. So its tag path should be finished with the "#cdata-section" string.
The last row that you can see in the picture above is the so-called "row-tag". This tag path allows DBF Commander to know which XML tag should be used as a new row in the resulting DBF file. In our case, it is "rss -> channel -> item".
Well, that's all. Now we can run the import from XML to DBF itself. Execute the following command from the command line, or from a BAT file:
As a result, we'll get the new file "D:\dbf\rss.dbf" that contains the imported XML data:
You can download all the files used in this conversion from here.