If it is not already, the source data should ideally be converted to a structured format, such as XML or JavaScript Object Notation (JSON), in order to facilitate automated conversion of the data. If the source data are in an unstructured format, such as spreadsheets or plain text files,there are many tools available that easily convert these files to structured data formats. For libraries trying to convert bibliographic records into linked data, among MarcEdit's(http://marcedit.reeset.net/) many features is the ability to convert MARC bibliographic data to XML. Though there are ways of converting less-structured data directly to RDF,using a structured format increases the number of potential tools and programming languages that can be used for converting the data to RDF and will limit the amount of errors that are likely to be generated during the conversion process.Reports were created to output the data with XML formatting from the Access database used to manage the data clean-up process for the ONLD project. This proved to be an effective option, as the project team benefited from the ability to easily manipulate the data in the Access database and yet still retained the ability to use a structured XML file for conversion.
Though project teams should plan their major data cleanup efforts before attempting to transform the source data to RDF, it is almost inevitable that data clean up will continue to happen throughout the life cycle of the project, with new anomalies being discovered as the data set is viewed in different formats. When reviewing the authorized organizations in an alphabetical index created for the NCSU ONLD website, the project team discovered more duplicate entries than when they were working with the data in E-Matrix or the XML file, which led to the consolidation of several headings after the initial conversion.