Phase-1 Database Migration
Part -2: Migrate Data from Oracle to PostgreSQL
In Part-1, I covered migrating Grails Application from Oracle to PostgreSQL.In this Part-2, I cover Data Migration to conclude this two-part series. This is not a comprehensive post by any means. I only cover what I experienced in this process.
After migrating Grails application, my next step was to migrate data. I knew this is where things would become bit tricky. I always feel that Software development is interesting with many competing technologies and products, but is made messy with only few similarities but many differences. Competing technologies start to emerge with little or no standards and deviate so much. Databases are no exception.
During application development, the debate of application logic getting buried in Stored Procedures as opposed to opening up in the application's business/service layer gets to the question of "What if we change the database?". The answer, "We won't change our database" often wins the debate and takes the roots of business-logic deeper and deeper into the database. Lately, PostgreSQL has been making many to rethink of investing in expensive databases. Change is inevitable. The time to "change your database" has come- to some, at least!
My first effort
My first effort was to turn to Google and find any tool(s) that could help migrate data from Oracle to PostgreSQL. After checking several commercial tools, an open-source tool (Ora2Pg), and trying out one tool (Navicat), it was an easy decision for me: decided not to use any tool.Most of the tools migrate both schema & data, often resulting with several datatype mismatches between Grails application's domain-model and Database schema, especially with fields like boolean that has a close matching database equivalent boolean type in PostgreSQL. It's not easy for the tools to infer these details through database schema/metadata.
The Approach
I took the well-known approach in Software Engineering: ETL - a 3-Step process of migrating data.
- Extraction- I leveraged SQL Developer tool to extract data out into delimited files.
- Transformation- I wrote Groovy scripts to transform data by transforming data of only those data types that needed special care.
- Load- I wrote groovy scripts to load transformed data files into PostgreSQL using its COPY command which is very cool for this task.
Data Migration Step-by-Step
Step-0: Create PostgreSQL database and generate schemaCreate PostgreSQL database and generate schema by leveraging Grails and dbm-plugin. I covered this in Part-1.
Step-1: Extract data from Oracle
There are many possible ways to extract/export data from Oracle. I used SQL Developer tool to extract data into delimited multiple files. Each file contains exported data of a specific table.
Certain data types can be transformed during data export by setting preferences. Default Date format is one such that differs form database to database. With this ETL approach, dates certainly need to go through transformations due to default-format differences between databases. This transformation can easily be avoided by taking it into this step of Extraction/Export. SQL Developer offers to set data preferences for a session. With this feature, we can set preferences for Date/Timestamp fields and change the format to default PostgreSQL date format.
Set Data Preferences
Open SQL Developer and connect to database. To check current preferences, go to Preferences > Database > NLS for formats from Oracle SQL Developer main menu item.
Execute the following SQL to alter date format for that particular session:
alter SESSION set NLS_DATE_FORMAT = 'MM/DD/YYYY HH24:MI:SS';
alter SESSION set NLS_TIMESTAMP_FORMAT = 'MM/DD/YYYY HH24:MI:SSXFF';
Export Data
Go to Tools > Database Export and follow the following steps:
Select Connection
Uncheck Export DDL
Format: delimited
Delimiter: % (your preferred char, I chose % as many of special characters are found in the actual data in some column or other)
Left Enclosure: "
Right Enclosure: "
Save as: Separate Files
File: enter dir name (where the exported data to be saved)
Simply go through next steps to finish. You will have data exported into multiple files in the directory specified with each file name as <TABLE_NAME>_DATA_TABLE.dsv. For instance if you have a table MY_TABLE, it's exported delimited data file would be: MY_TABLE_DATA_TABLE.dsv.
Step-2: Transform Data
This step requires, finding all data types that need data transformation. Once data type Date is taken care during the Extraction process, I was only left to deal with Boolean/boolean data types, transforming 1/0 to true/false. This requires identifying all such data columns in the entire schema. There are two choices for finding those columns: 1) Database metadata, 2) Grails domain model. I chose Grails domain-model and wrote a simple script to find the list of tables and columns that are mapped to Boolean/boolean properties of domain-model. I had covered this in my previous post: The power of Grails Metadata is there when you need it. Once I found those details, I got everything that I needed to transform data.
I wrote Groovy script for transforming data. Following is the Groovy script:
import groovy.json.JsonSlurper
/**
* oracle-to-postgres-data-mIgrator.groovy
* This groovy script migrates/transforms delimited data-files exported from Oracle to data-files that are PostgreSQL
* compliant. It takes care of special data migration needed for data-column types like boolean, and nullable FK columns
* that get outputted as "" for nulls when left enclosure(") and right enclosure (") characters are used during data
* export.
*
* Certain dependencies like: base directory the data-files are located in, data-file extension etc. are externalized
* into postgres-migration-config.groovy file for making this script little flexible.
*
* Prerequisites
* 1. Run script: boolean-columns-finder.groovy from the application's home dir.
* 2. Export all data from Oracle database into delimited files from a tool like: SqlDeveloper.
* 3. Adjust any configurations needed in postgres-data-migration-config.groovy file.
* Especially config properties like: delimiter and baseDir
*
* Dependencies (input)
* 1. Requires myAppTablesBooleanColumns.json to exist in the application home directory. This is the file that is
* result of executing boolean-columns-finder.groovy that contains a map of table names and corresponding list of
* boolean columns.
* 2. A set of delimited data files exported using a tool like SqlDeveloper.
*
* Assumptions for Data Export (based on the tool used SqlDeveloper to export oracle data).
* . Each table's data is exported into it's own data file with the filename like: _DATA_TABLE.dsv
* . To simplify parsing, used % as the delimiter because it's one unused character found in application's data.
* . Each data file contains header as the first line which is nothing but column names in capital letters.
* . Used left enclosure(") and right enclosure (") characters during data export which encloses column-data like
* long text that contains embedded new lines, and nullable FK's that are null.
* . Also, default date format is different between oracle and PostgreSQL. The data migration date columns has been
* eliminated by leveraging SqlDeveloper to export data in the format that PostgreSQL can import.
* This can be achieved by running the following SQLs for the session in SqlDeveloper before data export.
* alter SESSION set NLS_DATE_FORMAT = 'MM/DD/YYYY HH24:MI:SS';
* alter SESSION set NLS_TIMESTAMP_FORMAT = 'MM/DD/YYYY HH24:MI:SSXFF';
*
* Result (output)
* 1. Migrated data files. Each data file (_DATA_TABLE.dsv) will have a corresponding migrated data file
* (_DATA_TABLE.dsv.migrated)
* 2. Only data that needs to be migrated (for boolean data) goes through data migration.
* The input json file: myAppTablesBooleanColumns.json contains all tables that need data migration. For every
* table that needs data migration, the corresponding input data-file name is derived by the above mentioned
* filename assumptions.
* e.g. table: my_table, data-file: MY_TABLE_DATA_TABLE.dsv, migrated data-file: MY_TABLE_DATA_TABLE.dsv.migrated
* 3. Rest of the tables' data that do not need data migration (do not have boolean columns) is a simple passthrough
* with just the header migration. The header migration is simply converting column names to lowercase.
*
* @see postgres-data-migration-config.groovy
* @see boolean-columns-finder.groovy
*
* @author Giri Pottepalem
*/
def config = new ConfigSlurper().parse(new File('postgres-data-migration-config.groovy').toURI().toURL())
def dataDelimiter = config.data.delimiter
def datafileBaseDir = config.data.file.baseDir
def datafileExtension = config.data.file.extension
def migratedDatafileExtension = config.data.migrated.file.extension
def tablesBooleanColumnsFile = config.tables.boolean.columns.file
if(!new File(tablesBooleanColumnsFile).exists()){
println "Error: Missing required file ${tablesBooleanColumnsFile} for data migration."
System.exit(1)
}
Map tablesBooleanColumnsMap = new JsonSlurper().parseText(
new FileReader(tablesBooleanColumnsFile).text
)
def dataFilesNeedDataMigraton = tablesBooleanColumnsMap.keySet().collect {
"${it.toUpperCase()}${datafileExtension}"
}
def allDataFiles = new File(datafileBaseDir).listFiles().name.findAll {
it.endsWith(datafileExtension)
}
List dataFilesJustNeedHeaderMigration = allDataFiles - dataFilesNeedDataMigraton
println "Migrating just header for all data files that don't need data migration..."
dataFilesJustNeedHeaderMigration.each { dataFilename ->
String absoluteDataFilename = "${datafileBaseDir}/$dataFilename"
print "Reading data file: $absoluteDataFilename..."
File dataFile = new File(absoluteDataFilename)
File migratedDataFile = new File("${absoluteDataFilename}${migratedDatafileExtension}")
migratedDataFile.withWriter { fileWriter ->
dataFile.eachLine { line, lineNumber ->
if (lineNumber == 1) {
fileWriter.writeLine(line.toLowerCase()) //migrate header
} else {
fileWriter.writeLine(line) //take data as is
}
}
}
println "Migrated"
}
println "Migrating data..."
tablesBooleanColumnsMap.each { String tableName, List booleanColumns ->
String dataFilename = "${datafileBaseDir}/${tableName.toUpperCase()}${datafileExtension}"
print "Reading data file: $dataFilename..."
File dataFile = new File(dataFilename)
File migratedDataFile = new File("${dataFilename}${migratedDatafileExtension}")
String header
def columnNames
migratedDataFile.withWriter { fileWriter ->
dataFile.eachLine { String line, int lineNumber ->
if (lineNumber == 1) {
header = line.toLowerCase() //migrate header
columnNames = header.split(dataDelimiter)
fileWriter.writeLine(header)
} else {
//migrate data
String[] columnData = line.split(dataDelimiter, -1)
def booleanColumnIndexes = booleanColumns.collect { columnName ->
columnNames.findIndexOf { it.equalsIgnoreCase(columnName) }
}
booleanColumnIndexes.each { booleanColumnIndex ->
if (booleanColumnIndex < columnData.length) {
switch (columnData[booleanColumnIndex]) {
case null: columnData[booleanColumnIndex] = ''; break
case '1': columnData[booleanColumnIndex] = 't'; break
case '0': columnData[booleanColumnIndex] = 'f'; break
}
}
}
//update data that ended up as "" due to blank to '' in data exoport
columnData.eachWithIndex { data, i ->
if (data == '""') {
columnData[i] = ''
}
}
fileWriter.writeLine(columnData.join(dataDelimiter))
}
}
}
println "Migrated"
}
println "Done"
Following is a sample of externalized configuration properties as Groovy script in order to make the above script bit flexible:
/**
* postgres-data-migration-config.groovy
* Configuration properties as groovy script
* @author Giri Pottepalem
*/
data {
delimiter = '%'
file {
baseDir = './export' //dir in which delimited data files exist
extension = '_DATA_TABLE.dsv'
}
migrated {
file {
extension = '.migrated'
}
//need to be loaded in specific order due to foreign key constraints
files = [
'MY_TABLE1_DATA_TABLE.dsv.migrated',
'MY_TABLE2_DATA_TABLE.dsv.migrated',
...
]
}
}
appHome = '/gateway' //myApp application home dir
//file that contains list of tables and boolean columns for each table. Result of running boolean-columns-finder.groovy
tables.boolean.columns.file = "${appHome}/myAppTablesBooleanColumns.json"
postgres {
dataSource {
url = 'jdbc:postgresql://localhost:5432/myApp'
user = 'postgres' //superuser for running COPY
password = 'password'
driverClassName = 'org.postgresql.Driver'
}
//Adjust sequences with values taken from oracle db by: select sequence_name, last_number from user_sequences;
sequences = [
'hibernate_sequence': 150461,
'table1_sequence': 4041
]
}
Step-3: Load Data
Once data is migrated, find out data dependencies order and load data in the required order that satisfies foreign-key constraints. I covered this in my previous post: Groovy Script to load CSV data files into PostgreSQL database onto Amazon RDS.
No comments:
Post a Comment