Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
JasonHinsperger
Advisor
Advisor
0 Kudos

In this post, originally written by Glenn Paulley and posted to sybase.com in November of 2009, Glenn talks about importing data from Microsoft Excel into a SQL Anywhere database.


One way to import data from a Microsoft Excel spreadsheet into a SQL Anywhere database is via the DBISQL INPUT statement. Here's an example:


INPUT USING 'dsn=myExcelFile;DSN=myExcelFile'
FROM "myData" INTO "T"
CREATE TABLE ON

Note that this is a DBISQL statement, rather than a SQL statement that can be executed by the server. The components of this statement are as follows:

  • "myData" refers to a named matrix of rows and columns in the Excel spreadsheet, which will be used as input to the INPUT statement. In Excel Office 2007, one creates a named matrix of cells by executing the following steps:
    1. With the mouse, or using SHIFT-arrow, highlight the set of rows and columns desired within the worksheet to select them.
    2. Once highlighted, right-click on the selected rows.
    3. Scroll down to the menu item "Name a Range...." and press Enter or left-click.
    4. Type in your chosen name for this matrix of rows. We choose "myData" to correspond to the INPUT statement above.
    5. Save the modified spreadsheet.
  • DSN=. One needs to create an ODBC DSN in order for DBISQL to connect to the Microsoft Excel ODBC driver and read the rows and columns corresponding to "myData". To create the DSN:
    1. Start the Microsoft ODBC Administrator from your SQL Anywhere programs folder. Switch tabs to "System DSN", and then click "Add". Using System DSN's is important, because in some scenarios User DSN's will not be found.
    2. Select an Excel ODBC driver. On my laptop I'm still running 32-bit Windows XP - so the two Excel ODBC drivers available are:
      1. The generic .xls driver ("Microsoft Excel Driver (*.xls)"), version 4.00.6305.00, dated 4/14/2008; and
      2. The Office 2007 driver ("Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)"), version 12.00.6211.1000, dated 8/24/2007.
    3. After selecting one of the above drivers, click "Finish".
    4. You are then shown a dialog with input boxes for the various driver parameters. Enter a Data Source Name to name the DSN (we choose "myExcelFile"), and use the "Browse" button to inform the ODBC Administrator where the spreadsheet is in your filesystem. Then click "OK".
  • The syntax FROM "myData" INTO "T" tells DBISQL to create a table "T" from the data range named "myData".

Et voila! You have imported the selected rows and columns into a SQL Anywhere table named "T".

If Only It Were That Simple

The above set of steps does work. Really. It differs from the SQL Anywhere 11.0.1 documentation in that I've included the Excel 2007 steps for naming a matrix of rows and columns; the 11.0.1 documentation specifies the steps when using the Office 2003 version of Excel.

Aside: A missing piece of the SQL Anywhere 11.0.1 documentation is that the Microsoft ODBC driver assumes that the first row of a named set of rows and columns contains the "column names" for the data, and the ODBC driver will return those cell values in response to metadata calls for that result set. Hence, if naming a set of rows and columns to be loaded, ensure that the first row contains the names of the columns you desire for the table "T" - with the DBISQL INPUT statement, DBISQL will use the names of the columns returned by the metadata calls to the underlying ODBC driver.

However, there are two significant (and related) problems with importing Excel data using this method, and they both are due to the behaviour of the Microsoft Excel ODBC drivers (both of the drivers I've documented above exhibit the same behaviour). The two problems are:

  • The Microsoft ODBC driver for Excel seemingly chooses the data types for the various columns in the named area arbitrarily; and
  • Data exceptions can result because the choice of data types may not match all cell contents. Depending on precisely how the result set is FETCHed through the Excel ODBC driver, the application may (1) receive an error, or (2) values that would result in a data exception may be returned as NULL, or (3) the result set may be truncated without any notification.

The Excel driver's behaviour in choosing data types for each column of cells is partially explained in this 2003 Microsoft Knowledge Base article. There seems to be very little one can do to coerce the driver to choose a more generic type (such as string) when the data in a spreadsheet is dirty. It is these choices of data types that lead to the second problem.

The DBISQL INPUT statement causes DBISQL to open a cursor over the Excel data source using wide fetches - and upon the first data exception, the Microsoft driver returns end-of-file and the result set is effectively truncated without error. I experimented with two other JDBC-ODBC bridges - the freely-available Sun Microsystems bridge and the commercial Easysoft JDBC-ODBC bridge(available for a free trial) - and both drivers exhibited slightly different behaviour, returning the complete result set but with invalid data values substituted with NULLs, again with no indication of an error. My colleague Karim Khamis was kind enough to provide an example Java application to demonstrate the behaviour:


import java.io.*;
import java.sql.*;
import java.util.*;
class T
{
    public static void main (String args[]) throws IOException
    {
  Connection    con = null;
  System.out.println ( "Starting ... " );
  con = connect();
  if( con == null ) {
     return; // exception should already have been reported
  }
  System.out.println ( "Connected ... " );
  try {
     try {
  con.setAutoCommit(false);
     } catch( SQLException dummy ) {
     }
   
     ResultSet rs = con.getMetaData().getColumns( null, null, "myExcelFile", null );
     int colnum = 1;
     while( rs.next() ) {
         System.out.println( "Column " + colnum + " is named " + rs.getString(4)
                    + " with type " + rs.getString(6) + " with size/prec " + rs.getString(7)
                    + " with scale " + rs.getString(9) );
  ++colnum;
     }
     rs.close();
     System.out.println( "\n\n" );
   
     Statement stmt = con.createStatement();
     stmt.setFetchSize(1); // set to > 1 to enable wide fetches if supported
     rs = stmt.executeQuery( "select * from myData" );
     int colcount = rs.getMetaData().getColumnCount();
     int rownum = 1;
     while( rs.next() ) {
         System.out.print( "ROW " + rownum + ": " );
  ++rownum;
  for( int i = 1; i < colcount; ++i ) {
     System.out.print( rs.getObject(i) + " === " );
  }
  System.out.println( rs.getObject(colcount) );
     }
     rs.close();
     con.close();
     System.out.println( "Disconnected" );
  } catch (SQLException sqe) {
     printExceptions(sqe);
  }
    }
 
    private static Connection connect()
    {
  String    driver, url;
  Connection  connection;
  // System.out.println( "Using Sun JDBC-ODBC bridge..." );
  // driver="sun.jdbc.odbc.JdbcOdbcDriver";
  // url="jdbc:odbc:myData";
  System.out.println( "Using Easysoft JDBC-ODBC bridge..." );
  driver="easysoft.sql.jobDriver";
  url="jdbc:easysoft://localhost:8831/myData:trace=on";
  try {
     Class.forName( driver );
     connection = DriverManager.getConnection( url, "dba", "sql" );
  }
  catch( Exception e ) {
     System.err.println( "Error! Could not connect" );
     System.err.println( e.getMessage() );
     printExceptions( (SQLException)e );
     connection = null;
  }
  return connection;
    }
    static private void printExceptions(SQLException sqe)
    {
        while (sqe != null)
        {
     System.out.println("Unexpected exception : " +
  "SqlState: " + sqe.getSQLState()  +
  " " + sqe.toString() +
  ", ErrorCode: " + sqe.getErrorCode());
     System.out.println( "======================================" );
            sqe = sqe.getNextException();
        }
    }
}

The safe play here, unfortunately, is to export the data from Excel into something more amenable for loading, such as a CSV file. One can then use the various LOAD TABLE options to load the data, explicitly controlling the data types used for each column of T.