How-To

Bulk Excel Loader


Userlevel 6
Badge
  • Community Manager
  • 22 replies

As of versions 2.4.26 and 3.0.5 the Excel connector only supports a connection to a single Excel file. The Excel connector will eventually support connections to multiple Excel files with a single connection. In the meantime, we suggest using the following solutions.

 

Method 1

The solution consists of automating the creation of multiple connections, one connection to each Excel file. The Excel files are in the following folder structure.

/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/├── Book1.xlsx├── Book2.xlsx├── Book3.xlsx├── Book4.xlsx└── subfolder    ├── cities.xlsx    └── customers.xlsx

 

The first step is to create a file connection to the folder where the files are located.

01-create-datasource.png

 

The connection can also be created through code:

/* Create connection */call SYSADMIN.createConnection(name => 'excel_folder', jbossCliTemplateName => 'ufile', connectionOrResourceAdapterProperties => 'ParentDirectory=/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/,decompressCompressedFiles=false', encryptedProperties => '');;/* Create data source */call SYSADMIN.createDatasource(name => 'excel_folder', translator => 'ufile', modelProperties => 'importer.useFullSchemaName=false', translatorProperties => '', encryptedModelProperties => '', encryptedTranslatorProperties => '');;

 

Once the connection has been created, you can query the folder to list the files.

call "excel_folder.listFiles"("pathAndPattern" => '*.xlsx');;

path

name

lastModifiedTime

lastAccessTime

creationTime

isRegularFile

isDirectory

isSymbolicLink

isOther

size

fileKey

/mnt/hgfs/vm_host_tmp_folder/excel-sample-files

Book1.xlsx

2022-07-18 13:31

2022-07-18 13:31

2022-07-18 13:31

TRUE

FALSE

FALSE

FALSE

75380

-1183557981

/mnt/hgfs/vm_host_tmp_folder/excel-sample-files

Book2.xlsx

2022-07-18 13:29

2022-07-18 13:29

2022-07-18 13:29

TRUE

FALSE

FALSE

FALSE

21332

-1157001664

/mnt/hgfs/vm_host_tmp_folder/excel-sample-files

Book3.xlsx

2022-07-18 13:23

2022-07-18 13:30

2022-07-18 13:23

TRUE

FALSE

FALSE

FALSE

117472

-1128078879

/mnt/hgfs/vm_host_tmp_folder/excel-sample-files

Book4.xlsx

2022-07-18 13:30

2022-07-18 13:30

2022-07-18 13:30

TRUE

FALSE

FALSE

FALSE

9045

-1097344866

 

The file connector does not recursively enter subfolders. To list the contents of the subfolder you can use the following syntax:

call "excel_folder.listFiles"("pathAndPattern" => '\subfolder\*.xlsx');;

path

name

lastModifiedTime

lastAccessTime

creationTime

isRegularFile

isDirectory

isSymbolicLink

isOther

size

fileKey

/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/subfolder cities.xlsx 19:08.0 27:11.0 19:08.0 TRUE FALSE FALSE FALSE 12747 <OBJECT>
/mnt/hgfs/vm_host_tmp_folder/excel-sample-files/subfolder customers.xlsx 21:41.0 31:55.0 21:41.0 TRUE FALSE FALSE FALSE 10639 <OBJECT>

 

The next step is to create a stored procedure to loop over the list of files and create the CONNECTION and DATASOURCE.

create procedure views.createExcelDatasources(       connection_name_prefix string,       path_and_pattern string) asbegin       declare string connection_name;       -- Get the list of Excel files located in the folder. Then loop of the results using a cursor.       loop on (select x.path, x.name from (call "excel_folder.listFiles"("pathAndPattern" => path_and_pattern)) x) as cur       begin              -- Create the new connection name.              connection_name = connection_name_prefix || replace(cur.name, '.', '_');                         -- Using the path and file name, create the CONNECTION and DATASOURCE              call SYSADMIN.createConnection(                     name => connection_name,                     jbossCliTemplateName => 'excel',                     connectionOrResourceAdapterProperties => 'ParentDirectory=' || cur.path,                     encryptedProperties => ''                     ) without return;              call SYSADMIN.createDatasource(                     name => connection_name,                     translator => 'excel',                     modelProperties => 'importer.useFullSchemaName=false,importer.headerRowNumber=1,importer.ExcelFileName="' || cur.name || '"',                     translatorProperties => '',                     encryptedModelProperties => '',                     encryptedTranslatorProperties => ''                     ) without return;       endend;;

 

Once the stored procedure has been created we pass in a prefix and execute the stored procedure using the following call:

call views.createExcelDatasources("connection_name_prefix" => 'excel_file__', "path_and_pattern" => '*.xlsx');;

 

Once the stored procedure completes, refresh the list of data sources.

02-refresh-list.png

 

The new data sources will become available after the refresh, and you can query the contents of the various Excel files.

03-new-datasources.png

 

The files in the subfolder can be added by calling the stored procedure with the following parameters:

call views.createExcelDatasources("connection_name_prefix" => 'excel_file__', "path_and_pattern" => '\subfolder\*.xlsx');;

mceclip0.png

 

Method 2

Method 2 does not persist multiple connections, instead it creates a single connection to a file and imports the data. Then deletes the connection, and repeats the process for each Excel file.

 

This method does require that each Excel file have the same structure and schema, else the insert will fail. It is possible to modify this code to insert each Excel file into it's own table, but that is not covered in this article.

 

 

create procedure views.bulkImportExcelFiles(    connection_name string,    path_and_pattern string,    destination_schema string,    destination_table string) asbegin    declare string destination_schema_table = '"' || destination_schema || '.' || destination_table || '"';    declare string sql_template_insert_into = 'insert into "<<dest_schema>>"."<<dest_table>>" select * from "<<src_schema>>"."<<src_table>>"';    declare string sql_template_select_into = 'select * into "<<dest_schema>>"."<<dest_table>>" from "<<src_schema>>"."<<src_table>>"';    declare string sql_stmt;    -- Get the list of Excel files located in the folder. Then loop of the results using a cursor.    loop on (select x.path, x.name from (call "excel_folder.listFiles"("pathAndPattern" => path_and_pattern)) x) as curExcelFiles    begin        -- If the connection name exists, delete it.        if (exists(SELECT * FROM "SYSADMIN.Connections" as c where connection_name in ("c.fullName", "c.name")))        begin            call "SYSADMIN.removeDataSource"("name" => connection_name) without return;            call "SYSADMIN.removeConnection"("name" => connection_name) without return;        end             -- Using the path and file name, create the CONNECTION and DATASOURCE        call SYSADMIN.createConnection(            name => connection_name,            jbossCliTemplateName => 'excel',            connectionOrResourceAdapterProperties => 'ParentDirectory=' || curExcelFiles.path,            encryptedProperties => ''            ) without return;        call SYSADMIN.createDatasource(            name => connection_name,            translator => 'excel',            modelProperties => 'importer.useFullSchemaName=false,importer.headerRowNumber=1,importer.ExcelFileName="' || curExcelFiles.name || '"',            translatorProperties => '',            encryptedModelProperties => '',            encryptedTranslatorProperties => ''            ) without return;                 -- Create the SQL code to import the Excel file contents into the table        --   Note: An Excel file may contain multiple sheets. Iterate over each one and import the data.        loop on (SELECT "Name" FROM "SYS.Tables" where SchemaName = connection_name) as curExcelSheets        begin            if ((select * from (call "UTILS.tableExists"("tableName" => destination_schema_table)) x) = True)            begin                -- If the target table **does exists**, simply insert the data into the table.                sql_stmt = replace(replace(replace(replace(sql_template_insert_into,                    '<<dest_schema>>', destination_schema),                    '<<dest_table>>', destination_table),                    '<<src_schema>>', connection_name),                    '<<src_table>>', curExcelSheets.Name);                               end            else            begin                -- If the target table does **NOT** exists, use a select into to create the table                sql_stmt = replace(replace(replace(replace(sql_template_select_into,                    '<<dest_schema>>', destination_schema),                    '<<dest_table>>', destination_table),                    '<<src_schema>>', connection_name),                    '<<src_table>>', curExcelSheets.Name);                               end            execute (sql_stmt) without return;        end    endend;;

 


0 replies

Be the first to reply!

Reply