Purpose
The Generic Database Reader handler supports reading tabular data from a wide range of databases and file formats because it utilizes various abstract driver layers such as the ODBC, OLE DB, and the Windows native SQL Server driver. In theory, any type of database and many types of data files are accessible as long as the related ODBC or OLE DB driver is installed. Note this handler only utilizes 64-bits drivers only.
Apart from specifying the driver-type, a “database connection string” also needs to be provided to instruct the handler on how to connect to the target database (or data file) instance. The connection string varies depending on the database (or data file), and the driver being used. There are many connection-string examples that can be found online. The target table also needs to be specified as one of the parameters.
By default, the handler reads all the columns and all the rows from the targetted table. The data range can be limited by specifying the columns in the parameters and providing a ‘selection-filter‘ to limit the result to the only qualified rows.
Internally, when reading from a database table, the handler constructs an SQL select-statement, that is “SELECT ({column-1, column-2, .., column-N}) from {table-name} WHERE ({selection-filter});”, where “column-1 to N” and the ‘selection-filter’ are gathered from the provided parameters, and the hander executes this statement via the established connection.
For reading from a table which has dynamically changing data, the handler supports repetitive reading by specifying a repeat-interval. (Note, without specifying a repeating interval, the reader handler will read once then it will stop.) With repeating specified, the handler provides two ways of controlling the reading so only the “newly added” data is read at every repeat –
The first way of control depends on the target table has an incremental integer ‘identity’ column (e.g. the primary key column). Once this column is specified in the parameters, the reader remembers the last-read value of this column at each repeat, so next time it only reads those rows with greater value in this column.
The second way of control depends on the target table has a writable column for a “last-processed” flag which must be a ‘date-time’ type and have an initial value NULL. If such a column is designed in the table, and specified in the parameters, at each read interval, the handler will first mark all new rows with the current timestamp (where new records initially have the column value as NULL), then retrieve these columns. After the reading, because the ‘flag column’ of these rows is marked with a timestamp and is no longer NULL, they won’t be read in the next read interval.
You may utilize either (or both) of the two ways for controlling repetitive reading. Note, such control can effectively read any data being added to the table, but it cannot indicate if any data is deleted from the target table.
Tip: for reading data from multiple database tables, consider using a database view to combine columns from these tables, and include an incremental ID field in the view for the reader to track newly added records.
Parameters
The parameter database-driver-type defines which underlying driver is used for accessing the database or the data file. Type “odbc” indicates the targeted data table is access via an installed ODBC data source; type “oledb” indicates the data table is accessible as an OLE DB “data provider” via an installed OLE DB driver; type “mssql” indicates the target table is accessible via a native SQL Server driver installed in the Windows OS.
Note for ODBC and OLE DB drivers, the 64-bits version is required, as this handler is targeted for 64-bits Windows platforms such as Windows 7/10 and Windows Server 2012 and above. Also, if your database requires some non-standard SQL syntax, such as square brackets [] in Microsoft SQL or back quotes “ for MySQL, you can supply a suffix to the driver parameter to indicate the type (“brand”) of the database that is under the ODBC or OLE DB layer, e.g. “oledb-mssql” for accessing MS Access via OLE DB, or “odbc-mysql” for accessing a MySQL database via ODBC.
The connection-string and table-name are required for locating and connecting to the database and data file, and for specifying the target table where the data will be written into.
The incremental-id-column and the read-complete-flagging-column parameters are used for reading only newly added data from the target table since the last read. If ‘incremental-id-column’ is specified, the maximum value of this column is saved in the LAST_READ_ROW_ID file in the [F.] folder of the node, so next read will use this stored value as the reference and only read data with new increased values. (Tip, deleting this file will reset the reading back from the beginning.)
The ‘read-complete-flagging-column’ parameter, if specified, is used for the handler to mark the records as ‘having been read’ by setting the column with the current timestamp. It is assumed this column is exclusively used by the handler for the reading control purpose. (Tip, setting the column value back to NULL will cause the reader handler to re-read the data once again .)
The columns’ data type of the target table is default to ‘varchar’. The ‘column-spec‘ parameter is for specifying columns with other data types, eg. ‘integer’, ‘decimal’, or ‘date-time’. When it’s a ‘date-time’ format, an additional ‘formatter’ pattern is also required for the handler to correctly parse the input value. For varchar data-type, the ‘column-spec’ parameter is also used for capping the maximum length of the input varchar data – the handler will automatically truncate the input data if the limit is exceeded.
Name | Remark |
database-driver-type | Valid values are “mssql/odbc/oledb/oledb-mssql/odbc-mssql/odbc-mysql”. |
connection-string | a database connection string specifying how to connect to the target database. |
table-name | target table name. |
column-names | OPTIONAL. specifying the expected column names in the tabular data and the target table. If not supplied, the columns will be retrieved from the target table in the database automatically. |
selection-filter | This parameter will be included as part of the constructed SELECT statement, so it must conform to SQL’s WHERE-clause standard. eg “OrderID in (12,13,15)” |
repeat-interval-sec | specify how frequent the repeating read is. |
incremental-id-column | Specifies a column for incrementable reading-control. It is expected the column has a unique incremental integer value for each new row. The integer primary key column of a table is normally suitable for this purpose. |
read-complete-flagging-column | Specifies the ‘flagging’ column on the target table for incremental reading-control. This column can be any name but is expected to be a ‘date-time’ column and writable by the reader. Newly added records are expected to have an initial value NULL in this column. |
column-spec | can have multiple instances, for specifying the data type for columns that are not varchar, or for restricting the max data length of a varchar column. |
Note, to prevent the “SQL injection” attack, certain SQL keywords are forbidden to be used as part of the “table-name” and the “column-names” parameters.
Config Example
The following example illustrates using the handler reading from a MySQL database table called “DischargeRegistry”. The repeat-interval-sec parameter specifies the reading repeats every 5 seconds
The effectively retrieved records will be determined by the following SQL query (MySQL syntax)-
SELECT `ID`,`Patient ID`,`Last Name`, ... ,`Discharge DT`
FROM DischargeRegistry
WHERE (Language='English')
AND ID > {value in the LAST_READ_ROW_ID file}
AND `Record Process DT` IS NULL
Note the example illustrates the use of ‘date-time’ fields formatting: eg. both the Dob and the Discharge DT are defined as ‘date-time’ data type as in the ‘column-spec’ parameters, with different output formatting patterns.
<Handler>GenericDatabaseReader</Handler>
<Parameters>
<Parameter>
<!-- specifies the underlying data-access method eg. OleDb, ODBC, or native MS-SQL -->
<Name>database-driver-type</Name>
<Value>odbc-mysql</Value> <!-- mssql/odbc/oledb/odbc-mssql/odbc-mysql -->
</Parameter>
<Parameter>
<!-- database connection details -->
<Name>connection-string</Name>
<Value>Provider=System.Data.Odbc;DSN=MySQL-DB-test;Uid=User1;Pwd=password1;</Value>
</Parameter>
<Parameter>
<Name>table-name</Name>
<Value>DischargeRegistry</Value>
</Parameter>
<Parameter>
<Name>column-names</Name>
<Value>ID,Patient ID,Last Name,First Name,DOB,Sex,Language,Ward ID,Attending Doctor ID,Attending Doctor Name,Discharge DT</Value>
</Parameter>
<!-- this will be added to the SQL's where-clause .. -->
<Parameter>
<Name>selection-filter</Name>
<Value>Language='English'</Value>
</Parameter>
<Parameter>
<Name>incremental-id-column</Name>
<Value>ID</Value>
</Parameter>
<Parameter>
<Name>read-complete-flagging-column</Name>
<Value>Record Process DT</Value>
</Parameter>
<Parameter>
<Name>repeat-interval-sec</Name>
<Value>5</Value>
</Parameter>
<Parameter>
<Name>column-spec</Name>
<Value>DOB;date-time;yyyyMMdd</Value>
</Parameter>
<Parameter>
<Name>column-spec</Name>
<Value>Discharge DT;date-time;yyyyMMdd HH:mm:ss</Value>
</Parameter>
</Parameters>
See Also
The Tabular SQL Database Writer is a high-performance, transactional database writer specifically for MS SQL Server.
The Generic Database Writer handler does the writing to a database or a data file, via ODBC and OLE DB.