data_database
See also:
data_excel,
data_csv
If you use large amounts of data, you probably have them stored in databases on your computer or on an external server. Stagraph allows you to import and use this data via the ODBC connection, found behind the data_database function.
Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code.
ODBC accomplishes DBMS independence by using an ODBC driver as a translation layer between the application and the DBMS. The application uses ODBC functions through an ODBC driver manager with which it is linked, and the driver passes the query to the DBMS. Any ODBC-compliant application can access any DBMS for which a driver is installed. Drivers exist for all major DBMSs, many other data sources like address-book-systems and Microsoft Excel, and even for text or comma-separated values (CSV) files.
The following video shows capabilities of the data_database function.
To import data from databases, the odbcDriverConnect and sqlQuery function (from RODBC package) are used. Defining connection and retrieving data from databases is relatively simple. You need to set two parameters - connection string and sql query.
Function Properties
con, Connection String |
character string that defines the connection to database (e.g. ODBC driver, server ip address, user name or password). See your ODBC documentation for the format. |
Sql, SQL Code |
any valid SQL statement that returns data from connected database. |
You must have the 32-bit drivers when using 32-bit R and the 64-bit drivers when using 64-bit R. Otherwise there will be a cryptic message about a driver not being found.
File Database
How to load data from the database is shown on the MS Access example (file database). To import the data, click the Database button in the ribbon toolbar tab.
Clicking this button will display a dialog for the connection string and the SQL statement definition. In this dialog you define three parameters - Dataset Name, Connection String and SQL Code. At first define the Dataset Name - name of the dataset within the Stagraph project. In the following example, we used the file_DB name.
In the next step setup the Connection String. This is an ODBC connection string that allows you to connect Stagraph to the database. The connection string is entered in a text-box that supports code autocompletion and syntax highlighting. In addition to this enhancement, you can also use the features located on the right side of the text-box - File Name, Samples and Drivers.
Samples offer a menu with examples of connection strings for the 22 most used database systems. After clicking on selected item, the example is inserted into the text-box. In the following example, we added a connection string for the MS Access database.
The Driver is in the text-box always highlighted by the red color. It may happen that you have installed another driver on your computer. List of ODBC Drivers installed on your computer you find under the Drivers item. They are divided into two groups - 32-bit and 64-bit drivers. If you select the driver name in the connection string, you can overwrite it by clicking on selected item in the Drivers menu.
For drivers, be sure to use 32-bit drives with the 32-bit R runtime and 64-bit drivers with the 64-bit R runtime. Stagraph defaultly runs 32-bit R runtime on 32-bit computers and vice versa 64-bit R runtime on 64-bit computers. To use 32-bit drivers on a 64-bit computer you need to run 32-bit R runtime with Stagraph. You can set this in Ribbon Main Menu - Properties - R Connection. Change the R connection from:
"…\Program Files\R4Stagraph\bin\x64\Rterm.exe"
to:
"…\Program Files\R4Stagraph\bin\i386\Rterm.exe".
You can use 32-bit drivers after you restart the Stagraph.
If you connect to a file database, you must define the file path in connection string. In our example, this is the value after the Dbq argument. If you select the sample path in the connection string ("C:/mydatabase.mdb") and press the File Name, the Open File Dialog will appear to help you find the file you want to use. This procedure also correctly format the file path (slash mark).
The R Environment specifics is that the slash mark ("/") is used as the directories separator, instead of the backslash ("\"). If you paste the file name manually, you must also change these separators. If you define it using the Open button, separators are set automatically in correct form.
After entering the connection string, choosing installed driver and setting path to the file, you can type the SQL Code (in the last text-box) to obtain the required data. Like in the previous text-box, syntax highlighting and code autocompletion for SQL is supported. Click the Apply button to load the required data into Stagraph. Now you can work with data retrieved from selected database.
Appart from Apply and Cancel buttons, you also find two more buttons in the dialog. Using the first one - Save, you can store defined connection string and sql script as a file on the computer disk for later use. By the second button - Open, you can import your definition of the connection string into the dialog. This feature is useful when you work with selected databases frequently. But beware of your user name and password. The program export the connection in a simple readable file and anyone can get your credentials if have access to this file. Therefore, delete or replace them before saving.
Another option how to repeatedly use a defined database connection across multiple projects is to use the Partial Import feature (Ribbon Main Menu Button - Project - Partial Import). Using this feature you can copy dataset definition from selected project file to the current one.
Server Database
Importing data from server-type databases is similarly easy. Basically, the only difference is that you do not define the database location as path in computer directory structure but as using server IP Address. All other settings remain equal.
In the following example, we create a dataset named server_DB. We will create this dataset by connecting to the PostgreSQL database. From this database, through the SQL statement, we get all the data from the analyz_vody_chem table.
Clicking the Apply button your data will be loaded and created dataset will be listed in the Project Panel. All variables are displayed in the bottom part of the panel.
If you double-click on the dataset in Project Panel, it will appear in the Properties Panel. In this panel, you can later change the database connection properties. If you click on the data_database item, two adjustable parameters - con and SQL are displayed.
Parameter con contains the connection string that you use for connection to the database. The SQL parameter contains the SQL script that loads required data from the database. You can edit these two properties.
Clicking the right mouse button on the SQL text-box will reveal the option to view an external SQL script editor that includes features such as syntax highlighting and code autocompletion. After pressing the Apply button, the edited script is inserted back into the text-box.
Using the data_database function, you can get data from different types of databases using the ODBC connection. To use this connection, you need to have an ODBC driver installed on your computer. You can find the list of installed drivers in the input dialog. Following connection and data retrieval (via SQL script) is a relatively fast and simple task.