Data Import

Home   « User Interface   Data Wrangling »

Basic introduction of data import features is displayed in the following video-tutorial. This tutorial shows the data import workflow in Stagraph. Imported data can be used immediately in your data visualizations or you can edit (preprocess) them.

Data import is obviously the most important feature of every program that is focused on work with data. Stagraph is a visual interface that is built on the top of the R Runtime. This means that if you can import your data to R Runtime, you can also work with them in the Stagraph. Overall, there are only two limitations.

First is the dataset size limit and the second limit is the data structure. You can work with datasets that can be loaded into your computer’s memory. In real practice (if you don’t work with Big Data), for 90 % of cases you will not reach the maximum dataset size limit. This limit can be solved in several ways (e.g. larger RAM, 64-bit R Runtime, in-database preprocessing, sampling, partial import, ...).

In the term of data structure, you can load input data to R Runtime in various forms. Stagraph (in 2.0 version) allows you to work with data that has the character of tables (rows and columns). Your data is automatically stored in the data.frame object. data.frame objects Stagraph recognizes and allows you to use them in a visual interface.

As mentioned, your data can be imported from practically any source, such as files, databases or web services. In the Stagraph, you can import your data using a built-in visual interface or through custom R scripts. Individual data import functions are described in the following subchapters.

Create Dataset

For the data import are used functions in the ribbon toolbar tab Home (Input Data group). Overall, there are 5 buttons prepared for the data import. Each of them can be used in a different case (data source type).

Using the Create function, you can prepare your dataset directly inside the program in built-in spreadsheet. Under the File button, you find options for importing data from three file types - Excel files, CSV files and DBF files. With the Database button, you can import data from various databases via an ODBC driver. Under the Rterm button, you find functions for importing data from the background-running R Terminal. You can load a sample dataset or maps (spatial data from maps package). Finally, using the Custom button, you can import any type of data (that are not directly supported by the visual interface) through the custom R scripts - e.g. JSON files, XML files, or data from external web services. You can find a number of examples on the Internet.

List of imported dataset you can see in the Project Panel. If you click on the selected dataset, a list of dataset variables (columns) is displayed in at the bottom panel part. In addition, in the document area you can display the Data Preview Document. In this window, you can display datasets preview. This window displays maximally the first 200 records (rows).

Datasets In Project Panel

As described above, imported datasets are listed in the Project Panel. If you click on the selected one, a list of variables is displayed at the bottom panel part. This list using the icon and font color to indicate the variable data type. Additional help information is displayed in tooltips if you focus your mouse cursor on the selected dataset or variable. For a dataset, the tooltip displays the amount of memory it uses, the number of variables (cols) and records (rows). In the case of the variables the tooltip displays the variable data type.

If you click on the selected dataset, three icons appears at the end of the line. Using the first one, you can execute the dataset in the background-running R Terminal. With the second icon you can do the same process, but the result will be displayed in the Data Preview Document. Finally, using the last icon, you can remove the dataset from the project (and also from the R runtime).

Additionally, if you click by right mouse button on the dataset, a contextual menu is displayed in which you find other useful functions.

Using the Fix function, you can edit selected dataset directly (manually) in the Data Editor. However, these edits are not saved within the project, so if you open the current project from the disk again, all your edits will not be retained. This editor serves only for temporary, instant and minor data edits.

If you click on the second context menu function - Data, an interactive table with your data will be displayed. This table is rendered in the browser. Its use is appropriate if you are working with smaller dataset (up to 1,000 rows). With larger dataset, its generating and interactive features will be slow.

This table contains several features, such as showing / hiding columns, adjusting the number of visible rows, table printing or data filtering / sorting by selected columns. Created table you can save and use in an online environment (copy / paste its source code).

If you want to explore larger dataset, use the third context menu function - Statistical Summary. This function also creates an interactive table. The difference is that the table does not contains primary data, but its statistical summation. Your variables are displayed in rows and their statistical values in columns, such as the number of records, median, mad, min, max values, range and others.

The third interactive data browsing feature is represented with an interactive Pivot Table. In displayed environment you can create different kinds of interactive tables, heatmaps and graphs. Its options are much clearer from the previous video-tutorial.

Using the following Delete menu item, you can remove selected dataset from your project. The deleted dataset is also removed from the background-running R Runtime. The last three functions in the contextual menu serves to copy the dataset in various forms.

With the Copy Dataset function you can create an identical copy of your dataset. This feature is useful if you have created a dataset that loads data from a file on disk.

Imagine the following situation. You have loaded data from CSV file. On this dataset are applied data wrangling functions. In the next step of your work you want to create the same dataset (identical data structure and the same edits), but with the second CSV file. In this case, it is faster and more efficient to create a copy of current dataset and only change the file name in the Properties Panel. All functions that you apply on the first dataset will be also used for the new one.

The function New Dataset Based on This serves for a different purpose. Imagine the import of very large dataset from database or web service. After the import you apply several data wrangling functions. Each time you click on the Execute button, all the data import and processing functions are executed. This can significantly slow down your work. For this reason, you can divide the data import and data processing into two dataset objects. In the first dataset, you define only the data import from a source (database or web service). Then you click on the New Dataset Based on This item in the contextual menu. A new dataset will be created as simple link to the first dataset. Following data wrangling functions you will apply to the second dataset. If you divide your project in this way, your work will be faster and more efficient - the raw data and processed data will be loaded in computer memory simultaneously.

The last function - Copy Script is used to generate a script in the R language that defines your dataset. This script includes the data import and all subsequent processing steps. Created scripts you can use outside the Stagraph in an environments that allows you to execute R scripts (such as R Console, R Studio, Visual Studio and many other applications that have R integration). Described function is available only in the Professional Edition of the program.

Finally, the last two functions you can find in the list of dataset variables. These functions (hidden in the contextual menu) are here from the “historical” reasons and in the following versions will be redesigned and better integrated with the Stagraph interface.

If you right-click by mouse on the selected variable, you will see a context menu containing two functions - Summary and Structure. With these features you can display summary information about variable and information about its structure. These informations are displayed in the output-textbox of the built-in R Console.

The first example shows the output of the Summary function applied on the "country_etc" variable.

The second example shows the output from the Structure function for the "country_etc" variable.

We described practically all the features of Project Panel (for the list of datasets). Additional functionality for working with datasets can be found in the Properties Panel. If you double-click on the selected dataset in Project Panel, its definition will appear in the Properties Panel, where you can continue to work.

Datasets In Properties Panel

In addition to Project Panel, you can continue to work with datasets in the Properties Panel. If you double-click on selected dataset in the Project Panel, its details will be displayed in the Properties Panel. This panel is divided into three sections. At the top is the Header containing the dataset name along with the three buttons. Below the header is a List of Functions that are applied to the dataset. The first item in list is always a data import function. Subsequently, this list can contain functions for data cleaning and data wrangling. If you select function from the list, its options (arguments) are displayed.

The following figure shows an example of a dataset that was created by a reference (data_terminal) to a sample dataset (diamonds) in the R Terminal. Finally, a preview of the created dataset is displayed in the Data Preview Document.

If you want to see other dataset (or an object from the Visuals group) in the Properties Panel, right-click on the dataset icon in the Panel Header. From displayed contextual menu you can choose the object that you want to display (and edit).

In addition to this functionality, you find three buttons in the Panel Header. Using the first one, you can execute the dataset in the background-running R Runtime. Using the second button, you can do the same process but the result will be displayed in the Data Preview Document.

Finally, using the last button - Save, you can export the created and pre-processed dataset locally to a disk in the form of CSV File (tab-separated CSV file). This feature is useful if you are using the Stagraph for data import, data cleaning and statistical processing.

As has been said, in the bottom part of the Properties Panel is possible to modify the individual functions parameters. In the following example, we can replace the sample dataset. Using the data_terminal function you can attach a dataset that you created manually through the built-in R Console, or you can refer to a sample dataset. In the function properties, you can often use features hidden in contextual menus. In the following illustration, the contextual menu offers a selection of three commonly used sample datasets (iris, diamond and mtcars). When you click on the context menu item, the selected dataset is attached.

Functions that are hidden in the contextual menu you can use also in the list of applied functions. For this list you can use two functions Create Copy and Copy Script. Using the Create Copy function, you can create an identical copy of selected function that will be applied to the dataset. With the Copy Script function you can copy the selected function in the form of R script.

In the Properties Panel you can apply data wrangling functions to your dataset. These functions you can add from the ribbon toolbar tab named Data. With these functions you can clean, edit, join, filter, reshape and combine your datasets.

Data wrangling functions are also accessible directly from the Properties Panel, if you right-click on empty space in the list of applied functions.

These functions are in detail described in the next chapter - Data Wrangling. This chapter explains an extensive options of the data pre-processing in Stagraph.

All data import and data wrangling functions are fully functional in the Free version. The Stagraph can be considered a full-featured free tool for data cleaning and data wrangling. You are not limited in data sources and dataset size.