Geometry Layers




Introduction

This chapter brings you the specification of geometric layers that you can add to the chart. In Stagraph 1.1 you can use 41 geometric layers. The following text describes how you can add them to data visualization, what aestetic properties you can define and how to setup positional and other specific properties.

Each geometry layer forms a relatively simple object with several adjustable properties (arguments). Combining these geometric layers, you can create very impressive data visualization relative quickly and easily.

In addition to geometric layers, program allows you add also statistical layers (more in chapter Statistical Layers). If you want to add a new layer into plot, display the ribbon toolbar tab, named Layers.

Under the four buttons in the Plot Geometries group are logically divided geometry layers with which you can work. The second option how to add objects into your chart is by choose from contextual menu in the Plot panel.

All items in contextual menu are displayed (oppsed to the ribbon toolbar) in alphabetical order. It is up to you which method you preffer (context menu or ribbon toolbar). I prefer the context menu (besause of speed).

As has been said, geometries are in the ribbon toolbar divided into four logical groups. The first gourp are geometries under the 1D button. These objects required for basic rendering only one position aesthetic (x). From this menu you can add geometries for density plot, dotplot, frequency plot, histogram and barplot.

The second group (under the 2D button) forms geometries that are based on two position aesthetics (x and y). These are in menu divided by the character of its position scales (dicrete or continuous). After click, selected geometry is added into plot that is currently displayed in the Plot panel. Using these geometries it is possible to create scatter plot, you can add labels, quantile regression lines, boxplots, violins, bars, lines, areas, error bars or map layers.

The third group (3D) consists of three geometries that must be defined by three aesthetics (x, y, z or by x, y and color). With these geometries you can create contour plot, raster plot or tile plot.

The last group of geometries is hidden under the Primitives button. Here are simple (help) geometries with which you can enrich your data visualization, such as AB lines, curves, horizontal and vertical lines, polygon, path, line segment or ribbon geometry.

Geometries that you put into the visualization are shown in the top part of the Plot panel. If you select one of these geometries by mouse, its properties will be displayed at the bottom panel part.

The properties of geometry objects are divided into three sections. In the first section (Data / Mapping) you can define dataset that will be used for geometry and individual customizable aesthetics. In the second section (Stat / Position) you can setup a statistical transformation of dataset and edit position properties. Finally, in the last section (Properties) are listed other settings for geometry display modification.

If you add several geometries into plot, they are drawn in the order as they were inserted. If you want to change their rendering order, you can do it using buttons that appears when you select some geometry. It is also possible to remove geometry using the Remove button. If you want to deactivate object in plot (without removing), use the check-box before the object icon.

The program (in version 2.0) includes more than 80 types of geometries that you can use for your data visualizations.

Geometry aesthetics

If you select in the Plot panel one of inserted geometry layer, in the bottom panel part will show its properties that you can setup. In the first section, named Data / Mapping, you can choose a dataset that will be used and then you can define several aesthetic properties.

List of available aesthetic properties is slightly different for each geometry layer. More complex geometry contains more aesthetic properties. Individual aesthetic properties have identical options across geometries and its description is content of separate chapter, where you can find all the necessary information for their full utilization. Examples of their setup can be found also in chapters describing individual geometries.

Position Property

In the second section (Stat / Position) is a property position that is described in this chapter. Position property can be used to fine tune positioning of objects to achieve effects like dodging, jittering or stacking.

Overall, you can define in Stagraph seven types of position options. Each geometry layer has defined some position by default. For example, geom_point have set this property to identity, which means that all points are displayed at exactly defined position. In contrast, geom_bar has defaultly set position to stack and bars are stacked each other.

If you change this property using the Position combo-box, help settings displays for selected option. Only the identity position does not have any additional properties.

As first, the identity position will be described. This position is the default for multiple geometries, such as geom_point or geom_line. The values are rendered as defined by positional aesthetic properties (x, y).

The second type of position is jitter. As an example, we can use the previous chart, where the position property is set to identity and more points are not visible because they overlap. If we change the position to jitter, geometric objects are a little bit moved (randomly), so they do not overlap. For this position option you can set two properties (width, height) that defines maximum shift in the range from 0 to 1. A value of 1 means the jitter values will occupy 100% of the implied bins. If you enter 0, the geometry will not move in a given direction. An example is shown in the following figure.

Another two position properties stack and fill overlaps objects on top of aech another. The fill type of positioning stacks bars standardizes each stack to have constant height. Example of these setting is displayed in following two figures. In addition, if you use these options, you can setup two properties – vjust and reverse.

The vjust property is useful if you use geoms that are used for position (geom_point or geom_line) and not dimension (geom_bar, geom_area). You can set this property as numerical value from 0 to 1. If you set this value to 0, objects will be aligned with the bottom. At value 0.5 will be aligned with middle and at 1 (default setting) for the top.

Complete documentation for each type of position you can see if you display integrated R Console and write the command that begins with question mark, followed with “position_” and ends with the name of the position option. For example:

?position_stack

Another type of position that you can select is dodge. This position preservers the vertical position of gemometry objects while adjusting the horizontal position. In this way you can see next to each other several geometries that would be otherwise overlapped (e.g. boxplots). This position is often used when the X-axis shows categorical values. An example of such position setting is shown in the following figure.

For each category on the X-axis are shown two boxplots that are based on the color aesthetic property, mapped to dataset variable cut. For this position, you can define help property width as numerical value in the range from 0 to 1. This parameter defines the width of area in which will be your geometries dodged. At 1, geometries occupy the entire width of available area. If you set 0.5, geometries consume only half width and will be visually separated (for each X category).

Next position option – jitterdodge combines jitter and dodge position into one. Imagine the viewing points on the categorical X axis, where points are dividided according to another categorical variable (e.g. color aesthetic). If you want to divide them in the space (dodge), while ensuring that they do not overlap (jitter), you will use exactly this position type. An example is given in the following figure.

For this position option, you can adjust three help properties. Properties jitter width and jitter height serves for definition of jitter degree in x and y direction. The range for these values is defined as in the case of jitter position. For the dodge amount in the x direction (default 0.75) we use dodge width property. This position option is frequently used for display of boxplots (posision_dodge) with all measurements in the form of points (position_jitterdodge).

Finally, the last type of position stayed – nudge. This type is used if you want to use geom_text or geom_label along with geom_point geometry. Frequently we need to move little our text labels for resolve the geometries overlap. In this case, we can choose the nudge position and using the help properties x and y we can set exact shift of selected geometry layer in one or both directions. The following figure shows an example of this position option. On the X-axis, we used integer variable with point geometry. These points we shift slightly to the right from their real integer position.

The best way how to understand these position options (with all help properties) is to use them at work. I recommend using several geometries, such as geom_point, geom_bar, geom_text or geom_boxplot and testing their functionality on real examples. Remember that every geometry layer has defaultly set one of these position options.

In the following section, we will introduce statistical transformations of data before the final visualization.

Statistical Transformation

Another very important property for data visualization is selected statistical transformation. This property is located in the Plot Panel, section Stat / Position. Basically this property determines transformation of input data into the form needed for proper visualization.

Every geometry layer has predefined some statistical transformation. The default setting for several geometries is identity. If your geometry has choosen this option, data are used and displayed without any statical preprocessing as they are stored in the dataset. An example would be geom_point. In this case, in the final data visualization are used as position aesthetics variables from the dataset.

Other geometries have predefined different statistical transformations. For example, geom_bar has the count option choosed. This means that your data will be displayed as count (Y axis) of unique (or binned) X values. Overall, the program offers 32 statistical transformations.

Very important functionality of the program is that you can change the predefined statistical transformation. In this way, you can use any geometry and combine them into your desired shape or you can create entirely new (atypical) data visualization. If the statistical transformation is set (other than identity), new varaibles are created, based on used aesthetical properties.

For example, if the statistical transformation is set to count, the program calculates a new variable named count, which will include a number of values for each unique X variable. These automatically generated variables are named computed values and you can work with them – you can map them to aesthetic properties. More about their practival use is in chapter Aesthetic Properties – Intro.

The following three pictures show the geom_bar geometry with different statistical transformation settings. By default, geom_bar has set the statistical transformation to count. In this case, program transforms used data into two varaibles – unique x values (or bins) and number of its occurrence in dataset.

If you change statistical transformation to density, program calculates for defined positional aesthetic the kernel density estimate.

If you set the statistical transformation to the bin, program calculates number of observations in each bin and displays its coutns as bars. An example is shown in the following figure. This statistical transformation is useful if you have on the X-axis continual data. For disrete data is more appropriate the count transformation.

In the second example we will shiw how you can use one statistical transformation in different geometry layers. As an example can be used geom_histogram. This geometry is basically geom_bar layer with bin as the default stastitical transformation. This transformation split continual X scale into several segments and in these segments counts all observations – classical histogram. An example is displayed in the following figure.

What if you want to see calculated results using different geometries? An example is shown in the following figure. In this case, histogram values are displayed using three geometries – geom_line, geom_step and geom_point. The only thing you must to do is to map in each geometry layer the x position aesthetic to Sepal_Length variable and set the statistical transformation to bin. Subsequently program calculates base values for histogram and display them parallelly using multiple geometries.

In this way, you can combine together statistical transformations and geometries. The result can be very unusual and impressive visualizations of your data. In the following text are briefly introduced all statistical transformations that can be used in relation to geometries.

Other Properties

The first and second sections of layer properites (in Plot panel) are for dataset selection, easthetics properties mapping, statistical transformation choosing and to adjust the positioning properties. The last section (named Properties) includes two types of settings. The last three items are the same for all geometries. All other properties included in this section are specific to the selected geometry. For example, geom_point does not include any other specific settings and therefore in the Properties section are only three basic items. Conversely, geom_smooth contains in this section three other properties that are specific only for this geometry.

The first property that occurs in all geometries is a check-box named remove missing. If this check-box is set to FALSE, missing values are removed with a warning (in the Log panel). If TRUE, missing values are silently removed.

The second property show in legend defines whether the given layer will be included in legend. For example, in the previous figure the fill color scale is not displayed in legend, because we set the property to FALSE.

Finaly the check-box Inherit AES remains. If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn’t inherit behaviour from the default plot specification, e.g. borders.

Other properties that are shown in this section allow you to set specific properties of selected geometry layer. For example, in the previous figure we set the width property to used-defined value. Several specific properties has, for example, geom_smooth layer, which is displayed in the following figure. geom_smooth includes adjustable properties method, fomula and confidence interval. More about these specific properties will be in the following chapters, along with real examples of use.