---
author: "Simon Garnier"
title: "2 - Building a track table"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{2 - Building a track table}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

`trackdf` can handle multiple types of tracking data (in particular those 
generated by GPS units and video-tracking software) and multiple data frame 
classes (`base::data.frame`, `tibble::tibble`, and `data.table::data.table`). 
This is a design choice meant to accommodate the data processing pipelines of a 
maximum of users. It lets you use your favorite data manipulation paradigm (base 
`R`, `dplyr`/`tidyverse`, or `data.table`) while still standardizing the data 
format across studies and applications. 

A consequence of that versatility, however, is that building a "track table" 
(the name we give to the structure that will hold your tracking data) requires a 
little bit of extra work from you (but just a little bit). This vignette covers 
building a track table from raw data generated by automated video-tracking 
software and GPS collars, for instance. 

## 2.1 - Anatomy of a track table

At its core, a track table is just a wrapper around a data frame structure, as 
defined by one of the three main data frame classes in `R`: `base::data.frame`, 
`tibble::tibble`, and `data.table::data.table`. The choice of which data frame 
class is used underneath a track table is entirely your choice and depends on 
your preference for one or the other framework. `trackdf` will remember that 
choice and do its best to maintain it throughout your data analysis pipeline.

A track table is a specialized version of a data frame structure aimed at 
storing specifically tracking data, that is positions over time, of one or 
more individuals. In order to do that, `trackdf` imposes a few constraints on 
the construction of a track table over a traditional data frame. First, a track
table must have at least the 4 following named columns:

+ `id`: which contains the identity of the individual being tracked as character 
  strings;
+ `t`: which contains the time of each observation as date-time `POSIXct` 
  objects;
+ `x` and `y`: which contains the positions as a numeric values of the 
  observations along each of the axes of an Euclidean space (e.g., GPS 
  coordinates or the pixel coordinates outputted by video-tracking software);
+ `z`: an optional column similar to `x` and `y` that can be used in the case of
  3-dimensional trajectories. 

You can then add as many other columns as you want to store other data relevant 
to your work but these 4 columns (+ the optional `z` columns) are required in a
track table object.

In addition to these columns, a track table contains two additional attributes 
that are necessary for certain functions of the package: 

+ `proj`: which contains information about coordinate reference system in which 
  the coordinates are projected. This is mostly useful for geographic data as 
  captured by GPS units, for instance. `trackdf` can use that information to 
  automatically reproject the data into other coordinate reference systems, for 
  instance for working with GIS data. For video-tracking data and other tracking 
  systems that do not output geographic data, this can set to `NA`. 
+ `type`: which contains information about the class of data frame stored in the 
  track table object. This is mostly required for maintaining the data frame 
  class when the track table object is manipulated using `dplyr`'s functions. 
  It's mostly irrelevant from a user's point of view. 
  
Sounds complicated? Don't worry, `trackdf` provides a function to build track
tables with just a little bit of input from you. See the rest of the vignette 
below. 

## 2.2 - Building a track table from video-tracking data

Most video-tracking software generate outputs with information about the 
identity of each tracked individual, their position in some form of Euclidean 
space (using pixel coordinates or coordinates relative to the dimensions of the 
experimental setup), and the time of each observation (e.g., the frame number in 
a video). They can also contain other forms of information relevant to the work
and we will also see here how to import them into a track table.

First, let's load some data that was generated using the 
[`trackR`](https://github.com/swarm-lab/trackdf) video-tracking software: 

```{r message=FALSE, warning=FALSE}
raw <- read.csv(system.file("extdata/video/01.csv", package = "trackdf"))
print(raw, max = 10 * ncol(raw))
```

This data frame contains 8 columns. The positions are stored in the `x` and `y` 
columns as pixel coordinates. Time is store in the `frame` column as a frame 
number of the video the data was collected from. The identity of each tracked 
individual is stored in `track_fixed` (the `track` column contains the 
identities before manual inspection and correction; `id` can be ignored for the 
purpose of this tutorial). 

From this raw data, you can create a track table using the `track` function as 
follows: 

```{r}
library(trackdf)

tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed)
print(tt, max = 10 * ncol(tt))
```

`track` outputs a few warnings, all related to the time component that we 
provided it. Indeed, we provided it with frame numbers that `track` doesn't know
how to convert to date-time `POSIXct` objects and, therefore, defaulted to using
now has the start of the experiment, UTC as the time zone, and 1 second as the 
time between two consecutive observations. We can, however, help `track` by 
provided the missing information into the `origin` (start of the experiment), 
`tz` (the time zone), and `period` (time between two successive observations) 
parameter of the function: 

```{r}
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed, 
            origin = "2019-03-24 12:55:23", 
            period = "0.04S", # 1/25 of a second
            tz = "America/New_York")
print(tt, max = 10 * ncol(tt))
```

If you would like to include in the track table some of the additional data
contained in your raw data, it is as simple as adding extra columns when 
creating data frames. For instance, let's include the `ignore` data from the raw
data set: 

```{r}
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed, 
            ignore = raw$ignore,
            origin = "2019-03-24 12:55:23", 
            period = "0.04S", # 1/25 of a second
            tz = "America/New_York")
print(tt, max = 10 * ncol(tt))
```

Finally, `track` default to using `base::data.frame` as its data frame class for
storing the data. If you prefer to work with `tibble::tibble` or 
`data.table::data.table`, you can specify this in the `track` function as 
follows.

For `tibble::tibble`: 

```{r}
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed, 
            ignore = raw$ignore,
            origin = "2019-03-24 12:55:23", 
            period = "0.04S", # 1/25 of a second
            tz = "America/New_York",
            table = "tbl")
print(tt)
```

For `data.table::data.table`: 

```{r}
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed, 
            ignore = raw$ignore,
            origin = "2019-03-24 12:55:23", 
            period = "0.04S", # 1/25 of a second
            tz = "America/New_York",
            table = "dt")
print(tt)
```

---

## 2.3 - Building a track table from GPS data

Building a track table from geographic data follows similar principles, except 
that `track` also expect to receive information about the coordinate reference 
system the data is using. You can pass that information to `track` using the 
`proj` parameter of the function. But first, let's load some data that was 
generated by a GPS collar worn by a goat in Namibia: 

```{r message=FALSE, warning=FALSE}
raw <- read.csv(system.file("extdata/gps/02.csv", package = "trackdf"))
print(raw, max = 10 * ncol(raw))
```

`track` uses `sf::st_crs` to interpret information about coordinate reference 
systems. Therefore, you any format accepted by `sf::st_crs` to specify the 
coordinate reference system can be used with `track`. For data generated using 
GPS units, the character string "+proj=longlat" is often all that's needed. 

We can then create our GPS-based track table as follows:

```{r paged.print=FALSE}
tt <- track(x = raw$lon, y = raw$lat, t = paste(raw$date, raw$time), id = 1,  
                proj = "+proj=longlat", tz = "Africa/Windhoek")
print(tt, max = 10 * ncol(tt))
```

Note that because our raw data already contains dates and times of the 
observations, we can simply combine them with `paste` and pass the result to 
`track` that will interpret them automatically. 

Everything else works similarly to what was shown in the previous section about
video-tracking data. The tutorial about manipulating data stored in a track 
table is provided in a separate vignette.