In statistics, data is the starting point for any analysis. It is therefore essential to know how to master operations such as importing and exporting data, changing types, identifying individuals with missing values, concatenating factor levels, etc. These different notions are therefore presented in this sheet, which aims to be both concise and satisfactory in practice. We focus on the dplyr package, an up-to-date tool to manipulate data.

1. Reading data from files

Functions read.table and read.csv allows to import data from .txt or .csv files.

path <- file.path("../DATA", "piscines.csv") #first: directory, second: file
piscines <- read.csv(path)
class(piscines)
summary(piscines)

The are many important options in read.csv:

  • sep: the field separation character
  • dec: the character used for decimal points
  • header: a logical value indicating whether the file contains the names of the variables as its first line
  • row.names: a vector of row names (to identify indivuals if needed)
  • na.strings: a character vector of strings which are to be interpreted as NA values.

Exercise 1

  1. Import the dataset in file mydata.csv.

  2. Import correctly the dataset in file mydata.csv (use sep, dec and row.names)

  3. Import the dataset mydata2.csv

  4. This dataset contains missing data (collected with a dot). We need to use na.strings to take into account of this situation. Import correctly this dataset.

  5. Call the levels of sex: woman and man (use levels).

readr package makes it easy to read many types of rectangular data, including csv, tsv and fixed width files. It uses read_csv and read_table instead of read.csv and read.table. We can also use this package with the menu Import Dataset of Rstudio.

Exercice 2 (Combine tables)

We consider the following tables (in tibble format which needs dplyr or tidyverse package):

library(tidyverse)
df1 <- tibble(name=c("Mary","Peter","John","July"),age=c(18,25,21,43))
df2 <- tibble(name=c("Zac","Julian"),age=c(23,48))
df3 <- tibble(size=c(154,178,182,134,142),name1=c("Peter","Mary","July","John","stef"))
df1
df2
df3

We propose to combine these tables with some of the “join functions” of the tidyverse package (left_join, full_join for instance). Look at the cheat sheet Data transformation with dplyr (help+cheatsheets+…)

  1. Join df1 with df2 with bind_rows and compute the mean of the variable age. We call df the new data table.

  2. Join df with df3 with full_join

  3. Do the same with inner_join

  4. Explain the differences between full_join and inner_join

Exercise 3

Read the piscine.csv file with readr (use read_csv or click on Import Dataset)

2. dplyr package

dplyr is a powerful R-package to transform and summarize tabular data with rows and columns. We can find information here or in this cheatsheet.

We can manipulate data with classical matrix operations. For instance, we can obtain Longitude and Latitude for swimming pools with Longitude more than 153 with

piscines[piscines$Longitude>153,c("Longitude","Latitude")]

dplyr offers a more friendly syntax

library(tidyverse) #library(dplyr)
piscines %>% select(Longitude,Latitude) %>% filter(Longitude>153)

Code is more efficient and easier to read.

dplyr contains a grammar with the following verbs:

  • select(): select columns (variables)
  • filter(): filter rows (individuals)
  • arrange(): re-order or arrange rows
  • mutate(): create new columns (new variables)
  • summarise(): summarise values (compute statistical summaries)
  • group_by(): allows for group operations in the “split-apply-combine” concept

2.1 select() verb

It allows to select variables (columns):

select(df, VAR1, VAR2, ...)

For instance

coord <- select(piscines, Latitude, Longitude)
head(piscines, n=2)
head(coord, n=2)

We can use helper functions (begins_with, end_with, contains, matches) for more complex selections based on the name of the variables

coord <- select(piscines, ends_with("tude"))
head(coord, n=2)

2.2 mutate() verb

It allows to create new variables in the dataset:

mutate(df, NEW.VAR = expression(VAR1, VAR2, ...))

For instance

df <- mutate(piscines, phrase=paste("Swimming pool", Name, "is located at the address", Address))
select(df,phrase)

We can create many variables

mutate(piscines,
       phrase = paste("Swimming pool", Name, "is located at the address", Address),
       unused = Longitude + Latitude
)

2.3 filter() verb

It allows to select individuals (rows):

filter(df, TEST)

For instance

p1 <- filter(piscines, Longitude>153.02)
select(p1,Longitude)

or (we select swimming pool without “Pool” in the name)

df <- filter(piscines, !grepl("Pool", Name))
select(df,Name)

or (we select swimming pool with longitude more than 153.02 and latitude less than -27.488)

p2 <- filter(piscines, Longitude>153.02 | Latitude < -27.488)
p2 <- select(p2, Longitude, Latitude)
p2

We use slice to select individuals by index:

piscines %>% slice(5:8)

2.4 arrange() verb

It allows to sort a dataset according to a variable:

arrange(df, VAR) #increasing sort

or

arrange(df, desc(VAR)) #decreasing sort

For instance

arrange(piscines, Longitude)

or

arrange(piscines, desc(Longitude))

2.5 summarise() verb

More complex… It allows to define new datasets from the original dataset. New dataset often includes statistical summaries for the original dataset

  1. mean() ;
  2. median() ;
  3. IQR() ;
  4. var().

For instance

summarise(piscines,
          mean_long = mean(Longitude),
          med_lat = median(Latitude),
          min_lat = min(Latitude),
          sum_long = sum(Longitude)
)

You can also look at summarise_all, summarise_at (help(summarise_all)). dplyr also proposes the following functions (very useful in statistics):

  1. n(): number of lines (individuals of the dataset)
  2. n_distinct(): number of distinct elements of a vector
  3. fisrt() and last(): first and last element of a vector

For instance, we obtain the number of swimming pool in the dataset ant the longitude of the last swimming pool in the dataset with

summarise(piscines,n())
summarise(piscines,last(Longitude))

2.6 Operations on the verbs

We can of course concatenate verbs. For instance, we obtain the names of the swimming pool with the higher Longitude with:

p1 <- arrange(piscines,desc(Longitude)) #decreasing sort
summarise(p1,first(Name)) #extract the first

The pipe operator %>% makes the code more readable:

piscines %>% arrange(desc(Longitude)) %>% summarise(first(Name))

2.7 Group data with ‘Group_by’

group_by allows to apply operation for group of data. For instance, we want to compute mean longitudes for swimming pools of High and Low latitude (it does not make sense!). We first add a variable lat_disc which allows to discern high and low latitudes.

lat_mean <- summarise(piscines,mean(Latitude))
pisc1 <- mutate(piscines,lat_dis=factor(Latitude>as.numeric(lat_mean)))
levels(pisc1$lat_dis) <- c("Low","High")

We can now compute with group_by the mean longitudes for the 2 groups

group_by(pisc1,lat_dis) %>% summarise(mean_long=mean(Longitude))

Exercise 3

We consider the iris dataset

data(iris)

Answer to the following questions with dplyr

  1. Select the variables Petal.Width and Species

  2. Select individuals from Versicolor and Virginica groups (you can use symbol | for condition or)

  3. Calculate the number of setosa with summarise

  4. Calculate the mean of Petal Width for the versicolor specie

  5. Add in the dataset the variable Sum_Petal which contains the sum of Petal.Width and Sepal.Width

  6. Calculate the mean and the variance of Sepal.Length for each Species (use group_by)

Exercise 4

We consider the hflights dataset which contains informations about flights departing from Houston airports IAH (George Bush Intercontinental) and HOU (Houston Hobby):

library(hflights)
hflights <- tbl_df(hflights)

Variable Unique Carrier provides a code which identify the carrier:

lut1 <- c("AA" = "American", "AS" = "Alaska", "B6" = "JetBlue", "CO" = "Continental",
         "DL" = "Delta", "OO" = "SkyWest", "UA" = "United", "US" = "US_Airways", 
         "WN" = "Southwest", "EV" = "Atlantic_Southeast", "F9" = "Frontier", 
         "FL" = "AirTran", "MQ" = "American_Eagle", "XE" = "ExpressJet", "YV" = "Mesa")

We can also specify the variable CancellationCode as follows:

lut2 <- c("A" = "carrier", "B" = "weather", "C" = "FFA", "D" = "security", "E" = "not cancelled")

We change the table hflights to better explain variables Unique Carrier et CancellationCode.

hflights1 <- hflights
hflights1$UniqueCarrier <- lut1[hflights1$UniqueCarrier]
hflights1$CancellationCode[hflights1$CancellationCode==""] <- "Z"
hflights1$CancellationCode <- lut2[hflights1$CancellationCode]

From now on, we work with the table hflights1.

  1. Use several different ways to select variables from Origin to Cancelled

  2. Select variables DepTime, ArrTime, ActualElapsedTime, AirTime, ArrDelay and DepDelay. Use helper function contains().

  3. Add a variable ActualGroundTime which corresponds to ActualElapsedTime minus AirTime

  4. Add a variable AverageSpeed (=Distance/AirTime) and order the dataset according to this variable

  5. Select flights to JFK

  6. Calculate the number of fligths to JFK

  7. Create a summary of hflights1 which contains:
    • n : the total number of flights;
    • n_dest: the total number of destinations;
    • n_carrier : the total numbers of carriers.
  8. Create a summary of hflights1 from the carrier American which contains
    • the total number of flights ;
    • the total number of cancelled flights ;
    • the mean of the variable ArrDelay (be carefull with the NA).
  9. Calculate for each carrier
    • the total number of flights
    • the mean of the variable AirTime
  10. Order the carriers according to the mean of departure delays

Exercise 5

We consider datasets about the 4 tennis major tournaments in 2013. Dataset as well as description of the variables are available at https://archive.ics.uci.edu/ml/datasets/Tennis+Major+Tournament+Match+Statistics.

We first focus on the men tournaments of Roland Garros.

  1. Import the dataset.

  2. Display the name of Federer’s opponents.

  3. Display the names of the semi-finalists.

  4. How many points were played on average per match? It will be necessary to add in the table a variable corresponding to the number of points of each match (verb mutate).

  5. Calculate the average number of aces per match.

  6. Calculate the average number of aces per match for each round of the tournament.

  7. Calculate the total number of double faults in the tournament.

  8. Import the dataset which contains results of Wimbledon (always for men).

  9. Merge the 2 datasets. Add a variable Tournaments which indicates the Tournament (RG or WIMB). You can use bind_rows with the option .id.

  10. Display the match or Federer for each tournament.

  11. Make of comparison of the average number of aces per match for each round of the tournament between Roland Garros and Wimbledon.

LS0tCnRpdGxlOiAiTWFuaXB1bGF0aW5nIGRhdGEiCm91dHB1dDogCiAgICBodG1sX25vdGVib29rOgogICAgICB0b2M6IHllcwogICAgICB0b2NfZmxvYXQ6IHllcwotLS0KCgoKCkluIHN0YXRpc3RpY3MsIGRhdGEgaXMgdGhlIHN0YXJ0aW5nIHBvaW50IGZvciBhbnkgYW5hbHlzaXMuIEl0IGlzIHRoZXJlZm9yZSBlc3NlbnRpYWwgdG8ga25vdyBob3cgdG8gbWFzdGVyIG9wZXJhdGlvbnMgc3VjaCBhcyBpbXBvcnRpbmcgYW5kIGV4cG9ydGluZyBkYXRhLCBjaGFuZ2luZyB0eXBlcywgaWRlbnRpZnlpbmcgaW5kaXZpZHVhbHMgd2l0aCBtaXNzaW5nIHZhbHVlcywgY29uY2F0ZW5hdGluZyBmYWN0b3IgbGV2ZWxzLCBldGMuIFRoZXNlIGRpZmZlcmVudCBub3Rpb25zIGFyZSB0aGVyZWZvcmUgcHJlc2VudGVkIGluIHRoaXMgc2hlZXQsIHdoaWNoIGFpbXMgdG8gYmUgYm90aCBjb25jaXNlIGFuZCBzYXRpc2ZhY3RvcnkgaW4gcHJhY3RpY2UuIFdlIGZvY3VzIG9uIHRoZSAqKmRwbHlyKiogcGFja2FnZSwgYW4gdXAtdG8tZGF0ZSB0b29sIHRvIG1hbmlwdWxhdGUgZGF0YS4gCgoKIyAxLiBSZWFkaW5nIGRhdGEgZnJvbSBmaWxlcwoKRnVuY3Rpb25zICoqcmVhZC50YWJsZSoqIGFuZCAqKnJlYWQuY3N2KiogYWxsb3dzIHRvIGltcG9ydCBkYXRhIGZyb20gKi50eHQqIG9yICouY3N2KiBmaWxlcy4KCgoKCmBgYHtyfQpwYXRoIDwtIGZpbGUucGF0aCgiLi4vREFUQSIsICJwaXNjaW5lcy5jc3YiKSAjZmlyc3Q6IGRpcmVjdG9yeSwgc2Vjb25kOiBmaWxlCnBpc2NpbmVzIDwtIHJlYWQuY3N2KHBhdGgpCmNsYXNzKHBpc2NpbmVzKQpzdW1tYXJ5KHBpc2NpbmVzKQpgYGAKVGhlIGFyZSBtYW55IGltcG9ydGFudCBvcHRpb25zIGluICoqcmVhZC5jc3YqKjoKCiogKipzZXAqKjogdGhlIGZpZWxkIHNlcGFyYXRpb24gY2hhcmFjdGVyCiogKipkZWMqKjogdGhlIGNoYXJhY3RlciB1c2VkIGZvciBkZWNpbWFsIHBvaW50cwoqICoqaGVhZGVyKio6IGEgbG9naWNhbCB2YWx1ZSBpbmRpY2F0aW5nIHdoZXRoZXIgdGhlIGZpbGUgY29udGFpbnMgdGhlIG5hbWVzIG9mIHRoZSB2YXJpYWJsZXMgYXMgaXRzIGZpcnN0IGxpbmUKKiAqKnJvdy5uYW1lcyoqOiBhIHZlY3RvciBvZiByb3cgbmFtZXMgKHRvIGlkZW50aWZ5IGluZGl2dWFscyBpZiBuZWVkZWQpCiogKipuYS5zdHJpbmdzKio6IGEgY2hhcmFjdGVyIHZlY3RvciBvZiBzdHJpbmdzIHdoaWNoIGFyZSB0byBiZSBpbnRlcnByZXRlZCBhcyBOQSB2YWx1ZXMuCiogLi4uCgojIyMgRXhlcmNpc2UgMQoKMS4gSW1wb3J0IHRoZSBkYXRhc2V0IGluIGZpbGUgKm15ZGF0YS5jc3YqLgoKCjIuIEltcG9ydCAqKmNvcnJlY3RseSoqIHRoZSBkYXRhc2V0IGluIGZpbGUgKm15ZGF0YS5jc3YqICh1c2UgKnNlcCosICpkZWMqIGFuZCAqcm93Lm5hbWVzKikKCjMuIEltcG9ydCB0aGUgZGF0YXNldCAqbXlkYXRhMi5jc3YqCgo0LiBUaGlzIGRhdGFzZXQgY29udGFpbnMgbWlzc2luZyBkYXRhIChjb2xsZWN0ZWQgd2l0aCBhIGRvdCkuIFdlIG5lZWQgdG8gdXNlICoqbmEuc3RyaW5ncyoqIHRvIHRha2UgaW50byBhY2NvdW50IG9mIHRoaXMgc2l0dWF0aW9uLiBJbXBvcnQgY29ycmVjdGx5IHRoaXMgZGF0YXNldC4KCjUuIENhbGwgdGhlIGxldmVscyBvZiAqc2V4KjogKip3b21hbioqIGFuZCAqKm1hbioqICh1c2UgKipsZXZlbHMqKikuCgoKCioqcmVhZHIqKiBwYWNrYWdlIG1ha2VzIGl0IGVhc3kgdG8gcmVhZCBtYW55IHR5cGVzIG9mIHJlY3Rhbmd1bGFyIGRhdGEsIGluY2x1ZGluZyBjc3YsIHRzdiBhbmQgZml4ZWQgd2lkdGggZmlsZXMuIEl0IHVzZXMgKnJlYWRfY3N2KiBhbmQgKnJlYWRfdGFibGUqIGluc3RlYWQgb2YgKnJlYWQuY3N2KiBhbmQgKnJlYWQudGFibGUqLiBXZSBjYW4gYWxzbyB1c2UgdGhpcyBwYWNrYWdlIHdpdGggdGhlIG1lbnUgKipJbXBvcnQgRGF0YXNldCoqIG9mIGBSc3R1ZGlvYC4KCgojIyMgRXhlcmNpY2UgMiAoQ29tYmluZSB0YWJsZXMpCgpXZSBjb25zaWRlciB0aGUgZm9sbG93aW5nIHRhYmxlcyAoaW4gdGliYmxlIGZvcm1hdCB3aGljaCBuZWVkcyAqKmRwbHlyKiogb3IgKip0aWR5dmVyc2UqKiBwYWNrYWdlKToKCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQpkZjEgPC0gdGliYmxlKG5hbWU9YygiTWFyeSIsIlBldGVyIiwiSm9obiIsIkp1bHkiKSxhZ2U9YygxOCwyNSwyMSw0MykpCmRmMiA8LSB0aWJibGUobmFtZT1jKCJaYWMiLCJKdWxpYW4iKSxhZ2U9YygyMyw0OCkpCmRmMyA8LSB0aWJibGUoc2l6ZT1jKDE1NCwxNzgsMTgyLDEzNCwxNDIpLG5hbWUxPWMoIlBldGVyIiwiTWFyeSIsIkp1bHkiLCJKb2huIiwic3RlZiIpKQpkZjEKZGYyCmRmMwpgYGAKCldlIHByb3Bvc2UgdG8gY29tYmluZSB0aGVzZSB0YWJsZXMgd2l0aCBzb21lIG9mIHRoZSAiam9pbiBmdW5jdGlvbnMiIG9mIHRoZSAqKnRpZHl2ZXJzZSoqIHBhY2thZ2UgKCpsZWZ0X2pvaW4qLCAqZnVsbF9qb2luKiBmb3IgaW5zdGFuY2UpLiBMb29rIGF0IHRoZSBjaGVhdCBzaGVldCAqKkRhdGEgdHJhbnNmb3JtYXRpb24gd2l0aCBkcGx5cioqIChoZWxwK2NoZWF0c2hlZXRzKy4uLikKCjEuIEpvaW4gKipkZjEqKiB3aXRoICoqZGYyKiogd2l0aCAqYmluZF9yb3dzKiBhbmQgY29tcHV0ZSB0aGUgbWVhbiBvZiB0aGUgdmFyaWFibGUgKiphZ2UqKi4gV2UgY2FsbCAqKmRmKiogdGhlIG5ldyBkYXRhIHRhYmxlLgoKCjIuIEpvaW4gKipkZioqIHdpdGggKipkZjMqKiB3aXRoICoqZnVsbF9qb2luKioKCjMuIERvIHRoZSBzYW1lIHdpdGggKippbm5lcl9qb2luKioKCjQuIEV4cGxhaW4gdGhlIGRpZmZlcmVuY2VzIGJldHdlZW4gKipmdWxsX2pvaW4qKiBhbmQgKippbm5lcl9qb2luKioKCgojIyMgRXhlcmNpc2UgMwoKUmVhZCB0aGUgKnBpc2NpbmUuY3N2KiBmaWxlIHdpdGggKipyZWFkcioqICh1c2UgKipyZWFkX2NzdioqIG9yIGNsaWNrIG9uICoqSW1wb3J0IERhdGFzZXQqKikKCgojIDIuIGRwbHlyIHBhY2thZ2UKCmRwbHlyIGlzIGEgcG93ZXJmdWwgUi1wYWNrYWdlIHRvIHRyYW5zZm9ybSBhbmQgc3VtbWFyaXplIHRhYnVsYXIgZGF0YSB3aXRoIHJvd3MgYW5kIGNvbHVtbnMuIFdlIGNhbiBmaW5kIGluZm9ybWF0aW9uIFtoZXJlXShodHRwczovL3NwYXJrLnJzdHVkaW8uY29tL2RwbHlyLmh0bWwpIG9yIGluIHRoaXMgW2NoZWF0c2hlZXRdKGh0dHBzOi8vd3d3LnJzdHVkaW8uY29tL3dwLWNvbnRlbnQvdXBsb2Fkcy8yMDE1LzAyL2RhdGEtd3JhbmdsaW5nLWNoZWF0c2hlZXQucGRmKS4KCldlIGNhbiBtYW5pcHVsYXRlIGRhdGEgd2l0aCBjbGFzc2ljYWwgbWF0cml4IG9wZXJhdGlvbnMuIEZvciBpbnN0YW5jZSwgd2UgY2FuIG9idGFpbiAqKkxvbmdpdHVkZSoqIGFuZCAqKkxhdGl0dWRlKiogZm9yIHN3aW1taW5nIHBvb2xzIHdpdGggKkxvbmdpdHVkZSogbW9yZSB0aGFuIDE1MyB3aXRoCgpgYGB7cn0KcGlzY2luZXNbcGlzY2luZXMkTG9uZ2l0dWRlPjE1MyxjKCJMb25naXR1ZGUiLCJMYXRpdHVkZSIpXQpgYGAKCioqZHBseXIqKiBvZmZlcnMgYSBtb3JlIGZyaWVuZGx5IHN5bnRheAoKYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0KbGlicmFyeSh0aWR5dmVyc2UpICNsaWJyYXJ5KGRwbHlyKQpwaXNjaW5lcyAlPiUgc2VsZWN0KExvbmdpdHVkZSxMYXRpdHVkZSkgJT4lIGZpbHRlcihMb25naXR1ZGU+MTUzKQpgYGAKCkNvZGUgaXMgbW9yZSBlZmZpY2llbnQgYW5kIGVhc2llciB0byByZWFkLgoKKipkcGx5cioqIGNvbnRhaW5zIGEgZ3JhbW1hciB3aXRoIHRoZSBmb2xsb3dpbmcgdmVyYnM6CgoqICoqc2VsZWN0KCkqKjoJc2VsZWN0IGNvbHVtbnMgKHZhcmlhYmxlcykKKiAqKmZpbHRlcigpKio6CWZpbHRlciByb3dzIChpbmRpdmlkdWFscykKKiAqKmFycmFuZ2UoKSoqOglyZS1vcmRlciBvciBhcnJhbmdlIHJvd3MKKiAqKm11dGF0ZSgpKio6CWNyZWF0ZSBuZXcgY29sdW1ucyAobmV3IHZhcmlhYmxlcykKKiAqKnN1bW1hcmlzZSgpKio6CXN1bW1hcmlzZSB2YWx1ZXMgKGNvbXB1dGUgc3RhdGlzdGljYWwgc3VtbWFyaWVzKQoqICoqZ3JvdXBfYnkoKSoqOglhbGxvd3MgZm9yIGdyb3VwIG9wZXJhdGlvbnMgaW4gdGhlIOKAnHNwbGl0LWFwcGx5LWNvbWJpbmXigJ0gY29uY2VwdAoKIyMgMi4xIGBzZWxlY3QoKWAgdmVyYgoKSXQgYWxsb3dzIHRvIHNlbGVjdCB2YXJpYWJsZXMgKGNvbHVtbnMpOgpgYGB7ciwgZXZhbD1GQUxTRSwgaW5jbHVkZT1UUlVFfQpzZWxlY3QoZGYsIFZBUjEsIFZBUjIsIC4uLikKYGBgCgpGb3IgaW5zdGFuY2UKYGBge3J9CmNvb3JkIDwtIHNlbGVjdChwaXNjaW5lcywgTGF0aXR1ZGUsIExvbmdpdHVkZSkKaGVhZChwaXNjaW5lcywgbj0yKQpoZWFkKGNvb3JkLCBuPTIpCmBgYAoKV2UgY2FuIHVzZSAqKmhlbHBlciBmdW5jdGlvbnMqKiAoKipiZWdpbnNfd2l0aCoqLCAqKmVuZF93aXRoKiosICoqY29udGFpbnMqKiwgKiptYXRjaGVzKiopIGZvciBtb3JlIGNvbXBsZXggc2VsZWN0aW9ucyBiYXNlZCBvbiB0aGUgbmFtZSBvZiB0aGUgdmFyaWFibGVzCgpgYGB7cn0KY29vcmQgPC0gc2VsZWN0KHBpc2NpbmVzLCBlbmRzX3dpdGgoInR1ZGUiKSkKaGVhZChjb29yZCwgbj0yKQpgYGAKCiMjIDIuMiBgbXV0YXRlKClgIHZlcmIKCkl0IGFsbG93cyB0byBjcmVhdGUgbmV3IHZhcmlhYmxlcyBpbiB0aGUgZGF0YXNldDoKCmBgYHtyLCBldmFsPUZBTFNFLCBpbmNsdWRlPVRSVUV9Cm11dGF0ZShkZiwgTkVXLlZBUiA9IGV4cHJlc3Npb24oVkFSMSwgVkFSMiwgLi4uKSkKYGBgCkZvciBpbnN0YW5jZQpgYGB7cn0KZGYgPC0gbXV0YXRlKHBpc2NpbmVzLCBwaHJhc2U9cGFzdGUoIlN3aW1taW5nIHBvb2wiLCBOYW1lLCAiaXMgbG9jYXRlZCBhdCB0aGUgYWRkcmVzcyIsIEFkZHJlc3MpKQpzZWxlY3QoZGYscGhyYXNlKQpgYGAKV2UgY2FuIGNyZWF0ZSBtYW55IHZhcmlhYmxlcwpgYGB7cn0KbXV0YXRlKHBpc2NpbmVzLAogICAgICAgcGhyYXNlID0gcGFzdGUoIlN3aW1taW5nIHBvb2wiLCBOYW1lLCAiaXMgbG9jYXRlZCBhdCB0aGUgYWRkcmVzcyIsIEFkZHJlc3MpLAogICAgICAgdW51c2VkID0gTG9uZ2l0dWRlICsgTGF0aXR1ZGUKKQpgYGAKCgojIyAyLjMgYGZpbHRlcigpYCB2ZXJiCgpJdCBhbGxvd3MgdG8gc2VsZWN0IGluZGl2aWR1YWxzIChyb3dzKTogCmBgYHtyLCBldmFsPUZBTFNFLCBpbmNsdWRlPVRSVUV9CmZpbHRlcihkZiwgVEVTVCkKYGBgCgpGb3IgaW5zdGFuY2UKYGBge3J9CnAxIDwtIGZpbHRlcihwaXNjaW5lcywgTG9uZ2l0dWRlPjE1My4wMikKc2VsZWN0KHAxLExvbmdpdHVkZSkKYGBgCm9yICh3ZSBzZWxlY3Qgc3dpbW1pbmcgcG9vbCB3aXRob3V0ICJQb29sIiBpbiB0aGUgbmFtZSkKYGBge3J9CmRmIDwtIGZpbHRlcihwaXNjaW5lcywgIWdyZXBsKCJQb29sIiwgTmFtZSkpCnNlbGVjdChkZixOYW1lKQpgYGAKb3IgKHdlIHNlbGVjdCBzd2ltbWluZyBwb29sIHdpdGggbG9uZ2l0dWRlIG1vcmUgdGhhbiAxNTMuMDIgYW5kIGxhdGl0dWRlIGxlc3MgdGhhbiAtMjcuNDg4KQoKYGBge3J9CnAyIDwtIGZpbHRlcihwaXNjaW5lcywgTG9uZ2l0dWRlPjE1My4wMiB8IExhdGl0dWRlIDwgLTI3LjQ4OCkKcDIgPC0gc2VsZWN0KHAyLCBMb25naXR1ZGUsIExhdGl0dWRlKQpwMgpgYGAKCldlIHVzZSAqKnNsaWNlKiogdG8gc2VsZWN0IGluZGl2aWR1YWxzIGJ5IGluZGV4OgpgYGB7cn0KcGlzY2luZXMgJT4lIHNsaWNlKDU6OCkKYGBgCgoKIyMgMi40IGBhcnJhbmdlKClgIHZlcmIKCkl0IGFsbG93cyB0byBzb3J0IGEgZGF0YXNldCBhY2NvcmRpbmcgdG8gYSB2YXJpYWJsZToKYGBge3IsIGV2YWw9RkFMU0UsIGluY2x1ZGU9VFJVRX0KYXJyYW5nZShkZiwgVkFSKSAjaW5jcmVhc2luZyBzb3J0CmBgYApvcgpgYGB7ciwgZXZhbD1GQUxTRSwgaW5jbHVkZT1UUlVFfQphcnJhbmdlKGRmLCBkZXNjKFZBUikpICNkZWNyZWFzaW5nIHNvcnQKYGBgCkZvciBpbnN0YW5jZQpgYGB7cn0KYXJyYW5nZShwaXNjaW5lcywgTG9uZ2l0dWRlKQpgYGAKb3IKYGBge3J9CmFycmFuZ2UocGlzY2luZXMsIGRlc2MoTG9uZ2l0dWRlKSkKYGBgCgojIyAgMi41IGBzdW1tYXJpc2UoKWAgdmVyYgoKTW9yZSBjb21wbGV4Li4uIEl0IGFsbG93cyB0byBkZWZpbmUgbmV3IGRhdGFzZXRzIGZyb20gdGhlIG9yaWdpbmFsIGRhdGFzZXQuIE5ldyBkYXRhc2V0IG9mdGVuIGluY2x1ZGVzIHN0YXRpc3RpY2FsIHN1bW1hcmllcyBmb3IgdGhlIG9yaWdpbmFsIGRhdGFzZXQKCjEuIGBtZWFuKClgIDsKMS4gYG1lZGlhbigpYCA7CjEuIGBJUVIoKWAgOwoxLiBgdmFyKClgLgoKRm9yIGluc3RhbmNlCmBgYHtyfQpzdW1tYXJpc2UocGlzY2luZXMsCiAgICAgICAgICBtZWFuX2xvbmcgPSBtZWFuKExvbmdpdHVkZSksCiAgICAgICAgICBtZWRfbGF0ID0gbWVkaWFuKExhdGl0dWRlKSwKICAgICAgICAgIG1pbl9sYXQgPSBtaW4oTGF0aXR1ZGUpLAogICAgICAgICAgc3VtX2xvbmcgPSBzdW0oTG9uZ2l0dWRlKQopCmBgYAoKWW91IGNhbiBhbHNvIGxvb2sgYXQgKipzdW1tYXJpc2VfYWxsKiosICoqc3VtbWFyaXNlX2F0KiogKGBoZWxwKHN1bW1hcmlzZV9hbGwpYCkuICoqZHBseXIqKiBhbHNvIHByb3Bvc2VzIHRoZSBmb2xsb3dpbmcgZnVuY3Rpb25zICh2ZXJ5IHVzZWZ1bCBpbiBzdGF0aXN0aWNzKToKCjEuIGBuKClgOiBudW1iZXIgb2YgbGluZXMgKGluZGl2aWR1YWxzIG9mIHRoZSBkYXRhc2V0KQoxLiBgbl9kaXN0aW5jdCgpYDogbnVtYmVyIG9mIGRpc3RpbmN0IGVsZW1lbnRzIG9mIGEgdmVjdG9yCjEuIGBmaXNydCgpYCBhbmQgYGxhc3QoKWA6IGZpcnN0IGFuZCBsYXN0IGVsZW1lbnQgb2YgYSB2ZWN0b3IKCkZvciBpbnN0YW5jZSwgd2Ugb2J0YWluIHRoZSBudW1iZXIgb2Ygc3dpbW1pbmcgcG9vbCBpbiB0aGUgZGF0YXNldCBhbnQgdGhlIGxvbmdpdHVkZSBvZiB0aGUgbGFzdCBzd2ltbWluZyBwb29sIGluIHRoZSBkYXRhc2V0IHdpdGgKYGBge3J9CnN1bW1hcmlzZShwaXNjaW5lcyxuKCkpCnN1bW1hcmlzZShwaXNjaW5lcyxsYXN0KExvbmdpdHVkZSkpCmBgYAoKCgojIyAyLjYgT3BlcmF0aW9ucyBvbiB0aGUgdmVyYnMKCldlIGNhbiBvZiBjb3Vyc2UgY29uY2F0ZW5hdGUgdmVyYnMuIEZvciBpbnN0YW5jZSwgd2Ugb2J0YWluIHRoZSBuYW1lcyBvZiB0aGUgc3dpbW1pbmcgcG9vbCB3aXRoIHRoZSBoaWdoZXIgTG9uZ2l0dWRlIHdpdGg6CmBgYHtyfQpwMSA8LSBhcnJhbmdlKHBpc2NpbmVzLGRlc2MoTG9uZ2l0dWRlKSkgI2RlY3JlYXNpbmcgc29ydApzdW1tYXJpc2UocDEsZmlyc3QoTmFtZSkpICNleHRyYWN0IHRoZSBmaXJzdApgYGAKClRoZSAqKnBpcGUqKiBvcGVyYXRvciAqKiU+JSoqIG1ha2VzIHRoZSBjb2RlIG1vcmUgcmVhZGFibGU6CmBgYHtyfQpwaXNjaW5lcyAlPiUgYXJyYW5nZShkZXNjKExvbmdpdHVkZSkpICU+JSBzdW1tYXJpc2UoZmlyc3QoTmFtZSkpCmBgYAoKIyMgMi43IEdyb3VwIGRhdGEgd2l0aCAnR3JvdXBfYnknIAoKKipncm91cF9ieSoqIGFsbG93cyB0byBhcHBseSBvcGVyYXRpb24gZm9yIGdyb3VwIG9mIGRhdGEuIEZvciBpbnN0YW5jZSwgd2Ugd2FudCB0byBjb21wdXRlIG1lYW4gbG9uZ2l0dWRlcyBmb3Igc3dpbW1pbmcgcG9vbHMgb2YgSGlnaCBhbmQgTG93IGxhdGl0dWRlIChpdCBkb2VzIG5vdCBtYWtlIHNlbnNlISkuIFdlIGZpcnN0IGFkZCBhIHZhcmlhYmxlICoqbGF0X2Rpc2MqKiB3aGljaCBhbGxvd3MgdG8gZGlzY2VybiBoaWdoIGFuZCBsb3cgbGF0aXR1ZGVzLgoKYGBge3J9CmxhdF9tZWFuIDwtIHN1bW1hcmlzZShwaXNjaW5lcyxtZWFuKExhdGl0dWRlKSkKcGlzYzEgPC0gbXV0YXRlKHBpc2NpbmVzLGxhdF9kaXM9ZmFjdG9yKExhdGl0dWRlPmFzLm51bWVyaWMobGF0X21lYW4pKSkKbGV2ZWxzKHBpc2MxJGxhdF9kaXMpIDwtIGMoIkxvdyIsIkhpZ2giKQpgYGAKCldlIGNhbiBub3cgY29tcHV0ZSB3aXRoICoqZ3JvdXBfYnkqKiB0aGUgbWVhbiBsb25naXR1ZGVzIGZvciB0aGUgMiBncm91cHMKCmBgYHtyfQpncm91cF9ieShwaXNjMSxsYXRfZGlzKSAlPiUgc3VtbWFyaXNlKG1lYW5fbG9uZz1tZWFuKExvbmdpdHVkZSkpCmBgYAoKCiMjIyBFeGVyY2lzZSAzCgpXZSBjb25zaWRlciB0aGUgaXJpcyBkYXRhc2V0CmBgYHtyfQpkYXRhKGlyaXMpCmBgYAoKQW5zd2VyIHRvIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb25zIHdpdGggKipkcGx5cioqCgoxLiBTZWxlY3QgdGhlIHZhcmlhYmxlcyAqKlBldGFsLldpZHRoKiogYW5kICoqU3BlY2llcyoqCgoyLiBTZWxlY3QgaW5kaXZpZHVhbHMgZnJvbSAqKlZlcnNpY29sb3IqKiBhbmQgKipWaXJnaW5pY2EqKiBncm91cHMgKHlvdSBjYW4gdXNlIHN5bWJvbCB8IGZvciBjb25kaXRpb24gKipvcioqKQoKMy4gQ2FsY3VsYXRlIHRoZSBudW1iZXIgb2YgKipzZXRvc2EqKiB3aXRoICoqc3VtbWFyaXNlKioKCjQuIENhbGN1bGF0ZSB0aGUgKiptZWFuKiogb2YgKlBldGFsIFdpZHRoKiBmb3IgdGhlIHZlcnNpY29sb3Igc3BlY2llCgo1LiBBZGQgaW4gdGhlIGRhdGFzZXQgdGhlIHZhcmlhYmxlICoqU3VtX1BldGFsKiogd2hpY2ggY29udGFpbnMgdGhlIHN1bSBvZiAqKlBldGFsLldpZHRoKiogYW5kICoqU2VwYWwuV2lkdGgqKgoKNi4gQ2FsY3VsYXRlIHRoZSBtZWFuIGFuZCB0aGUgdmFyaWFuY2Ugb2YgKipTZXBhbC5MZW5ndGgqKiBmb3IgZWFjaCAqKlNwZWNpZXMqKiAodXNlICoqZ3JvdXBfYnkqKikKCiMjIyBFeGVyY2lzZSA0CgpXZSBjb25zaWRlciB0aGUgKipoZmxpZ2h0cyoqIGRhdGFzZXQgd2hpY2ggY29udGFpbnMgaW5mb3JtYXRpb25zIGFib3V0IGZsaWdodHMgZGVwYXJ0aW5nIGZyb20gSG91c3RvbiBhaXJwb3J0cyBJQUggKEdlb3JnZSBCdXNoIEludGVyY29udGluZW50YWwpIGFuZCBIT1UgKEhvdXN0b24gSG9iYnkpOgoKYGBge3J9CmxpYnJhcnkoaGZsaWdodHMpCmhmbGlnaHRzIDwtIHRibF9kZihoZmxpZ2h0cykKYGBgCgpWYXJpYWJsZSBgVW5pcXVlIENhcnJpZXJgIHByb3ZpZGVzIGEgY29kZSB3aGljaCBpZGVudGlmeSB0aGUgY2FycmllcjoKYGBge3J9Cmx1dDEgPC0gYygiQUEiID0gIkFtZXJpY2FuIiwgIkFTIiA9ICJBbGFza2EiLCAiQjYiID0gIkpldEJsdWUiLCAiQ08iID0gIkNvbnRpbmVudGFsIiwKICAgICAgICAgIkRMIiA9ICJEZWx0YSIsICJPTyIgPSAiU2t5V2VzdCIsICJVQSIgPSAiVW5pdGVkIiwgIlVTIiA9ICJVU19BaXJ3YXlzIiwgCiAgICAgICAgICJXTiIgPSAiU291dGh3ZXN0IiwgIkVWIiA9ICJBdGxhbnRpY19Tb3V0aGVhc3QiLCAiRjkiID0gIkZyb250aWVyIiwgCiAgICAgICAgICJGTCIgPSAiQWlyVHJhbiIsICJNUSIgPSAiQW1lcmljYW5fRWFnbGUiLCAiWEUiID0gIkV4cHJlc3NKZXQiLCAiWVYiID0gIk1lc2EiKQpgYGAKV2UgY2FuIGFsc28gc3BlY2lmeSB0aGUgdmFyaWFibGUgYENhbmNlbGxhdGlvbkNvZGVgIGFzIGZvbGxvd3M6CmBgYHtyfQpsdXQyIDwtIGMoIkEiID0gImNhcnJpZXIiLCAiQiIgPSAid2VhdGhlciIsICJDIiA9ICJGRkEiLCAiRCIgPSAic2VjdXJpdHkiLCAiRSIgPSAibm90IGNhbmNlbGxlZCIpCmBgYAoKV2UgY2hhbmdlIHRoZSB0YWJsZSBgaGZsaWdodHNgIHRvIGJldHRlciBleHBsYWluIHZhcmlhYmxlcyBgVW5pcXVlIENhcnJpZXJgIGV0IGBDYW5jZWxsYXRpb25Db2RlYC4KCmBgYHtyfQpoZmxpZ2h0czEgPC0gaGZsaWdodHMKaGZsaWdodHMxJFVuaXF1ZUNhcnJpZXIgPC0gbHV0MVtoZmxpZ2h0czEkVW5pcXVlQ2Fycmllcl0KaGZsaWdodHMxJENhbmNlbGxhdGlvbkNvZGVbaGZsaWdodHMxJENhbmNlbGxhdGlvbkNvZGU9PSIiXSA8LSAiWiIKaGZsaWdodHMxJENhbmNlbGxhdGlvbkNvZGUgPC0gbHV0MltoZmxpZ2h0czEkQ2FuY2VsbGF0aW9uQ29kZV0KYGBgCgpGcm9tIG5vdyBvbiwgd2Ugd29yayB3aXRoIHRoZSB0YWJsZSAqKmhmbGlnaHRzMSoqLgoKCjEuIFVzZSBzZXZlcmFsIGRpZmZlcmVudCB3YXlzIHRvIHNlbGVjdCB2YXJpYWJsZXMgZnJvbSAqKk9yaWdpbioqIHRvICoqQ2FuY2VsbGVkKioKCjIuIFNlbGVjdCB2YXJpYWJsZXMgYERlcFRpbWVgLCBgQXJyVGltZWAsIGBBY3R1YWxFbGFwc2VkVGltZWAsIGBBaXJUaW1lYCwgYEFyckRlbGF5YCBhbmQgYERlcERlbGF5YC4gVXNlICpoZWxwZXIgZnVuY3Rpb24qICoqY29udGFpbnMoKSoqLgoKMy4gQWRkIGEgdmFyaWFibGUgKipBY3R1YWxHcm91bmRUaW1lKiogd2hpY2ggY29ycmVzcG9uZHMgdG8gKkFjdHVhbEVsYXBzZWRUaW1lKiBtaW51cyAqQWlyVGltZSoKCjQuIEFkZCBhIHZhcmlhYmxlICoqQXZlcmFnZVNwZWVkKiogKGA9RGlzdGFuY2UvQWlyVGltZWApIGFuZCBvcmRlciB0aGUgZGF0YXNldCBhY2NvcmRpbmcgdG8gdGhpcyB2YXJpYWJsZQoKNS4gU2VsZWN0IGZsaWdodHMgdG8gSkZLCgo2LiBDYWxjdWxhdGUgdGhlIG51bWJlciBvZiBmbGlndGhzIHRvIEpGSwoKNy4gQ3JlYXRlIGEgc3VtbWFyeSBvZiBgaGZsaWdodHMxYCB3aGljaCBjb250YWluczoKICAgICsgYG5gIDogdGhlIHRvdGFsIG51bWJlciBvZiBmbGlnaHRzOwogICAgKyBgbl9kZXN0YDogdGhlIHRvdGFsIG51bWJlciBvZiBkZXN0aW5hdGlvbnM7CiAgICArIGBuX2NhcnJpZXJgIDogdGhlIHRvdGFsIG51bWJlcnMgb2YgY2FycmllcnMuCiAgICAKOC4gQ3JlYXRlIGEgc3VtbWFyeSBvZiBgaGZsaWdodHMxYCBmcm9tIHRoZSBjYXJyaWVyICoqQW1lcmljYW4qKiB3aGljaCBjb250YWlucwogICAgKyB0aGUgdG90YWwgbnVtYmVyIG9mIGZsaWdodHMgOwogICAgKyB0aGUgdG90YWwgbnVtYmVyIG9mIGNhbmNlbGxlZCBmbGlnaHRzIDsKICAgICsgdGhlIG1lYW4gb2YgdGhlIHZhcmlhYmxlIGBBcnJEZWxheWAgKGJlIGNhcmVmdWxsIHdpdGggdGhlIGBOQWApLgogICAgCjkuIENhbGN1bGF0ZSBmb3IgZWFjaCBjYXJyaWVyIAogICAgKyB0aGUgdG90YWwgbnVtYmVyIG9mIGZsaWdodHMKICAgICsgdGhlIG1lYW4gb2YgdGhlIHZhcmlhYmxlICoqQWlyVGltZSoqCiAgICAKMTAuIE9yZGVyIHRoZSBjYXJyaWVycyBhY2NvcmRpbmcgdG8gdGhlIG1lYW4gb2YgZGVwYXJ0dXJlIGRlbGF5cyAKCgojIyMgRXhlcmNpc2UgNQoKV2UgY29uc2lkZXIgZGF0YXNldHMgYWJvdXQgdGhlIDQgdGVubmlzIG1ham9yIHRvdXJuYW1lbnRzIGluIDIwMTMuIERhdGFzZXQgYXMgd2VsbCBhcyBkZXNjcmlwdGlvbiBvZiB0aGUgdmFyaWFibGVzIGFyZSBhdmFpbGFibGUgYXQgW2h0dHBzOi8vYXJjaGl2ZS5pY3MudWNpLmVkdS9tbC9kYXRhc2V0cy9UZW5uaXMrTWFqb3IrVG91cm5hbWVudCtNYXRjaCtTdGF0aXN0aWNzXShodHRwczovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvZGF0YXNldHMvVGVubmlzK01ham9yK1RvdXJuYW1lbnQrTWF0Y2grU3RhdGlzdGljcykuCgpXZSBmaXJzdCBmb2N1cyBvbiB0aGUgbWVuIHRvdXJuYW1lbnRzIG9mIFJvbGFuZCBHYXJyb3MuCgoxLiBJbXBvcnQgdGhlIGRhdGFzZXQuCgoyLiBEaXNwbGF5IHRoZSBuYW1lIG9mIEZlZGVyZXIncyBvcHBvbmVudHMuCgozLiBEaXNwbGF5IHRoZSBuYW1lcyBvZiB0aGUgc2VtaS1maW5hbGlzdHMuCgo0LiBIb3cgbWFueSBwb2ludHMgd2VyZSBwbGF5ZWQgb24gYXZlcmFnZSBwZXIgbWF0Y2g/IEl0IHdpbGwgYmUgbmVjZXNzYXJ5IHRvIGFkZCBpbiB0aGUgdGFibGUgYSB2YXJpYWJsZSBjb3JyZXNwb25kaW5nIHRvIHRoZSBudW1iZXIgb2YgcG9pbnRzIG9mIGVhY2ggbWF0Y2ggKHZlcmIgYG11dGF0ZWApLgoKNS4gQ2FsY3VsYXRlIHRoZSBhdmVyYWdlIG51bWJlciBvZiBhY2VzIHBlciBtYXRjaC4KCjYuIENhbGN1bGF0ZSB0aGUgYXZlcmFnZSBudW1iZXIgb2YgYWNlcyBwZXIgbWF0Y2ggZm9yIGVhY2ggcm91bmQgb2YgdGhlIHRvdXJuYW1lbnQuCgo3LiBDYWxjdWxhdGUgdGhlIHRvdGFsIG51bWJlciBvZiBkb3VibGUgZmF1bHRzIGluIHRoZSB0b3VybmFtZW50LgoKOC4gSW1wb3J0IHRoZSBkYXRhc2V0IHdoaWNoIGNvbnRhaW5zIHJlc3VsdHMgb2YgV2ltYmxlZG9uIChhbHdheXMgZm9yIG1lbikuCgo5LiBNZXJnZSB0aGUgMiBkYXRhc2V0cy4gQWRkIGEgdmFyaWFibGUgKipUb3VybmFtZW50cyoqIHdoaWNoIGluZGljYXRlcyB0aGUgVG91cm5hbWVudCAoUkcgb3IgV0lNQikuIFlvdSBjYW4gdXNlICoqYmluZF9yb3dzKiogd2l0aCB0aGUgb3B0aW9uICoqLmlkKiouCgoxMC4gRGlzcGxheSB0aGUgbWF0Y2ggb3IgRmVkZXJlciBmb3IgZWFjaCB0b3VybmFtZW50LgoKMTEuIE1ha2Ugb2YgY29tcGFyaXNvbiBvZiB0aGUgYXZlcmFnZSBudW1iZXIgb2YgYWNlcyBwZXIgbWF0Y2ggZm9yIGVhY2ggcm91bmQgb2YgdGhlIHRvdXJuYW1lbnQgYmV0d2VlbiBSb2xhbmQgR2Fycm9zIGFuZCBXaW1ibGVkb24uCgoK