Chetan Prajapati
Chetan Prajapati

Founder & Statistician at ISCON

Reshaping though frequently required in data analysis, so often it remains confusing even if you are frequent user ofreshapefunction. Here I have provided an simple example to elaborate more on each argument ofreshape.

Table of Contents

Reshape data from wide to long

Your data in wide form if the multiple observations of item, place or person (i.e. units) has been recorded in single row (but in multiple column). This multiple observations may be of repeated measure type (observation are made repeatedly at different time point) or multiple characteristics of some unit (eg. height, length and width of square).

Let’s take an example of wide data of repeated measure type. Person A is visiting clinic every month for their blood pressure check, and nurse note down the value for each month in single row which belong to this specific person A. See example below,

# wide data
df <- data.frame(matrix(data = NA, nrow = 3, ncol = 5, dimnames = list(NULL, paste0(c("id","name","jan","feb","march")))))
df[1,] <- c(1,"A",123,120,125)
df[2,] <- c(2,"B",140,150,155)
df[3,] <- c(3,"C",96,86,97)

Our “wide” data look like

knitr::kable(df, caption = "wide data")
Table 1:wide data
idnamejanfebmarch
1A123120125
2B140150155
3C968697

We want a data in which each new observation in new row but within same column i.e. long data. To achieve that we need to use standardreshapecommand inR. Thereshapehas following argument:

  • idvar: unique identifier for person,place or object on which observations(measurements) are made at different time points or repeatedly. ExampleCase ID

  • varying: if observation for specific individuals are made at different time points, in which columns values are recorded i.e time-varying columns. ExampleJan,Feb,March

  • timevar: what will be the name of column once the time-varying columns above has been staked in rows. ExampleMonth

  • times: what will be the values (of time) once the time-varying columns above has been staked in rows. ExampleJan,Feb,March

  • v.names: what will be the values (of observations) once the time-varying columns above has been staked in rows. ExampleBP

  • direction: data needs to converted from wide tolongformat.

df_long <- reshape(df,
idvar = "id",  
        #[unique identifier for person,place or object on which observations(measurments) are made at different time points or repeatedly]

varying = c("jan","feb","march"),  
        # [if observation for specific individulas are made at different time points, in which columns values are recorded i.e time-varying columns ]

timevar = "month", 
        # [what will be the name of column once the timevarying columns above has been staked in rows]

times = c("jan","feb","march"),
        # [what will be the values (of time) once the timevarying columns above has been staked in rows]

v.names = "BP", 
        # [what will be the values (of observations) once the timevarying columns above has been staked in rows]

direction = "long")   
        # [we want to convert wide df into long one])

Our “long” data look like

df_long <- arrange(df_long, id)

kable(df_long,format = "pandoc", caption = "long data")
Table 2:long data
idnamemonthBP
1Ajan123
1Afeb120
1Amarch125
2Bjan140
2Bfeb150
2Bmarch155
3Cjan96
3Cfeb86
3Cmarch97

Sometime, not only one type of measurement (BP) but also other types (such as heart rate-HR) are measured and recorded row wise. For example,

# wide data
df <- data.frame(matrix(data = NA, nrow = 3, ncol = 8, dimnames = list(NULL, paste0(c("id","name","BP_jan","BP_feb","BP_march","HR_jan","HR_feb","HR_march")))))
df[1,] <- c(1,"A",123,120,125,72,70,71)
df[2,] <- c(2,"B",140,150,155,85,82,86)
df[3,] <- c(3,"C",96,86,97,65,52,59)
kable(df,format = "pandoc", caption = "wide data- multiple category")
Table 3:wide data- multiple category
idnameBP_janBP_febBP_marchHR_janHR_febHR_march
1A123120125727071
2B140150155858286
3C968697655259

This data can be converted into “long” by usinglistfor group of time-varying columns forvarying

df_long <- reshape(df,
idvar = "id",  
varying = list(c("BP_jan","BP_feb","BP_march"),c("HR_jan","HR_feb","HR_march") ),  
timevar = "month", 
times = c("jan","feb","march"),
v.names = c("BP","HR"), 
direction = "long")   
df_long <- arrange(df_long, id)
kable(df_long,format = "pandoc", caption = "long data")
Table 4:long data
idnamemonthBPHR
1Ajan12372
1Afeb12070
1Amarch12571
2Bjan14085
2Bfeb15082
2Bmarch15586
3Cjan9665
3Cfeb8652
3Cmarch9759

Reshape data from long to wide

To make data “wide” from long, thereshapefunction will need only two main arguments

  • idvar: unique identifier of unit on which measurement are made

  • timevar: which column representthe timingof the observations ( so thatreshapefunction associate it with the value for given time for each ID )

If you do not specify above two arguments, function will drop an error-

Error in [.data.frame (data, , idvar) : undefined columns selected

If you read above error carefully, it already specifying which arguments were missing. Here in above case missing argument wasidvar.

You can optionally provide,

  • v.names: which column representvaluesof the observations in long data (so thatreshapefunction can transform these values into rows for each ID)

  • sep: column names in wide format are going to be created using value oftimesvarandintegers. Specify how both will be seperated in column names.

Here is the example

df_wide <- reshape(df_long,
       idvar = "id",
       # unique identifier
       timevar = "month",
       # the column represent the timing of the observations
       v.names = c("BP","HR"),
       # the columns represent the value of the observation (BP,HR)
       direction = "wide",
       sep = "_"
        )

Here is the wide data

Table 5:wide data
idnameBP_janHR_janBP_febHR_febBP_marchHR_march
11A123721207012571
42B140851508215586
73C966586529759