tidyr学习
## 为方便处理,在数据集中增加一列car
```{r echo=FALSE,warning=FALSE,error=FALSE}
library(tidyr)
library(dplyr)
library(tibble)
head(mtcars)
mtcars$car <- rownames(mtcars)
mtcars <- mtcars[, c(12, 1:11)]
```
##gather-宽数据转为长数据,类似于reshape2包中的melt函数
```{r}
mtcarsNew <- mtcars%>%gather(attribute,value,-car)
#tidyr很好的一点是可以只gather若干列而其他列保持不变。如果你想gather在map和gear之间的所有列而保持carb和car列不变
mtcarsNew <- mtcars%>%gather(attribute,value,mpg:gear)
```
##spread—长数据转为宽数据,类似于reshape2包中的cast函数
```{r}
mtcarsSpread <- mtcarsNew%>%spread(attribute,value)
```
##unit—多列合并为一列
```{r}
set.seed(1)
date <- as.Date('2016-01-01') + 0:14
hour <- sample(1:24, 15)
min <- sample(1:60, 15)
second <- sample(1:60, 15)
event <- sample(letters, 15)
data <- data.frame(date, hour, min, second, event)
dataNew <- data%>%
unite(datehour,date,hour,sep=' ')%>%
unite(datetime,datehour,min,second,sep=':')
```
##separate—将一列分离为多列
```{r}
data1 <- dataNew%>%
separate(datetime,c('date','time'),sep=' ')%>%
separate(time,c('hour','min','second'),sep=':')
```
##separate_rows()通过分隔符将那些含有多个值的字段拆分为多行
```{r}
df <- data_frame(x = 1:2, y = c("a,b", "d,e,f"))
df %>%
separate_rows(y, sep = ",")
```
对比separate()函数,它将df数据集拆分成了多列
```{r}
df %>%
separate(y, c("y1", "y2", "y3"), sep = ",", fill = "right")
```
##spread()函数增加了一个sep参数:用于将列名设置为“key|sep|value”格式。这在对字段为数值型数据做重塑时非常有用。
```{r}
df <- data_frame(
x = c(1, 2, 1),
key = c(1, 1, 2),
val = c("a", "b", "c")
)
df %>% spread(key, val)
df %>% spread(key, val, sep = "_")
```
##unnest()函数增加了一个.sep参数。当数据框的多个列包含有相同变量名的时候非常有用
```{r}
df <- data_frame(
x = 1:2,
y1 = list(
data_frame(y = 1),
data_frame(y = 2)
),
y2 = list(
data_frame(y = "a"),
data_frame(y = "b")
)
)
df %>% unnest()
df %>% unnest(.sep = "_")
```
###unnest()函数添加了.id参数用于显示列表中定义的名称
```{r}
df <- data_frame(
x = 1:2,
y = list(
a = 1:3,
b = 3:1
)
)
df %>% unnest()
df %>% unnest(.id = "id")
```
```{r echo=FALSE,warning=FALSE,error=FALSE}
library(tidyr)
library(dplyr)
library(tibble)
head(mtcars)
mtcars$car <- rownames(mtcars)
mtcars <- mtcars[, c(12, 1:11)]
```
##gather-宽数据转为长数据,类似于reshape2包中的melt函数
```{r}
mtcarsNew <- mtcars%>%gather(attribute,value,-car)
#tidyr很好的一点是可以只gather若干列而其他列保持不变。如果你想gather在map和gear之间的所有列而保持carb和car列不变
mtcarsNew <- mtcars%>%gather(attribute,value,mpg:gear)
```
##spread—长数据转为宽数据,类似于reshape2包中的cast函数
```{r}
mtcarsSpread <- mtcarsNew%>%spread(attribute,value)
```
##unit—多列合并为一列
```{r}
set.seed(1)
date <- as.Date('2016-01-01') + 0:14
hour <- sample(1:24, 15)
min <- sample(1:60, 15)
second <- sample(1:60, 15)
event <- sample(letters, 15)
data <- data.frame(date, hour, min, second, event)
dataNew <- data%>%
unite(datehour,date,hour,sep=' ')%>%
unite(datetime,datehour,min,second,sep=':')
```
##separate—将一列分离为多列
```{r}
data1 <- dataNew%>%
separate(datetime,c('date','time'),sep=' ')%>%
separate(time,c('hour','min','second'),sep=':')
```
##separate_rows()通过分隔符将那些含有多个值的字段拆分为多行
```{r}
df <- data_frame(x = 1:2, y = c("a,b", "d,e,f"))
df %>%
separate_rows(y, sep = ",")
```
对比separate()函数,它将df数据集拆分成了多列
```{r}
df %>%
separate(y, c("y1", "y2", "y3"), sep = ",", fill = "right")
```
##spread()函数增加了一个sep参数:用于将列名设置为“key|sep|value”格式。这在对字段为数值型数据做重塑时非常有用。
```{r}
df <- data_frame(
x = c(1, 2, 1),
key = c(1, 1, 2),
val = c("a", "b", "c")
)
df %>% spread(key, val)
df %>% spread(key, val, sep = "_")
```
##unnest()函数增加了一个.sep参数。当数据框的多个列包含有相同变量名的时候非常有用
```{r}
df <- data_frame(
x = 1:2,
y1 = list(
data_frame(y = 1),
data_frame(y = 2)
),
y2 = list(
data_frame(y = "a"),
data_frame(y = "b")
)
)
df %>% unnest()
df %>% unnest(.sep = "_")
```
###unnest()函数添加了.id参数用于显示列表中定义的名称
```{r}
df <- data_frame(
x = 1:2,
y = list(
a = 1:3,
b = 3:1
)
)
df %>% unnest()
df %>% unnest(.id = "id")
```
评论
发表评论