数据探索
---
title: "Untitled"
author: "xuefliang"
date: "6/16/2019"
output: html_document
editor_options:
chunk_output_type: console
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(inspectdf)
library(tidyverse)
library(readr)
data(starwars)
#install_github("alastairrushworth/inspectdf")
```
```{r}
df= read_csv('https://raw.githubusercontent.com/lgellis/STEM/master/DATA-ART-1/Data/FinalData.csv', col_names = TRUE)
allGrades <- df
oldGrades <- allGrades %>%
filter(Grade > 5)
youngGrades <- allGrades %>%
filter(Grade < 6)
ggplot(oldGrades, aes(x=Grade)) + geom_histogram()
ggplot(youngGrades, aes(x=Grade)) + geom_histogram()
```
## inspect_types() 检查变量类型
```{r}
inspect_types(allGrades)%>%
show_plot()
inspect_types(youngGrades, oldGrades) %>%
show_plot()
```
## inspect_mem() 大小信息,包括数据框列、行、总大小和每个变量的大小。
```{r}
inspect_mem(allGrades)%>%
show_plot()
inspect_mem(youngGrades, oldGrades) %>%
show_plot()
```
## inspect_na() 缺失值
```{r}
inspect_na(allGrades)%>%
show_plot()
inspect_na(youngGrades, oldGrades) %>%
show_plot()
```
## inspect_num() 数字变量
```{r}
inspect_num(allGrades)%>%
show_plot()
inspect_num(youngGrades, oldGrades) %>%
show_plot()
```
## inspect_imb 分类变量的特征不平衡
```{r}
inspect_imb(allGrades) %>%
show_plot()
inspect_imb(youngGrades, oldGrades) %>%
show_plot()
```
## inspect_cat() 分类变量
```{r}
inspect_cat(allGrades) %>%
show_plot()
inspect_cat(youngGrades, oldGrades) %>%
show_plot()
```
## inspect_cor 数列的相关系数
```{r}
inspect_cor(allGrades) %>% show_plot()
inspect_cor(youngGrades, oldGrades) %>% show_plot()
```
## 变量类型-表格展示
```{r}
starwars %>%
inspect_types()
```
## 分类变量-表格摘要
cnt列有多少个唯一的类别。
common最常出现的类别。
common_pcnt最常出现类别所占百分比。
```{r}
star_cat <-
starwars %>%
inspect_cat()
star_cat
#eye_color变量类别构成
star_cat$levels$eye_color
```
## 分类变量-可视化
```{r}
star_cat %>% show_plot()
# 比较稀疏类别,将稀疏类别设置为high cardinality,通过设置high_cardinality = 2甚至更高,可以将稀疏类别的“长尾”进一步划分。随着更大数据集,这对于可视化变得越来越重要。
star_cat %>% show_plot(high_cardinality = 1)
# 色盲主题
star_cat %>% show_plot(col_palette = 1)
star_cat %>% show_plot(col_palette = 2)
```
评论
发表评论