数据探索

--- title: "Untitled" author: "xuefliang" date: "6/16/2019" output: html_document editor_options: chunk_output_type: console --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(inspectdf) library(tidyverse) library(readr) data(starwars) #install_github("alastairrushworth/inspectdf") ``` ```{r} df= read_csv('https://raw.githubusercontent.com/lgellis/STEM/master/DATA-ART-1/Data/FinalData.csv', col_names = TRUE) allGrades <- df oldGrades <- allGrades %>% filter(Grade > 5) youngGrades <- allGrades %>% filter(Grade < 6) ggplot(oldGrades, aes(x=Grade)) + geom_histogram() ggplot(youngGrades, aes(x=Grade)) + geom_histogram() ``` ## inspect_types() 检查变量类型 ```{r} inspect_types(allGrades)%>% show_plot() inspect_types(youngGrades, oldGrades) %>% show_plot() ``` ## inspect_mem() 大小信息,包括数据框列、行、总大小和每个变量的大小。 ```{r} inspect_mem(allGrades)%>% show_plot() inspect_mem(youngGrades, oldGrades) %>% show_plot() ``` ## inspect_na() 缺失值 ```{r} inspect_na(allGrades)%>% show_plot() inspect_na(youngGrades, oldGrades) %>% show_plot() ``` ## inspect_num() 数字变量 ```{r} inspect_num(allGrades)%>% show_plot() inspect_num(youngGrades, oldGrades) %>% show_plot() ``` ## inspect_imb 分类变量的特征不平衡 ```{r} inspect_imb(allGrades) %>% show_plot() inspect_imb(youngGrades, oldGrades) %>% show_plot() ``` ## inspect_cat() 分类变量 ```{r} inspect_cat(allGrades) %>% show_plot() inspect_cat(youngGrades, oldGrades) %>% show_plot() ``` ## inspect_cor 数列的相关系数 ```{r} inspect_cor(allGrades) %>% show_plot() inspect_cor(youngGrades, oldGrades) %>% show_plot() ``` ## 变量类型-表格展示 ```{r} starwars %>% inspect_types() ``` ## 分类变量-表格摘要 cnt列有多少个唯一的类别。 common最常出现的类别。 common_pcnt最常出现类别所占百分比。 ```{r} star_cat <- starwars %>% inspect_cat() star_cat #eye_color变量类别构成 star_cat$levels$eye_color ``` ## 分类变量-可视化 ```{r} star_cat %>% show_plot() # 比较稀疏类别,将稀疏类别设置为high cardinality,通过设置high_cardinality = 2甚至更高,可以将稀疏类别的“长尾”进一步划分。随着更大数据集,这对于可视化变得越来越重要。 star_cat %>% show_plot(high_cardinality = 1) # 色盲主题 star_cat %>% show_plot(col_palette = 1) star_cat %>% show_plot(col_palette = 2) ```

评论

此博客中的热门博文

V2ray websocket(ws)+tls+nginx分流

Rstudio 使用代理