文档

清理混乱和缺失的数据表

此示例演示如何查找、清理和删除缺少数据的表行。

加载样本数据

从逗号分隔的文本文件加载示例数据,messy.csv。该文件包含许多不同的缺失数据指示器:

  • 空字符向量(“”)

  • 句点(.)

  • NA

  • -99

要指定要作为空值处理的字符向量,请使用“免费治疗”带有可读作用

T=可读性(“messy.csv”,“免费治疗”, {'.',“不”})
T=21×5表我们的研究是一个B C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 3 3 3“3”3“3”3“3“3”3“3“3”3“3”3“3“3”3“3”3“3“3”是”是“3”是“3“7”是“7”是“7”非“7”非“7”非“7”非“7”非“7“7”非“7”非“7“7“7”非“7“7”非“7”非“7”非“7”非“7”非“7”非“7“7”非“7“7”非“7”非“7”非“7”非“7”非“7”7”7 3''563 563'pnj5'463'编号'463 463'wnn3'6'编号'6'oks9'23'是'23 23“wba3”否“是”否14

T是一个包含21行和五个变量的表。“免费治疗”仅适用于文件中的数字列,无法处理数字文字,例如'-99'.

汇总表

属性创建表摘要,查看每个变量的数据类型、描述、单位和其他描述性统计信息总结作用

摘要(T)
变量:A: 21x1 cell array of character vectors B: 21x1 double Values: Min -99 Median 14 Max 563 NumMissing 3 C: 21x1 cell array of character vectors D: 21x1 double Values: Min -99 Median 7 Max 563 NumMissing 2 E: 21x1 double Values: Min -99 Median 14 Max 563

从文件导入数据时,默认为可读将包含非数字元素的任何变量作为字符向量的单元格数组读取。

查找缺少值的行

显示表中的行子集,T,至少缺少一个值。

TF=ismissing(T{'''.'“不”NaN-99});T(任何(TF,2),:)
ans=5×5表A B C D E ______ ___ _____ ______ 'egh3' NaN 'no' 7 7 'abk6' 563 ' 563 563 'wba3' NaN 'yes' NaN 14 'poj2' -99 'yes' -99 -99 'gry5' NaN 'yes' NaN 21

可读取代'.'“不”具有在数值变量中,B,D,E.

替换缺少的值指示器

清理数据,以便由代码指示缺失的值-99具有标准的MATLAB®数值缺失值指示器,.

T = standardizeMissing (T, -99)
T=21×5表我们的研究是一个B C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C 3 3 3“3”3“3”3“3“3”3“3“3”3“3”3“3“3”3“3”3“3“3”是”是“3”是“3“7”是“7”是“7”非“7”非“7”非“7”非“7”非“7“7”非“7”非“7“7“7”非“7“7”非“7”非“7”非“7”非“7”非“7”非“7“7”非“7“7”非“7”非“7”非“7”非“7”非“7”7”7 3''563 563'pnj5'463'编号'463 463'wnn3'6'编号'6'oks9'23'是'23 23“wba3”否“是”否14

标准化替换的三个实例-99具有.

创建一个新表,T2,并用表的前几行中的值替换缺少的值。填充物缺失提供许多方法来填充缺失的值。

T2=填充缺失(T,“以前的”)
T2 =21×5表我们的研究是一个B C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU3“是”563 563“pnj5”463“否”463“wnn3”6“否”6“oks9”23“是”23“wba3”23“是”23 14

删除缺少值的行

创建一个新表,T3,它仅包含中的行T没有缺失值。

T3=RMT缺失(T)
T3 =16×5表我们的研究是一个B C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU3'6'no'6'oks9'23'yes'23'pkn4'2'no'22'adw3'22'no'2222'bas8'23'编号23

T3包含16行和5个变量。

组织数据

对行的排序T3按降序排列C,然后按升序排序A..

T3 = sortrows (T2, {“C”,“A”},{“下降”,“上升”})
T3 =21×5表A B C D E  ______ ____ _____ ____ ____ ' abk6 ' 563 '是的' 563 563 afe1 ' 3 '是的' 3 3 ' __arg1 ' 5 '是的' 5 ' 5 gry5“23”是的“23日21”jre3“34.6”是的“34.6 - 34.6”oii4 ' 5 '是的' 5 ' 5 oks9“23”是的”23日23 poj2 ' 22 '是的' 22 22 wba3“23”是的”23日14“wen9 234 '是的' 234 234 wnk3 ' 245 '是的' 245 245 wth4 ' 3 '是的' 3 3 ' adw3“22”没有“22 22”atn2“23”没有“23日23` bas8 ` 23 ` no ` 23 ` 23 ` dbo8 ` 5 ` no ` 5

在里面C,行首先按“是的”,然后是“不”.然后进来A.,行按字母顺序列出。

重新排列表格,以便A.C彼此紧挨着。

T3=T3(:{“A”,“C”,“B”,“D”,“E”})
T3 =21×5表“是的”563 563 563 3“是”3 3 3“3”3“3”3“3”3 3“3”3 3“3”3“3”3“3”3“3”3“3”3“3”3“3”3“3”3“3”3“3”3“3”3”3“3”3 3 3 3 3“5 5 5 5”5“5”5”5“5”2”2”2“2”2“2”2”2”2”2“2”2”2”2“5“5“5“5“5“5“5”3”5”5“5“5“5“5“5”5“5”5“5”5”6“8”6“8”6“8”6“8”6“8”6“8”6“8”6“8”6“8”6“8“8“8”6“8”8“8“8“8”8”8“8“8”8“8““第22号22”atn2“第23号23”Bas8''编号'23 23'dbo8''编号'5

另见

||||||

相关的话题

这个话题有用吗?