{"id":2759,"date":"2016-04-05T13:51:13","date_gmt":"2016-04-05T11:51:13","guid":{"rendered":"http:\/\/blogs.sun.ac.za\/cib\/?p=2759"},"modified":"2020-07-29T14:26:36","modified_gmt":"2020-07-29T12:26:36","slug":"cib-researchers-develop-new-software-package","status":"publish","type":"post","link":"https:\/\/blogs.sun.ac.za\/cib\/cib-researchers-develop-new-software-package\/","title":{"rendered":"C\u00b7I\u00b7B researchers develop new software package for improving data quality"},"content":{"rendered":"<p>Three C\u00b7I\u00b7B researchers, Mark Robertson, Cang Hui and Vernon Visser developed a new R package that can be used for assessing and improving the quality of datasets consisting of occurrence records.<\/p>\n<p>Museums and herbarium collections provides records of where species occurred, which are often used for mapping biodiversity patterns. These collections datasets are freely available and are becoming easily accessible through portals such as the <a href=\"https:\/\/www.gbif.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Global Biodiversity Information Facility<\/a>. Unfortunately these datasets contain many errors and suffer from several data quality issues. Despite the large number of users of these datasets there are only a few software tools dedicated to error detection and correction of such datasets.<\/p>\n<p>The package, called <a href=\"https:\/\/cran.r-project.org\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong><em>biogeo<\/em><\/strong><\/a> includes features such as error detection, such as mismatches between the recorded country and the country where the record is plotted, records of terrestrial species that fall into the sea and outlier detection. A key feature of the package is the ability to identify likely alternative positions for points that represent obvious errors in the dataset and functions to explore records in geographical and environmental space in order to identify possible errors in the dataset. Functions are also available for converting coordinates that are in various text formats into degrees, minutes and seconds and then into decimal degrees.<\/p>\n<p>The package was developed for the R environment, so at least some experience with R is useful, but is not essential.\u00a0 The package comes with a tutorial that is aimed at the first-time user that provides examples of how to use the various functions in the package to detect and correct errors in collections datasets.<\/p>\n<p>The package is available from the Comprehensive R Archive Network <a href=\"https:\/\/cran.r-project.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/cran.r-project.org\/<\/a><\/p>\n<p>A paper describing common data quality issues and highlighting the features of the package was published in the journal, <em>Ecography<\/em>.<\/p>\n<figure id=\"attachment_2760\" aria-describedby=\"caption-attachment-2760\" style=\"width: 506px\" class=\"wp-caption aligncenter\"><img fetchpriority=\"high\" decoding=\"async\" class=\"size-large wp-image-2760\" src=\"http:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-940x940.jpeg\" alt=\"The package can be used to identify likely alternative positions for points that represent obvious errors in a dataset. \" width=\"506\" height=\"506\" srcset=\"https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-940x940.jpeg 940w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-580x580.jpeg 580w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-150x150.jpeg 150w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-768x768.jpeg 768w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-1536x1536.jpeg 1536w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-2048x2048.jpeg 2048w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-50x50.jpeg 50w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-80x80.jpeg 80w, https:\/\/blogs.sun.ac.za\/cib\/files\/2020\/07\/11.-fig2-for-Marks-nugget-45x45.jpeg 45w\" sizes=\"(max-width: 506px) 100vw, 506px\" \/><figcaption id=\"caption-attachment-2760\" class=\"wp-caption-text\">The package can be used to identify likely alternative positions for points that represent obvious errors in a dataset.<\/figcaption><\/figure>\n<h4><strong>To read the paper<\/strong><\/h4>\n<p><a href=\"http:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/ecog.02118\/epdf\" target=\"_blank\" rel=\"noopener noreferrer\">Robertson, M. P., Visser, V. and Hui, C.\u00a0 2016.\u00a0 Biogeo:\u00a0 an R package for assessing and improving data quality of occurrence record datasets. \u2013 Ecography 39: DOI: 10.1111\/ecog.02118.<\/a><\/p>\n<p>For more information, contact Mark Robertson at <a href=\"mailto:mrobertson@zoology.up.ac.za\">mrobertson@zoology.up.ac.za<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Three C\u00b7I\u00b7B researchers, Mark Robertson, Cang Hui and Vernon Visser developed a new R package that can be used for assessing and improving the quality of datasets consisting of occurrence records.<\/p>\n","protected":false},"author":237,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"ocean_post_layout":"","ocean_both_sidebars_style":"","ocean_both_sidebars_content_width":0,"ocean_both_sidebars_sidebars_width":0,"ocean_sidebar":"","ocean_second_sidebar":"","ocean_disable_margins":"enable","ocean_add_body_class":"","ocean_shortcode_before_top_bar":"","ocean_shortcode_after_top_bar":"","ocean_shortcode_before_header":"","ocean_shortcode_after_header":"","ocean_has_shortcode":"","ocean_shortcode_after_title":"","ocean_shortcode_before_footer_widgets":"","ocean_shortcode_after_footer_widgets":"","ocean_shortcode_before_footer_bottom":"","ocean_shortcode_after_footer_bottom":"","ocean_display_top_bar":"default","ocean_display_header":"default","ocean_header_style":"","ocean_center_header_left_menu":"","ocean_custom_header_template":"","ocean_custom_logo":0,"ocean_custom_retina_logo":0,"ocean_custom_logo_max_width":0,"ocean_custom_logo_tablet_max_width":0,"ocean_custom_logo_mobile_max_width":0,"ocean_custom_logo_max_height":0,"ocean_custom_logo_tablet_max_height":0,"ocean_custom_logo_mobile_max_height":0,"ocean_header_custom_menu":"","ocean_menu_typo_font_family":"","ocean_menu_typo_font_subset":"","ocean_menu_typo_font_size":0,"ocean_menu_typo_font_size_tablet":0,"ocean_menu_typo_font_size_mobile":0,"ocean_menu_typo_font_size_unit":"px","ocean_menu_typo_font_weight":"","ocean_menu_typo_font_weight_tablet":"","ocean_menu_typo_font_weight_mobile":"","ocean_menu_typo_transform":"","ocean_menu_typo_transform_tablet":"","ocean_menu_typo_transform_mobile":"","ocean_menu_typo_line_height":0,"ocean_menu_typo_line_height_tablet":0,"ocean_menu_typo_line_height_mobile":0,"ocean_menu_typo_line_height_unit":"","ocean_menu_typo_spacing":0,"ocean_menu_typo_spacing_tablet":0,"ocean_menu_typo_spacing_mobile":0,"ocean_menu_typo_spacing_unit":"","ocean_menu_link_color":"","ocean_menu_link_color_hover":"","ocean_menu_link_color_active":"","ocean_menu_link_background":"","ocean_menu_link_hover_background":"","ocean_menu_link_active_background":"","ocean_menu_social_links_bg":"","ocean_menu_social_hover_links_bg":"","ocean_menu_social_links_color":"","ocean_menu_social_hover_links_color":"","ocean_disable_title":"default","ocean_disable_heading":"default","ocean_post_title":"","ocean_post_subheading":"","ocean_post_title_style":"","ocean_post_title_background_color":"","ocean_post_title_background":0,"ocean_post_title_bg_image_position":"","ocean_post_title_bg_image_attachment":"","ocean_post_title_bg_image_repeat":"","ocean_post_title_bg_image_size":"","ocean_post_title_height":0,"ocean_post_title_bg_overlay":0.5,"ocean_post_title_bg_overlay_color":"","ocean_disable_breadcrumbs":"default","ocean_breadcrumbs_color":"","ocean_breadcrumbs_separator_color":"","ocean_breadcrumbs_links_color":"","ocean_breadcrumbs_links_hover_color":"","ocean_display_footer_widgets":"default","ocean_display_footer_bottom":"default","ocean_custom_footer_template":"","ocean_post_oembed":"","ocean_post_self_hosted_media":"","ocean_post_video_embed":"","ocean_link_format":"","ocean_link_format_target":"self","ocean_quote_format":"","ocean_quote_format_link":"post","ocean_gallery_link_images":"on","ocean_gallery_id":[],"footnotes":""},"categories":[51026,3256],"tags":[8802,73082,73083,71768,73084],"class_list":["post-2759","post","type-post","status-publish","format-standard","hentry","category-2016-news","category-news","tag-biodiversity","tag-biogeo","tag-occurrence-records","tag-r-package","tag-science-datasets","entry"],"acf":[],"_links":{"self":[{"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/posts\/2759","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/users\/237"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/comments?post=2759"}],"version-history":[{"count":1,"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/posts\/2759\/revisions"}],"predecessor-version":[{"id":2761,"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/posts\/2759\/revisions\/2761"}],"wp:attachment":[{"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/media?parent=2759"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/categories?post=2759"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.sun.ac.za\/cib\/wp-json\/wp\/v2\/tags?post=2759"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}