A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identified and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis.
OpenRefine (formerly Google Refine) is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another. Many people comment that this tool saves them literally months of work trying to make these edits by hand!
Turn messy data into tidy data! Much of your time will be spent in this ‘data wrangling’ stage. It’s not the most fun, but it’s necessary and food data organization is the foundation of data analysis. Learn the rules for a 'tidy dataset' in order to clean and prepare your data with examples and tools.
This workshop will build on principles from the June Tidy Data is Good Data seminar, and explore how to turn messy data into tidy data using simple functions in R. Note: You do not have to have...
Scholars can communicate their research in various ways. While peer-reviewed journal publications remain the primary outlet for sharing the key results of research projects, there are growing norms and expectations that the underlying data from projects should also be published.
In this session, we will look at strategies and tools for increasing and measuring the impact of your research, and help you get started with an ORCID profile.
Scholars communicate their research in various ways. While peer-reviewed journal publications remain the primary outlet for sharing the key results of research projects, there are growing expectations that the underlying data from projects should also be published. Data management practices help researchers take care of their data throughout the entire research process from the planning phase to the end of a project when data might be published within a repository.
This webinar will look at strategies to effectively publish data and provide strategies for “...
"It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data." - Hadley Wickham, Chief Scientist at RStudio
Analysis-ready datasets have been responsibly collected and reviewed so that analysis of the data yields clear, consistent, and error-free results to the greatest extent possible. When working on a research project, take steps to ensure that your data is safe, authentic, and usable.
This webinar will review data management practices to ensure your datasets are ready for analysis. We will review...
Scholars increasingly work on collaborative research projects. Collaborative projects often bring together partners across disciplines, institutions, and sectors. These projects present opportunities for innovation but also raise challenges for the development of efficient and effective workflows and the management of data.
This workshop will examine considerations for collaborative research and present some strategies for developing and documenting workflows as well as methods for storing and sharing data. We will also look at some tools available that can...
Research Data Management is essential for responsible research and should be introduced when starting a new project or joining a new lab. Managing data across a project and/ or a team allows for accurate communication about that project.
This session will review the important steps for onboarding new employees/trainees to a lab or new projects. The key take-away from this session will be how to incorporate these steps within your individual project or lab environment.
Publishing research data within a trusted repository helps you comply with funder and journal data sharing policies, supports the discovery of and access to data, and can result in more visibility and higher impact for research projects. These shared datasets can be cited and referenced by yourself and by other researchers.
This seminar will provide an overview of sharing data in a repository and how to structure a data citation.
Dropbox is a file hosting service that offers cloud storage, file synchronization, version control, online editing, and more. Entire Labs can promote collaboration via Dropbox which provides a platform for accessing shared data without taking up valuable space on your computers.
This seminar will explore how you can effectively utilize Dropbox for managing your research files and entire research project.
Note: we will focus on Dropbox for Business which is available for the Harvard research community.
To ensure that you understand your own data and to enable others to find, use and properly cite your data, it helps to create README files with ‘documentation’ or ‘metadata’ about the datasets you create.
This session will explore the critical role documentation plays in data management and how you can ensure good documentation throughout your research.
Define common types of documentation
Understand why documenting your research is important...