This tutorial covers the following:
Import data in OpenRefine.
Data types supported by OpenRefine.
Exploring your data with the OpenRefine feature.
Sorting and viewing your data.
As OpenRefine is already installed and running, let’s learn how to import, what data types are supported by OpenRefine, and how to explore your data and sort and view it by following the instructions outlined below.
Data Exploration 101: Tips for Importing and Maximizing Insights from Your Data
- Import your data: Open the OpenRefine application and select “Create Project” to create a new project. You can import your data into the project in various formats, including CSV, TSV, XLS, and JSON. Select the appropriate import option and follow the prompts to import your data.
- Data types supported: OpenRefine supports several data types; some of the most frequently used are:
- Text: Any alphanumeric characters, punctuation, and spaces.
- Numbers: Integers and decimals, and scientific notation.
- Booleans: Logical values (true or false)
- Dates: Formats, such as YYYY-MM-DD and MM/DD/YYYY.
- Arrays: Collections of values that can be stored in a single cell.
- Objects Data structures that can contain multiple fields or attributes, each with its value.
- Explore your data with Openrefine’s powerful facet feature. Using facets, you can identify patterns and inconsistencies in your data, making it easier to clean and transform for data visualizations. In this exercise, you will use facets to explore a sample dataset of public educational organizations in Connecticut and gain insights into the different types of schools in the state.”
- Download the sample spreadsheet about public educational organizations in Connecticut. Save the file to your device.
- Open OpenRefine in your browser and Import the data by clicking “Create Project” and selecting the downloaded spreadsheet.
- Select the “Organization Type” column by clicking on the column header.
- Click on the “Facet” button at the top of the selected column.
- In the drop-down menu, select “Text Facet”. This will create a new pane on the left-hand side of the screen. However, there are several options for facets, which can be found below:
- Text facet: Show all the unique values in the column and how many times each value appears. You can use this facet to identify misspellings, duplicates, and inconsistent values.
- Numeric facet: Explore the numerical values in a column, including the minimum and maximum values and the mean and distribution of values.
- Timeline facet: Explore the temporal values in a column, including the earliest and latest dates, the distribution of dates, and the frequency of events over time.
- Custom facet: Create a facet based on a custom expression or function. It is helpful if you need to perform a more complex data analysis.
- The text facet will show you all the unique values in the column and how many times each value appears. Use this information to answer the question about how many school types exist in Connecticut.
- Use the “Undo/Redo” buttons at the top of the screen to return to previous steps or redo a facet or filter.
- Once you have explored the dataset using facets, you can export the data in a new format or save your OpenRefine project for future use.
- Using OpenRefine and facets can help you quickly gain insights into your data, identify errors or inconsistencies, and filter the data to focus on specific subsets. You can become more proficient at using OpenRefine to analyze and explore datasets with practice.
Short Instructional Video
Video 1: “How to import and use facets to count school organization types in Connecticut,” Self-hosted video, 00:15 seconds, posted by Myri Ayala.