- -

Open Refine for UNLV Libraries: Setup

Getting ready

You need to download and install OpenRefine and download a data file to follow this lesson.

For this lesson you will need OpenRefine (formerly Google Refine) and a web browser.

Note: this is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed.

Downloading OpenRefine

You can download OpenRefine from http://openrefine.org/download.html. This lesson has been tested with all versions of OpenRefine, from 2.5 to 2.7.

OpenRefine 2.7 is recommended.

There are versions for Windows, Mac OS X and Linux.

Installing OpenRefine

When you download OpenRefine for Windows or Linux, you are downloading a zip file. To install OpenRefine you simply unzip the downloaded file wherever you want to install the program. This can be to a personal directory or to an applications or software directory - OpenRefine should run wherever you put the unzipped folder. The location has to be a “local” drive as problems have been reported trying to run OpenRefine from a Network drive.

If you are downloading OpenRefine for Mac, you are downloading a ‘dmg’ (disk image) file which you can open, and then drag the OpenRefine application to an appropriate folder on you computer.

Windows

Mac

Linux

OpenRefine is a Java application, and you need to have a ‘Java Runtime Environment’ (JRE) installed on your computer to run OpenRefine. If you don’t already have one installed then you can download and install from http://java.com by going to the site and clicking “Free Java Download”.

Running OpenRefine

The interface to OpenRefine is accessed via a web browser. When you run Refine normally this should open a window in your default web browser pointing at the address http://127.0.0.1:3333. If this doesn’t happen automatically you can open a web browser and type in this address.

Working with Large Datasets in OpenRefine

Running OpenRefine to clean up larger datasets can cause the program to freeze or run very slowly. This is usually because OpenRefine uses your computer’s RAM to process changes to the dataset, and anything using more than 3GB will cause your RAM to “clog up” and run slowly or not at all.

There is a work around for this issue. You will need to allocate more memory to running the program, which you can do by following this FAQ tutorial from the OpenRefine GitHub. It involves downloading and installing a Java extension as well as making a small adjustment to an OpenRefine text file.

Getting Help

If you encounter problems installing or running OpenRefine, a good source of support is the OpenRefine mailing list and forum

If you are installing OpenRefine on Windows, you may want to check the thread on Installing OpenRefine on Windows 7

There are also general and specialist tutorials about using OpenRefine available on the web, including: