Data Entry Overview

Data Entry Overview
Prev	Chapter 7. Data Entry	Next

The prototypical way to import data into Gombe-MI is in bulk, via a plain text file having columns delimited by the tab character. These are easily produced by almost any spreadsheet program; it is expected that most data imported into Gombe-MI will be typed into a spreadsheet and then exported to tab-delimited text for upload.

Most data are uploaded into Gombe-MI via the Upload program, most often directly into views. The phpPgAdmin program's import function can also be used, although it aborts as soon as it encounters an error. Data may also be entered row by row directly into the database, either via the phpPgAdmin web interface, or by entering SQL into either phpPgAdmin or any other PostgreSQL front-end.

Tip

Typically when uploading data into PostgreSQL encountering an error causes the upload to abort. This can be a problem when uploading large amounts of data since it results in a repeated cycle of upload-fix-re-upload. Gombe-MI contains some bespoke programs that address this issue by attempting to continue uploading after an error is encountered in order to deliver back to the user as many errors as possible from a single upload.

The nature of the data validation built into Gombe-MI has an impact on the efficient use of these sorts of upload programs. Data validation occurs in two stages. First, as much validation as possible takes place as each row is inserted into each table. This phase of the data validation process is called here, for lack of a better term, “before commit” data validation. The second phase of data validation occurs once all the data has been inserted into the database. In this phase additional validation occurs to check for consistency throughout the data as a whole. This second phase of data validation is called here the “on commit” data validation phase.

The two phases of data validation can be distinguished by the errors messages they produce. “Before commit” data validation error messages do not contain the phrase “on commit” in their text, “on commit” error messages do contain this phrase.

The upload programs can recover and continue to upload additional rows if a data validation error is found during the “before commit” phase of data validation. In this case it can be beneficial to upload large files in order to produce large numbers of error messages and minimize the number of upload-fix-re-upload cycles.

However once the data is clean enough that the upload program ceases to produce “before commit” error messages, once the “on commit” validation phase has been entered, only a single error message at a time can be returned. Encountering an error during the “on commit” validation phase always immediately aborts the upload. When the data is clean enough that only “on commit” errors are returned then it can be beneficial to upload small files in order to reduce the amount of time it takes to produce an error and increase the speed of the upload-fix-re-upload cycle.

Gombe-MI contains a minimal number of bespoke programs. Some few of these programs are utility in nature: a program to logout, a program to automate the steps involved in the creation of a new database user, and so forth. Most of the data entered into Gombe-MI is collected in tabular, row-and-column, format suitable for entry into a relational database. As mentioned previously, this data can be imported directly.

Most errors in computer data entry can be caught by the wwwdiff program. This program compares files. Typos are detected by entering the data twice, preferably by 2 different people, and comparing the result. Errors made in the field are more likely to be detected by manual checks of the data, or by the data validation built into Gombe-MI.