Naming nightmares

Olly Rudd, NHNZ Images Librarian

In an ideal world every time you search a database you want all the relevant records presented to you. Not just those that the system can be bothered finding, or a heap of others that are totally irrelevant. Achieving this state of precision is almost impossible and depends on many factors some are and some are not under the control of you the user.

For instance the vagaries of the language. With wildlife the problems associated with vernacular names are well known. To give an example moon-daisy and ox-eye daisy are one and the same thing (pretty flowering weeds in wheat fields). Purple gallinule and pukeko are the same bird both are water rails which are marsh loving birds, the first is the North American name and the second is the New Zealand name. Talk about purple gallinule in NZ or pukeko in USA and you're likely be given blank looks. This nomenclatural problem is not limited to wildlife; one person's shovel is another person's spade.

Spelling variants of course are another well-known problem, most modern databases deal with common variants like color and colour, by using fuzzy logic. But when you are interrogating an older database, which, if it is a text-only one it could well be relying on twenty year old technology, as a searcher you need to be aware of variants. I have even come across databases where you still need to perform two searches to cover singular and plural aspects of your term.

In searching the NHNZ Images database you need to be aware that we adopt British usage of spelling. Also you'll probably find that we have our own peculiar slang terms which however much we endeavour to exclude from the database still manage to creep in. Not that I can think of any specific examples as I write.

The factors that govern the usefulness of a search which are under the control of the user, depend upon the experience of the user in querying databases. Most people are familiar with the Boolean logic concept of AND, OR, NOT, but with wildlife, terms that are adjacent are more likely to be useful. The default for most databases is AND, but for NHNZ Images it is ADJ (adjacent), thus a search for blue penguin in most databases will retrieve records that contain blue and penguin – so you get blue cars, blue butterflies and so on which occur in a programme called penguins. But search for blue penguin and the first 200 hits are for little blue penguins. You can achieve the same effect in most databases by putting speech marks round the term thus “blue penguin”, which tells the search engine you want to retrieve that phrase.

Each database has its own foibles and to get really speedy accurate and pertinent answers there is a quick way to find out what footage we have on penguins, which is to phone or email.