Mastodon

Have you ever noticed advertisements in the job classifieds for ‘database architects’ or ‘database administrators’? It makes database design sound pretty intimidating, doesn’t it? If they’re so difficult to use that they require architects and administrators, what hope do we mere mortals have?

In fact, databases range from being ultra-simple to use to ultra-complex. In the world of personal computers, there are two main types of database programs. At the simple end of the scale are flat-file databases, also called single-file or list managers. These programs are as easy to learn as a word processor or spreadsheets. The initial concepts may take a little more time to absorb than word processing, but they’re well within anyone’s grasp. If you’re using a database at home, in a class at school or in a small club or organisation, chances are the simple flat-file database will fill your needs.

Relational databases

Things can get more complex when you use the other type of personal computer database program, called a relational database. With a relational database program you can create a range of databases, from flat-file structures to demanding multi-file systems. If you’re using a database in your small business, a large organisation, or an ambitious school project, you’re likely to need at least some of the features of a more complex relational database.

Whichever type of database program you use, the most crucial step in using it is to design your database structure carefully. The way you structure your data will affect every other action. It will determine how easy it is to enter information into the database; how well the database will trap inconsistencies and exclude duplicate records; and how flexibly you will be able to get information out of the database.

A simple example

Let’s take an ultra simple example: the phonebook. Say you’ve been given the job of placing your school or organisation’s phone directory on computer.

It should be easy: all you need is the name, the address and the phone number of each person. Your initial table design thus consists of three fields: name, address and phone number. Right?

Let’s check.

Testing your design

The way to see if your database design works is to test it with some sample data, so feed the following records into your hypothetical table:

J. T. Apples, 100 Megalong Dr, Haberfield, 4992122
B. York, 2/53 Alice St, Leichhardt, 5050011
M. R. Sullivan, 9 Jay St, Leichhardt, 4893892
B. J. Anderson, 71 Wally Rd, Glebe, 2298310

Now tell the database program to sort the information:

B. J. Anderson, 71 Wally Rd, Glebe, 2298310
B. York, 2/53 Alice St, Leichhardt, 5050011
J. T. Apples, 100 Megalong Dr, Haberfield, 4992122
M. R. Sullivan, 9 Jay St, Leichhardt, 4893892

Revising your design

Immediately, you can see this is not what you want. You want the table sorted alphabetically by last name, not initials.

How can you fix this? Well, you could do some complex manipulation using a database feature called ‘string functions’ – if your program supports such a feature. Or, you could come up with a better table design in the first place: last name, initials, address and phone number. Feed your test data into this revised structure then tell the database program to sort your table using the last name followed by the initials. This time, you’ll get the correct alphabetical listing:

Anderson, B. J., 71 Wally Rd, Glebe, 2298310
Apples, J. T., 100 Megalong Dr, Haberfield, 4992122
Sullivan, M. R., 9 Jay St, Leichhardt, 4893892
York, B., 2/53 Alice St, Leichhardt, 5050011

Keep on refining

Don’t stop there. The table can be even more effective if you break the structure down further. For instance, if you’d like to have an easy way to list only those people who live in Leichhardt, this design won’t help you. But with a little more work, you can break your database structure down further into last name, initials, street address, suburb, and phone number.

With this structure, you’ll be able to sort your database alphabetically by last name or by suburb, and you’ll be able to pluck out all those people who live in a particular suburb.

A visual example

Here’s another example of this process in action, starting with a badly designed table structure:

A flawed table design

Figure 1: A simple – and flawed – students table

This fragment of a students table suffers from various defects. The combined name field makes it hard for us to locate students by using their last names, and also difficult to sort the table alphabetically by last name. Even worse, how do we uniquely identify each student? Currently, this table uses a combination of the Name and Date of Birth fields to identify each student. What if we happen to have two Jim Smiths, both born on 5th September 1982? It’s unlikely, yes, but not impossible.

The composite address field is also a problem: for instance, how do we pre-sort mail by postal code so we can do bulk mailouts at discount rates?

By breaking the fields down we end up with double the number of fields but a much simpler and more effective structure. The new Student ID field lets us uniquely identify any student, even when they share names and birthdates. We’ve split the Names field into first and last names so we can get an alphabetical listing. We’ve done the same to the address field, allowing us to sort by postal code or suburb.

Improved table design

Figure 2: An improved students table

More is less

Notice how creating an efficient table structure consists of breaking down your fields into simpler and simpler components? You end up with a table with many more fields than you might originally have thought necessary, but each of those fields houses much more basic information.

There’s a technical term for this process: normalisation. If you wanted to become a database architect you’d have to become conversant with normalisation and functional dependencies and normal forms. If you’re happy to remain a lesser mortal, you can safely ignore these terms, provided you keep in mind that your task is to create a database structure that provides an efficient store for your information and that makes it flexible and easy to extract useful information.

In some ways, creating a database that’s effective and simple to use is almost an anti-intuitive process. For example, our initial structure for the phone directory:

name
address
phone number

seems like it’s a simpler design than our end result:

first name
last name
street address
suburb
phone number

Creating useful fields

What you need to remember is that while the structure might look more complex, the contents of each field have been reduced to the simplest useful components. I say “useful” because we could, of course, break each field down further. For instance, we could break the street address field into two fields, street number and street name. But what would be the point? There’s no need to extract or sort information in the database simply by street number or name, and so it’s not a useful basis for breaking up the field. On the other hand, if we wanted to deal with multi-line addresses which are common for businesses, such as:

Suite 5
122 Jones Street

then it makes sense to break the address field down into two simpler fields, address line 1 and address line 2. You’re not likely to want to sort information based on only one of these fields, nor are you likely to use either of these fields in isolation. What is likely is that you’ll want to have an easy way to print address line 1 and address line 2 as separate lines when addressing envelopes. So this field division becomes useful when getting information out of your database.

Permissible design infractions

I should add that creating these address sub-fields is not, technically, a good solution. Notice how with this table design, many records will have nothing in the address line 2 field, as most people’s address consists of one line, rather than two. So you’ll be wasting a lot of space in your database. Additionally, what about addresses that require more than two lines, such as:

Suite 5
Level III, Building A20
122 Jones Street

Our new table structure can’t cope with this. Should we add another address line to ensure we cater to the infrequent address that needs three lines? Should we add a fourth line just in case…?

You can see the problems you can create by not getting the design right. A technically rigorous solution is to remove the address lines from our phonebook table altogether, and stick them in an address table, that we then link to the phonebook table by a common field.

However, there’s no need to fuss too much about such details. Many databases get by with minor infractions of database design rules, and you shouldn’t feel hampered by such rules provided your table structures:

  • provide flexible and simple output (whether you’re asking an online query of the database or printing reports)
  • eliminate redundant or duplicated information
  • exclude inconsistencies

Computer-less design

One thing I hope you’ve noticed is that we’ve done all our design without the aid of a computer. This is as it should be: it lets you focus on the significance of the task without the distractions of trying to learn a database program at the same time.

You can design and test your database structure without going near a computer. The only thing you really need to know is the type of database program you’ll use: if it’s a flat-file database, such as Microsoft Works, you’ll be limited to single-table database design. If it’s a relational program, such as FileMaker Pro, Microsoft Access or Alpha Five, you can design single- or multi-table databases.

In the next article in this series, we’ll move on to relational design. You’ve seen how breaking down your fields into simpler components in a single table can help make it easier to get useful information out of your database. When we look at relational design, you’ll discover how extending this process lets you address the other two goals of database design: eliminating redundant information and excluding inconsistencies.