As of February 2012, most databases consist of server software and a set of files that all reside on a single computer. More advanced approaches use several computers and many files, sometimes at different locations. Parallel and distributed methods improve access speed for very large databases, access for geographically dispersed organizations and reliability for applications that depend on uptime.
A distributed database houses data in two or more server computers at separate locations. For example, a head office in Chicago has one database and a warehouse in Kansas City has another one. The two share a link over the Internet such that the Chicago database receives shipment records from Kansas City every night. A typical parallel database resides in one location with one set of files, though several computers share the workload.
A parallel database’s main advantage is speed. Like a standard database running on one computer, a parallel database receives Structured Query Language, or SQL requests from users. The server breaks these down into a series of steps, then executes them. A standard database server performs all the steps by itself whereas a parallel database assigns steps to different computers. When each computer finishes its task, the database assembles the information and sends results back to the user. Because each computer works on only part of the work, together they finish a SQL request in much less time. As an organization’s database requirements grow, you add computers to the parallel database to meet the increased workload.
Distributed databases improve access, as each local office has its own database. Most SQL transactions take place at the office level without the delays incurred by long-distance data networks. Each local database has information in common with the others, but may also have data unique to the location. Periodically, the local databases synchronize over a long-distance network to stay current with each other. By contrast, a parallel database doesn't improve access to remote locations.
Distributed and parallel databases address reliability in different ways. A database running on a single computer depends entirely on that machine’s reliability: The moment the computer crashes, the database becomes unavailable. Most parallel databases have a feature that monitors the state of each computer; if one fails, the server program removes it from service but the database remains available, though at reduced performance. For a distributed database, the remaining local servers continue to process information if one develops a problem; the outage affects only the users local to the failed computer.