What is Database?
A database is a collection of files that have stored data. The files and the data within them are related in some manner – either they are from the same domain, the same function, the same firm, or some other category. The files in the database are created according to the needs of the function or department and are maintained with data that is relevant to the applications the department runs.
Table of Content
- 1 What is Database?
- 2 Database Example in Organization
- 3 Basic Architecture of Database
- 4 Need for Database System
Database Example in Organization
An example of a database in an organization is an ‘Employee’ database. This will correspond to the human resources function of the organization. The ‘Employee’ database may contain files related to employee details, employment history, benefits details, leave history, family details, medical compensation details, and so on.
The files are related to the employee concept, although they contain different data, depending on the applications that will need the data. Computations regarding medical reimbursements, for instance, will read data from the files related to the employee’s status, benefits, and medical history.
Consider another example of a ‘Product’ database. This may contain files related to the types of products, the details about product features, the prices and history of prices of products, the regional and seasonal sales figures of products, and the details of product designs. Such files could be used by the manufacturing department to determine production schedules, by the marketing department to determine sales campaigns or by the finance department to determine overhead allocation.
Fields, Records, and Files
A file in a database consists of particular spaces, called structures, in which data is maintained. The basic structure of a file is called a field. A field is a defined space, of specific dimensions, in which data is placed. Data is read from and can be deleted from fields. When defining or designing a field, the contents of the field have to be specified exactly.
For instance, if the field has to hold the date of birth of an employee, it has to be specified how the data will be stored (in dd-mm-yyyy format or mm-dd-yy format), and what kind of characters (numbers, in this case) will be permitted. Several other dimensions specify a field and are held in a metadata file. (A metadata file contains details about how data is stored in files and provides information on how the data can be used and managed.)
A collection of fields is called a record. Each record is like a row in a spreadsheet; it consists of a pre-defined number of fields. In most cases, the sizes of the fields are fixed; this ensures that the total size of a record is also fixed. Records are a collection of fields that have meaning in the context of the application.
A table is contained in a file. Each table may contain a few records or a very large number of records. A database consists of many files. Modern database systems allow table sizes to include billions of records. Furthermore, very large tables may be split and stored on different servers.
In relational databases, the tables are related to each other. These relations allow data to be linked according to some logic and then extracted from the tables.
Basic Architecture of Database
Databases may be organized and used in many different ways. The most basic use is as a personal database. Individual users create databases for their personal use in organizations or at home. A personal database may be on a personal computer at the office, on a mobile phone, or on a tablet computer. The data in these databases is fed and updated by the user and is principally used by him/her. For instance, a contacts database on a mobile phone is a personal database.
All the data is entered and used by the mobile phone user. The design of the database is not created by the user (such databases are often provided as off-the-shelf applications), but the use and maintenance are only by the user. Personal databases are highly tuned to the needs of the user. They are not meant to be shared. These databases also cannot be shared, as they reside on personal devices; and this is a limitation of these systems.
Workgroup databases or function databases are designed to be shared across employees in an organization, either belonging to a group or a functional department. Such a database is maintained on a central computer, along with applications relevant to the group or department. Users access and update the data on the central database from their local computers.
Enterprise or organizational databases are accessed by all members of the organization. These are typically organized in the client–server mode. A central database server provides database capabilities to different applications that reside on other computers. These client applications interact with the database server to draw on data services, whereas the database server is managed independently. An advantage of these database servers is that they can be made highly secure, with strong access restrictions, and can also be backed up carefully to recover from crashes.
While designing client–server databases, a prime issue to be addressed is – where the processing should take place. If data processing has to be done on the client from, say, three tables then these tables have to be moved across the network to the client, which should have enough computing capacity to do the processing.
If, on the other hand, the computing is done on the server then the clients have to send processing requests to the server and await the results, and this puts a lot of load on the server. Clients such as mobile phones or personal computers often do not have the processing capacity to deal with large amounts of data, so the processing is invariably left to the server.
The architecture often used in enterprises is referred to as three-tier architecture. Here the clients interact with application servers, which then call upon database servers for their data needs. Here a load of processing for applications and data is spread across two sets of servers, thus enabling greater efficiency.
Databases may be centralized or decentralized within organizations. Centralized databases are designed on the client–server model, with a two-tier or three-tier architecture. Decentralized or distributed databases have tables distributed across many servers on a network. The servers may be geographically distributed, but for the applications, they appear as a single entity. One type of distributed server has the entire database replicated across many servers. This is called a homogeneous database.
Those users who are close to a particular server can access data from that particular one, whereas others access data from other, physically closer servers. When data is changed on any one server, it is also changed on the others.
Distributed databases can also be federated in nature. It means the databases across the network are not the same; they are heterogeneous. In such an architecture, when application servers draw on the databases, special algorithms pull together the required data from diverse servers and present a consolidated picture. This architecture is useful where the data entry and management of servers is particular to a region.
For example, multinational banks use federated databases as their databases in different countries operate on different currencies and exchange criteria and rely on local data. For applications requiring global data, the applications use special logic for analyzing disparate data.
A special class of software is used to connect disparate databases and these are known as middleware. As databases can have different data structures for the same kind of data, the middleware software allows the databases to read and write data to and from each other. For example, the data field for ‘student name’ may have a space for 30 characters in one database and 40 characters in another.
The fact that they are referring to the same concept is captured by the middleware that enables the translation from one to the other. The middleware is also used by the Application Layer to read and use data from many databases. In modern web-centric applications, middleware plays a major role in allowing the use of distributed databases by application servers.
Need for Database System
Different aspects of the need for database systems are discussed in the sections given below.
- Data Independence
- Reduced Data Redundancy
- Data Consistency
- Data Access
- Data Administration
- Managing Concurrency
- Managing Security
- Recovery From Crashes
- Application Development
Databases allow data about an activity or a domain to be maintained independently. This independence means that the data is stored in separate files in a structured manner, and the creation and updating of the data are done independently of its uses. For instance, in a college, a database of students is updated when a student joins or leaves the college, changes address changes phone number, and so on.
This is independent of how the data is used by programs for course registration or the library. Furthermore, the programs and applications that use the data are not aware of where and how the data is maintained; they only need to know how to make simple calls to access the data.
Reduced Data Redundancy
One goal of databases is to reduce data redundancy. Data redundancy refers to the duplication of data in different tables. If data on students is maintained in two or three different databases in the college then for one change, say in a student’s mobile phone number, all the databases have to be changed. Reduced data redundancy ensures that minimal storage is used for the data. With the rapid increase in data over time, conserving space is an important management challenge.
Data users must have access to consistent data, that is, the data is the same regardless of the application through which the user accesses it. Consistency implies that the integrity of the data is maintained (the data has not been changed or corrupted in a manner unknown to the system); the data is valid, which means that the data is the correct one to use for the moment; and the data is accurate, which means that the data being used is the one that was designed to be used. Consistency requires careful management of data updating, deletion, copying, and security.
Data stored in databases must be accessed efficiently. Very large databases, such as those maintained by eBay, have to be managed in a way that when users search within them, their results should be available within a matter of seconds. A search on eBay results in a response within a few seconds, even though the system has to search through billions of records. Furthermore, the response from the database has to be presented to the user in a manner that is easy to read and understand, which requires further processing.
Data administration entails deciding who can create, read, update, or delete data. Many organizations have strict controls over who can create or delete data fields or tables. This is determined by the needs of the organization and the roles defined for database administrators and users.
Read access is usually provided to those who need to only see and use the data, but not modify or change it in any way. Update access is also carefully restricted to those who have the rights and privileges to do so. Modern database systems enable sophisticated ways in which these four functions can be enabled or disabled for users and administrators.
A serious challenge for modern databases, especially those used for e-commerce applications, is that of managing concurrency. Data is often maintained on many servers, and distributed across a wide geography. Concurrency entails ensuring that changes or updates to a particular element in a table are reflected across all the distributed servers where users access the data. This is an element of managing consistency, particularly for distributed databases.
A substantial threat to modern databases is from crackers and unauthorized users. Database systems have to provide a layer of security, over and above the security systems in place at the organization, which ensures protection across transactions and all administration tasks. This also means that internal tampering and mischief with data are carefully monitored and controlled.
Recovery From Crashes
Databases are crucial to the internal working of an organization – they are both a resource and an asset. With the high levels of transactions happening within the IS of organizations, the data must be secured against failure. Modern database systems provide a sophisticated system of backup, mirroring, and recovery that allows rapid recovery from crashed servers.
Databases enable applications to be developed using the facilities of data management provided by them. E-commerce sites, for example, create a web presence that includes search, display, selection, sale, and payment for products, which rely on databases that provide all the relevant data and store data, for the transactions.
Applications may be local to a function or department or shared across many departments, and they may share data from the databases. Database systems provide special languages by which their data can be manipulated, and hence can be used by application developers.