What is Data Management? Challenge

Udacity Offer 50 OFF

What is Data Management?

Data management refers to the process of collecting, storing, organizing, maintaining, and utilizing data in an organization or business. It involves the development and implementation of policies, procedures, and strategies to ensure the accuracy, completeness, security, and availability of data throughout its lifecycle.

Introduction of Data Management

Data as a resource of any organization has gained critical importance in recent years. It has to be managed carefully to ensure access, reliability, and security. Databases consist of simple structures such as fields, records, and tables. When assembled within an architecture, these simple structures provide an immensely useful yet manageable resource for organizations. There are many types of database designs, the most popular being that of tables related to each other, called relational databases.

Commercial implementations of such databases are called database management systems (DBMS) with many features to create, update, query, and manage data easily. All such systems rely on SQL, a computer language that allows easy definition and manipulation of tables and relations.

Modern database requirements are modelled by entity-relationship (ER) diagrams that provide a high-level user view of a database structure. This view is then translated into the database’s actual design, called schema. When organizations accumulate large masses of data, the focus shifts from simply using the data for transactions to that using the data for help in decision-making.

Data is separated into special tables called warehouses that are then used for analysis. Big Data is the form of data that has massive volume, velocity, and variety, and is often used for organizations for key insights.

Need for Data Management

History of Data Use

In the early years of computing when programs were written on large mainframe computers, the practice was to include the data required for computation within the program. For example, if a program computed the income tax deductions for personnel in an organization, the data regarding the personnel and their income was maintained within the program itself.

If changes were required, say, when a new employee joined the organization, the entire program for income tax calculations would have to be modified, not just the data alone. Changes to data were difficult as the entire program had to be changed, and further, the data was not available to other programs.

With advances in programming languages, this situation changed and data was maintained in separate files that different programs could use. This improved the ability to share data, but it introduced problems with data updating and integrity. If one program changed the data, other programs had to be informed of this development and their logic had to be altered accordingly.

A start in organizing data came with the idea of the relational data storage model, put forward by British scientist E.F. Codd in 1970. Codd, then working with IBM in the USA, showed how data could be stored in structured form in files that were linked to each other and could be used by many programs with simple rules of modification. This idea was taken up by commercial database software, like Oracle, and became the standard for data storage and use.

Challenge of Data Management

Consider the following facts:

  • Researchers estimate that the total amount of data stored in the world is 295 exabytes or 295 billion gigabytes. This estimate is based on an assessment of analog and digital storage technologies from 1986 to 2007.

    The report (by M. Hilbert and P. Lopez that appeared in Science Express in February 2011) states that paper-based data storage, which was about 33% in 1986, had shrunk to only about 0.07% in 2007, as now most of the data is stored in digital form. Data is mostly stored on computer hard disks or on optical storage devices.

  • The consulting firm IDC estimated (in 2008) that the annual growth in data takes place in two forms:

    • Structured: Here data is created and maintained in databases and follows a certain data model. The growth in structured data is about 22% annually (compounded).

    • Unstructured: Here data remains informally. The growth in unstructured data is about 62% annually.

  • The large online auction firm eBay has a data warehouse of more than 6 petabytes (6 million gigabytes) and adds about 150 billion rows per day to its database tables (in 2010).

The above examples highlight the incredible amounts of data that are being created and stored around the world. Managing this data so that it could be used effectively presents a strong challenge to database systems: the systems not only have to store the data but also have to make it available almost instantly whenever needed, allow users to search through the data efficiently, and also ensure that the data is safe and uncorrupted.

Management Information Systems

(Click on Topic to Read)

Leave a Reply