What is data?
Data is a collection of facts, such as numbers, words, measurements, observations, or just descriptions of things. Data can also be defined as facts and statistics collected together for reference or analysis. Data are quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. There are two types of data; namely, qualitative and quantitative. Qualitative data is descriptive information (it describes something) whereas Quantitative data is numerical information (numbers)
Why data is so important?
- Data is regarded as the number one asset of the organization than a computer, a cabinet, a chair, or a table. If a table is broken can be replaced by a new one, unlike data that carries the identity of the organization. If data are lost can jeopardize the reputation of the organization. For example, if students' records at a school are exposed to the public domain.
- When data is analyzed can provide very useful information to decision-makers at the organization. High-quality data will lead to a better decision-making process
Dimensions of data quality
Data quality must meet six dimensions:
- Accuracy or correctness
- Completeness or comprehensiveness
- Consistency, coherence, or clarity
- Timeliness or latency
- Validity or reasonableness
- Accessibility or availability
- Credibility, reliability, or reputation
- Relevance, pertinence, or usefulness
The term “accuracy” refers to the degree to which information accurately reflects an event or object described. For example, if a customer’s age is 32, but the system says she’s 34, that information is inaccurate. What steps can you take to improve your accuracy? Ask yourself whether the information represents the reality of the situation. Is there incorrect data (that needs to be fixed)?
Data is considered “complete” when it fulfills expectations of comprehensiveness. Let’s say that you ask the customer to supply his or her name. You might make a customer’s middle name optional, but as long as you have the first and last name, the data is complete. There are things you can do to improve this data quality dimension. You’ll want to assess whether all of the requisite information is available and whether there are any missing elements.
At many companies, the same information may be stored in more than one place. If that information matches, it’s considered “consistent.” For example, if your human resources information systems say an employee doesn’t work there anymore, yet your payroll says he’s still receiving a check, that’s inconsistent. To resolve issues with inconsistency, review your data sets to see if they’re the same in every instance. Are there any instances in which the information conflicts with itself?
Is your information available right when it’s needed? That data quality dimension is called “timeliness.” Let’s say that you need financial information every quarter; if the data is ready when it’s supposed to be, it’s timely. The data quality dimension of timeliness is user expectation. If your information isn’t ready exactly when you need it, it doesn’t fulfil that dimension.
Validity is a data quality dimension that refers to information that doesn’t conform to a specific format or doesn’t follow business rules. A popular example is birthdays – many systems ask you to enter your birthday in a specific format, and if you don’t, it’s invalid. To meet this data quality dimension, you must check if all of your information follows a specific format or business rules.
“Unique” information means that there’s only one instance of it appearing in a database. As we know, data duplication is a frequent occurrence. “Daniel A. Robertson” and “Dan A. Robertson” may well be the same person. Meeting this data quality dimension involves reviewing your information to ensure that none of it is duplicated.
How to measure data quality
- Accuracy - How well does a piece of information reflect reality?
- Completeness - Does it fulfil your expectations of what’s comprehensive?
- Consistency - Does information stored in one place match relevant data stored elsewhere?
- Timeliness - Is your information available when you need it?
- Validity - Is the information in a specific format, does it follow business rules, or is it in an unusable format?
- Uniqueness - Is this the only instance in which this information appears in the database?