Data masking
Data masking is a data anonymization technique and a cybersecurity practice. It involves obscuring specific parts of data, such as hiding every character in a name except for the initial, or replacing most digits in a credit card number with filler characters. At the same time, the data remains meaningful, maintaining the original format.
IT teams and database administrators are typically responsible for implementing data masking techniques. They configure databases and data management systems to mask sensitive data, ensuring it remains obscured to all users, including the data owner, to enhance security and privacy.
A common scenario is showing a masked ID or credit card number in the UI to its owner for confirmation or selection purposes, which they can do despite only seeing the last few digits. In fact, PCI DSS mandates that the full primary account number (PAN) should never be rendered on client machines — a requirement easily addressed with masking.
Data masking is also valuable for maintaining data utility in test environments where real PII isn’t required.
Types of data masking
There are two main types of data masking:
- Static data masking is applied to data at rest, permanently replacing the original data. This is typically used when creating copies of production databases for use in lower (non-production) environments, such as testing or development environments.
- Dynamic data masking is applied in real time as data is accessed. This method helps obscure sensitive information in the UI or when displaying query results without altering data in the underlying database.
Data masking can also be deterministic or non-deterministic. Deterministic data masking consistently transforms the same input value into the same masked value, while non-deterministic data masking can produce a different masked value each time the same input value is masked.
Examples of data masking
Masking allows you to preserve the format of the original data, and you’ll likely want to implement masking differently for different data types.
For example, when masking a social security number (SSN) or a credit card number, you might mask all characters except for the last four.
For data types that don’t have a fixed length, you may want to apply masking such that the length of the mask remains consistent. For example, although the last names “Johnson” and “Johansson” differ in length, both could use the mask “J****” with a fixed length of 5 characters.
The same principle applies to email masks. You may want to keep the @
character and the domain part of an email while masking the local part except for the first character, using a fixed length.
Data type | Unmasked | Masked |
---|---|---|
SSN | 232-76-4321 | XXX-XX-4321 |
Credit card | 5186 5087 5957 5100 | **** **** **** 5100 |
Last name | Johnson | J**** |
james_spencer@gmail.com | j*****@gmail.com |