1

I would like to obfuscate (scramble) sensitive data from a SQL Server database, but in the way which will provide:

  • irreversibility (the plaintext can't be derived from the obfuscated data),
  • obfuscated data length needs to be the same as a length of data before obfuscation.
  • obfuscated value does not need to be unique for repeated obfuscations of the same input value. To be honest, I rather like getting the same value for the same input which can used (e.g. some matching data in different tables, probably useful in test cases).

Example:

Abc -> zyx (lenght: 3)
StackOverflow -> a65vr4doqjd (lenght: 11)

Usually I avoid "home made" algorithms, so are you aware of some MS builtin solution which could provide this kind of obfuscation?

I hope I expressed my problem clearly, otherwise let me know and I'll try to add as much info as needed.

marc_s
9,0626 gold badges46 silver badges52 bronze badges
asked Jul 4, 2016 at 12:45
0

1 Answer 1

7

No, I am not aware of any built-in function that does exactly this. But, you can still accomplish this without doing anything too complicated.

You could use the built-in CRYPT_GEN_RANDOM function (introduced in SQL Server 2008 R2) which generates random values based on a supplied length. The output is in hex/binary values so each byte returned is represented as two alphanumeric characters (hence the / 2 + 1 part below).

DECLARE @InputString NVARCHAR(4000) = 'hello';
SELECT SUBSTRING(CONVERT(VARCHAR(8000),
 CRYPT_GEN_RANDOM((LEN(@InputString) / 2) + 1),
 2),
 1,
 LEN(@InputString)) AS [Obfuscated];
SET @InputString = 'test';
SELECT SUBSTRING(CONVERT(VARCHAR(8000),
 CRYPT_GEN_RANDOM((LEN(@InputString) / 2) + 1),
 2),
 1,
 LEN(@InputString)) AS [Obfuscated];

Returns something along the lines of:

8C108
9A7A

The only real downside here is that this needs to be done inline as CRYPT_GEN_RANDOM cannot be used in a User-Defined Function (UDF: Scalar or Table-Valued). However, it can still be applied in a set-based approach using a CTE as shown here (just set @MaxLength to the max length of the column being obfuscated):

DECLARE @MaxLength INT = 10;
;WITH cte AS
(
 SELECT CONVERT(VARCHAR(8000),
 CRYPT_GEN_RANDOM((@MaxLength / 2) + 1),
 2) AS [Random]
)
SELECT tmp.[String],
 cte.[Random],
 SUBSTRING(cte.[Random], 1, LEN(tmp.[String])) AS [Obfuscated]
FROM (VALUES (N'test'), (N'Hello')) tmp(String)
CROSS JOIN cte;

Returns something along the lines of:

String Random Obfuscated
------ ------------ ----------
test F99B3888F993 F99B
Hello D3250E74F0A3 D3250

As you can see, CRYPT_GEN_RANDOM returns a different value for each row.

Also, not sure if this is acceptable or not, but the only alpha characters returned are A - F.


OR, if you want the obfuscation to be repeatable for the same input value, or at least don't mind it being repeatable and prefer that this code be in a function so that it is easier to apply to multiple columns, you can use the HASHBYTES function which, like CRYPT_GEN_RANDOM, returns hex/binary bytes. Unlike CRYPT_GEN_RANDOM, the output length is fixed (in this case at 64 characters since I am using SHA2_256), so I used REPLICATE to repeat the hashed valued if the length of the input string is more than 64 characters. Also unlike CRYPT_GEN_RANDOM, HASHBYTES can be used in a User-Defined Function (UDF) :-).

CREATE FUNCTION dbo.Obfuscate(@InputString NVARCHAR(4000))
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
 SELECT SUBSTRING(REPLICATE(CONVERT(VARCHAR(8000),
 HASHBYTES('SHA2_256', @InputString),
 2),
 (LEN(@InputString) / 64) + 1),
 1,
 LEN(@InputString)) AS [Obfuscated];
GO

And that can be used as follows:

SELECT tmp.[String],
 LEN(tmp.[String]) AS [InputLength],
 ob.[Obfuscated],
 LEN(ob.[Obfuscated]) AS [OutputLength]
FROM (VALUES (N'test'), (N'Hello'), (REPLICATE(N'A', 63)),
 (REPLICATE(N'B', 64)), (REPLICATE(N'C', 65)),
 (REPLICATE(N'D', 4000))) tmp(String)
CROSS APPLY dbo.Obfuscate(tmp.[String]) ob;

Returns something along the lines of:

String InputLength Obfuscated OutputLength
------ ----------- ---------- ------------
test 4 FE52 4
Hello 5 A07E4 5
AAAAAAAAAAAAAAAAAAAAAA... 63 4B589C85DE74E76487730F3... 63
BBBBBBBBBBBBBBBBBBBBBB... 64 79813FB6480F354F1C6017A... 64
CCCCCCCCCCCCCCCCCCCCCC... 65 FB4B38FBA41ECC24B5B0F68... 65
DDDDDDDDDDDDDDDDDDDDDD... 4000 5D01CC6508C164E652B5C77... 4000

PLEASE NOTE: If you need alpha characters beyond A - F and/or need to have distinct obfuscated values for distinct input values (i.e. reduce chances of collisions), then either method above can be adapted easily enough to do that.

answered Jul 4, 2016 at 15:07
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.