Courtesy of SQL Server Pro. You can find the original article here.

My dad, Gabriel Ben-Gan, passed away recently. He loved numbers, logic and puzzles, and used to solve problems in his own unique way. This article is about a puzzle that incorporates the above ingredients. Dad, this one’s for you, and is in your memory.

I’d like to thank David Shipley, who originally sent me this beautiful puzzle. The problem is pretty simple, but the solution requires some creativity.

As part of the solution to the puzzle I use an auxiliary table called dbo.Nums, which contains a sequence of integers starting with 1 in a column called n. You can find this auxiliary table in the sample database TSQLV3. You can download the source code to create this sample database here.

The Puzzle
The puzzle is about coping with typographical errors (typos) when a keyword is entered for search purposes.

You get as inputs three strings:

  1. A keyword that the user entered for search (@keyword)
  2. A source word (@src)
  3. A synonym for the source word (@synonym)

Here’s an example for possible input values:

DECLARE 
  @keyword AS NVARCHAR(1000) = N'wwAMxxAMyyAMzz',
  @src     AS NVARCHAR(1000) = N'AM',
  @synonym AS NVARCHAR(1000) = N'AN';

Your task is to generate all possible permutations of the input @keyword where each occurrence of @src is either replaced with @synonym or not. For example, for the above inputs the desired output is:

wwAMxxAMyyAMzz
wwANxxAMyyAMzz
wwAMxxANyyAMzz
wwANxxANyyAMzz
wwAMxxAMyyANzz
wwANxxAMyyANzz
wwAMxxANyyANzz
wwANxxANyyANzz

To work…

Step 1: Generate permutations
The solution to this puzzle can be broken into five main steps.

The first step is to return a sequence of integers in the range 0 through num_permutations – 1, where num_permutations represents the number of permutations of @keyword that need to be created. The num_permutations value can be computed as 2^num_ocurrences, where num_occurrences is the number of occurrences of @src in @keyword. With our sample input values, @src is AM and @keyword is wwAMxxAMyyAMzz, so num_ocurrences = 3, and therefore num_permutations = 8.

If you’re wondering, “What’s the logic behind computing the number of permutations?” think of the template for the permutations as having placeholders for all occurrences of @src in @keyword: ww?xx?yy?zz. Each placeholder is like a bit in an integer bitmap. When the bit is not set, you replace the placeholder with @src; when it’s set you replace it with @synonym. With three bits, the possible integers fall into the range 0 through 7 (2^3 – 1). The bitmaps of the integers in this range represent the permutations of the aforementioned template that you need to create.

The following expression computes num_ocurrences:

(LEN(@keyword) - LEN(REPLACE(@keyword, @src, N''))) / LEN(@src)

The following query generates the integers whose bitmaps represent the permutations that you need to create:

SELECT n-1 AS permutation
FROM TSQLV3.dbo.Nums
WHERE n <=
  POWER(2,
    (LEN(@keyword) - LEN(REPLACE(@keyword, @src, N''))) / LEN(@src));

Table 1 shows the output of this query with the bitmap representation of the integers.

Table 1: Permutations

permutation binperm
----------- -------
0           000
1           001
2           010
3           011
4           100
5           101
6           110
7           111

Step 2: Compute injection positions

Call the result of the first step P. The second step creates a row for every combination of permutation in P and number n in the range 1 through num_ocurrences. Each occurrence of a @src element in @keyword may be preceded by some non-@src element and may be followed by some non-@src element. So, for every n in the range 1 through num_ocurrences we can think of the positions of the @src elements among the @src and non-@src elements as being 2 * n. For example, in our input @keyword we have the following elements:

1 - ww
2 - AM
3 - xx
4 - AM
5 - yy
6 - AM
7 - zz

As you can see, the three occurrences of @src are in element positions 2, 4 and 6. If a non-@src element doesn’t exist before or after a @src element, you can think of it as existing but being an empty string.

As mentioned, the second step of our solution generates a row for every permutation in P and occurrence n of @src in @keyword. You compute the element position of the replacement element (call it pos) as 2 * n. Then depending on the state of the nth bit in permutation, the replacement element is either @src (bit isn’t set) or @synonym (bit is set). The code in Listing 2 implements the second step of the solution.

Listing 1: Step 2 – Compute injection positions

WITH P AS
(
  SELECT n-1 AS permutation
  FROM TSQLV3.dbo.Nums
  WHERE n <=
    POWER(2,
     (LEN(@keyword)
       - LEN(REPLACE(@keyword, @src, N'')))
     / LEN(@src))
)
SELECT *
FROM P
  CROSS APPLY ( SELECT N.n, N.n * 2 AS pos, -- * 2 to account for a preceeding element
                  CASE permutation & POWER(2, N.n - 1)
                      WHEN 0 THEN @src
                      ELSE @synonym
                  END AS element
                FROM TSQLV3.dbo.Nums AS N
                WHERE N.n <=
                  (LEN(@keyword)
                    - LEN(REPLACE(@keyword, @src, N'')))
                  / LEN(@src) ) AS A
ORDER BY permutation, pos, element;

The code in Listing 1 defines a CTE called P based on the query that implements the first step in the solution. The outer query against P uses the CROSS APPLY operator to create for every permutation and occurrence of @src in @keyword the replacement elements. The APPLY operator applies a query against Nums (call it N). As mentioned, for occurrence N.n, the target element position (pos) is N.n * 2. To know whether to use @src or @sysnonym as the target element, you need to check whether the nth bit is set. You do so with the following expression:

permutation & POWER(2, N.n - 1)

You then use a CASE expression to return either @src or @synonym based on the result of this expression.
The output of the code in Listing 2 with our sample input values is shown in Table 2 in abbreviate form.

Table 2: Injection positions

permutation binperm n   pos         element
----------- ------- --- ----------- -------
0           000     1   2           AM
0           000     2   4           AM
0           000     3   6           AM
1           001     1   2           AN
1           001     2   4           AM
1           001     3   6           AM
2           010     1   2           AM
2           010     2   4           AN
2           010     3   6           AM
...

Again, I added a column called binperm that shows the bitmap representation of the permutation value.

Step 3: Split original elements
The third step creates the set of elements that precede or follow the @src elements in @keyword. It also computes corresponding element positions, leaving room for the replacement elements to be injected. To achieve this you use classic string-splitting logic based on an auxiliary table of numbers (you can find coverage of such string splitting solutions here and here). In this case, you split the string stored in @keyword, using @src as the separator. The following code implements the third step in the solution:

SELECT (ROW_NUMBER() OVER(ORDER BY n) - 1) * 2 + 1 AS pos,
  SUBSTRING(@keyword, n,
    CHARINDEX(@src, @keyword + @src, n) - n) AS element
FROM TSQLV3.dbo.Nums
WHERE n <= LEN(@keyword) + 1
  AND SUBSTRING(@src + @keyword, n, LEN(@src)) = @src;

This code, applied to our sample input values, generates the output shown in Table 3.

Table 3: Split string

pos                  element
-------------------- --------
1                    ww
3                    xx
5                    yy
7                    zz

Step 4: Unifies original elements and elements to inject

The fourth step simply combines the result of step 2 (replacement elements to inject) with the result of step 3 (original elements). The code in Listing 4 implements this step.

Listing 2: Step 4 – Combine steps 2 and 3

WITH P AS -- Permutations
(
  SELECT n-1 AS permutation
  FROM TSQLV3.dbo.Nums
  WHERE n <=
    POWER(2,
      (LEN(@keyword)
        - LEN(REPLACE(@keyword, @src, N'')))
      / LEN(@src))
),
E AS -- Orignal split elements minus @src occurrences
(
  SELECT (ROW_NUMBER() OVER(ORDER BY n) - 1) * 2 + 1 AS pos,
    SUBSTRING(@keyword, n,
      CHARINDEX(@src, @keyword + @src, n) - n) AS element
  FROM TSQLV3.dbo.Nums
  WHERE n <= LEN(@keyword) + 1
    AND SUBSTRING(@src + @keyword, n, LEN(@src)) = @src
)
SELECT *
FROM P
  CROSS APPLY ( -- Elements to inject
                SELECT N.n * 2 AS pos,
                  CASE permutation & POWER(2, N.n - 1)
                      WHEN 0 THEN @src
                      ELSE @synonym
                  END AS element
                FROM TSQLV3.dbo.Nums AS N
                WHERE N.n <=
                  (LEN(@keyword)
                    - LEN(REPLACE(@keyword, @src, N'')))
                  / LEN(@src) 
                
                UNION ALL
                
                -- Original elements
                SELECT pos, element FROM E) AS A
ORDER BY permutation, pos, element;

Like before, the CTE P represents the bitmaps for the permutations of @keyword that need to be created. The CTE E represents the original split elements (the query implementing step 3). The outer query uses a CROSS APPLY operator that applies logic to generate the elements to inject unified with the original elements from E. The output of the code in Listing 2, applied to our sample input values is shown in Table 4.

 Table 4: Unified original elements and elements to inject

permutation pos                  element
----------- -------------------- -------
0           1                    ww
0           2                    AM
0           3                    xx
0           4                    AM
0           5                    yy
0           6                    AM
0           7                    zz
1           1                    ww
1           2                    AN
1           3                    xx
1           4                    AM
1           5                    yy
1           6                    AM
1           7                    zz
2           1                    ww
2           2                    AM
2           3                    xx
2           4                    AN
2           5                    yy
2           6                    AM
2           7                    zz
...

Step 5: Concatenate elements

The fifth and final step concatenates the elements of each permutation to one string based on pos ordering. I use the classic FOR XML PATH technique to concatenate the elements. Listing 3 has the complete solution for the puzzle.

Listing 3: Complete solution

WITH P AS -- Permutations
(
  SELECT n-1 AS permutation
  FROM TSQLV3.dbo.Nums
  WHERE n <=
    POWER(2,
      (LEN(@keyword)
        - LEN(REPLACE(@keyword, @src, N'')))
      / LEN(@src))
),
E AS -- Orignal split elements minus @src occurrences
(
  SELECT (ROW_NUMBER() OVER(ORDER BY n) - 1) * 2 + 1 AS pos,
    SUBSTRING(@keyword, n,
      CHARINDEX(@src, @keyword + @src, n) - n) AS element
  FROM TSQLV3.dbo.Nums
  WHERE n <= LEN(@keyword) + 1
    AND SUBSTRING(@src + @keyword, n, LEN(@src)) = @src
)
SELECT result
FROM P
  CROSS APPLY ( SELECT element AS [text()]
                FROM (
                  SELECT N.n * 2 AS pos,
                    CASE permutation & POWER(2, N.n - 1)
                        WHEN 0 THEN @src
                        ELSE @synonym
                    END AS element
                  FROM TSQLV3.dbo.Nums AS N
                  WHERE N.n <=
                    (LEN(@keyword)
                      - LEN(REPLACE(@keyword, @src, N'')))
                    / LEN(@src) 
                
                  UNION ALL
                
                  SELECT pos, element FROM E
                     ) AS D
                ORDER BY pos
                FOR XML PATH('')
              ) AS A(result);

Table 5 shows the output of the solution for the sample input values.

Table 5: Desired result

wwAMxxAMyyAMzz
wwANxxAMyyAMzz
wwAMxxANyyAMzz
wwANxxANyyAMzz
wwAMxxAMyyANzz
wwANxxAMyyANzz
wwAMxxANyyANzz
wwANxxANyyANzz

Conclusion
I find the string replacement puzzle that I presented in this article to be a beautiful puzzle. It requires you to apply your knowledge of some classic T-SQL techniques such as the ones used for string splitting and string concatenation, and also to incorporate some new ideas. It involves numbers, logic, bitwise manipulation and love. I can only hope to face many more puzzles like this in the future.

Latest posts by Itzik Ben-Gan (see all)