Splitting Strings is a trivial task. Using SQL unfortunately it is not so easy to do. Recently I had to split long concatenated strings from one field in an Oracle database.

Luckily I was not the only one facing this challenge and found helpful information on the internet.

Below an example that did the trick for me.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
 -- Splitting test strings --
WITH qry AS (
Select 'a,b,c' As tf From Dual
Union Select 'd,e,f' As tf From Dual 
Union Select 'g,h' As tf From Dual 
Union Select 'i' As tf From Dual
) 
SELECT regexp_substr(tf, '[^\,]+', 1, rn) Transposed_Field 
FROM qry 
cross join 
(
Select rownum rn from ( 
select max(regexp_count(tf, ',')) + 1 mx from qry) connect by level <= mx 
) 
Where regexp_substr(tf, '[^\,]+', 1, rn) is not null
Order By Transposed_Field;

The resultset of above query will look like the following picture.

  

The top part "With qry" builds a resultset for testing pusposes, selecting static, hard coded strings from standard testing table DUAL.

1
2
3
4
5
6
WITH qry AS (
Select 'a,b,c' As tf From Dual
Union Select 'd,e,f' As tf From Dual 
Union Select 'g,h' As tf From Dual 
Union Select 'i' As tf From Dual
) 

In order to easier understand the bottom part we have to split the whole query into smaller parts, extracting and explaining single sub selects. Let's first have a look at the max(regexp_count).

If you execute below query you will receive the maximum count of the defined seperator in the concatenated field you want to split. In this example the maximum count is three.

1
2
3
4
5
6
7
8
-- Explaining the "max(rexp_count())" clause --
WITH qry AS (
  Select 'a,b,c' As tf From Dual
  Union Select 'd,e,f' As tf From Dual 
  Union Select 'g,h' As tf From Dual 
  Union Select 'i' As tf From Dual
)
select max(regexp_count(tf, ',')) + 1 mx from qry;

If you were to change one of the strings to contain more than three commas in one of the string, you would receive an equivalent max count of separators.

Now that we know how often each line from the source query "qry" will have to be transposed (copied vertically) at maximum, we can build a select holding the row numbers that will serve in the final join as multiplier and index for the regular expression.

1
2
3
4
5
6
7
8
9
10
-- Explaning the "rownum" and "connect by level" clauses -- 
WITH qry AS (
	Select 'a,b,c' As tf From Dual
	Union Select 'd,e,f' As tf From Dual 
	Union Select 'g,h' As tf From Dual 
	Union Select 'i' As tf From Dual
)
Select rownum rn from (
	select max(regexp_count(tf, ',')) + 1 mx from qry
) connect by level <= mx; 

The resultset of above query will hold an index, which is incremented with every row of the resultset. Below you can see a screenshot of the resultset for above query. The rownum clause will give you a number representing an index for each row in a resultset.

 

Now for the "connect by level" clause, which is a bit hard to explain. Let's try the following as an example.

1
2
3
4
5
6
WITH qry AS (
  Select 'a' As tf From Dual
  Union Select 'b' As tf From Dual
)
Select rownum, tf, level, sys_connect_by_path(tf, '->') as ph from qry 
connect by level <= 2;

Since we are selcting only two records, we would expect to see only two records in the resultset. However, the resultset will return 6 rows.

Why is that? The "connect by" clause is used to resolve hierarchical data structures like trees in Oracle. You can connect child and parent items stored in the same table or more than one table by providing a child key and parant key relationship. "Connect by" joins items stored in the same table by providing the key connection as a constraint.

Another constraint you can provide in the connect by clause is the level, in order to define until which level in a tree structure the query should run and continue to combine records.

If the key connection is not provided and the only constraint is "level <= N" the only limit is the amount of levels provided in "N". In this case Oracle returns each possible permutation within each level and starts connecting each new row of the result to each row that has been pulled before.

"sys_connect_by_path(tf, '->')" shows you how each row in the source set ('a' and 'b')  is connected to each other row and the level in combination with the source value from "qry" shows you how many combinations are created.

To summarize the "connect by level" clause, it will return all possible combinations / permutations resulting from your source resultset, in our example "qry".

Below you can find helpful links, explaining hierarchical queries in general (connect by) and some forum entries explaining what is happening in detail for special use case "connect by level".

http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries003.htm

http://stackoverflow.com/questions/13540661/why-does-connect-by-level-on-a-table-return-extra-rows

Well, hope it helps. Enjoy.