I was recently fixing a bug in my code that used String class's split() method in groovy to split a string of values separated by comma (,) and was processing each element of the result using .each{} method of the result. That was buggy and what bugged me most was- when I was testing all possible cases, particularly, the case to test blank String ('') caused my expected result to fail. I had to switch to tokenize() in order to get my blank String ('') test-case to pass.
Here is what I noticed and found:
Both split and tokenize methods take String as an argument as delimiter for splitting and tokenizing. The subtle difference as per the API documentation is: split returns Array of Strings whereas tokenize returns List of Strings.
However, the difference for a blank string ('') is:
split() returns an Array of size 1 (unexpected and leads to bugs when looping or relying on the size) whereas tokenize() returns a List of size 0 (as expected). The first element in the Array returned by split() is nothing but the blank string itself.
Example code:
Here is what I noticed and found:
Both split and tokenize methods take String as an argument as delimiter for splitting and tokenizing. The subtle difference as per the API documentation is: split returns Array of Strings whereas tokenize returns List of Strings.
However, the difference for a blank string ('') is:
split() returns an Array of size 1 (unexpected and leads to bugs when looping or relying on the size) whereas tokenize() returns a List of size 0 (as expected). The first element in the Array returned by split() is nothing but the blank string itself.
Example code:
//non-blank String with comma separated values
def languagesInMyCareerStr = 'C, C++, Java, Groovy'
def spiltLanguages = languagesInMyCareerStr.split(', ')
def tokenizedLanguages = languagesInMyCareerStr.tokenize(', ')
assert ['C', 'C++', 'Java', 'Groovy'] == tokenizedLanguages
assert ['C', 'C++', 'Java', 'Groovy'] == spiltLanguages
assert tokenizedLanguages.class == ArrayList
assert spiltLanguages.class == (String []).class
assert spiltLanguages.size() == tokenizedLanguages.size()
assert tokenizedLanguages.size() == 4
assert spiltLanguages.size() == 4
//blank String
def languagesBeforeMyCareer = ''
spiltLanguages = languagesBeforeMyCareer.split(',')
tokenizedLanguages = languagesBeforeMyCareer.tokenize(',')
assert spiltLanguages.size() != tokenizedLanguages.size()
assert tokenizedLanguages.size() == 0
assert spiltLanguages.size() == 1
assert spiltLanguages[0] == '' //the blank string itself