Splitting strings is a common task in Java — whether you're processing CSV files, parsing user input, or breaking down URLs. Java provides two primary tools for this: String.split()
and StringTokenizer
.
This tutorial explores both, highlighting their syntax, use cases, differences, performance, and best practices.
🔍 What Is String Splitting?
String splitting means breaking a single string into multiple substrings based on a delimiter (e.g., comma, space, pipe).
Example:"apple,banana,grape"
→ ["apple", "banana", "grape"]
🛠 Using split()
Method
Syntax
String[] parts = str.split(String regex);
Example
String fruits = "apple,banana,grape";
String[] result = fruits.split(",");
System.out.println(Arrays.toString(result)); // [apple, banana, grape]
With Limit
String[] result = fruits.split(",", 2);
// Output: [apple, banana,grape]
🔎 Using StringTokenizer
(Legacy)
Syntax
StringTokenizer tokenizer = new StringTokenizer(str, delimiter);
Example
String input = "one,two,three";
StringTokenizer tokenizer = new StringTokenizer(input, ",");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
📊 split() vs StringTokenizer Comparison
Feature | split() |
StringTokenizer |
---|---|---|
Introduced In | Java 1.4 | Java 1.0 |
Returns | Array of strings | Tokens via iterator |
Based On | Regex | Delimiter string |
Null-safe | ✅ Yes | ❌ Can throw NPE |
Thread-safe | ✅ Yes (immutable) | ❌ No |
Flexibility | ✅ High (regex) | 🚫 Limited |
Preferred In Modern Code | ✅ Yes | 🚫 Legacy API |
⚙️ Real-World Use Cases
- CSV Parsing:
split(",")
- Log File Analysis:
split("\s+")
- Tokenizing User Input:
StringTokenizer
(legacy) - URL Breakdown:
split("/")
⚠️ Edge Cases & Pitfalls
1. Splitting by special regex characters
String path = "C:\Users\John";
String[] parts = path.split("\\"); // Need to escape backslash
2. Trailing empty strings
String input = "a,b,c,,";
String[] tokens = input.split(",");
System.out.println(tokens.length); // 4 (last empty is removed)
Use split(",", -1)
to retain all tokens.
🔄 Refactoring Example
❌ Naive Manual Splitting
String line = "a,b,c";
int comma1 = line.indexOf(",");
int comma2 = line.indexOf(",", comma1 + 1);
String part1 = line.substring(0, comma1);
String part2 = line.substring(comma1 + 1, comma2);
✅ Better with split()
String[] parts = line.split(",");
📌 What's New in Java Strings?
Java 11
- Added
isBlank()
,lines()
,strip()
, etc. split()
enhanced with better Unicode handling
Java 13+
- Text Blocks (multi-line strings)
Java 21
- String Templates (preview feature)
✅ Best Practices
- Prefer
split()
overStringTokenizer
in modern code - Escape regex characters properly (
\.
for period,\\
for backslash) - Use
limit
argument if you want to control splits - Use
split(",", -1)
to preserve empty tokens - Avoid using
StringTokenizer
unless maintaining legacy systems
🔚 Conclusion and Key Takeaways
- Use
split()
for modern, flexible, regex-based string splitting. StringTokenizer
is a legacy class — avoid unless necessary.- Always consider edge cases like empty strings, special characters, or multiple delimiters.
- Proper string splitting leads to cleaner, more maintainable code.
❓ FAQ
1. What’s the main difference between split()
and StringTokenizer
?
split()
uses regex and returns an array. StringTokenizer
is legacy and returns tokens one by one.
2. When should I use StringTokenizer
?
Only in legacy code or where minimal dependencies are needed.
3. How to retain empty tokens?
Use split(delimiter, -1)
.
4. Can split()
handle regex?
Yes. It's fully regex-based.
5. Is StringTokenizer
thread-safe?
No. split()
is safer due to immutability.
6. How to split by whitespace?
Use split("\s+")
.
7. How to split a string into characters?
Use split("")
or convert to char array.
8. What if delimiter is a special regex character?
Escape it with \
.
9. Is split()
faster than StringTokenizer
?
Generally yes, and much more flexible.
10. Can I use multiple delimiters?
Yes. Use regex like split("[,;|]")
.