Splitting strings is a common task in Java β whether you're processing CSV files, parsing user input, or breaking down URLs. Java provides two primary tools for this: String.split() and StringTokenizer.
This tutorial explores both, highlighting their syntax, use cases, differences, performance, and best practices.
π What Is String Splitting?
String splitting means breaking a single string into multiple substrings based on a delimiter (e.g., comma, space, pipe).
Example:"apple,banana,grape" β ["apple", "banana", "grape"]
π Using split() Method
Syntax
String[] parts = str.split(String regex);
Example
String fruits = "apple,banana,grape";
String[] result = fruits.split(",");
System.out.println(Arrays.toString(result)); // [apple, banana, grape]
With Limit
String[] result = fruits.split(",", 2);
// Output: [apple, banana,grape]
π Using StringTokenizer (Legacy)
Syntax
StringTokenizer tokenizer = new StringTokenizer(str, delimiter);
Example
String input = "one,two,three";
StringTokenizer tokenizer = new StringTokenizer(input, ",");
while (tokenizer.hasMoreTokens()) {
System.out.println(tokenizer.nextToken());
}
π split() vs StringTokenizer Comparison
| Feature | split() |
StringTokenizer |
|---|---|---|
| Introduced In | Java 1.4 | Java 1.0 |
| Returns | Array of strings | Tokens via iterator |
| Based On | Regex | Delimiter string |
| Null-safe | β Yes | β Can throw NPE |
| Thread-safe | β Yes (immutable) | β No |
| Flexibility | β High (regex) | π« Limited |
| Preferred In Modern Code | β Yes | π« Legacy API |
βοΈ Real-World Use Cases
- CSV Parsing:
split(",") - Log File Analysis:
split("\s+") - Tokenizing User Input:
StringTokenizer(legacy) - URL Breakdown:
split("/")
β οΈ Edge Cases & Pitfalls
1. Splitting by special regex characters
String path = "C:\Users\John";
String[] parts = path.split("\\"); // Need to escape backslash
2. Trailing empty strings
String input = "a,b,c,,";
String[] tokens = input.split(",");
System.out.println(tokens.length); // 4 (last empty is removed)
Use split(",", -1) to retain all tokens.
π Refactoring Example
β Naive Manual Splitting
String line = "a,b,c";
int comma1 = line.indexOf(",");
int comma2 = line.indexOf(",", comma1 + 1);
String part1 = line.substring(0, comma1);
String part2 = line.substring(comma1 + 1, comma2);
β
Better with split()
String[] parts = line.split(",");
π What's New in Java Strings?
Java 11
- Added
isBlank(),lines(),strip(), etc. split()enhanced with better Unicode handling
Java 13+
- Text Blocks (multi-line strings)
Java 21
- String Templates (preview feature)
β Best Practices
- Prefer
split()overStringTokenizerin modern code - Escape regex characters properly (
\.for period,\\for backslash) - Use
limitargument if you want to control splits - Use
split(",", -1)to preserve empty tokens - Avoid using
StringTokenizerunless maintaining legacy systems
π Conclusion and Key Takeaways
- Use
split()for modern, flexible, regex-based string splitting. StringTokenizeris a legacy class β avoid unless necessary.- Always consider edge cases like empty strings, special characters, or multiple delimiters.
- Proper string splitting leads to cleaner, more maintainable code.
β FAQ
1. Whatβs the main difference between split() and StringTokenizer?
split() uses regex and returns an array. StringTokenizer is legacy and returns tokens one by one.
2. When should I use StringTokenizer?
Only in legacy code or where minimal dependencies are needed.
3. How to retain empty tokens?
Use split(delimiter, -1).
4. Can split() handle regex?
Yes. It's fully regex-based.
5. Is StringTokenizer thread-safe?
No. split() is safer due to immutability.
6. How to split by whitespace?
Use split("\s+").
7. How to split a string into characters?
Use split("") or convert to char array.
8. What if delimiter is a special regex character?
Escape it with \.
9. Is split() faster than StringTokenizer?
Generally yes, and much more flexible.
10. Can I use multiple delimiters?
Yes. Use regex like split("[,;|]").