Splitting Strings in Java: Mastering split() and StringTokenizer

Illustration for Splitting Strings in Java: Mastering split() and StringTokenizer
By Last updated:

Splitting strings is a common task in Java — whether you're processing CSV files, parsing user input, or breaking down URLs. Java provides two primary tools for this: String.split() and StringTokenizer.

This tutorial explores both, highlighting their syntax, use cases, differences, performance, and best practices.


🔍 What Is String Splitting?

String splitting means breaking a single string into multiple substrings based on a delimiter (e.g., comma, space, pipe).

Example:
"apple,banana,grape"["apple", "banana", "grape"]


🛠 Using split() Method

Syntax

String[] parts = str.split(String regex);

Example

String fruits = "apple,banana,grape";
String[] result = fruits.split(",");
System.out.println(Arrays.toString(result)); // [apple, banana, grape]

With Limit

String[] result = fruits.split(",", 2);
// Output: [apple, banana,grape]

🔎 Using StringTokenizer (Legacy)

Syntax

StringTokenizer tokenizer = new StringTokenizer(str, delimiter);

Example

String input = "one,two,three";
StringTokenizer tokenizer = new StringTokenizer(input, ",");
while (tokenizer.hasMoreTokens()) {
    System.out.println(tokenizer.nextToken());
}

📊 split() vs StringTokenizer Comparison

Feature split() StringTokenizer
Introduced In Java 1.4 Java 1.0
Returns Array of strings Tokens via iterator
Based On Regex Delimiter string
Null-safe ✅ Yes ❌ Can throw NPE
Thread-safe ✅ Yes (immutable) ❌ No
Flexibility ✅ High (regex) 🚫 Limited
Preferred In Modern Code ✅ Yes 🚫 Legacy API

⚙️ Real-World Use Cases

  • CSV Parsing: split(",")
  • Log File Analysis: split("\s+")
  • Tokenizing User Input: StringTokenizer (legacy)
  • URL Breakdown: split("/")

⚠️ Edge Cases & Pitfalls

1. Splitting by special regex characters

String path = "C:\Users\John";
String[] parts = path.split("\\"); // Need to escape backslash

2. Trailing empty strings

String input = "a,b,c,,";
String[] tokens = input.split(",");
System.out.println(tokens.length); // 4 (last empty is removed)

Use split(",", -1) to retain all tokens.


🔄 Refactoring Example

❌ Naive Manual Splitting

String line = "a,b,c";
int comma1 = line.indexOf(",");
int comma2 = line.indexOf(",", comma1 + 1);
String part1 = line.substring(0, comma1);
String part2 = line.substring(comma1 + 1, comma2);

✅ Better with split()

String[] parts = line.split(",");

📌 What's New in Java Strings?

Java 11

  • Added isBlank(), lines(), strip(), etc.
  • split() enhanced with better Unicode handling

Java 13+

  • Text Blocks (multi-line strings)

Java 21

  • String Templates (preview feature)

✅ Best Practices

  • Prefer split() over StringTokenizer in modern code
  • Escape regex characters properly (\. for period, \\ for backslash)
  • Use limit argument if you want to control splits
  • Use split(",", -1) to preserve empty tokens
  • Avoid using StringTokenizer unless maintaining legacy systems

🔚 Conclusion and Key Takeaways

  • Use split() for modern, flexible, regex-based string splitting.
  • StringTokenizer is a legacy class — avoid unless necessary.
  • Always consider edge cases like empty strings, special characters, or multiple delimiters.
  • Proper string splitting leads to cleaner, more maintainable code.

❓ FAQ

1. What’s the main difference between split() and StringTokenizer?

split() uses regex and returns an array. StringTokenizer is legacy and returns tokens one by one.

2. When should I use StringTokenizer?

Only in legacy code or where minimal dependencies are needed.

3. How to retain empty tokens?

Use split(delimiter, -1).

4. Can split() handle regex?

Yes. It's fully regex-based.

5. Is StringTokenizer thread-safe?

No. split() is safer due to immutability.

6. How to split by whitespace?

Use split("\s+").

7. How to split a string into characters?

Use split("") or convert to char array.

8. What if delimiter is a special regex character?

Escape it with \.

9. Is split() faster than StringTokenizer?

Generally yes, and much more flexible.

10. Can I use multiple delimiters?

Yes. Use regex like split("[,;|]").