Skip to content

CloudStorageFileSystem: java.net.URISyntaxException: Illegal character in hostname #1218

Closed
@kshakir

Description

@kshakir

Environment details

  1. Specify the API at the beginning of the title: CloudStorageFileSystem
  2. OS type and version: macOS 13.4.1 (a)
  3. Java version: OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode)
  4. version(s): 0.126.20-SNAPSHOT

Steps to reproduce

  1. Create a bucket in GCS with a name containing an underscore
  2. Call Paths.get(URI.create("<path_in_your_bucket")) with the GCS path in your bucket

Code example

Paths.get(URI.create("gs://bucket_with_authority/path"))

Stack trace

java.lang.IllegalArgumentException: Expected scheme-specific part at index 3: gs:

	at com.google.cloud.storage.contrib.nio.CloudStorageUtil.stripPathFromUri(CloudStorageUtil.java:65)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.getPath(CloudStorageFileSystemProvider.java:282)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.getPath(CloudStorageFileSystemProvider.java:97)
	at java.base/java.nio.file.Path.of(Path.java:208)
	at java.base/java.nio.file.Paths.get(Paths.java:97)
	at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProviderTest.testBucketWithAuthority(CloudStorageFileSystemProviderTest.java:836)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
	at com.google.cloud.testing.junit4.MultipleAttemptsRule$1.evaluate(MultipleAttemptsRule.java:94)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38)
	at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11)
	at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35)
	at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232)
	at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55)

External references such as API reference guides

Any additional information below

According to RCF 2396, a URI may contain an “Authority Component” (section 3.2). That authority may be “Server-based” (section 3.2.2) or “Registry-based” (section 3.2.1).

All Google bucket names are not server-based authorities. The Google bucket naming documentation mentions “You can use a bucket name in a DNS record as part of a CNAME or A redirect.” but it does not say “You must”.

Since the bucket names are not hostnames then under RFC 2396 the URI may still be constructed using the “Registry-based” form. URI.create("gs://bucket_with_authority/path") produces a valid java.net.URI, it’s just not “Server-based” according to the RFC.

The stack trace appears to be caused by the CloudStorageFileSystem internal use of the java.net.URI API. In a number of places the CloudStorageFileSystem attempts to construct URIs with hostnames, or retrieve the hostname from “Server-based” URIs instead of “Registry-based” URIs.

The fix is to internally use the java.net.URI APIs that support all RFC 2396 section 3.2.1 and section 3.2.2 authorities, not the APIs that only support the section 3.2.2 "Server-based" authorities.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: storageIssues related to the googleapis/java-storage-nio API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions