When I run the jar in the GCE, it had the following error:
java -jar mySimple.jar --project=myProjcet
Aug 13, 2015 1:22:26 A
When exporting a Runnable JAR file using Eclipse, there are three ways to package your project:
All 3 options, have the same usage pattern when executing, e.g.
java -jar myrunnable.jar --myCommandLineOption1=...
Currently, only option 1 is compatible with how the Dataflow SDK for Java is able to detect resources to stage because it is dependent on them being file URIs from a URLClassLoader.
For an explanation of how the Runnable Jars are created and more specific details of why this was problematic, read further below.
An alternative solution to using the Runnable Jars, is to execute your project using mvn exec.
This creates a jar which copies all the class files & resources in each individual jar into a single jar. This allows for a manifest where the entire classpath is composed of file based URIs:
Manifest-Version: 1.0
Main-Class: com.google.cloud.dataflow.starter.StarterPipeline
Class-Path: .
This creates a jar file with additional jars embedded within it. It uses a custom main entry point (org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader) which knows how to read the custom manifest entries (Rsrc-Class-Path & Rsrc-Main-Class) and creates a classloader with non file based URIs. Since the Dataflow SDK for Java currently only knows how to handle file based resources and doesn't know how to interpret the rsrc:... URIs, you get the exception that your seeing.
Manifest-Version: 1.0
Rsrc-Class-Path: ./ httpclient-4.3.6.jar ...
Class-Path: .
Rsrc-Main-Class: simple.SimpleV1
Main-Class: org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader
This creates a jar file which contains your project resources and then creates a folder along side the runnable jar containing all your projects dependent jars. This allows for a more complex standard manifest listing all your project dependencies.
Manifest-Version: 1.0
Main-Class: com.google.cloud.dataflow.starter.StarterPipeline
Class-Path: . runnable_lib/google-cloud-dataflow-java-sdk-all-manual_build.jar ...
The Class-Path manifest is not returned part of the URLClassLoader and hence these classes are not discoverable. Furthermore, those jars are only meant to be loaded by classes from that jar which can lead to a jar loading hierarchy. More details are available here: http://docs.oracle.com/javase/7/docs/technotes/tools/findingclasses.html