1

I am seeing a behavior in Spark ( 2.2.0 ) I do not understand, but guessing it's related to Lambda and Anonymous classes, when trying to extract out a lambda function:

This works:

public class EventsFilter
{
 public Dataset< String > filter( Dataset< String > events )
 {
 return events.filter( ( FilterFunction< String > ) x -> x.length() > 3 );
 }
}

Yet this does not:

public class EventsFilter
{
 public Dataset< String > filter( Dataset< String > events )
 {
 FilterFunction< String > filter = new FilterFunction< String >(){
 @Override public boolean call( String value ) throws Exception
 {
 return value.length() > 3;
 }
 };
 return events.filter( filter );
 }
}
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298) ...
...
Caused by: java.io.NotSerializableException: ...EventsFilter
 ..Serialization stack:
- object not serializable (class: ...EventsFilter, 
value:...EventsFilter@e521067)
 - field (class: .EventsFilter1,ドル name: this0,ドル type: class ..EventsFilter)
. - object (class ...EventsFilter1,ドル ..EventsFilter1ドル@5c70d7f0)
. - element of array (index: 1)
 - array (class [Ljava.lang.Object;, size 4)
 - field (class: 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun8,ドル name: references1,ドル type: class [Ljava.lang.Object;)
 - object (class org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun8,ドル <function2>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)

I am testing against:

@Test
public void test()
{
 EventsFilter filter = new EventsFilter();
 Dataset<String> input = SparkSession.builder().appName( "test" ).master( "local" ).getOrCreate()
 .createDataset( Arrays.asList( "123" , "123" , "3211" ) ,
 Encoders.kryo( String.class ) );
 Dataset<String> res = filter.filter( input );
 assertThat( res.count() , is( 1l ) );
}

Even weirder, when put in a static main, both seem to work...

How is defining the function explicitly inside a method causing that sneaky 'this' reference serialization?

asked Sep 17, 2017 at 17:05
2
  • 1
    Can we see the entire serialization stack? Commented Sep 17, 2017 at 19:33
  • added entire serialization stack Commented Sep 17, 2017 at 20:06

2 Answers 2

3

Java's inner classes holds reference to outer class. Your outer class is not serializable, so exception is thrown.

Lambdas does not hold reference if that reference is not used, so there's no problem with non-serializable outer class. More here

answered Sep 17, 2017 at 17:26
Sign up to request clarification or add additional context in comments.

3 Comments

Which inner class / outer class are you talking about? What's the difference between the lambda and the function variable 'filter' and why is it not a free variable?
Outer class i just your main class EventsFilter, inner class = anonymous class. Lamdbas are also using inner classe, but in different way :)I will add more detailed description later, in few hours
1

I was under the false impression that Lambdas are implemented under the hood as inner classes. This is no longer the case (very helpful talk). Also, as T. Gawęda answered, inner classes do in fact hold reference to the outer class, even if it is not needed (here). This difference explains the behavior.

answered Sep 18, 2017 at 7:50

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.