Python vs. Java: Duck Typing, Parsing on Whitespace and Other Cool Differences

Python has a lot to offer Java developers, and the languages are interesting both in their similarities and their differences. In a prior blog, I discussed the differences between Python and Java at a higher level. This time I'm diving slightly deeper and exploring some of the finer technical differences.

The biggest similarity is their "(almost) everything is an object" design and their reputation for excellent cross-platform support, as well as things like immutable strings and deep, relatively standard libraries.

There are big differences, too. At the community level, Java has always had a single large corporate sponsor. Python support is more distributed. Although both are well within the Algol-like family of languages, Python's use of syntactically-significant whitespace sets it a little further apart from the mainstream than Java, which is comfortably familiar in its C-like use of braces and semi-colons.

Both languages are compiled down to bytecodes that run on virtual machines, although Python generally does this automatically at runtime and Java has a separate program (javac) that does it. The virtual machines largely isolate the languages from the vagaries of the underlying hardware. Many Java virtual machines (JVMs) have the ability to do just-in-time compilation of parts of the bytecode down to the native instruction set of whatever platform it happens to be running on, which can produce significant speed-ups.

Parsing whitespace puts some people off Python. As someone who worked in SGML-based text processing, I came to Python quite reluctantly because I believed "whitespace is not actually evil... it is just misunderstood", and could not see how a language that depended on it was a good idea. Once I got used to parsing on whitespace it seemed the most natural thing in the world.

A Key Difference: Duck Typing

The biggest difference between the two languages is that Java is a statically typed and Python is a dynamically typed.

Python is strongly but dynamically typed. This means names in code are bound to strongly typed objects at runtime. The only condition on the type of object a name refers to is that it supports the operations required for the particular object instances in the program. For example, I might have two types Person and Car that both support operation "run", but Car also supports "refuel". So long as my program only calls "run" on objects, it doesn't matter if they are Person or Car. This is called "duck typing" after the expression "if it walks like a duck and talks like a duck, it's a duck".

This makes Python very easy to write and not too bad to read, but difficult to analyze. Static type inference in Python is a known hard problem. The lack of type information in function signatures combined with support for operator overloading and just-in-time loading of modules at runtime means that the most common type inference algorithms have nothing to work with until the point in the program's execution when the types are known anyway. The Hindley-Milner algorithm that is commonly used in functional languages like Haskell and ML depends on being able to know, for example, that certain operations are restricted to particular types. These languages also typically have function signatures that "seed" the algorithm with the type information for their arguments.

In Python, names have no strong binding to their type, and thanks to duck typing, function arguments can be used to pass in any object whose interface supports the operations required by the function. There is no reasonable way to determine the type of an argument in this case, which can be very powerful and convenient, and is a lot like how we use objects in the real world. In the real world I don't generally care if I have a rock or a hammer: both have "hit()" interfaces that result in similar consequences when called.

Classes in object-oriented languages are meant to model concepts, but concepts are purely mental constructs that are essentially attitudes toward the concrete stuff of reality. Duck typing reflects this fact nicely. An object doesn't have to "be" a particular type, it just has to be useable where a thing of that type might be useable. This can lead to surprises, but it's a more accurate reflection of the categorical fluidity of human thought than is the rigid hierarchy imposed by more restrictive type systems.

A Downside of Not Having Type Information

The downside is that not having type information means it can be hard to tell what is going on at any given place in the code, particularly when names are ambiguous, which they frequently are. To take an extreme case:


Does that do some kind of fine-tuning on the object, or translate it into a well-known Eastern European language? Or something else?

If "b" is ten layers down in a call stack it's going to take a long time to answer that question, and if the upper parts of that call stack branch out and are called from many locations, any one of which could be passing in an object of a different type, you could be at it all day.

Java, in contrast, is statically typed. Names in Java are bound to types at compile time via explicit type declaration. This means many type errors that would result in a runtime error--and often a program crash--in Python get caught at compile time in Java. And you can tell at a glance what type of object a name is associated with in Java, which makes analysis by humans as well as compilers much easier.

The cost of this is that developers have to care about types. Java's automatic type conversions are extremely limited, and the compiler insists that objects passed through interfaces be of a type convertible to the target type, either by inheritance or automatic type promotion. Java doesn't permit the kind of implicit type conversion based on constructors that C++ does.

This means that Java depends critically on well-designed types, while Python requires very little type design. This is what makes Python a great prototyping language, and is also what makes it a good teaching language and a good language for people who aren't software professionals. Professionals care about types, and actually enjoy threading the maze of arcane type rules imposed by strongly typed languages to create, clean, powerful systems that are provably type-safe. Everyone else just wants to get their job done.

Python Lets You Get the Job Done

Python lets that happen: it is a language that gets out of the way and lets you get the job done. It sometimes feels like it's a helpful assistant handing you tools. Need a tool to "download stuff" from the Web? Got that. Parse the results? Sure. Run a singular value decomposition on a sparse matrix? No problem. Need antigravity? We're working on it…

Used with appropriate discipline and testing, Python can comfortably scale to large applications and high-powered Web services.We know this because people have done it. It's also a great "glue" language for bringing together everything from Fortran-era numerical libs to statistical algorithms in R.

Finally, the Java design has an interesting mix of atomic types built into the language. In Python this isn't possible, but there is a superset of Python called Cython that allows users to specify types where having that information could result in faster code being emitted by the compiler. Careful use of this extended version of the language can result in considerable performance improvement.

Really finally, there is a version of Python called Jython that compiles to JVM byte-codes, allowing very simple integration of Python with Java, and which gives Python programmers access to Java's deep and powerful libraries.

Worth A Look for Java Developers

Java was a big step forward in simplicity compared to C++, and many people rightly fell in love with it for that reason. Python is an even bigger step in the same direction, toward a simpler, more human-friendly tool for expressing our ideas in a form that machines can turn into reality.

Given all that, Java developers should give Python a look. It's a great scripting language for automating boring and repetitive tasks, it's a great embedded language for Java applications, and it's a great alternative to Java in many situations. What's not to love?