pytype probably should not be your last line of defense

taming a python

Python is a dynamically-typed programming language with cursory support for static typing.

What that means specifically is that, at compile-time:

  • You can annotate any variable name with a type.
  • The types can be parameterized: List[str] — “list of strings” — and List[int] — “list of integers” — are different types.

At runtime:

  • Python ignores all type annotations.
  • Types cannot be parameterized: type([“a”, “b”, “c”]) == type([1, 2, 3]) == list

Python includes a convenient function isinstance(x, y) which returns True if x is a member of type y.

pytype is a tool from Google that analyzes your program based on its type annotations and determines if those annotations are broken by your program. For instance, pytype will complain if you pass [1, 2, 3] where a List[str] was expected.

There are a few practical problems with pytype which I found when trying to use it at work:

  • pytype seems to have trouble with variance — it complained earlier when I wrote code that expected a List[Dict[str, Union[int, str]]] and received a List[Dict[str, int]] instead. Frankly, I don’t know how to deal with this kind of thing in Python, but an error doesn’t seem right.
  • pytype doesn’t seem to understand the magic tricks used to implement SQLAlchemy, the database library we use

Unfortunately, several competing tools I used had worse problems. In addition to that, Python appears to have completely changed the typechecking-relevant APIs in every major version between 3.5 and 3.9, so many typechecking libraries — including dynamic ones, such as typeguard — didn’t work at all on the relatively modern version of Python we used at work.

At work, we have historically added checks at the public interface boundary which use isinstance(x, y) to fail if a value has the wrong type. This has the advantage that, if code is used in a way we didn’t expect, we always get an error.

In general, I think code should resist misuse when possible. Using code the wrong way should basically always cause a crash. pytype doesn’t create that guarantee, so while I recommend using pytype, I also recommend using type assertions of the kind we use when you’re at runtime.

Some of these checks have a big performance cost. When we expect a list of integers, checking isinstance(x, int) for every integer in a big list is pretty expensive. Our experience is that most of the time, when we’ve written an assertion like this, though, we’ve managed to trigger it.

We have also benefited from checking that values are in their expected ranges (for instance, an age shouldn’t be greater than 100, even though Python integers are unbounded) and that input objects aren’t implausibly large. (a list of 1,000 search keywords would be far too many) Most typechecking libraries don’t help with this and for this sort of thing, you should really be validating your input anyways.

Most Python implementations are very slow, so chances are, if you’re using Python, you’ve already accepted a high implicit performance cost. You probably have the infrastructure to scale to a greater number of servers if needed.

That being said — if you’re fearless, disregard this advice! I could be wrong.

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s