Don't return named tuples in new APIs

72 points by todsacerdoti 10 days ago | 68 comments

Another problem brought about by their design being backwards-compatible with tuples is that you get wonky equality rules where two namedtuples of different types and with differently-named attributes can compare as equal:

    >>> Foo = namedtuple("Foo", ["bar"])
    >>> Baz = namedtuple("Baz", ["qux"])
    >>> Foo(bar="hello") == Baz(qux="hello")
    True

This also happens with the "new-style" namedtuples (typing.NamedTuple).

I like the convenience of namedtuples but I agree with the author: there are enough footguns to prefer other approaches.

mont_tag 6 days ago | root | parent |

This seems like an invented problem that never comes up in practice. It is no more interesting than numpy arrays evaluating as equal even when they conceptually not comparable:

    >>> temperatures_fahrenheit = np.array([10, 20, 30])
    >>> temperatures_celsius = np.array([10, 20, 30])
    >>> temperatures_fahrenheit == temperatures_celsius
    array([ True,  True,  True])

dathery 6 days ago | root | parent | next |

I've usually seen it come up when people try to hash the objects to use as dictionary keys or in sets, and then encounter very hard-to-troubleshoot issues later on. Obviously it's a bit weird to hash a bunch of objects of different types, but it's just one example of the footguns that namedtuples have and why I prefer other approaches.

quotemstr 6 days ago | root | parent | prev |

The numpy equality thing is actually an enormous footgun, especially for people new to numeric Python. The equality for numpy and its derivates should have had the traditional Python meaning (yielding a bool), with the current operation (yielding a mask) should have been put under a named method.

6 days ago | root | parent | next |

[deleted]

beng-nl 5 days ago | root | parent | prev |

Not that I find your argument invalid, but you’re not arguing against GP.

At risk of belaboring the point or being redundant, GP is making the point that the Celsius vs Fahrenheit meaning of the arrays makes them Semantically different and therefore the equality could Be taken to be misleading when this is taken into account. GP thinks this is nonsense and draws a parallel with named tuples.

xg15 6 days ago | prev | next |

Counterpoint: Named tuples are immutable, while dataclasses are mutable by default.

You can use frozen=true to "simulate" immutability, but that just overwrites the setter with a dummy implementation, something you (or your very clever coworker) can circumvent by using object.__setattr__()

So you neither get the performance benefits nor the invariants of actual immutability.

hansvm 6 days ago | root | parent | next |

Counter-counterpoint:

- Everything in Python is mutable, including the definitions of constants like `3` and `True`. It's much like "unsafe" in Rust; you can do stupid things, but when you see somebody reaching for `__setattr__` or `ctypes` then you know to take out your magnifying glass on the PR, find a better solution, ban them from the repo, or start searching for a new job.

- Performance-wise, named tuples are sometimes better because more work happens in C for every line of Python, not because of any magic immutability benefits. It's similar to how you should prefer comprehensions to loops (most of the time) if you're stuck with Python but performance still matters a little bit. Yes, maybe still use named tuples for performance reasons, but don't give the credit to immutability.

> something you (or your very clever coworker) can circumvent by using object.__setattr__()

This fits pretty well with a lot of other stuff in Python (e.g. there’s no real private members in classes). There’s a bunch of escape hatches that you should avoid (but that can still be useful sometimes), and those usually are pretty obvious (e.g. if you see code using object.__setattr__, something is definitely not right).

Can’t tell whether this is good design or not, but personally I like it.

jfktrey 6 days ago | root | parent | next |

Counterpoint: I've used `object.__setattr__` pretty often when setting values in the `__post_init__` of frozen dataclasses

xg15 6 days ago | root | parent |

I'd argue, that's about the only correct usage of this stuff.

eyegor 6 days ago | root | parent | prev |

Is there a difference between global setattr(object, v) and object.__setattr__(v)? I've seen setattr() in the wild all over but I've never encountered the dunder one.

notpushkin 6 days ago | root | parent |

Note that `object` here is not a placeholder variable but actually refers to the global object type (basically a superclass of pretty much every other type in Python). It allows you to bypass the classes’ __setattr__ and set the value regardless (the setattr() function can’t do that):

  In [1]: from dataclasses import dataclass

  In [2]: @dataclass(frozen=True)
     ...: class Foo:
     ...:     a: int
     ...:

  In [3]: foo = Foo(5)

  In [4]: foo.a = 10
  FrozenInstanceError: cannot assign to field 'a'

  In [5]: setattr(foo, "a", 10)
  FrozenInstanceError: cannot assign to field 'a'

  In [6]: object.__setattr__(foo, "a", 10)

  In [7]: foo.a
  Out[7]: 10

quotemstr 6 days ago | root | parent | prev |

It's Python. You can override practically any behavior. Hell, use ctypes and mutate immutable tuples! Doing so is well-defined in the C API!

What bugs me more about frozen dataclasses is how post-init methods have to use the setattr hack.

webprofusion 6 days ago | prev | next |

Oh you mean Python library APIs. I totally thought this was going to be a generic article about APIs delivered over http, the first thing I'd think of when someone says API.

tomrod 6 days ago | root | parent | next |

Yeah, having spent the last few years in REST world I sort of thought the same thing.

8n4vidtmkvmk 5 days ago | root | parent |

My manager called the class I was about to define an API about 6 years ago and I couldn't refute it. It is a form of API. My definition was suddenly expanded.

fingerlocks 5 days ago | root | parent |

Depending on the context, it’s probably more accurate to call that an ABI

8n4vidtmkvmk 4 days ago | root | parent |

Abi is the binary interface, no? This was within the same compilation unit

lmm 6 days ago | root | parent | prev |

[dead]

heavyset_go 6 days ago | prev | next |

Author could have used NamedTuple instead of dataclass or TypedDict:

    from typing import NamedTuple

    class Point(NamedTuple):
        x: int
        y: int
        z: int

I don't see "don't use namedtuples in APIs" as a useful rule of thumb, to be honest. Ordered and iterable return-types make sense for a lot of APIs. Use them where it makes sense.

rtpg 6 days ago | root | parent | next |

I feel like "consider dataclasses as a useful default" is decent advice.

You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).

heavyset_go 6 days ago | root | parent | next |

> I feel like "consider dataclasses as a useful default" is decent advice.

I agree.

> You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).

I've seen examples where dataclasses were used when order matters, however, hence why I'm not comfortable with a general rule against namedtuples. Sometimes order and iterability matter, and dogmatically reaching for a different data type that doesn't preserve that information might be the wrong choice.

david2ndaccount 6 days ago | root | parent | prev |

You can have methods with NamedTuple

rtpg 6 days ago | root | parent |

Oh, with the class declaration version? I never considered that, but feels obvious now.

The point of the article is that: do not return objects that are sequences too when a struct would suffice.

Though the provided options are not immutable and it is much more important than whether you leak o[0] access. dataclass(frozen=True) is the only acceptable alternative.

notpushkin 6 days ago | root | parent | prev |

Author argues that named tuples are bad (it’s literally the article title) so I think you miss the point?

heavyset_go 6 days ago | root | parent | next |

The author claims that the reason people reach for namedtuples is brevity, but I'd argue to the contrary and include the modern syntax for defining a namedtuple. The syntax is nearly identical to TypedDict and dataclass.

There are other reasons to reach for namedtuples, for example when order or iterability matter. I don't see a general rule of not using namedtuples in APIs to be that useful. Use the right tool when the need calls for it.

mont_tag 6 days ago | root | parent |

Right. It takes equal effort to define a named tuple, typed dict or dataclass.

Mostly people just reach for the tool that does what they want, named tuples for tuply stuff, typed dicts for mapping applications, and dataclasses when you actually want a class.

PittleyDunkin 6 days ago | root | parent | prev |

The author also argues for readability over semantics, so I'm not sure they got the point to begin with.

the__alchemist 6 days ago | prev | next |

I think the best option for this, which is one listed in the article, is the dataclass. It's like a struct in C or Rust. It's ideal for structured data, which is, I believe, what a named tuple is intended for.

o11c 6 days ago | root | parent |

The annoyance of dataclasses, of course, is that they interact very awkwardly with immutability, which much of the Python ecosystem mandates (due to lacking value semantics).

But yes, they're still the least-bad choice.

the__alchemist 6 days ago | root | parent | next |

Valid. One of my biggest (Perhaps my #1) fault with python is sloppy mutability and pass-by-value/reference rules.

Spivak 6 days ago | root | parent |

Is there a situation where Python ever passes by value? Like you can sort of pretend for primitive types but I can't think of case where it's actually value.

o11c 6 days ago | root | parent |

Non-CPython implementations may pass by value as an optimization for certain immutable builtin types. This is visible to code using `is`.

(It's surprisingly difficult to implement a rigorous way to detect this vs compile-time constant evaluation though; note that identical objects of certain types are pooled already when generating/loading bytecode files. I don't think any current implementation is smart enough to optimize the following though)

  $ python3 -c 'o = object(); print(id(o) is id(o))'
  False
  $ pypy3 -c 'o = object(); print(id(o) is id(o))'
  True
  $ jython -c 'o = object(); print(id(o) is id(o))'
  True

ericvsmith 5 days ago | root | parent | next |

What you're seeing here is an optimization about integers, not about pass by value. CPython only does this for small integers:

  $ python -c "print(int('3') is int('3'))"
  True
  $ python -c "print(int('300') is int('300'))"
  False

Other implementations make different choices.

cwalv 6 days ago | root | parent | prev |

Are you saying `o` is passed by value? I think this behavior is due to the return from `id()` being interned, or not. `id(o) == id(o)` will be true in all cases

o11c 6 days ago | root | parent |

I mean that the `id` function returns by value. It's not interning since that explicitly refers to something allocated, which isn't the case here.

ericvsmith 5 days ago | root | parent |

This is incorrect. The returned integer is a regular Python object, not some "unboxed" integer value.

nomel 6 days ago | root | parent | prev |

> is that they interact very awkwardly with immutability

How so?

Tuples are awkward with immutability, if you put mutable things inside them.

o11c 6 days ago | root | parent |

By default, dataclasses can't be used as keys in a `dict`. You have to either use `frozen` (in which case the generated `__init__` becomes an abomination) or use `unsafe_hash` (in which case you have no guardrails).

In languages with value semantics, nothing about this problem even makes sense, since obviously a dict's key is taken by value if it needs to be stored.

Tuple behavior is sensible if you are familiar with discussion of reference semantics (though not as much as if you also support value semantics).

Still, at least we aren't Javascript where the question of using composite keys is answered with "screw you".

porridgeraisin 5 days ago | root | parent |

Haha

  obj[JSON.stringify(x)] = y

* Ducks *

o11c 5 days ago | root | parent |

  x = [Symbol('geese')]

math_dandy 6 days ago | prev | next |

One advantage of (Named)Tuples over dataclasses or SimpleNamespaces is that they can be used as indices into numpy arrays, very useful when you API is returning a point or screen coordinates or similar.

mont_tag 6 days ago | prev | next |

This article seems vacuous to me. It misses the point that tuples are fundamental to the language with c-speed native support for packing, unpacking, hashing, pickling, slicing and equality tests. Tuples appear everywhere from the output of doctest, to time tuples, the result of divmod, the output of a csv reader and the output of a sqlite3 query.

Tuples are a core concept and fundamental data aggregation tool for Python. However, this post uses a trivial `Point()` class strawman to try to shoot down the idea of using tuples at all. IMO that is fighting the language and every existing API that either accepts tuple inputs or returns tuple outputs. That is a vast ecosystem.

According the glossary a named tuple "any type or class that inherits from tuple and whose indexable elements are also accessible using named attributes." Presumably, no one disputes that having names improves readability. So really this weak post argues against tuples themselves.

quotemstr 6 days ago | root | parent | next |

The beauty of Python is that it's so slow that you can relax and use what's clearest and most expensive. Finding yourself micro-optimizing things like tuple allocation time is a signal that you should be writing an extension or a numba snippet or something.

CaliforniaKarl 6 days ago | root | parent | prev |

I think the core issue is about trust.

I trust that the maintainers of the Python language & the Python Standard Library are not going to change their tuple-using APIs in a breaking way, without a clear signal (like a major-version bump).

I do not extend that same trust to other Python projects. Maybe I extend that same trust to projects that demonstrate proper use of Semantic Versioning, but not to others.

Using something other than tuples trades some performance for some stability, which is a trade I’m OK with.

Joker_vD 6 days ago | prev | next |

> This leads to writing tests for both ways of accessing your data, not just one of them. And you shouldn't skimp on this

Or you can just keep returning namedtuple instead of something else, because then you absolutely can skimp on testing whether what you return does, in fact, satisfies the namedtuple's interface.

Spivak 6 days ago | prev | next |

I think for the same reason you should avoid TypedDicts for new APIs as well. Dataclasses are the natural replacement for both.

mont_tag 6 days ago | root | parent |

Not really. A lot of tooling, JSON for example, naturally works with dictionaries. A TypedDict naturally connects with all those tools. In contrast, dataclasses are hostile to the enormous ecosystem of tools that work with dictionaries.

If you store all your data is dataclasses, you end-up having to either convert back to dictionaries or having to rebuild all that tooling. Python's abstract syntax trees are an example. If nodes had been represented with native dictionaries, then pprint would work right out the box. But with every node being its own class, a custom pretty printer is needed.

Dataclasses are cool but people should have a strong preference for Python's native types: list, tuple, dict, and set. Those work with just about everything. In contrast, a new dataclass is opaque and doesn't work with any existing tooling.

designed 6 days ago | root | parent |

An advantage of dataclasses over dicts is that you can add methods and properties.

Also you can easily convert a dataclass to a dict with dataclasses.asdict. Not so easy to go from dict to dataclass though

ReflectedImage 6 days ago | root | parent |

That's what a class is for.

Spivak 5 days ago | root | parent |

Right but that's @dataclass. Being a replacement for classes in commonly used situations is one of its design goals.

ReflectedImage 6 days ago | prev | next |

namedtuple is preferable as it's the more Pythonic solution. Simpler is better.

eesmith 6 days ago | root | parent |

namedtuple brings in likely inappropriate complexity, so it is not always simpler.

Consider the object returned by os.stat. It allows getting terms by index:

  >>> import os
  >>> os.stat("/dev/null")
  os.stat_result(st_mode=8630, st_ino=336, st_dev=-2065165533, st_nlink=1,
  st_uid=0, st_gid=0, st_size=0, st_atime=1730960121, st_mtime=1730981460,
  st_ctime=1730981460)
  >>> os.stat("/dev/null")[0]
  8630
  >>> os.stat("/dev/null")[-1]
  1730981720

But that list of 10 values became locked because people do:

  (mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime) = o.stat(filename)

Which means new fields are not accessible by indexing, only attribute lookup:

  >>> os.stat("/dev/null").st_blksize
  65536

Assuming there wasn't the historical baggage which made os.stat the way it is, why would namedtuple still be the more Pythonic solution, and simpler than a frozen dataclass?

(Historically, os.stat originally returned a tuple, then migrated to os.stat_result and named attributes, both for readability and to allow new fields, but keeping indexing for backwards compatibility support.)

ReflectedImage 6 days ago | root | parent |

Because namedtuple is a simpler construct than frozen dataclass therefore it is always preferable in Python.

The number of characters and lines to access the members of namedtuple is significantly less than dataclass.

dataclass would have similar issues if st_dev was now defined as a string whereas namedtuple would not.

Whilst there maybe edge cases in using simpler constructs, in scripting languages we accept the edge cases as it pays off in 99.99% of cases to simply ignore them. If something goes wrong, you catch the exception "Ask for forgiveness not for permission" if you want to look up the concept.

ericvsmith 5 days ago | root | parent | next |

I think you're saying that it takes fewer characters to define a namedtuple. If you're interested in less typing, There's also dataclasses.make_dataclass:

   >>> Point = dataclasses.make_dataclass("Point", ["x", "y", "z"])
   >>> Point(1, 2, 3)
   Point(x=1, y=2, z=3)

eesmith 5 days ago | root | parent | prev |

The Pythonic "simpler" does not mean "fewer characters" otherwise APL or Perl/Raku would be more Pythonic.

Namedtuple is not strictly simpler. It implements additional features which would not be in an frozen dataclass, which makes "simpler" a personal bias.

I've been using Python continuously since 1998 and well remember the "look before you leap" vs. "ask for forgiveness not for permission" debate from the bygone comp.lang.python forum. And I learned the concept of "easier to ask for forgiveness" from the 60 Minutes interview with Grace Hopper back in the 1980s.

That have nothing to do with this issue, which is one of taking on an API burden without due consideration simply because it's less typing.

personjerry 6 days ago | prev | next |

I feel like get mouse coordinates is a perfect time to return a named tuple though?

Lvl999Noob 6 days ago | root | parent |

Yes. That was a positive case for NamedTuple. The negative case was what if the function needs to grow further and return more stuff and then its no longer clear what the return values are? For example, what if `get_mouse_coordinates()` becomes `get_peripheral_coordinates()` which, for some reason, needs to return the coordinates of all the peripherals as one flat namedtuple `NamedTuple(mouse_x: int, mouse_y: int, pointer_x: int, pointer_y: int, ...)`. I know its a contrived example but it can happen for other kinds of functions.

doctorpangloss 6 days ago | prev | next |

Data classes can gracefully replace tuples everywhere. Set frozen, then use a mixin or just author a getitem and iter magic, and you’re done.

heavyset_go 6 days ago | root | parent |

They can't if you're using tuple unpacking.

doctorpangloss 6 days ago | root | parent |

    def __iter__(): 
      yield self.my_field
      yield self.my_other_field

Recreates tuple unpacking.

Joker_vD 6 days ago | root | parent | next |

Just mix this class in:

    class IWantToHaveNamedTupleInterfaceButWasToldTheyAreBad:
        def __iter__(self):
            return iter(self.__dict__.values())

and voila:

    @dataclass(frozen=True)
    class Point(IWantToHaveNamedTupleInterfaceButWasToldTheyAreBad):
        x: int
        y: int

    p = Point(1, 2)
    x, y = p

heavyset_go 5 days ago | root | parent | prev |

Yes, but that is insane.

awinter-py 6 days ago | prev | next |

these can be more memory-efficient than classes or dictionaries.

there was a point a while back where python added __slots__ to classes to help with this; and in practice these days the largest systems are using numpy if they're in python at all

not sure what modern versions do. but in the olden days, if you were creating lots of small objects, tuples were a low-overhead way to do it

pipeline_peak 5 days ago | prev | next |

You’d think a much an easy to use high level language would have:

Point(x,y,z)

solarkraft 6 days ago | prev |

> But there are three more ways to do the same data structure

Thanks, I hate it. There’s a lot I like about Python, but this is a major pain point.

NamedTuple, TypedDict, Dataclass, Record ... Remember the Zen of Python? „There should be one-- and preferably only one --obvious way to do it“ - it feels like Python has gone way overboard with ways to structure data.

In Javascript everything is an object, you can structurally type them with Typescript and I don’t feel like I’m missing much.