While reading a file’s contents in Python and messing around with trying to get the line number and column number for a given point in the file or vice versa, I noticed something weird. For a given byte-count, the point I ended up at after reading those many bytes in Python was about 50 less than that reported by Emacs. I checked and double-checked my math, but no matter how I looked at it, the result was the same: Python was seeing more bytes in the file than Emacs was. I remembered reading something a few years back about newlines in Windows being \r\n instead of just \n, so I looked up the “pythonic” way of handling this disparity in files that my code would be required to handle.

And I stumbled across this helpful thread on StackOverflow. As it turns out, Python (or at least the open method in Python) has builtin support to handle this; it’s called Universal Newlines. Basically, instead of the typical open call used for getting a file descriptor:

fd = open(filename, 'r')

one instead uses the following call:

fd = open(filename, 'rU')

and Python auto-magically handles everything for you by treating \n as \n and \r\n as \n as well, leaving you to worry just about your business logic and not have to tangle with line-endings. I thought that was pretty neat, that they didn’t say “Oh, we’ll just let someone write a library for that”, they decided to include that in the core libraries (is open in a library or is it part of the language itself? Not sure).

More information here: http://docs.python.org/2/library/functions.html#open

Tagged: