Removing a string prefix in Python

A little Python thing I like to do, but have never seen in other people’s code is to remove a prefix like this:

s = 'some string'
if s.startswith('some '):
    s = s[len('some '):]

I like the s[len('some '):] approach as I find it both error-robust (as opposed to typing the actual number like s[5:]) and self-documenting. For example, consider:

from glob import glob
files = glob('datadir/experiment/*.txt')

ids = [f[len('datadir/'):] for f in files]

It is pretty clear that what I want to do is remove the datadir/ prefix.

It works for suffixes too:

without_ext = filename[:-len('.txt')]
combined = filename[len('datadir/experiment/'):-len('.txt')]

This is much better than [1]:

combined = filename[18:-4]

§

(One may be tempted to write filename.replace('.txt','') to get rid of a suffix, but this is wrong! It does not work with 'datadir/experiments/datafiles.txt/filename.txt', which is perfectly legal.)

§

It is slightly inefficient because the Python interpreter will actually create a string, then compute its length. [2]

However, this is generally in code where it does not matter that much. If it did, I’d be doing it in C(++) or using some other method.

[1] It should have been filename[19:-4], but it’s hard to see immediately. In any case, writing a number always makes me think and code should not make you think too muchg
[2] It is not allowed to just replace it by the result statically because you may have redefined the function len. It could have a check for the common case, I suppose.
Advertisements

4 thoughts on “Removing a string prefix in Python

  1. I use “some string”.split()[1] to get “string”, and os.path.basename(‘datadir/experiment/*.txt’) to get to the filenames. There is also os.path.splitext to get a tuple of (root,ext) from the filename.

    • For the specific use cases above, yes (although getting rid of just the first element of a path can be a couple of lines). There are more complex use cases:

      filename = ‘my_prefix/my_prefix2/datadir/subdir/file.txt’
      data_id = filename[len(‘my_prefix/my_prefix2/’):]

      This is pretty common for me where the data is a collection of directories & subdirectories where that path identifies the datum in question. However, it can be moved on the filesystem so there is often a prefix which may change.

      You can do this with splits() & joins, but I would prefer my two-liners as more explicit.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s