Removing a string prefix in Python

A little Python thing I like to do, but have never seen in other people’s code is to remove a prefix like this:

s = 'some string'
if s.startswith('some '):
    s = s[len('some '):]

I like the s[len('some '):] approach as I find it both error-robust (as opposed to typing the actual number like s[5:]) and self-documenting. For example, consider:

from glob import glob
files = glob('datadir/experiment/*.txt')

ids = [f[len('datadir/'):] for f in files]

It is pretty clear that what I want to do is remove the datadir/ prefix.

It works for suffixes too:

without_ext = filename[:-len('.txt')]
combined = filename[len('datadir/experiment/'):-len('.txt')]

This is much better than [1]:

combined = filename[18:-4]


(One may be tempted to write filename.replace('.txt','') to get rid of a suffix, but this is wrong! It does not work with 'datadir/experiments/datafiles.txt/filename.txt', which is perfectly legal.)


It is slightly inefficient because the Python interpreter will actually create a string, then compute its length. [2]

However, this is generally in code where it does not matter that much. If it did, I’d be doing it in C(++) or using some other method.

[1] It should have been filename[19:-4], but it’s hard to see immediately. In any case, writing a number always makes me think and code should not make you think too muchg
[2] It is not allowed to just replace it by the result statically because you may have redefined the function len. It could have a check for the common case, I suppose.

4 thoughts on “Removing a string prefix in Python

  1. I use “some string”.split()[1] to get “string”, and os.path.basename(‘datadir/experiment/*.txt’) to get to the filenames. There is also os.path.splitext to get a tuple of (root,ext) from the filename.

    1. For the specific use cases above, yes (although getting rid of just the first element of a path can be a couple of lines). There are more complex use cases:

      filename = ‘my_prefix/my_prefix2/datadir/subdir/file.txt’
      data_id = filename[len(‘my_prefix/my_prefix2/’):]

      This is pretty common for me where the data is a collection of directories & subdirectories where that path identifies the datum in question. However, it can be moved on the filesystem so there is often a prefix which may change.

      You can do this with splits() & joins, but I would prefer my two-liners as more explicit.

  2. Thanks Luis, it works perfectly on my code:
    if data[r][c].startswith(‘=’):
    data[r][c]= data[r][c][len(‘=’):]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.