I\'m building an analyzer for a series of strings. I need to check how much each line is indented (either by tabs or by spaces).
Each line is just a string in a text
The len() method will count tab (\t) as one. In some case, it will not behave expectedly. So my way is to use re.sub and then count the space(s).
indent_count = re.sub(r'^([\s]*)[\s]+.*$', r'\g<1>', line).count(' ')
def count_indentation(line) :
count = 0
try :
while (line[count] == "\t") :
count += 1
return count
except :
return count
To count the number of spaces at the beginning of a string you could do a comparison between the left stripped (whitespace removed) string and the original:
a = " indented string"
leading_spaces = len(a) - len(a.lstrip())
print(leading_spaces)
# >>> 4
Tab indent is context specific... it changes based on the settings of whatever program is displaying the tab characters. This approach will only tell you the total number of whitespace characters (each tab will be considered one character).
Or to demonstrate:
a = "\t\tindented string"
leading_spaces = len(a) - len(a.lstrip())
print(leading_spaces)
# >>> 2
EDIT:
If you want to do this to a whole file you might want to try
with open("myfile.txt") as afile:
line_lengths = [len(line) - len(line.lstrip()) for line in afile]
I think Gizmo's basic idea is good, and it's relatively easy to extend it to handle any mixture of leading tabs and spaces by using a string object's expandtabs()
method:
def indentation(s, tabsize=4):
sx = s.expandtabs(tabsize)
return 0 if sx.isspace() else len(sx) - len(sx.lstrip())
print indentation(" tindented string")
print indentation("\t\tindented string")
print indentation(" \t \tindented string")
The last two print statements will output the same value.
Edit: I modified it to check and return 0 if a line of all tabs and spaces is encountered.