Counting the words a character said in a movie script

蓝咒 提交于 2019-12-11 09:09:01

问题


I already managed to uncover the spoken words with some help. Now I'm looking for to get the text spoken by a chosen person. So I can type in MIA and get every single words she is saying in the movie Like this:

name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)

So I'm able to count the words afterwards.

This is how the movie script looks like

An awkward beat. They pass a wooden SALOON -- where a WESTERN
 is being shot. Extras in COWBOY costumes drink coffee on the
 steps.
                     Revision                        25.


                   MIA (CONT'D)
      I love this stuff. Makes coming to work
      easier.

                   SEBASTIAN
      I know what you mean. I get breakfast
      five miles out of the way just to sit
      outside a jazz club.

                   MIA
      Oh yeah?

                   SEBASTIAN
      It was called Van Beek. The swing bands
      played there. Count Basie. Chick Webb.
             (then,)
      It's a samba-tapas place now.

                   MIA
      A what?

                   SEBASTIAN
      Samba-tapas. It's... Exactly. The joke's on
      history.

回答1:


If you want to compute your tally with only one pass over the script (which I imagine could be pretty long), you could just track which character is speaking; set things up like a little state machine:

import re
from collections import Counter, defaultdict

words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'

for line in SCRIPT.split('\n'):
    name = line.replace('(CONT\'D)', '').strip()
    if re.match('^[A-Z]+$', name):
        currently_speaking = name
    else:
        words_spoken[currently_speaking].update(line.split())

You could use a more sophisticated regex to detect when the speaker changes, but this should do the trick.

demo




回答2:


I would ask the user for all the names in the script first. Then ask which name they want the words for. I would search the text word by word till I found the name wanted and copy the following words into a variable until I hit a name that matches someone else in the script. Now people could say the name of another character, but if you assume titles for people speaking are either all caps, or on a single line, the text should be rather easy to filter.

for word in script:
    if word == speaker and word.isupper(): # you may want to check that this is on its own line as well.
        recording = True
    elif word in character_names and word.isupper():  # you may want to check that this is on its own line as well.
        recording = False

    if recording:
        spoken_text += word + " "



回答3:


I will outline how you could generate a dict which can give you the number of words spoken for all speakers and one which approximates your existing implementation.

General Use

If we define a word to be any chunk of characters in a string split along ' ' (space)...

import re

speaker = '' # current speaker
words = 0 # number of words on line
word_count = {} # dict of speakers and the number of words they speak

for line in script.split('\n'):
    if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
    if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            words = len(line.split())
            if speaker in word_count:
                 word_count[speaker] += words
            else:
                 word_count[speaker] = words

Generates a dict with the format {'JOHN DOE':55} if John Doe says 55 words.

Example output:

>>> word_count['MIA']

13

Your Implementation

Here is a version of the above procedure that approximates your implementation.

import re

def wordsspoken(script,name):
    word_count = 0
    for line in script.split('\n'):
        if re.match('^[ ]{19}[^ ]{1,}.*', line): # name of speaker
            speaker = line.split(' (')[0][19:]
        if re.match('^[ ]{6}[^ ]{1,}.*', line): # dialogue line
            if speaker == name:
                word_count += len(line.split())
    print(word_count)

def main():
    name = input("Enter name:")
    wordsspoken(script, name)
    name1 = input("Enter another name:")
    wordsspoken(script, name1)



回答4:


There are some good ideas above. The following should work just fine in Python 2.x and 3.x:

import codecs
from collections import defaultdict

speaker_words = defaultdict(str)

with codecs.open('script.txt', 'r', 'utf8') as f:
  speaker = ''
  for line in f.read().split('\n'):
    # skip empty lines
    if not line.split():
      continue

    # speakers have their names in all uppercase
    first_word = line.split()[0]
    if (len(first_word) > 1) and all([char.isupper() for char in first_word]):
      # remove the (CONT'D) from a speaker string
      speaker = line.split('(')[0].strip()

    # check if this is a dialogue line
    elif len(line) - len(line.lstrip()) == 6:
      speaker_words[speaker] += line.strip() + ' '

# get a Python-version-agnostic input
try:
  prompt = raw_input
except:
  prompt = input

speaker = prompt('Enter name: ').strip().upper()
print(speaker_words[speaker])

Example Output:

Enter name: sebastian
I know what you mean. I get breakfast five miles out of the way just to sit outside a jazz club. It was called Van Beek. The swing bands played there. Count Basie. Chick Webb. It's a samba-tapas place now. Samba-tapas. It's... Exactly. The joke's on history.


来源:https://stackoverflow.com/questions/49881391/counting-the-words-a-character-said-in-a-movie-script

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!