I am parsing my vcard info (copied to a txt file)to extract name:number
and put it into a dictionary.
Data sample:
BEGIN:VCARD VERSION:2.1 N:MEO;Apoio;;; FN:Apoio MEO TEL;CELL;PREF:1696 TEL;CELL:162 00 END:VCARD BEGIN:VCARD VERSION:2.1 N:estrangeiro;Apoio MEO;no;; FN:Apoio MEO no estrangeiro TEL;CELL;PREF:+35196169000 END:VCARD
import re
file = open('Contacts.txt', 'r')
contacts = dict()
for line in file:
name = re.findall('FN:(.*)', line)
nm = ''.join(name)
if len(nm) == 0:
continue
contacts[nm] = contacts.get(nm)
print(contacts)
With this I am getting a dictionary with names but for numbers I am getting None. {'name': None, 'name': None}
.
Can I do this with re? To extract both name and number with the same re.findall
expression?
You should better use an already existing library instead of trying to reinvent the wheel:
pip install vobject
And then within python
>>> import vobject
>>> s = """\
... BEGIN:VCARD
... VERSION:2.1
... N:MEO;Apoio;;;
... FN:Apoio MEO
... TEL;CELL;PREF:0123456789
... TEL;CELL:0123456768
... END:VCARD
... BEGIN:VCARD
... VERSION:2.1
... N:estrangeiro;Apoio MEO;no;;
... FN:Apoio MEO no estrangeiro
... TEL;CELL;PREF:+0123456789
... END:VCARD """
>>> vcard = vobject.readOne(s)
>>> vcard.prettyPrint()
VCARD
VERSION: 2.1
TEL: 1696
TEL: 162 00
FN: Apoio MEO
N: Apoio MEO
and you're done!
so if you want to make a dictionary out of that, all you need to do is:
>>> {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }
{'Apoio MEO': ['1696', '162 00']}
so you could make all that into a function:
def parse_vcard(path):
with open(path, 'r') as f:
vcard = vobject.readOne(f.read())
return {vcard.contents['fn'][0].value: [tel.value for tel in vcard.contents['tel']] }
From there, you can improve the code to handle multiple vcard
s in a single vobject
file, and update the dict
with more phones.
N.B.: I leave you as an exercise to change the code above from reading one and only one vcard within a file, into a code that can read several vcards. Hint: read the documentation of vobject
.
N.B.: I'm using your data, and I'm considering that whatever you wrote, it is meaningless. But in doubt, I have modified the phone numbers.
just for the fun, let's have a look at your code. First there's an indentation issue, but I'll consider this is because of bad copy/paste ☺.
① import re
② file = open('Contacts.txt', 'r')
③ contacts = dict()
④ for line in file:
⑤ name = re.findall('FN:(.*)', line)
⑥ nm = ''.join(name)
⑦ if len(nm) == 0:
⑧ continue
⑨ contacts[nm] = contacts.get(nm)
⑩ print(contacts)
so first, there are two issues at line ②. You're opening a file using open()
, but you're not closing the file. If you're calling this function to open one billion files, you'll starve your system's available file descriptors because you're not closing the files. As a good habit you should always use instead the with construct:
with open('...', '...') as f:
… your code here …
that takes care of the fd for you, and better shows where you can make use of your opened file.
The second issue is that you're calling your variable file
, which is shadowing the file
type. Hopefully, the file
type is very rarely used, but it's a bad habit to have, as you might one day not understand a bug that happens because you've shadowed a type with a variable. Just don't use it, it'll save you trouble one day.
Line ⑤ and ⑥, you're applying a re.findall
regex on each line. You should better use re.match()
, as you're already iterating over each line, and you won't have FN: something
within that line. That will make you avoid the unnecessary ''.join(name)
But instead of using a regex for such a simple thing, you'd better use str.split()
:
if 'FN:' in line:
name = line.split(':')[-1]
Line ⑦ is not only superfluous — if you use the if
above, but actually wrong. Because then you'll skip all lines that does not have FN:
within it, meaning that you'll never extract the phone numbers, just the name.
Finally Line ⑧ makes absolutely no sense. Basically, what you're doing is equivalent of:
if nm in contacts.keys():
contacts[nm] = contacts[nm]
else:
contacts[nm] = None
All in all, in your code, all you do is extract names, and you don't even bother with the telephones number. So when you say:
With this I am getting a dictionary with names but for numbers I am getting None
it makes no sense, as you're actually not trying to extract phone numbers.
Can I do this with re? To extract both name and number with the same
re.findall
expression?
yes, you could, with something that would look like (untested regex that's very likely to be not working), over the whole file, or at least for each vcard:
FN:(?P<name>[^\n]*).*TEL[^:]*:(?P<phone>[^\n])
but why bother, when you've got a lib that does it perfectly for you!
My answer is based on zmos answer (you need to install vobject).
To get all vobjects from a vcf-file you can do something like this:
import vobject
with open(infile) as inf:
indata = inf.read()
vc = vobject.readComponents(indata)
vo = next(vc, None)
while vo is not None:
vo.prettyPrint()
vo = next(vc, None)
The documentation of vobject
(on GitHub) is a little bit crappy so I looked into their code and figured out that readOne
is just calling a next on readComponents
. So you can use readComponents
to get a collection.
来源:https://stackoverflow.com/questions/35825919/vcard-parser-with-python