parse a pdf using python

后端 未结 1 1408
粉色の甜心
粉色の甜心 2020-12-29 16:06

I have a pdf file. It contains of four columns and all the pages don\'t have grid lines. They are the marks of students.

I would like to run some analysis on this di

1条回答
  •  生来不讨喜
    2020-12-29 16:44

    Use PyPDF2:

    from PyPDF2 import PdfFileReader
    
    with open('CT1-All.pdf', 'rb') as f:
        reader = PdfFileReader(f)
        contents = reader.getPage(0).extractText().split('\n')
        pass
    

    When you print contents, it will look like this (I have trimmed it here):

    [u'Serial NoRoll NoNameCT1 Marks (50)111MA20026KARADI KALYANI212AR10029MUKESH K
    MAR5', u'312MI31004DEEPAK KUMAR7', u'413AE10008FADKE PRASAD DIPAK27', u'513AE10
    22RAHUL DUHAN37', u'613AE30005HIMANSHU PRABHAT26.5', u'713AE30019VISHAL KUMAR39
    , u'813AG10014HEMANT17', u'913AG10028SHRESTH KR KRISHNA37.51013AG30009HITESH ME
    RA33.5', u'1113AG30023RACHIT MADHUKAR40.5', u'1213AR10002ACHARY SUDHEER11', u'1
    13AR10004AMAN ASHISH20.5', u'1413AR10008ANKUR44', u'1513AR10010CHUKKA SHALEM RA
    U11.5', u'1613AR10012DIKKALA VIJAYA RAGHAVA20.5', u'1713AR10014HRISHABH AMRODIA
    1', u'1813AR10016JAPNEET SINGH CHAHAL19.5', u'1913AR10018K VIGNESH42.5', u'2013
    R10020KAARTIKEY DWIVEDI49.5', u'2113AR10024LAKSHMISRI KEERTI MANNEY49', u'2213A
    10026MAJJI DINESH9.5', u'2313AR10028MOUNIKA BHUKYA17.5', u'2413AR10030PARAS PRA
    

    0 讨论(0)
提交回复
热议问题