Extracting Hyperlinks From Excel (.xlsx) with Python

前端 未结 7 1528
闹比i
闹比i 2020-12-16 16:29

I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True for .xl

相关标签:
7条回答
  • 2020-12-16 16:46

    A successful solution I've worked with is to install unoconv on the server and implement a method that invokes this command line tool via the subprocess module to convert the file from xlsx to xls since hyperlink_map.get() works with xls.

    0 讨论(0)
  • 2020-12-16 16:50

    This is possible with openpyxl:

    import openpyxl
    
    wb = openpyxl.load_workbook('yourfile.xlsm')
    ws = wb['Sheet1']
    # This will fail if there is no hyperlink to target
    print(ws.cell(row=2, column=1).hyperlink.target)
    
    0 讨论(0)
  • 2020-12-16 16:57

    In my experience getting good .xlsx interaction requires moving to IronPython. This lets you work with the Common Language Runtime (clr) and interact directly with excel'

    http://ironpython.net/

    import clr
    clr.AddReference("Microsoft.Office.Interop.Excel")
    import Microsoft.Office.Interop.Excel as Excel
    excel = Excel.ApplicationClass()
    
    wb = excel.Workbooks.Open('testFile.xlsx')
    ws = wb.Worksheets['Sheet1']
    
    address = ws.Cells(row, col).Hyperlinks.Item(1).Address
    
    0 讨论(0)
  • 2020-12-16 16:59

    For direct manipulation of Excel files it's also worth looking at the excellent XlWings library.

    0 讨论(0)
  • 2020-12-16 17:00

    If instead of just .hyperlink, doing .hyperlink.target should work. I was getting a 'None' as well from using just ".hyperlink" on the cell object before that.

    0 讨论(0)
  • 2020-12-16 17:04

    FYI, the problem with openpyxl is an actual bug.

    And, yes, xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx.

    0 讨论(0)
提交回复
热议问题