Using re.search multiple times inside for loop to extract different field values in python

后端 未结 1 1377
青春惊慌失措
青春惊慌失措 2021-01-29 05:56

I want to retrieve all percentage data as well as integer/float numbers with units from an input text, if it is present in the text. If both are not present together, I want to

相关标签:
1条回答
  • 2021-01-29 06:39

    Here is a Python demo of what we talked about in the comments :

    mod per request

    >>> import re
    >>> 
    >>> extracteddata = ['"Water 5.5 ml for injections 0.80 and 100 at 2.2 % ','Injections 100 and 0.80', 'Ropivacaine hydrochloride monohydrate for injection (corresponding to 2 mg Ropivacaine hydrochloride anhydrous) 2.12 mg Active ingredient Ph Eur ', 'Sodium chloride for injection 8.6 mg 28% Tonicity contributor Ph Eur ', 'Sodium hydroxide 2M q.s. pH-regulator Ph Eur, NF Hydrochloric acid 2M q.s. pH-regulator Ph Eur, NF ', 'Water for Injections to 1 ml 34% Solvent Ph Eur, USP The product is filled into polypropylene bags sealed with rubber stoppers and aluminium caps with flip-off seals. The primary container is enclosed in a blister. 1(1)']
    >>> 
    >>> Rx = r"(?i)(?=.*?((?:\d+(?:\.\d*)?|\.\d+)\s*(?:mg|kg|ml|q\.s\.|ui|M|g|µg)))?(?=.*?(\d+(?:\.\d+)?\s*%))?(?=.*?((?:\d+(?:\.\d*)?|\.\d+))(?![\d.])(?!\s*(?:%|mg|kg|ml|q\.s\.|ui|M|g|µg)))?.+"
    >>> 
    >>> for e in extracteddata:
    ...         match = re.search( Rx, e )
    ...         print("--------------------------------------------")
    ...         if match.group(1):
    ...                 print( "Unit num:  \t\t", match.group(1) )
    ...         if match.group(2):
    ...                 print( "Percentage num:  \t", match.group(2) )
    ...         if match.group(3):
    ...                 print( "Just a num:  \t\t", match.group(3) )
    ... 
    --------------------------------------------
    Unit num:                5.5 ml
    Percentage num:          2.2 %
    Just a num:              0.80
    --------------------------------------------
    Just a num:              100
    --------------------------------------------
    Unit num:                2 mg
    --------------------------------------------
    Unit num:                8.6 mg
    Percentage num:          28%
    --------------------------------------------
    Unit num:                2M
    --------------------------------------------
    Unit num:                1 ml
    Percentage num:          34%
    Just a num:              1
    

    This is the regex expanded

     (?i)
     (?=
          .*? 
          (                             # (1 start)
               (?:
                    \d+ 
                    (?: \. \d* )?
                 |  \. \d+ 
               )
               \s* 
               (?: mg | kg | ml | q \. s \. | ui | M | g | µg )
          )                             # (1 end)
     )?
     (?=
          .*? 
          (                             # (2 start)
               \d+ 
               (?: \. \d+ )?
               \s* %
          )                             # (2 end)
     )?
     (?=
          .*? 
          (                             # (3 start)
               (?:
                    \d+ 
                    (?: \. \d* )?
                 |  \. \d+ 
               )
          )                             # (3 end)
          (?! [\d.] )
          (?!
               \s* 
               (?: % | mg | kg | ml | q \. s \. | ui | M | g | µg )
          )
     )?
     .+ 
    

    As seen it uses three look ahead assertions to find the first instances
    of the unit and percentage numbers and stand alone numbers.
    All values are unique and not an overlap.

    Testing each one for non-empty shows if it found that item(s) in the line.

    0 讨论(0)
提交回复
热议问题