Remove u202a from Python string

后端 未结 6 739
一整个雨季
一整个雨季 2020-12-11 10:07

I\'m trying to open a file in Python, but I got an error, and in the beginning of the string I got a /u202a character... Does anyone know how to remove it?

相关标签:
6条回答
  • 2020-12-11 10:31

    When you initially created your .py file, your text editor introduced a non-printing character.

    Consider this line:

    carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)
    

    Let's carefully select the string, including the quotes, and copy-paste it into an interactive Python session:

    $ python
    Python 3.6.1 (default, Jul 25 2017, 12:45:09) 
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> "‪H:\\7 - Script\\teste.csv"
    '\u202aH:\\7 - Script\\teste.csv'
    >>> 
    

    As you can see, there is a character with codepoint U-202A immediately before the H.

    As someone else pointed out, the character at codepoint U-202A is LEFT-TO-RIGHT EMBEDDING. Returning to our Python session:

    >>> s = "‪H:\\7 - Script\\teste.csv"
    >>> import unicodedata
    >>> unicodedata.name(s[0])
    'LEFT-TO-RIGHT EMBEDDING'
    >>> unicodedata.name(s[1])
    'LATIN CAPITAL LETTER H'
    >>> 
    

    This further confirms that the first character in your string is not H, but the non-printing LEFT-TO-RIGHT EMBEDDING character.

    I don't know what text editor you used to create your program. Even if I knew, I'm probably not an expert in that editor. Regardless, some text editor that you used inserted, unbeknownst to you, U+202A.

    One solution is to use a text editor that won't insert that character, and/or will highlight non-printing characters. For example, in vim that line appears like so:

    carregar_uml("<202a>H:\\7 - Script\\teste.csv", variaveis)
    

    Using such an editor, simply delete the character between " and H.

    carregar_uml("H:\\7 - Script\\teste.csv", variaveis)
    

    Even though this line is visually identical to your original line, I have deleted the offending character. Using this line will avoid the OSError that you report.

    0 讨论(0)
  • 2020-12-11 10:35

    Or you can slice out that character

    file_path = r"‪C:\Test3\Accessing_mdb.txt"
    file_path = file_path[1:]
    with open(file_path, 'a') as f_obj:
    f_obj.write('some words')
    
    0 讨论(0)
  • 2020-12-11 10:36

    try strip(),

    def carregar_uml(arquivo, variaveis):
        cadastro_uml = {}
        id_uml = 0
    
        for i in open(arquivo):
            linha = i.split(",")
    
    
    carregar_uml("‪H:\\7 - Script\\teste.csv", variaveis)
    
    carregar_uml = carregar_uml.strip("\u202a")
    
    0 讨论(0)
  • 2020-12-11 10:41

    The problem is the directory path of the file is not read properly. Use raw strings to pass it as argument and it should work.

    carregar_uml(r'H:\7 - Script\teste.csv', variaveis)
    
    0 讨论(0)
  • 2020-12-11 10:45

    you can use this sample code to remove u202a from file path

    st="‪‪F:\\somepath\\filename.xlsx"    
    data = pd.read_excel(st)
    

    if i try to do this it gives me a OSError and In detail

    Traceback (most recent call last):
      File "F:\CodeRepo\PythonWorkSpace\demo\removepartofstring.py", line 14, in <module>
        data = pd.read_excel(st)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
        return func(*args, **kwargs)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
        return func(*args, **kwargs)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
        io = ExcelFile(io, engine=engine)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
        self._reader = self._engines[engine](self._io)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
        self.book = xlrd.open_workbook(filepath_or_buffer)
      File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook
        with open(filename, "rb") as f:
    OSError: [Errno 22] Invalid argument: '\u202aF:\\somepath\\filename.xlsx'
    

    but if i do that like this

        st="‪‪F:\\somepath\\filename.xlsx" 
        data = pd.read_excel(st.strip("‪u202a")) #replace your string here
    

    Its working for me

    0 讨论(0)
  • 2020-12-11 10:45

    use small letter when you write your hard-disk-drive name! not big letter!

    ex) H: -> error ex) h: -> not error

    0 讨论(0)
提交回复
热议问题