How to remove accents and tilde in a C++ std::string

前端 未结 8 1336
半阙折子戏
半阙折子戏 2020-12-15 21:26

I have a problem with a string in C++ which has several words in Spanish. This means that I have a lot of words with accents and tildes. I want to replace them for their not

相关标签:
8条回答
  • 2020-12-15 22:11

    I definitely think you should look into the root of the problem. That is, look for a solution that will allow you to support characters encoded in Unicode or for the user's locale.

    That being said, your problem is that you're dealing with multi-character strings. There is std::wstring but I'm not sure I'd use that. For one thing, wide characters aren't meant to handle variable width encodings. This hole goes deep, so I'll leave it at that.

    Now, as for the rest of your code, it is error prone because you mix the looping logic with translation logic. Thus, at least two kinds of bugs can occur: translation bugs and looping bugs. Do use the STL, it can help you a lot with the looping part.

    The following is a rough solution for replacing characters in a string.

    main.cpp:

    #include <iostream>
    #include <string>
    #include <iterator>
    #include <algorithm>
    #include "translate_characters.h"
    
    using namespace std;
    
    int main()
    {
        string text;
        cin.unsetf(ios::skipws);
        transform(istream_iterator<char>(cin), istream_iterator<char>(),
                  inserter(text, text.end()), translate_characters());
        cout << text << endl;
        return 0;
    }
    

    translate_characters.h:

    #ifndef TRANSLATE_CHARACTERS_H
    #define TRANSLATE_CHARACTERS_H
    
    #include <functional>
    #include <map>
    
    class translate_characters : public std::unary_function<const char,char> {
    public:
        translate_characters();
        char operator()(const char c);
    
    private:
        std::map<char, char> characters_map;
    };
    
    #endif // TRANSLATE_CHARACTERS_H
    

    translate_characters.cpp:

    #include "translate_characters.h"
    
    using namespace std;
    
    translate_characters::translate_characters()
    {
        characters_map.insert(make_pair('e', 'a'));
    }
    
    char translate_characters::operator()(const char c)
    {
        map<char, char>::const_iterator translation_pos(characters_map.find(c));
        if( translation_pos == characters_map.end() )
            return c;
        return translation_pos->second;
    }
    
    0 讨论(0)
  • 2020-12-15 22:14

    First, this is a really bad idea: you’re mangling somebody’s language by removing letters. Although the extra dots in words like “naïve” seem superfluous to people who only speak English, there are literally thousands of writing systems in the world in which such distinctions are very important. Writing software to mutilate someone’s speech puts you squarely on the wrong side of the tension between using computers as means to broaden the realm of human expression vs. tools of oppression.

    What is the reason you’re trying to do this? Is something further down the line choking on the accents? Many people would love to help you solve that.

    That said, libicu can do this for you. Open the transform demo; copy and paste your Spanish text into the “Input” box; enter

    NFD; [:M:] remove; NFC
    

    as “Compound 1” and click transform.

    (With help from slide 9 of Unicode Transforms in ICU. Slides 29-30 show how to use the API.)

    0 讨论(0)
提交回复
热议问题