How to remove accents and tilde in a C++ std::string

前端未结

关注

 8  1336

I have a problem with a string in C++ which has several words in Spanish. This means that I have a lot of words with accents and tildes. I want to replace them for their not

相关标签:

8条回答

故里飘歌

2020-12-15 22:11

I definitely think you should look into the root of the problem. That is, look for a solution that will allow you to support characters encoded in Unicode or for the user's locale.

That being said, your problem is that you're dealing with multi-character strings. There is std::wstring but I'm not sure I'd use that. For one thing, wide characters aren't meant to handle variable width encodings. This hole goes deep, so I'll leave it at that.

Now, as for the rest of your code, it is error prone because you mix the looping logic with translation logic. Thus, at least two kinds of bugs can occur: translation bugs and looping bugs. Do use the STL, it can help you a lot with the looping part.

The following is a rough solution for replacing characters in a string.

main.cpp:

#include <iostream>
#include <string>
#include <iterator>
#include <algorithm>
#include "translate_characters.h"

using namespace std;

int main()
{
    string text;
    cin.unsetf(ios::skipws);
    transform(istream_iterator<char>(cin), istream_iterator<char>(),
              inserter(text, text.end()), translate_characters());
    cout << text << endl;
    return 0;
}

translate_characters.h:

#ifndef TRANSLATE_CHARACTERS_H
#define TRANSLATE_CHARACTERS_H

#include <functional>
#include <map>

class translate_characters : public std::unary_function<const char,char> {
public:
    translate_characters();
    char operator()(const char c);

private:
    std::map<char, char> characters_map;
};

#endif // TRANSLATE_CHARACTERS_H

translate_characters.cpp:

#include "translate_characters.h"

using namespace std;

translate_characters::translate_characters()
{
    characters_map.insert(make_pair('e', 'a'));
}

char translate_characters::operator()(const char c)
{
    map<char, char>::const_iterator translation_pos(characters_map.find(c));
    if( translation_pos == characters_map.end() )
        return c;
    return translation_pos->second;
}

0 讨论(0)

难免孤独

2020-12-15 22:14
First, this is a really bad idea: you’re mangling somebody’s language by removing letters. Although the extra dots in words like “naïve” seem superfluous to people who only speak English, there are literally thousands of writing systems in the world in which such distinctions are very important. Writing software to mutilate someone’s speech puts you squarely on the wrong side of the tension between using computers as means to broaden the realm of human expression vs. tools of oppression.

What is the reason you’re trying to do this? Is something further down the line choking on the accents? Many people would love to help you solve that.

That said, libicu can do this for you. Open the transform demo; copy and paste your Spanish text into the “Input” box; enter
```
NFD; [:M:] remove; NFC
```
as “Compound 1” and click transform.

(With help from slide 9 of Unicode Transforms in ICU. Slides 29-30 show how to use the API.)
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2