How to set file encoding format to UTF8 in C++

后端未结

关注

 4  924

A requirement for my software is that the encoding of a file which contains exported data shall be UTF8. But when I write the data to the file the encoding is always ANSI. (

相关标签:

4条回答

清歌不尽

2021-01-13 04:14
On Windows in VC++2010 it is possible (not yet implemented in GCC, as far as i know) using localization facet std::codecvt_utf8_utf16 (i.e. in C++11). The sample code from cppreference.com has all basic information you would need to read/write UTF-8 file.
```
std::wstring wFromFile = _T("                                                                    
                                                        
            
```
0 讨论(0) 发布评论: 提交评论加载中...


          	          
            
           
            
                              
                
              
              
                
                  谎友^        
                
              
                            
                2021-01-13 04:16
              
            
            
                                                                       
On Windows, files don't have encodings. Each application will assume an encoding based on its own rules. The best you can do is put a byte-order mark at the front of the file and hope it's recognized.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2021-01-13 04:18
              
            
            
                                                                       
AFAIK, fprintf() does character conversions, so there is no guarantee that passing UTF-8 encoded data to it will actually write the UTF-8 to the file.  Since you already converted the data yourself, use fwrite() instead so you are writing the UTF-8 data as-is, eg:

DWORD dwCount = MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), NULL, 0 );  
if (dwCount == 0) continue;

std::vector<WCHAR> utf16Text(dwCount);  
MultiByteToWideChar( CP_ACP, 0, line.c_str(), line.length(), &utf16Text[0], dwCount );  

dwCount = WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), NULL, 0, NULL, NULL );  
if (dwCount == 0) continue;

std::vector<CHAR> utf8Text(dwCount);  
WideCharToMultiByte( CP_UTF8, 0, &utf16Text[0], utf16Text.size(), &utf8Text[0], dwCount, NULL, NULL );  

fwrite(&utf8Text[0], sizeof(CHAR), dwCount, pOutputFile);  
fprintf(pOutputFile, "\n");  

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-01-13 04:20
              
            
            
                                                                       
The type char has no clue of any encoding, all it can do is store 8 bits. Therefore any text file is just a sequence of bytes and the user must guess the underlying encoding. A file starting with a BOM indicates UTF 8, but using a BOM is not recommended any more. The type wchar_t in contrast is in Windows always interpreted as UTF 16.

So let's say you have a file encoded in UTF 8 with just one line: "Confucius says: Smile. 孔子说：微笑！                                                                    

                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...