Who determines the ordering of characters

前端未结

关注

 6  1890

I have a query based on the below program -

char ch;
ch = \'z\';
while(ch >= \'a\')
{
    printf(\"char is  %c and the value is %d\\n\", ch, ch);
    ch =


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2020-12-19 03:46
              
            
            
                                                                       

  Why is the printing of whole set of
  lowercase letters not guaranteed in
  the above program.


Because it's possible to use C with an EBCDIC character encoding, in which the letters aren't consecutive.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-12-19 03:46
              
            
            
                                                                       
It is determined by whatever the execution character set is. 

In most cases nowadays, that is the ASCII character set, but C has no requirement that a specific character set be used.

Note that there are some guarantees about the ordering of characters in the execution character set.  For example, the digits '0' through '9' are guaranteed each to have a value one greater than the value of the previous digit.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-19 04:02
              
            
            
                                                                       
The compiler implementor chooses their underlying character set. About the only thing the standard has to say is that a certain minimal number of characters must be available and that the numeric characters are contiguous.

The required characters for a C99 execution environment are A through Z, a through z, 0 through 9 (which must be together and in order), any of !"#%&'()*+,-./:;<=>?[\]^_{|}~, space, horizontal tab, vertical tab, form-feed, alert, backspace, carriage return and new line. This remains unchanged in the current draft of C1x, the next iteration of that standard.

Everything else depends on the implementation.

For example, code like:

int isUpperAlpha(char c) {
    return (c >= 'A') && (c <= 'Z');
}


will break on the mainframe which uses EBCDIC, dividing the upper case characters into two regions.

Truly portable code will take that into account. All other code should document its dependencies.

A more portable implementation of your example would be something along the lines of:

static char chrs[] = "zyxwvutsrqponmlkjihgfedcba";
char *pCh = chrs;
while (*pCh != 0) {
    printf ("char is %c and the value is %d\n", *pCh, *pCh);
    pCh++;
}


If you want a real portable solution, you should probably use islower() since code that checks only the Latin characters won't be portable to (for example) Greek using Unicode for its underlying character set.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2020-12-19 04:02
              
            
            
                                                                       
Obviously determined by the implementation of C you're using, but more then likely for you it's determined by the American Standard Code for Information Interchange (ASCII).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2020-12-19 04:03
              
            
            
                                                                       
I answer you too late but apart from what was already said I want to add a little.

At the 5th translation phase (part of the preprocessor) each member of the source character set is converted to the corresponding character of the execution character set.  Quote from ISO 9899, 5.1.1.2p5


  
    
    Each source character set member and escape sequence in character constants and
    string literals is converted to the corresponding member of the execution character
    set;  if  there  is  no  corresponding  member, it is converted  to  an  implementation-deﬁned member other than the null (wide) character.
    7)
    
  


There is no need for the source char set to be the same as the execution char set ; as others say, if the execution char set is EBCDIC of IBM's mainframe, the logic is not the same as in the case of ASCII character set.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2020-12-19 04:04
              
            
            
                                                                       
These days, people going around calling your code non-portable are engaging in useless pedantry. Support for ASCII-incompatible encodings only remains in the C standard because of legacy EBCDIC mainframes that refuse to die. You will never encounter an ASCII-incompatible char encoding on any modern computer, now or in the future. Give it a few decades, and you'll never encounter anything but UTF-8.

To answer your question about who decides the character encoding: While it's nominally at the discression of your implementation (the C compiler, library, and OS) it was ultimately decided by the internet, both existing practice and IETF standards. Presumably modern systems are intended to communicate and interoperate with one another, and it would be a huge headache to have to convert every protocol header, html file, javascript source, username, etc. back and forth between ASCII-compatible encodings and EBCDIC or some other local mess.

In recent times, it's become clear that a universal encoding not just for machine-parsed text but also for natural-language text is also highly desirable. (Natural language text interchange is not as fundamental as machine-parsed text, but still very common and important.) Unicode provided the character set, and as the only ASCII-compatible Unicode encoding, UTF-8 is pretty much the successor to ASCII as the universal character encoding.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复