Neden bu dize uzunluğu daha fazla karakter sayısından fazla olduğunu?

SORU

17 Kasım 2014, PAZARTESİ

Neden bu dize uzunluğu daha fazla karakter sayısından fazla olduğunu?

Bu kod:

string a = "abc";
string b = "A&text=Everyone else is giving the surface answer, but there's a deeper rationale too: the number of "characters" is a difficult-to-define question and can be surprisingly expensive to compute, whereas a length property should be fast.


Why is it difficult to define? Well, there's a few options and none are really more valid than another:



The number of code units (bytes or other fixed size data chunk; C# and Windows typically use UTF-16 so it returns the number of two-byte pieces) is certainly relevant, as the computer still needs to deal with the data in that form for many purposes (writing to a file, for example, cares about bytes rather than characters)

The number of Unicode codepoints is fairly easy to compute (although O(n) because you gotta scan the string for surrogate pairs) and might matter to a text editor.... but isn't actually the same thing as the number of characters printed on screen (called graphemes). For example, some accented letters can be represented in two forms: a single codepoint, or two points paired together, one representing the letter, and one saying "add an accent to my partner letter". Would the pair be two characters or one? You can normalize strings to help with this, but not all valid letters have a single codepoint representation.

Even the number of graphemes isn't the same as the length of a printed string, which depends on the font among other factors, and since some characters are printed with some overlap in many fonts (kerning), the length of a string on screen is not necessarily equal to the sum of the length of graphemes anyway!

Some Unicode points aren't even characters in the traditional sense, but rather some kind of control marker. Like a byte order marker or a right-to-left indicator. Do these count?



In short, the length of a string is actually a ridiculously complex question and calculating it can take a lot of CPU time as well as data tables.


Moreover, what's the point? Why does these metrics matter? Well, only you can answer that for your case, but personally, I find they are generally irrelevant. Limiting data entry I find is more logically done by byte limits, as that's what needs to be transferred or stored anyway. Limiting display size is better done by the display side software - if you have 100 pixels for the message, how many characters you fit depends on the font, etc., which isn't known by the data layer software anyway. Finally, given the complexity of the unicode standard, you're probably going to have bugs at the edge cases anyway if you try anything else.


So it is a hard question with not a lot of general purpose use. Number of code units is trivial to calculate - it is just the length of the underlying data array - and the most meaningful/useful as a general rule, with a simple definition.


That's why b has length 4 beyond the surface explanation of "because the documentation says so".

&

      
   
  CEVAP
 
 
  17 Kasım 2014, PAZARTESİ
 
 
 
 
  Bunu Paylaş:    Facebook    
  Twitter    
  Google+    
  E-Posta    
 
 
 
  Etiketler:   #C#C .net.netDizesiStringUnicodeUnicode DizeUnicode-string 
 
 
    ÖNCEKİ VS2010 proje güncel ... 
  SONRAKİ JQuery arasındaki fa... 
 
 
      
  AYNI ETİKETTEKİ VİDEOLAR
            Eğlence    
 
 Dik Bas Desenleri Ve Modları: Dik Bas ... 20 EYLÜL 2008
 
    Nasıl Yapılır ve Stil    
 
 Python Pt 3 Program Nasıl... 4 EYLÜL 2010
 
    Eğitim    
 
 Öğretici - 3 - Dizeleri Programlama P... 26 AĞUSTOS 2014
 
    Bilim ve Teknoloji    
 
 Excel Büyü Hüner 305: Toplam Basamak Di... 4 NİSAN 2009
 
    Spor    
 
 Aksaklık Düğüm Atmayı Nasıl: Boom Hitch Bağla... 10 NİSAN 2008
 
    Eğlence    
 
 Dik Bas Desenleri Ve Modları: Dik Bas ... 20 EYLÜL 2008
 
    Nasıl Yapılır ve Stil    
 
 Python Pt 3 Program Nasıl... 4 EYLÜL 2010
 
    Eğitim    
 
 Öğretici - 3 - Dizeleri Programlama P... 26 AĞUSTOS 2014
 
    Bilim ve Teknoloji    
 
 Excel Büyü Hüner 305: Toplam Basamak Di... 4 NİSAN 2009
 
    Spor    
 
 Aksaklık Düğüm Atmayı Nasıl: Boom Hitch Bağla... 10 NİSAN 2008
 
    Eğlence    
 
 Dik Bas Desenleri Ve Modları: Dik Bas ... 20 EYLÜL 2008
 
    Nasıl Yapılır ve Stil    
 
 Python Pt 3 Program Nasıl... 4 EYLÜL 2010
 
    Eğitim    
 
 Öğretici - 3 - Dizeleri Programlama P... 26 AĞUSTOS 2014
 
 
 
  YORUMLAR

   SPONSOR VİDEO
    
 
 
      
  Rastgele Yazarlar
       Gavin Hoey
 21 Aralık 2007
 
 
     iMasterful
 11 EYLÜL 2009
 
 
     Marina and T
 8 Temmuz 2008
 
 
 
 
  İLGİLİ SORU / CEVAPLAR
     1
   Neden uzun bir dize küçük dizeleri çok...
  28 Kasım 2014
 
 
 2
   Nasıl birden fazla satır içine daha fa...
  5 EKİM 2008
 
 
 3
   Dize tek boşluk ile 2 veya daha fazla ...
  28 Mayıs 2010
 
 
 4
   Neden TextBox ekleme yapar.Bir döngü s...
  4 Ocak 2012
 
 
 5
   Eğer kayıtları cayır cayır yanan hızlı...
  21 Mayıs 2011
 
 
 
 
 
 
  İLGİLİ BAĞLANTILAR