Doxygen
utf8.h 文件参考

Various UTF8 related helper functions. 更多...

#include <cstdint>
#include <string>
+ utf8.h 的引用(Include)关系图:
+ 此图展示该文件直接或间接的被哪些文件引用了:

浏览源代码.

函数

std::string convertUTF8ToLower (const std::string &input)
 Converts the input string into a lower case version, also taking into account non-ASCII characters that has a lower case variant. 更多...
 
std::string convertUTF8ToUpper (const std::string &input)
 Converts the input string into a upper case version, also taking into account non-ASCII characters that has a upper case variant. 更多...
 
std::string getUTF8CharAt (const std::string &input, size_t pos)
 Returns the UTF8 character found at byte position pos in the input string. 更多...
 
uint32_t getUnicodeForUTF8CharAt (const std::string &input, size_t pos)
 Returns the 32bit Unicode value matching character at byte position pos in the UTF8 encoded input. 更多...
 
uint8_t getUTF8CharNumBytes (char firstByte)
 Returns the number of bytes making up a single UTF8 character given the first byte in the sequence. 更多...
 
const char * writeUTF8Char (TextStream &t, const char *s)
 Writes the UTF8 character pointed to by s to stream t and returns a pointer to the next character. 更多...
 
bool lastUTF8CharIsMultibyte (const std::string &input)
 Returns true iff the last character in input is a multibyte character. 更多...
 
bool isUTF8CharUpperCase (const std::string &input, size_t pos)
 Returns true iff the input string at byte position pos holds an upper case character. 更多...
 
int isUTF8NonBreakableSpace (const char *input)
 Check if the first character pointed at by input is a non-breakable whitespace character. 更多...
 

详细描述

Various UTF8 related helper functions.

See https://en.wikipedia.org/wiki/UTF-8 for details on UTF8 encoding.

在文件 utf8.h 中定义.

函数说明

◆ convertUTF8ToLower()

std::string convertUTF8ToLower ( const std::string &  input)

Converts the input string into a lower case version, also taking into account non-ASCII characters that has a lower case variant.

在文件 utf8.cpp187 行定义.

188 {
190 }

引用了 asciiToLower(), caseConvert() , 以及 convertUnicodeToLower().

被这些函数引用 addClassMemberNameToIndex(), addFileMemberNameToIndex(), addMemberToSearchIndex(), addNamespaceMemberNameToIndex(), createJavaScriptSearchIndex(), QCString::lower() , 以及 searchId().

◆ convertUTF8ToUpper()

std::string convertUTF8ToUpper ( const std::string &  input)

Converts the input string into a upper case version, also taking into account non-ASCII characters that has a upper case variant.

在文件 utf8.cpp192 行定义.

193 {
195 }

引用了 asciiToUpper(), caseConvert() , 以及 convertUnicodeToUpper().

被这些函数引用 FilterAlphaIndex::determineSortKey(), QCString::upper() , 以及 writeAlphabeticalClassList().

◆ getUnicodeForUTF8CharAt()

uint32_t getUnicodeForUTF8CharAt ( const std::string &  input,
size_t  pos 
)

Returns the 32bit Unicode value matching character at byte position pos in the UTF8 encoded input.

在文件 utf8.cpp135 行定义.

136 {
137  std::string charS = getUTF8CharAt(input,pos);
138  int len;
139  return convertUTF8CharToUnicode(charS.c_str(),charS.length(),len);
140 }

引用了 convertUTF8CharToUnicode() , 以及 getUTF8CharAt().

◆ getUTF8CharAt()

std::string getUTF8CharAt ( const std::string &  input,
size_t  pos 
)

Returns the UTF8 character found at byte position pos in the input string.

The resulting string can be a multi byte sequence.

在文件 utf8.cpp127 行定义.

128 {
129  if (input.length()<=pos) return std::string();
130  int numBytes=getUTF8CharNumBytes(input[pos]);
131  if (input.length()<pos+numBytes) return std::string();
132  return input.substr(pos,pos+numBytes);
133 }

引用了 getUTF8CharNumBytes().

被这些函数引用 addClassMemberNameToIndex(), addFileMemberNameToIndex(), addMemberToSearchIndex(), addNamespaceMemberNameToIndex(), createJavaScriptSearchIndex(), FilterAlphaIndex::determineSortKey(), getUnicodeForUTF8CharAt() , 以及 writeAlphabeticalClassList().

◆ getUTF8CharNumBytes()

uint8_t getUTF8CharNumBytes ( char  firstByte)

Returns the number of bytes making up a single UTF8 character given the first byte in the sequence.

在文件 utf8.cpp23 行定义.

24 {
25  uint8_t num=1;
26  unsigned char uc = static_cast<unsigned char>(c);
27  if (uc>=0x80u) // multibyte character
28  {
29  if ((uc&0xE0u)==0xC0u)
30  {
31  num=2; // 110x.xxxx: 2 byte character
32  }
33  if ((uc&0xF0u)==0xE0u)
34  {
35  num=3; // 1110.xxxx: 3 byte character
36  }
37  if ((uc&0xF8u)==0xF0u)
38  {
39  num=4; // 1111.0xxx: 4 byte character
40  }
41  if ((uc&0xFCu)==0xF8u)
42  {
43  num=5; // 1111.10xx: 5 byte character
44  }
45  if ((uc&0xFEu)==0xFCu)
46  {
47  num=6; // 1111.110x: 6 byte character
48  }
49  }
50  return num;
51 }

被这些函数引用 Markdown::detab(), escapeCharsInString(), getUTF8CharAt(), nextUTF8CharPosition() , 以及 writeUTF8Char().

◆ isUTF8CharUpperCase()

bool isUTF8CharUpperCase ( const std::string &  input,
size_t  pos 
)

Returns true iff the input string at byte position pos holds an upper case character.

在文件 utf8.cpp218 行定义.

219 {
220  if (input.length()<=pos) return false;
221  int len;
222  // turn the UTF8 character at position pos into a unicode value
223  uint32_t code = convertUTF8CharToUnicode(input.c_str()+pos,input.length()-pos,len);
224  // check if the character can be converted to lower case, if so it was an upper case character
225  return convertUnicodeToLower(code)!=nullptr;
226 }

引用了 convertUnicodeToLower() , 以及 convertUTF8CharToUnicode().

被这些函数引用 DefinitionImpl::_setBriefDescription().

◆ isUTF8NonBreakableSpace()

int isUTF8NonBreakableSpace ( const char *  input)

Check if the first character pointed at by input is a non-breakable whitespace character.

Returns the byte size of the character if there is match or 0 if not.

在文件 utf8.cpp228 行定义.

229 {
230  return (static_cast<unsigned char>(input[0])==0xC2 &&
231  static_cast<unsigned char>(input[1])==0xA0) ? 2 : 0;
232 }

被这些函数引用 Markdown::detab().

◆ lastUTF8CharIsMultibyte()

bool lastUTF8CharIsMultibyte ( const std::string &  input)

Returns true iff the last character in input is a multibyte character.

在文件 utf8.cpp212 行定义.

213 {
214  // last byte is part of a multibyte UTF8 char if bit 8 is set and bit 7 is not
215  return !input.empty() && (((unsigned char)input[input.length()-1])&0xC0)==0x80;
216 }

被这些函数引用 DefinitionImpl::_setBriefDescription().

◆ writeUTF8Char()

const char* writeUTF8Char ( TextStream t,
const char *  s 
)

Writes the UTF8 character pointed to by s to stream t and returns a pointer to the next character.

在文件 utf8.cpp197 行定义.

198 {
199  if (s==0) return 0;
200  uint8_t len = getUTF8CharNumBytes(*s);
201  for (uint8_t i=0;i<len;i++)
202  {
203  if (s[i]==0) // detect premature end of string (due to invalid UTF8 char)
204  {
205  len=i;
206  }
207  }
208  t.write(s,len);
209  return s+len;
210 }

引用了 getUTF8CharNumBytes() , 以及 TextStream::write().

被这些函数引用 HtmlCodeGenerator::codify(), ManGenerator::codify(), RTFGenerator::codify() , 以及 writeXMLCodeString().

convertUnicodeToUpper
const char * convertUnicodeToUpper(uint32_t code)
Definition: caseconvert.h:12
asciiToUpper
static char asciiToUpper(uint32_t code)
Definition: utf8.cpp:147
caseConvert
static std::string caseConvert(const std::string &input, char(*asciiConversionFunc)(uint32_t code), const char *(*conversionFunc)(uint32_t code))
Definition: utf8.cpp:152
getUTF8CharAt
std::string getUTF8CharAt(const std::string &input, size_t pos)
Returns the UTF8 character found at byte position pos in the input string.
Definition: utf8.cpp:127
convertUTF8CharToUnicode
static uint32_t convertUTF8CharToUnicode(const char *s, size_t bytesLeft, int &len)
Definition: utf8.cpp:69
getUTF8CharNumBytes
uint8_t getUTF8CharNumBytes(char c)
Returns the number of bytes making up a single UTF8 character given the first byte in the sequence.
Definition: utf8.cpp:23
convertUnicodeToLower
const char * convertUnicodeToLower(uint32_t code)
Definition: caseconvert.h:1505
asciiToLower
static char asciiToLower(uint32_t code)
Definition: utf8.cpp:142
TextStream::write
void write(const char *buf, size_t len)
Adds a array of character to the stream
Definition: textstream.h:180