utf8_tool

简介

一个用于操作utf-8格式字符串的库，

由于gs自带的字符串操作多是以字节为单位，在操作多字节字符时多有不便，使用utf8_tool可以对utf-8格式的字符串进行以字符为单位进行操作。

组件接口

unicode.gs

支持utf8到Unicode的转换

TODO

Overlong encodings, see https://en.wikipedia.org/wiki/UTF-8

函数原型	函数作用
int peek(string str, int off)	返回当前位置的codepoint
int skip(string str, int off)	返回下一个codepoint的位置
array next(string str, int off)	返回当前位置的codepoint，并移动到下一个codepoint的位置
bool is_chinese(int codepoint)	检查是否为中文字符

unicode_ex.gs

提供字符串相关的一些实用函数

Overlong encodings, see https://en.wikipedia.org/wiki/UTF-8

函数原型	函数作用
bool string.is_chinese(string s, int off = 0, int len = -1)	检查字符串是否全部由中文字符构成
bool string.contains_chinese(string s, int off = 0, int len = -1)	检查字符串是否包含中文字符
bool string.contains_punctuation(string s, int off = 0, int len = -1)	检查字符串是否包含标点符号字符
bool string.is_alpha(string s, int off = 0, int len = -1)	检查字符串是否全部由英文字符构成
bool string.contains_alpha(string s, int off = 0, int len = -1)	检查字符串是否包含英文字符

utf8_tool.gs

函数原型	函数作用
int strlen(string str)	获取一个UTF-8字符串中的字符数量
int strnlen(string str, int maxlen)	获取一个UTF-8字符串中的字符数量
string substr(string str, int begin, int length)	获取UTF-8字符串的子串
string index(string str, int ind)	获取UTF-8字符串中指定位置的字符
int strcspn(string str1, string str2)	检索字符串 str1 开头连续有几个字符都不含字符串 str2 中的字符，与C语言中的strcspn行为相同
int strpbrk(string str1, string str2)	与 strcspn 行为基本相同，检索字符串 str1 开头连续有几个字符都不含字符串 str2 中的字符
int strchr(string str, int ch)	检索字符串中首次出现给定字符的位置
string inverse(string str)	翻转字符串
int strrchr(string str, int ch)	检索字符串中最后一次出现给定字符的位置
int indexed_ch_bytecount(string str, int index)	获取字符串中指定位置的字符在UTF-8中使用几个字节
bool is_valid(string str)	判断给定字符串是否是一个合法的UTF-8序列

样例

public void unicode_sample()
{
    string str = "Hello, 世界！";
    int off = 0, len = str.length();

    int idx = 0;
    while (off < len)
    {
        int cp = unicode.peek(str, off);
        printf("[%2d] = %x, %O\n", idx++, cp, unicode.is_chinese(cp));
        off = unicode.skip(str, off);
    }
}

简介​

组件接口​

unicode.gs​

unicode_ex.gs​

utf8_tool.gs​

样例​

简介

组件接口

unicode.gs

unicode_ex.gs

utf8_tool.gs

样例