Rust_String

字符串String

string和&str是两种不同的字符串类型，主要区别在于所有权和可变性
String
- 拥有所有权
- 是可变的，在堆上分配内存
&str
- 字符串切片—slice，通常是对string或字符串字面量的引用。不具有所有权
- 不可变的

新建字符串

和cpp等类似，string被实现为一个带有一些额外保证、限制和功能的字节vector封装

新建：

//1
let mut s0=String::new();//由于string是可变的，前面要有mut关键字
//2
let data="hello";
let s1=data.to_string();
//3
let s2="hello".to_string();
//4
let s3=String::from("hello");

更新字符串

使用push_str()或push()来附加

1 2	`let mut s=String::from("hello"); s.push_str(" world");`

push_str()采用 字符串slice ，不需要获得参数的所有权。例如：

let mut s=String::from("hello");
let s2="world";
s1.push_str(s2);
println!("{s2}");//可以通过编译！s2并没有失效

push() 获取单个字符作为参数附加到String中。

1 2	`let mut s=String::from("h"); s.push('i');`

使用 + 或 format! 宏拼接字符串

1
2
3

let s1=String::from("hi");
let s2=String::from("wow");
let s3=s1 + **&s2**;**//此处s1被“移动”了，不能再使用了；s2仍然可以使用**

&String可以强制转换成&str

let s1=String::from("You");
let s2=String::from("are")l
let s3=String::from("right");

let s=format!("{s1} {s2} {s3}");
println!("{s}");

可以用于连接多个复杂字符串并返回；有点类似于println!

rust的字符串不支持索引
每个 Unicode 标量值需要两个字节存储

字符串slice

可以创建slice

1 2	`let s="hello"; let s2=&s[0..2];`

遍历字符串

在 Rust 中，字符串是以 UTF-8 编码的，因此遍历字符串时需要注意字符和字节的区别。Rust 提供了多种方法来遍历字符串的不同部分，具体取决于你需要操作的是字符、字节还是更复杂的字形簇（grapheme clusters）。

1. 遍历 Unicode 标量值（字符）

使用 chars() 方法可以将字符串按 Unicode 标量值（即 Rust 中的 char 类型）分割，并逐个遍历。

for c in "Зд".chars() {
    println!("{c}");
}

输出：

1
2
3

З
д

适用场景 ：当你需要处理单个 Unicode 字符时。
注意：chars() 返回的是 Unicode 标量值，而不是字形簇（例如，某些字符可能由多个 Unicode 标量值组成，如 é 可以是 e 和 ´ 的组合）。

2. 遍历原始字节

使用 bytes() 方法可以遍历字符串的原始字节（UTF-8 编码的字节序列）。

for b in "Зд".bytes() {
    println!("{b}");
}

输出：

适用场景 ：当你需要处理字符串的底层字节表示时。
注意：UTF-8 编码中，一个 Unicode 标量值可能由多个字节组成（例如，З 由两个字节 208 和 151 组成）。

3. 遍历字形簇（Grapheme Clusters）

字形簇是用户感知的“字符”，可能由多个 Unicode 标量值组成。例如，é 可以是一个字形簇，但它可能由 e 和 ´ 两个 Unicode 标量值组成。

Rust 标准库没有直接支持字形簇的操作，但可以通过第三方库（如 unicode-segmentation）来实现。

使用 `unicode-segmentation` 库

首先，在 Cargo.toml 中添加依赖：

1
2
3

[dependencies]
unicode-segmentation = "1.10"

然后，使用 graphemes() 方法遍历字形簇：

use unicode_segmentation::UnicodeSegmentation;

for g in "नमस्ते".graphemes(true) {
    println!("{g}");
}

输出：

न
म
स्
ते

适用场景 ：当你需要处理用户感知的“字符”时（例如，文本渲染或输入处理）。
注意：字形簇的处理比字符和字节更复杂，因此需要依赖第三方库。

4. 遍历字符串的其他方法

按行遍历 ：使用 lines() 方法可以将字符串按行分割。

for line in "hello\\\\nworld".lines() {
    println!("{line}");
}

按单词遍历 ：使用 split_whitespace() 方法可以将字符串按空白字符分割。

for word in "hello world".split_whitespace() {
    println!("{word}");
}

总结

方法	返回类型	适用场景	注意事项
`chars()`	`char`	处理单个 Unicode 标量值	不适用于字形簇
`bytes()`	`u8`	处理字符串的原始字节	一个字符可能由多个字节组成
`graphemes()`（第三方）	字形簇	处理用户感知的“字符”	需要 `unicode-segmentation`库
`lines()`	字符串切片	按行分割字符串	适用于多行文本
`split_whitespace()`	字符串切片	按空白字符分割字符串	适用于单词分割

Rust

#Rust

Rust_String

https://pqcu77.github.io/2025/02/19/Rust-String/

作者

linqt

发布于

2025年2月19日

许可协议

Rust_包/Crate 上一篇

Rust_Notes 下一篇

Rust_String

字符串String

新建字符串

更新字符串

字符串slice

遍历字符串

1. 遍历 Unicode 标量值（字符）

2. 遍历原始字节

3. 遍历字形簇（Grapheme Clusters）

使用 unicode-segmentation 库

4. 遍历字符串的其他方法

总结

使用 `unicode-segmentation` 库