数据结构--字符串

字符串

一、定义

1.C语言中

C语言中以结束字符\0结尾的字符数组表示字符串

字符串通常是通过字符数组或者字符指针来表示的。定义字符串的常见方式有：
1
2
3
4
5
// 使用字符数组定义字符串
char str1[] = "Hello, World!";

// 使用字符指针定义字符串
char *str2 = "Hello, World!";

相关操作：

C语言标准库提供了一些处理字符串的函数，这些函数都在 <string.h> 头文件中定义
**strlen**：计算字符串的长度，不包括 '\0'
1
int len = strlen(str1);  // 返回字符串长度
**strcpy**：将一个字符串复制到另一个字符串中
1
2
char dest[20];
strcpy(dest, str1);  // 将 str1 复制到 dest
**strcat**：将两个字符串连接在一起
1
strcat(dest, str2);  // 将 str2 连接到 dest 之后
**strcmp**：比较两个字符串的大小（按字典顺序）
1
2
3
if (strcmp(str1, str2) == 0) {
    // str1 和 str2 相等
}

二、字符串的实现

1.String.h

#pragma once
#include <iostream>
#include <vector>

class String{
private:
    char *s;
    int _size;
public:
    // 构造字符串
    String(const char *str);
    // 拷贝构造函数
    String(const String &string);
    // 析构函数
    ~String();
    // 获取长度
    int size() const;
    // 获取字符串中的索引i处的字符，不能对其进行改变
    char operator[](int i) const;
    // 获取字符串中的索引i处的字符，可以对其进行改变
    char &operator[](int i);
    // 获取索引pos开始，长度为n的子串
    String subString(int pos, int n);
    // 拼接
    String& operator+=(const String &str);
    // 模式匹配：简单匹配
    int find(String T, int pos = 0);
    // 模式匹配：KMP匹配算法
    int find_KMP(String T, int pos=0);
    // 模式匹配：模式串的NEXT函数
    void get_next(std::vector<int> &next);

    // 输出
    friend std::ostream  &operator<<(std::ostream &ostream, const String &str);

};

2.string.cpp

#include "String.h"
#include "exception"


String::String(const char *str):s(nullptr),_size(0)
{
    if(str == nullptr)
        return;

    int n = 0;
    // 直到字符串结束符
    while(str[n] != '\0')
    {
        n++;
    }
    // +1表示也需要包含字符串结束符\0的长度
    s = new char[n+1];
    if(s == nullptr)
        throw "内存分配失败";
    for(int j = 0;j<n;j++)
    {
        s[j] = str[j];
    }
    _size = n;
}

// 拷贝构造函数
String::String(const String &string)
{
    this->_size = string.size();
    this->s = new char[this->_size + 1];

    for(int j = 0;j<_size;j++)
    {
        s[j] = string[j];
    }
}

// 析构函数
String::~String()
{
    if(this->s != nullptr)
    {
        _size = 0;
        delete s;
    }
}

int String::size() const
{
    return _size;
}

char String::operator[](int i) const
{
    if(i<0 || i>=_size)
        throw "索引越界";
    return s[i];
}

char & String::operator[](int i)
{
    if(i<0 || i>=_size)
        throw "索引越界";
    return s[i];
}

// 获取索引pos开始，长度为n的子串
String String::subString(int pos, int n)
{
    if(pos < 0 || pos >= _size)
    {
        throw "位置非法";
    }
    if(pos+n > _size)
    {
        throw "超出";
    }

    char *p = new char[n+1];
    if(p == nullptr)
        throw "内存分配失败";

    for(int j=0,i=pos; j<n; j++,i++)
    {
        p[j] = s[i];
    }
    p[n] = '\0';

    String str(p);
    delete[] p;
    return str;
}


String & String::operator+=(const String &str)
{
    int newSize = this->_size + str.size();
    char *p = new char [newSize + 1];
    if(p == nullptr)
        return *this;

    int j=0;
    while(j<_size)
    {
        p[j] = this->s[j];
        j++;
    }
    int i = 0;
    while(i<str.size())
    {
        p[j] = str[i];
        i++;
        j++;
    }

    delete this->s;
    s = p;
    _size = newSize;
}

// 输出
std::ostream& operator<<(std::ostream &ostream, const String &str)
{
    return (ostream << str.s);
}

三、匹配算法

1.朴素匹配算法

假设主串为S=goodgoogle，从起始位置pos=0，开始进行搜索子串T=google的位置

主串S第一位开始，S与T前三个字母都匹配成功，但S第四个字母是d而T的是 g。第一位匹配失败

主串S第二位开始，主串S首字母是o，要匹配的T首字母是g，匹配失败

主串S第三位开始，主串S首字母是o，要匹配的T首字母是 g，匹配失败

主串S第四位开始，主串S首字母是d，要匹配的T首字母是g，匹配失败

主串S第五位开始，S与T，6个字母全匹配，匹配成功

总结：

简单的说，就是对主串的每一个字符作为子串开头，与要匹配的字符串进行匹配。对主串做大循环，每个字符开头做T的长度的小循环，直到匹配成功或全部遍历完成为止

// 模式匹配：简单匹配
// 在S的第pos位置起查找是否有和T相同的字符串
// 查找成功返回子串起始位置，否则返回-1
int String::find(String T, int pos)
{
    int i = pos;
    int j = 0;
    while(i<this->_size && j<T.size())
    {
        if(s[i] == T[j])
        {
            // 当前字符匹配，j递增
            j++;
        }
        else{
            // 否则i回退，j返回模式串首，重新开始
            i = i-j;
            j=0;
        }
        // i需要每次都进行递增，对字符进行搜索
        i++;
    }

    // 匹配成功
    if(j >= T.size())
        return i-j;
    else
        // 失败
        return -1;
}

2.KMP模式匹配算法

在朴素匹配算法中，当遇到字符不匹配时，模式串需要向右移动一位（下一次从此次主串匹配的首位的下一位开始匹配），然后重新从头开始比较。KMP算法的关键优化点在于：当遇到不匹配时，利用已经匹配的部分信息，避免将模式串完全向右移动，减少不必要的重复匹配

在朴素匹配算法中，记录主串中索引位置i需要进行回溯，当遇到不匹配的字符时候，需要减去已经与模式串进行匹配的长度j

KMP算法就是可以让主串中的i少发生一些没必要的回溯

KMP的算法步骤：

模式串的next数组：

记录模式串字符串索引位置j，不含当前位置字符前面的字符串的前缀与缀的最大匹配长度（不含有整体）

进行字符串的匹配：

利用next数组，加速匹配的过程，当匹配过程中出现不匹配时，模式串不需要回退到起始位置，而是根据部分next数组决定模式串向右移动的位数，从而避免重复匹配

时间复杂度：

KMP算法的时间复杂度是 O(n + m)，其中 n 是文本串的长度，m 是模式串的长度。相比于暴力算法的 O(n * m) 的复杂度，KMP算法在大多数情况下有着显著的性能提升

（1）next数组

对于任意字符串，索引j=0与j=1的next数组的值均为

next[0] = -1

next[1] = 0

如下，对于模式串aabbaabsaabaaax的next数组：

（2）使用模式串的next数组加速匹配过程

若，主串为：S1 = aabaabcaabaabaabcaabaabt

模式串S2 = aabaabcaabaabt

模式串S2的next数组为[-1 0 1 0 1 2 3 0 1 2 3 4 5 6]

匹配过程：

完整过程：可以匹配成功

（3）模式串next数组求解

不用根据之前求得的next索引位置的值进行跳跃

跳跃一次

代码实现

// 模式匹配：模式串的NEXT函数
// 其中next是一个与模式串长度一致的vector数组
void String::get_next(std::vector<int> &next) 
{
    // next[0] = -1  next[1] = 0
    next[0] = -1;
    next[1] = 0;
    
    // i表示当前要求的next值的位置
    // cn表示当前要和前一个字符比对的下标
    int i = 2, cn = 0;
    while(i < next.size())
    {
        if(s[i-1] == s[cn])
        {
            next[i] = ++cn;
            i++;
        }
        else if(cn > 0)
        {
            // 进行跳跃
            cn = next[cn];
        }
        else
        {
            // 若cn == 0, i处的next值为0
            next[i] = 0;
            i++;
        }
    }
}

（4）KMP匹配主流程

// KMP匹配算法
int String::find_KMP(String T, int pos)
{
    // s1中当前比对的位置是x
    // s2中当前比对的位置是y
    int x = 0;
    int y = 0;
    // 主串长度
    int n = this->size();
    // 模式串长度
    int m = T.size();
    std::vector<int> next(m);
    // 获得模式串的next数组
    get_next(next);
    // 进行匹配
    while (x < n && y < m)
    {
        // 当主串索引字符与模式串索引字符一致
        if(this->s[x] == T[y])
        {
            // 主串与模式串索引均移动
            x++;
            y++;
        }
        else if(y==0)
        {
            // 模式串的第一个字符与主串当前字符不匹配
            // 只有主串索引移动
            x++;
        }
        else
        {
            // 跳跃
            y = next[y];
        }
    }

    // 若y==m表示模式串全部匹配成功
    // 则主串当前的索引位置 减去 模式串长度即可得到 答案
    // 若主串索引x>=n,但是y<m表示匹配失败，返回-1
    return y==m ? x-y : -1;
}

四、扩展用例

1.另一棵树的子树

leetCode 572 Easy

题目地址

（1）暴力递归

 // 方法1
 // 暴力递归
class Solution {
public:
    static bool same(TreeNode *a, TreeNode *b)
    {
        if(a==nullptr && b == nullptr)
        {
            return true;
        }

        if(a!=nullptr && b!=nullptr)
        {
            // 判断当前节点的值是否一致
            // 判断当前节点的左边节点是否一致
            // 判断当前节点的右边节点是否一致
            return a->val == b->val && same(a->left, b->left)
                && same(a->right , b->right);
        }

        return false;
    }

    bool isSubtree(TreeNode* root, TreeNode* subRoot) {
        if(root != nullptr && subRoot != nullptr)
        {
            // 根节点递归--判断当前根结点是否一致
            // 左子树递归--判断当前左子树是否一致
            // 右子树递归--判断当前右子树是否一致
            return same(root, subRoot) || isSubtree(root->left, subRoot)
                || isSubtree(root->right, subRoot); 
        }
        return subRoot == nullptr;
    }
};

（2）对二叉树进行先序序列化，在使用KMP算法进行匹配

struct TreeNode {
     int val;
     TreeNode *left;
     TreeNode *right;
     TreeNode() : val(0), left(nullptr), right(nullptr) {}
     TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
     TreeNode(int x, TreeNode *left, TreeNode *right) : val(x), left(left), right(right) {}
};

class Solution {
public:
    // 对树进行先序序列化
    static void serial(TreeNode *root, vector<string> &path)
    {
        if(root == nullptr)
        {
            path.push_back("null");
        }
        else
        {
            // 将根节点的值添加进入序列中
            path.push_back(root->val);
            // 左边递归加入至序列中
            serial(root->left,path);
            // 右边递归加入至序列中
            serial(root->right, path);
        }
    }

    // 获取获取模式串的next数组
    static vector<int> get_next(vector<string> s2)
    {
        int size = s2.size();
        vector<int> next(size);
        if(size > 2)
        {
            next[0] = -1;
            next[1] = 0;
        }
        else if(size == 2)
        {
            next[0] = -1;
            next[1] = 0;
            return next;
        }
        else
        {
            next[0] = -1;
            return next;
        }

        int i = 2, cn=0;
        while(i<size)
        {
            if(s2[i-1] == s2[cn])
            {
                next[i] = ++cn;
                i++;
            }
            else if(cn > 0)
            {
                cn = next[cn];
            }
            else
            {
                next[i] = 0;
                i++;
            }
        }

        return next;
    }

    // kmp算法
    static int kmp(vector<string> s1, vector<string> s2)
    {
        int n = s1.size();
        int m = s2.size();
        int x = 0, y=0;

        // 获取模式串的next数组
        vector<int> next = get_next(s2);
        while(x<n && y<m)
        {
            if(s1[x] == s2[y])
            {
                x++;
                y++;
            }
            else if(y==0)
            {
                x++;
            }
            else
            {
                y = next[y];
            }
        }

        return y==m ? x-y : -1;
    }

    bool isSubtree(TreeNode* root, TreeNode* subRoot) {
        if(root != nullptr && subRoot!= nullptr)
        {
            vector<string> s1;
            vector<string> s2;
            // 将主树与匹配树进行先序序列化
            serial(root, s1);
            serial(subRoot, s2);
            return kmp(s1, s2) !=-1;
        }

        // 匹配树为nullptr，则直接返回true
        return subRoot == nullptr;
    }
};