【python】re.error: bad character range_raise source.error(msg, len(this) + 1 + len(that))-CSDN博客

本文链接：https://blog.csdn.net/qq_41536059/article/details/111930631

博客讲述了使用Python分割中文句子时遇到报错的情况。参考提示得知，re分割字符串时，分隔符集合需按ASCII值从小到大排列，原代码顺序不符合要求，更改分隔符顺序后问题得到解决。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

python 分割中文句子的时候报错：

  File "C:\Users\Admin\anaconda3\envs\NLP\lib\re.py", line 215, in split
    return _compile(pattern, flags).split(string, maxsplit)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\re.py", line 288, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 924, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 420, in _parse_sub
    not nested and not items))
  File "C:\Users\Admin\anaconda3\envs\NLP\lib\sre_parse.py", line 574, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range ）-  at position 15

出错代码点：

txt_split = re.split(r'[，,.。！!；;：:?？、（）- ]', txt_process.strip())

参考这位仁兄：re分割字符串时，所用的分隔符集合必须按其ASCII值的大小从小到大排列

而我原代码里的顺序为：

print([ord(x) for x in '，,.。！!；;：:?？、（）- '])

[65292, 44, 46, 12290, 65281, 33, 65307, 59, 65306, 58, 63, 65311, 12289, 65288, 65289, 45, 32]

更改分隔符的顺序后，解决~

txt_split = re.split(r'[ !,-.:;?、。！（），：；？]', txt_process.strip())
print([ord(x) for x in ' !,-.:;?、。！（），：；？'])

[32, 33, 44, 45, 46, 58, 59, 63, 12289, 12290, 65281, 65288, 65289, 65292, 65306, 65307, 65311]