为网友们分享了相关的编程文章,网友任英华根据主题投稿了本篇教程内容,涉及到python、re、正则、匹配、过滤、字符串、python re.sub 正则表达式过滤指定字符相关内容,已被837网友关注,相关难点技巧可以阅读下方的电子资料。
python re.sub 正则表达式过滤指定字符
实例代码
re.sub(pattern, repl, string, count=0, flags=0) Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes such as \j are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern. For example: >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):', ... r'static PyObject*\npy_\1(void)\n{', ... 'def myfunc():') 'static PyObject*\npy_myfunc(void)\n{' If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example: >>> def dashrepl(matchobj): ... if matchobj.group(0) == '-': return ' ' ... else: return '-' >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files') 'pro--gram files' >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE) 'Baked Beans & Spam' The pattern may be a string or an RE object. The optional argument count is the maximum number of pattern occurrences to be replaced; countmust be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous match, so sub('x*', '-', 'abc') returns '-a-b-c-'. In addition to character escapes and backreferences as described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number>uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE. Changed in version 2.7: Added the optional flags argument.
简单来说,pattern 就是一个需要被替换的正则表达式,当其匹配成功后,就会用 repl 进行替换,而 string 就是需要被替换的字符串。
比如,我们需要对一个字符串进行处理,首先需要删除括号内所有内容,包括括号,其次,删除空格,然后按照逗号将其进行分割
s = '物流 企业, 生产效率, 数据包络分析(DEA),Window Analysis,' r1 = re.compile('[((].*?[))]|\s|,$') # [((] 匹配英文( 或者中文( # [((]\w*?[))] 匹配以中文括号或者英文括号括起来的 \w*? # \w*? 匹配字母、数字、下划线,重复任意次,尽可能少重复 # | 逻辑或 # \s 任意空白符 # ,$ 从最后开始的一个逗号 s1 = re.sub(r1,'',s) # 对于正则表达式进行 '' 替换,等效于删除正则表达式可以匹配的内容 # 按照逗号分割 s2 = s1.split(',') # 按照逗号将其分割 for x in s2: print x