Python–从文本中提取标签
原文:https://www . geesforgeks . org/python-extract-hashtags-from-text/
一个哈希表是一个关键字或短语,前面有一个哈希符号(#),写在帖子或评论中以突出显示它并方便搜索。一些例子是:#like,#gfg,#自拍 我们得到了一个包含标签的字符串,我们必须将这些标签提取到一个列表中并打印出来。
示例:
Input : GeeksforGeeks is a wonderful #website for #ComputerScience
Output : website , ComputerScience
Input : This day is beautiful! #instagood #photooftheday #cute
Output : instagood, photooftheday, cute
方法 1:
- 使用 Split()方法将文本拆分成单词。
- 对于每个单词,检查第一个字符是否是哈希符号(#)。
- 如果是,则将该单词添加到没有哈希符号的哈希表列表中。
- 打印标签列表。
Python 3
# function to print all the hashtags in a text
def extract_hashtags(text):
# initializing hashtag_list variable
hashtag_list = []
# splitting the text into words
for word in text.split():
# checking the first character of every word
if word[0] == '#':
# adding the word to the hashtag_list
hashtag_list.append(word[1:])
# printing the hashtag_list
print("The hashtags in \"" + text + "\" are :")
for hashtag in hashtag_list:
print(hashtag)
if __name__=="__main__":
text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
text2 = "This day is beautiful ! # instagood # photooftheday # cute"
extract_hashtags(text1)
extract_hashtags(text2)
输出:
The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute
方法二:使用正则表达式。
Python 3
# import the regex module
import re
# function to print all the hashtags in a text
def extract_hashtags(text):
# the regular expression
regex = "#(\w+)"
# extracting the hashtags
hashtag_list = re.findall(regex, text)
# printing the hashtag_list
print("The hashtags in \"" + text + "\" are :")
for hashtag in hashtag_list:
print(hashtag)
if __name__=="__main__":
text1 = "GeeksforGeeks is a wonderful # website for # ComputerScience"
text2 = "This day is beautiful ! # instagood # photooftheday # cute"
extract_hashtags(text1)
extract_hashtags(text2)
输出:
The hashtags in "GeeksforGeeks is a wonderful #website for #ComputerScience" are :
website
ComputerScience
The hashtags in "This day is beautiful! #instagood #photooftheday #cute" are :
instagood
photooftheday
cute
版权属于:月萌API www.moonapi.com,转载请注明出处