5 synsets: Name code.n.01 POS: n Definition: a set of rules or principles or laws (especially written ones) Examples: [] Lemmas: [Lemma('code.n.01.code'), Lemma('code.n.01.codification')] Antonyms: [] Hypernyms: [Synset('written_communication.n.01')] Instance Hpernyms: [] Part Holonyms: [] Part Meronyms: []
...
Name code.n.03 POS: n Definition: (computer science) the symbolic arrangement of data or instructions ina computer program or the set of such instructions Examples: [] Lemmas: [Lemma('code.n.03.code'), Lemma('code.n.03.computer_code')] Antonyms: [] Hypernyms: [Synset('coding_system.n.01')] Instance Hpernyms: [] Part Holonyms: [] Part Meronyms: []
...
Name code.v.02 POS: v Definition: convert ordinary language into code Examples: ['We should encode the message for security reasons'] Lemmas: [Lemma('code.v.02.code'), Lemma('code.v.02.encipher'), Lemma('code.v.02.cipher'), Lemma('code.v.02.cypher'), Lemma('code.v.02.encrypt'), Lemma('code.v.02.inscribe'), Lemma('code.v.02.write_in_code')] Antonyms: [] Hypernyms: [Synset('encode.v.01')] Instance Hpernyms: [] Part Holonyms: [] Part Meronyms: []
同义词集 和 词元 在词网里是按照树状结构组织起来的,下面的代码会给出直观的展现:
1 2 3 4 5 6 7 8 9
def hypernyms(synset): return synset.hypernyms()
synsets = wordnet.synsets('soccer') for synset in synsets: print(synset.name() + " tree:") pprint(synset.tree(rel=hypernyms)) print()
# 一个单词可能有多个 同义词集,需要把 word1 的每个同义词集和 word2 的每个同义词集分别比较 for s1 in syn1: for s2 in syn2: print("Path similarity of: ") print(s1, '(', s1.pos(), ')', '[', s1.definition(), ']') print(s2, '(', s2.pos(), ')', '[', s2.definition(), ']') print(" is", s1.path_similarity(s2)) print()
1 2 3 4 5 6 7 8 9 10
Path similarity of: Synset('football.n.01') ( n ) [ any of various games played with a ball (round or oval) in which two teams try to kick or carry or propel the ball into each other's goal ] Synset('soccer.n.01') ( n ) [ a football game in which two teams of 11 players try to kick or head a ball into the opponents' goal ] is0.5
Path similarity of: Synset('football.n.02') ( n ) [ the inflated oblong ball used in playing American football ] Synset('soccer.n.01') ( n ) [ a football game in which two teams of 11 players try to kick or head a ball into the opponents' goal ] is0.05
Path similarity of: Synset('code.n.01') ( n ) [ a set of rules or principles or laws (especially written ones) ] Synset('bug.n.02') ( n ) [ a fault or defect in a computer program, system, or machine ] is 0.1111111111111111 ... Path similarity of: Synset('code.n.02') ( n ) [ a coding system used for transmitting messages requiring brevity or secrecy ] Synset('bug.n.02') ( n ) [ a fault or defect in a computer program, system, or machine ] is 0.09090909090909091 ... Path similarity of: Synset('code.n.03') ( n ) [ (computer science) the symbolic arrangement of data or instructions in a computer program or the set of such instructions ] Synset('bug.n.02') ( n ) [ a fault or defect in a computer program, system, or machine ] is 0.09090909090909091
subject_tags = ["NN", "NNS", "NP", "NNP", "NNPS", "PRP", "PRP$"] def subject(sentence_tree): for tagged_word in sentence_tree: # A crude logic for this case - first word with these tags is considered subject if tagged_word[1] in subject_tags: return tagged_word[0]
print("Subject:", subject(tree))
结果显示主语是 I:
1 2
Subject: I
这是一个比较基础的文本分析步骤,可以用到更广泛的应用场景中。 比如,在聊天机器人方面,如果用户告诉机器人:“给我妈妈 Jane 预订一张机票,1 月 1 号伦敦飞纽约的”,机器人可以用这种分析方法解读这个指令: