题目要求
现有一个文件dictionary.txt
,要求对数据进行处理,分别将英文单词和中文解释保存到mysql数据库,并且可通过输入关键字进行模糊匹配查询。dictionary.txt
文件格式1
2
3
4
5
6abandon v.抛弃,放弃
abandonment n.放弃
abbreviation n.缩写
abeyance n.缓办,中止
abide v.遵守
ability n.能力
题目分析
- 数据库建表,表字段至少包括
单词
和含义
。涉及知识点为SQLAlchemy
,主要是创建表。 - 读取文件并对内容进行处理,然后插入数据库。涉及知识点为
文件读写
、字符串处理
、SQLAlchemy
插入数据。 - 通过关键字进行模糊匹配。涉及知识点为
SQLAlchemy
查询。实现
数据库建表
createTable.py
1
2
3
4
5
6
7
8
9
10
11
12
13from sqlalchemy import create_engine, Column, String, Integer
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine("mysql+pymysql://root:123456@localhost/test")
Base = declarative_base()
class Dictionary(Base):
__tablename__ = 'dictionary'
id = Column(Integer, primary_key=True)
wod = Column(String(50))
mean = Column(String(50))
Base.metadata.create_all(engine)
mysql数据库查询1
2
3
4
5
6
7
8
9mysql> desc dictionary;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| key | varchar(50) | YES | | NULL | |
| value | varchar(50) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
数据处理
insertData.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23import codecs
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from fuzzyMatching.createTable import Dictionary
engine = create_engine("mysql+pymysql://root:123456@localhost/test?charset=utf8")
DBsession = sessionmaker(bind=engine)
session = DBsession()
class HandlerData(object):
def __init__(self, dataFile):
self.dataFile = dataFile
def make_data_to_str(self):
with codecs.open(self.dataFile, encoding='utf-8') as file:
for (num, value) in enumerate(file):
line = value.strip().split()
diction = Dictionary(id=num+1, wod=line[0], mean=line[1])
session.add(diction)
session.commit()
handlerData = HandlerData('dictionary.txt')
handlerData.make_data_to_str()
session.close()
mysql数据库查询结果1
2
3
4
5
6
7mysql> select count(*) from dictionary;
+----------+
| count(*) |
+----------+
| 7988 |
+----------+
1 row in set (0.04 sec)
数据模糊匹配
fuzzyMatch.py
1
2
3
4
5
6
7
8
9
10
11from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from fuzzyMatching.createTable import Dictionary
engine = create_engine("mysql+pymysql://root:123456@localhost/test?charset=utf8")
DBsession = sessionmaker(bind=engine)
session = DBsession()
word = input("Please input a word:")
zero_result = session.query(Dictionary).filter(Dictionary.wod.like("%{0}%".format(word))).all()
for i in zero_result:
print(i.id, i.wod, i.mean)
输出结果1
2
3
4
5
6
7
8Please input a word:test
]1517 contest n.,v.争夺,比赛
1518 contestant n.参赛人
5604 protest v.,n.抗议,反对
5605 protestant n.清教徒,新教徒
7159 test v.,n.测验,试验
7160 testify v.证明,证实
7161 testimony n.证据
错误提示
- UnicodeEncodeError: ‘latin-1’ codec can’t encode characters in position 18-22: ordinal not in range(256)
- 查询结果汉字乱码
处理方法
在创建engine时需要添加?charset=utf8
参数