【清华大模型公开课】学习笔记

作者:wallace-lai
发布:2024-04-30
更新:2024-06-04

课程大纲

What to Teach

  • Big picture knowledge about big models in NLP and beyond

    • Basics of NLP and neural models

    • Why and how NLP models become larger

    • New paradigms and methods in big models

    • Open problems and challenges in big model systems

  • Ability to use open-source toolkits to build practical systems based on big models

    • Utilization of big models is not easy due to the giant size

    • Learn to use open-source toolkits

  • Ability to solve novel open problems with the power of big models

    • How to review related works in the related area

    • How to identify the key challenges for the task

    • How to figure out solutions to the challenge from big models’ point of view

课程计划

  • Basic Knowledge of Big Models

    • L1 - NLP & Big Model Basics (GPU server Linux, Bash, Conda,…)

    • L2 - Neural Network Basics (PyTorch)

    • L3 - Transformer and PLMs (Huggingface Transformers)

  • Key Technology of Big Models

    • L4 - Prompt Tuning & Delta Tuning(OpenPrompt,OpenDelta)

    • L5 - Efficient Training & Model Compression (OpenBMB suite)

    • L6 - Big-Model-based Text understanding and generation

  • Interdisciplinary Application of Big Models

    • L7 - Big Models X Biomedical Science

    • L8 - Big Models X LegalIntelligence

    • L9 - Big Models X Brain and Cognitive Science

自然语言处理基础:基础与应用

Scientific Impact of NLP

  • TuringTest : A test of machine ability to exhibit intelligent behavior indistinguishable from a human

  • Language is the communication tool in the test

图灵测试在图灵最初的论文中的表述是模仿游戏,即一旦机器表现地像一个人即认为机器具有了人类智能。这也是所谓的鸭子定律。

  • Natural language question-answering

  • 2011 : IBM Watson DeepQA system competed on Jeopardy!and received the first place

  • A new milestone of Al after DeepBlue won world champion of chees in 1997

还有一个里程碑是2016年谷歌的DeepMind开发的AlphaGo击败了人类围棋冠军。

A Nice Review on NLP

  • Advances in Natural Language Processing [2015 Science]

Typical Tasks & Applications

NLP的基本任务:

(1)词性标注:识别并标注单词的词性(动词、名词、形容词等)

(2)命名实体识别:现实世界中的实体(人名、地名、机构名)的识别

(3)共指消解:理解句子中的代词具体指代的对象

(4)依赖关系识别:

Structural Knowledge

自然语言处理与人类的结构化知识有密切的关系,反映现实世界的知识都是隐藏在文本中的。

Knowledge Graph

知识图谱相当于是把全世界关于现实世界的一些实体关系组织成了一个大的网络

Machine Reading

Machine Reading在于让机器自动地阅读文本内容,从文本内容中挖掘出相关的一些结构化的知识。

自然语言处理基础:词表示与语言模型

大模型基础:大模型之旅

大模型基础:大模型背后的范式