文章用Python创建API的正确方法
FlyAI文献翻译英文原文::The Right Way to Build an API with Python 标签:Python How can we set up a way to communicate from one software instance to another? It sounds simple, and — to be completely honest — it is. All we need is an API. An API (Application Programming Interface) is a simple interface that defines the types of requests (demands/questions, etc.) that can be made, how they are made, and how they are processed. In our case, we will be building an API that allows us to send a range of GET/POST/PUT/PATCH/DELETE requests (more on this later), to different endpoints, and return or modify data connected to our API. We will be using the Flask framework to create our API and Postman to test it. In short, we will cover: 我们如何设置一种从一个软件实例到另一个软件实例的通信方式?这听起来很简单,而且—说实话—确实如此。 我们需要的只是一个API。 API(应用程序编程接口)是一个简单的接口,它定义了可以提出的请求(需求/问题等)的类型,如何提出请求,以及如何处理这些请求。 在我们的案例中,我们将构建一个API,允许我们向不同的端点发送一系列GET/POST/PUT/PATCH/DELETE请求(稍后会有更多介绍),并返回或修改连接到我们API的数据。 我们将使用Flask框架来创建我们的API,并使用Postman来测试它。总之,我们将涵盖以下内容: > Setup - Our Toy Data - Initialize a Flask API - Endpoints - Running a Local Server > Writing Our API - GET - POST - 401 Unauthorized - PUT - DELETE - Users Class (summary) > That's It! Setup Our API will contain two endpoints, users and locations. The former will allow access to our registered user’s details, whereas the latter will include a list of cafe locations. The hypothetical use-case here is of a multi-million cafe bookmarking app, where users open the app and bookmark their favorite cafe’s — like Google Maps, but not useful. 设置 我们的API将包含两个端点,用户和地点。前者将允许访问我们的注册用户的详细信息,而后者将包括咖啡馆位置的列表。 这里的假设用例是一个百万级的咖啡馆书签应用,用户打开应用,将自己喜欢的咖啡馆加入书签—就像谷歌地图一样,但没有用。 Our Toy Data For the sake of simplicity, we are going to store this data in two local CSV files. In reality, you probably want to take a look at something like MongoDB or Google Firebase. Our CSV files look like this: 我们的小型数据 为了简单起见,我们将把这些数据存储在两个本地CSV文件中。在现实中,你可能想看看MongoDB或Google Firebase这样的东西。 我们的CSV文件是这样的: User’s data in users.csv. Image by Author. user.csv中的用户数据。图片由作者提供。 Location mappings in locations.csv. Image by Author. You can download users.csv here, and locations.csv here. locations.csv中的位置映射。图片由作者提供。 你可以在这里下载 users.csv,在这里下载 locations.csv。 Initialize a Flask API Now to our Python script, we need to import modules and initialize our API, like so: 初始化一个Flask API 现在到我们的Python脚本,我们需要导入模块并初始化我们的API,像这样。 from flask import Flask from flask_restful import Resource, Api, reqparse import pandas as pd import astapp = Flask(__name__) api = Api(app) Endpoints As we already touched on, our API will have two endpoints, users and locations. The result of this is — if our API was located at www.api.com, communication with the Users class would be provided at www.api.com/users and Locations at www.api.com/locations. To create an endpoint, we define a Python class (with any name you want) and connect it to our desired endpoint with api.add_resource, like this: 端点 正如我们已经提到的,我们的API将有两个端点,用户和位置。 这样做的结果是—如果我们的API位于www.api.com,那么与Users类的通信将在www.api.com/users,而Location则在www.api.com/locations。 要创建一个端点,我们定义一个Python类(用任何你想要的名字),然后用api.add_resource将它连接到我们所需的端点,就像这样。 class Users(Resource): # methods go here pass api.add_resource(Users, '/users') # '/users' is our entry point Flask needs to know that this class is an endpoint for our API, and so we pass Resource in with the class definition. Inside the class, we include our HTTP methods (GET, POST, DELETE, etc.). Finally, we link our Users class with the /users endpoint using api.add_resource. Because we want two endpoints, we replicate the logic: Flask需要知道这个类是我们的API的一个端点,所以我们将Resource与类定义一起传递进去。 在这个类里面,我们包含了我们的 HTTP 方法(GET、POST、DELETE 等)。 最后,我们使用 api.add_resource 将 Users 类与 /users 端点连接起来。 因为我们想要两个端点,所以我们复制逻辑。 class Users(Resource): # methods go here pass class Locations(Resource): # methods go here pass api.add_resource(Users, '/users') # '/users' is our entry point for Users api.add_resource(Locations, '/locations') # and '/locations' is our entry point for Locations Running a Local Server Finally, when we write out our API, we need to test it! To do this, we need to host our API, which we can do locally by adding app.run to the end of our script like this: 运行本地服务器 最后,当我们写出我们的API时,我们需要测试它! 要做到这一点,我们需要对我们的API进行托管,我们可以通过在我们的脚本末尾添加app.run来在本地进行测试,就像这样。 if __name__ == '__main__': app.run() # run our Flask app Now, when we run our script, we should see something like this: 现在,当我们运行我们的脚本时,我们应该看到这样的东西。 Initialization of our localhost server. Image by Author. Once our server is setup, we can test our API as we build it using Postman, if you haven’t used it before it is the de-facto standard for API testing. And, don’t worry — it’s incredibly simple to use — download Postman from here. Before going ahead, you can find the full script that we will be building here. If you’re not sure where a code snippet should go, check there! 我们的localhost服务器的初始化。图片由作者提供。 一旦我们的服务器设置好了,我们就可以使用Postman来测试我们的API,如果你以前没有使用过它,它是API测试的事实标准。而且,不用担心—使用起来非常简单—从这里下载Postman。 在开始之前,你可以在这里找到我们将构建的完整脚本。如果你不确定代码片段应该放在哪里,请在这里查看! Writing API Methods Inside each of our classes, we keep our HTTP methods, GET, POST, and DELETE. To create a GET method, we use def get(self). POST and DELETE follow the same pattern. GET The GET method is the simplest. We return all data stored in users.csv wrapped inside a dictionary, like so: 编写API的方法 在我们的每一个类里面,我们都保存着我们的HTTP方法,GET、POST和DELETE。 要创建一个GET方法,我们使用def get(self)。POST和DELETE遵循相同的模式。 GET GET方法是最简单的。我们返回所有存储在user.csv中的数据,这些数据被包裹在一个字典中,就像这样: class Users(Resources): def get(self): data = pd.read_csv('users.csv') # read CSV data = data.to_dict() # convert dataframe to dictionary return {'data': data}, 200 # return data and 200 OK code We can then run the script to initialize our API, open Postman and send a GET request to our localhost address (typically http://127.0.0.1:5000)— this is our API entry point. 然后,我们可以运行脚本来初始化我们的API,打开Postman并向我们的localhost地址(通常是http://127.0.0.1:5000)发送一个GET请求--这是我们的API入口点。 How to send a GET request to our API. Image by Author. 如何向我们的API发送GET请求。图片由作者提供。 To send a GET request to our API in Postman we: Select GET from the dropdown Type the entry point of our API instance + /users (the endpoint) Hit Send Check the status code returned by our API (we should see 200 OK) View our API’s response, which is users.csv in JSON (like a dictionary) format 要在Postman中向我们的API发送GET请求,我们: 从下拉菜单中选择GET输入我们的API实例的入口点+/users(端点)点击发送检查我们的API返回的状态代码(我们应该看到200 OK)。查看我们的API响应,它是JSON格式的user.csv(像一个字典) POST The POST method allows us to add records to our data. In this case, we will take arguments for usedId, name, and city. These arguments are passed to our API endpoint as URL parameters, which look like this: POST POST方法允许我们向我们的数据添加记录。在本例中,我们将接受 usedId、name 和 city 的参数,这些参数作为 URL 参数传递给我们的 API 端点。 这些参数以URL参数的形式传递给我们的API端点,它看起来像这样。 http://127.0.0.1:5000/users?userId=abc123&name=The Rock&city=Los Angeles We can specify the required parameters, and then parse the values provided using reqparse — like this: 我们可以指定所需的参数,然后使用reqparse解析所提供的值—就像这样。 parser = reqparse.RequestParser() # initialize parser.add_argument('userId', required=True) # add arguments parser.add_argument('name', required=True) parser.add_argument('city', required=True) args = parser.parse_args() # parse arguments to dictionary Let’s break our parser code down: We initialize our parser with .RequestParser(). Add our arguments with .add_argument([arg_name], required) — note that required=True means that the argument is required in the request. Alternatively, we can add optional arguments with required=False. Parse our arguments and their values into a Python dictionary using .parse_args(). We can then access the values passed to each argument like we usually would with key-value pairs in a dictionary. Let’s put those together to add values to our CSV: 让我们分解一下我们的解析器代码。 我们用.RequestParser()来初始化我们的解析器。 用.add_argument([arg_name], required)来添加我们的参数—注意,required=True意味着该参数在请求中是必需的。另外,我们也可以用 required=False 添加可选的参数。 使用 .parse_args() 将我们的参数和它们的值解析成一个 Python 字典。 然后,我们可以像通常使用字典中的键值对那样访问传递给每个参数的值。 让我们把这些放在一起,把值添加到我们的 CSV 中。 class Users(Resource): def post(self): parser = reqparse.RequestParser() # initialize parser.add_argument('userId', required=True) # add args parser.add_argument('name', required=True) parser.add_argument('city', required=True) args = parser.parse_args() # parse arguments to dictionary # create new dataframe containing new values new_data = pd.DataFrame({ 'userId': args['userId'], 'name': args['name'], 'city': args['city'], 'locations': [[]] }) # read our CSV data = pd.read_csv('users.csv') # add the newly provided values data = data.append(new_data, ignore_index=True) # save back to CSV data.to_csv('users.csv', index=False) return {'data': data.to_dict()}, 200 # return data with 200 OK If it’s starting to look a little more confusing — all we’re doing is: Creating a row of new data new_data from the URL parameters args Appending it to the pre-existing data Saving the newly merged data And, returning data alongside a 200 OK status code. 如果它开始看起来有点混乱—我们正在做的是: 从URL参数args中创建一行新的数据new_data将其添加到已有的数据中保存新合并的数据然后,在 200 OK 状态码旁边返回数据 We create a new user by sending a POST request containing userId, name, and city parameters to our /user endpoint. Image by Author. 我们通过向/user端点发送一个包含userId、名称和城市参数的POST请求来创建一个新用户。图片由作者提供。 We can now send a POST request to create a new user, easy! 我们现在可以发送一个POST请求来创建一个新用户,非常简单! 401 Unauthorized Our code handles POST requests, allowing us to write new data to users.csv — but what if that user already exists? For that, we need to add a check. If the userId already exists, we return a 401 Unauthorized code to the user. 401 未经批准 我们的代码处理POST请求,允许我们向user.csv写入新的数据—但如果该用户已经存在怎么办? 为此,我们需要添加一个检查。如果userId已经存在,我们就返回一个401 Unauthorized代码给用户。 ... # read our CSV data = pd.read_csv('users.csv') if args['userId'] in list(data['userId']): return { 'message': f"'{args['userId']}' already exists." }, 401 else: # create new dataframe containing new values new_data = pd.DataFrame({ 'userId': args['userId'], 'name': args['name'], 'city': args['city'], 'locations': [[]] }) # add the newly provided values data = data.append(new_data, ignore_index=True) data.to_csv('users.csv', index=False) # save back to CSV return {'data': data.to_dict()}, 200 # return data with 200 OK If we try to POST again with the userId ‘abc123’, we will return the following 401 Unauthorized status code and message. Image by Author. Going back to Postman, we can test if our API is functioning by trying to add the same user twice — this time, The Rock received a 401 Unauthorized response. 如果我们尝试用用户ID’abc123’再次POST,我们将返回以下401 Unauthorized状态代码和消息。图片由作者提供。 回到Postman,我们可以通过尝试两次添加同一个用户来测试我们的API是否正常工作—这一次,The Rock收到了401 Unauthorized响应。 PUT What if we want to add a cafe to a user? We can’t use POST as this returns a 401 Unauthorized code — instead, we use PUT. Similar to POST, we need to add if-else logic in the case of the provided userId not existing. PUT 如果我们想给用户添加一个咖啡馆呢?我们不能使用POST,因为这将返回一个401 Unauthorized代码—相反,我们使用PUT。 与POST类似,我们需要在提供的userId不存在的情况下添加if-else逻辑。 class Users(Resource): def put(self): parser = reqparse.RequestParser() # initialize parser.add_argument('userId', required=True) # add args parser.add_argument('location', required=True) args = parser.parse_args() # parse arguments to dictionary # read our CSV data = pd.read_csv('users.csv') if args['userId'] in list(data['userId']): # evaluate strings of lists to lists data['locations'] = data['locations'].apply( lambda x: ast.literal_eval(x) ) # select our user user_data = data[data['userId'] == args['userId']] # update user's locations user_data['locations'] = user_data['locations'].values[0] \ .append(args['location']) # save back to CSV data.to_csv('users.csv', index=False) # return data and 200 OK return {'data': data.to_dict()}, 200 else: # otherwise the userId does not exist return { 'message': f"'{args['userId']}' user not found." }, 404 Other than a couple of small tweaks to the code, our PUT method is almost identical to POST. 除了几个小的代码调整,我们的PUT方法和POST几乎是一样的。 Here we use the PUT method to add the cafe with ID 0007 to The Rock’s bookmarked locations. Image by Author. Back in Postman, our required input parameters have changed. Now, we only need userId and a location to add to the users bookmarked locations. 这里我们使用PUT方法将ID为0007的咖啡馆添加到The Rock的书签位置。图片由作者提供。 在Postman中,我们所需的输入参数已经改变了。现在,我们只需要userId和一个位置来添加到用户的书签位置。 DELETE We can also delete records with the DELETE method. This method is pretty straightforward, we need to specify a userId to remove, and add some if-else logic in the case of a non-existent userId. So if Jill decides our app is useless and wants to leave, we would send a DELETE request containing her userId. DELETE 我们还可以通过DELETE方法来删除记录。 这个方法很直接,我们需要指定一个要删除的userId,并在userId不存在的情况下添加一些if-else逻辑。 所以,如果Jill决定我们的应用没有用,想要离开,我们就会发送一个包含她的userId的DELETE请求。 Sending a DELETE request for userId ‘b2c’ deletes Jill’s record from our user data. Image by Author. We can test this in Postman, and as expected, we return our data without Jill’s record. What if we try and delete a non-existent user? 为userId’b2c’发送DELETE请求,从我们的用户数据中删除Jill的记录。图片由作者提供。 我们可以在Postman中进行测试,正如预期的那样,我们返回的数据中没有Jill的记录。如果我们尝试删除一个不存在的用户呢? If we DELETE a userId that does not exist, we will receive a 404 Not Found status code and a message explaining that the userId does not exist. Image by Author. Again, we receive our 404 Not Found and a brief message explaining that the userId was not found. 如果我们删除一个不存在的 userId,我们将收到一个404 Not Found 状态代码和一条解释 userId 不存在的消息。图片由作者提供。 同样,我们收到了我们的404 Not Found 和一条简短的消息,解释说找不到 userId。 Users Class That’s all of the parts that make up the Users class, accessed via our /users endpoint. You can find the full script for it here. After that, we still need to put together the Locations class. This other class should allow us to GET, POST, PATCH (update), and DELETE locations. Each location is given a unique ID — when a user bookmarks a location, that unique ID is added to their locations list with PUT /users. The code for this is not that much different from what we wrote in the Users class so that we won’t repeat ourselves. However, you can find it alongside the Users class here. 用户类别 这就是构成Users类的所有部分,通过我们的/users端点访问。你可以在这里找到它的完整脚本。 之后,我们还需要把Location类放在一起。这个其他类应该允许我们GET、POST、PATCH(更新)和DELETE位置。 每个位置都被赋予一个唯一的ID—当用户将一个位置作为书签时,这个唯一的ID会通过PUT /users添加到他们的位置列表中。 这个代码和我们在Users类中写的没有太大区别,所以我们不会重复。不过,你可以在这里和Users类一起找到它。 That’s It! It’s as simple as that. Setting up an API with Flask and Python is incredibly straightforward. We now have an easy-to-use and standardized method for communicating between different interfaces. We’ve covered all of the most common request methods — GET, POST, PUT, and DELETE — and a few HTTP status codes too — 200, 401, and 404. Finally, we’ve learned how to host our API locally and test it with Postman — allowing us to quickly diagnose issues and ensure our API is behaving as intended. All-in-all, API development is a crucial skill for developers, data scientists, and almost any other tech-inclined role you can imagine. If you have any questions or ideas for improvement, let me know on Twitter or in the comments below. I hope you enjoyed the article and thank-you for reading! 就是这样! 就这么简单。用Flask和Python设置一个API是非常直接的。 我们现在有了一个易于使用和标准化的方法来在不同的接口之间进行通信。 我们已经涵盖了所有最常见的请求方法—GET、POST、PUT和DELETE—以及一些HTTP状态码—200、401和404。 最后,我们已经学会了如何在本地托管我们的API,并使用Postman进行测试—让我们能够快速诊断问题,并确保我们的API按照预期的方式运行。 总而言之,API开发是开发人员、数据科学家以及你能想象到的几乎所有其他技术型角色的一项重要技能。 如果你有任何问题或改进的想法,请在Twitter上或在下面的评论中告诉我。 希望你喜欢这篇文章,谢谢你的阅读!

0

Python3

AI小助手·今天14:21 0 阅读 4
文章玩转StyleGAN2模型:教你生成动漫人物
FlyAI文献翻译英文原文:Generating Anime Characters with StyleGAN2 标签:深度学习 Generated StyleGAN Interpolation [Image by Author] 生成的样式GAN插值[图片由作者提供] Generative Adversarial NetworkGenerative Adversarial Network (GAN) is a generative model that is able to generate new content. The topic has become really popular in the machine learning community due to its interesting applications such as generating synthetic training data, creating arts, style-transfer, image-to-image translation, etc. 生成式对抗网络 生成式对抗网络(GAN)是一种能够生成新内容的生成式模型。由于其有趣的应用,如生成合成训练数据、创造艺术、风格转换、图像到图像的翻译等,这个话题在机器学习界真的很受欢迎。 GAN Architecture [Image by Author] GAN架构 [图片由作者提供] GAN consisted of 2 networks, the generator, and the discriminator. The generator will try to generate fake samples and fool the discriminator into believing it to be real samples. The discriminator will try to detect the generated samples from both the real and fake samples. This interesting adversarial concept was introduced by Ian Goodfellow in 2014. There are already a lot of resources available to learn GAN, hence I will not explain GAN to avoid redundancy. I recommend reading this beautiful article by Joseph Rocca for understanding GAN. GAN由2个网络组成,即生成器和鉴别器。生成器将尝试生成假样本,并愚弄鉴别器,使其相信是真实样本。鉴别器将试图从真假样本中检测出生成的样本。这个有趣的对抗性概念是由Ian Goodfellow在2014年提出的。已经有很多资源可以用来学习GAN,因此为了避免冗余,我就不解释GAN了。 我推荐大家阅读Joseph Rocca写的这篇理解GAN的美文。 Understanding Generative Adversarial Networks (GANs) 理解生成式对抗网络(GANs) StyleGAN2The StyleGAN paper, “A Style-Based Architecture for GANs”, was published by NVIDIA in 2018. The paper proposed a new generator architecture for GAN that allows them to control different levels of details of the generated samples from the coarse details (eg. head shape) to the finer details (eg. eye-color). StyleGAN also incorporates the idea from Progressive GAN, where the networks are trained on lower resolution initially (4x4), then bigger layers are gradually added after it’s stabilized. By doing this, the training time becomes a lot faster and the training is a lot more stable. 风格GAN2 英伟达在2018年发表了StyleGAN论文《A Style-Based Architecture for GANs》。该论文为GAN提出了一种新的生成器架构,允许他们控制生成样本的不同细节水平,从粗糙的细节(如头部形状)到更精细的细节(如眼睛颜色)。 StyleGAN还融合了Progressive GAN的思想,即网络最初在较低分辨率(4x4)上进行训练,稳定后再逐步增加更大的层数。这样做,训练时间变得更快,训练也更稳定。 Progressive Growing GAN [Source: Sarah Wolf] StyleGAN improves it further by adding a mapping network that encodes the input vectors into an intermediate latent space, w, which then will have separate values be used to control the different levels of details. 渐进式增长GAN[来源:萨拉-沃尔夫] StyleGAN进一步改进了它,增加了一个映射网络,将输入向量编码成一个中间的潜伏空间,w,然后将有单独的值用来控制不同层次的细节。 StyleGAN Generator Architecture [Image by Author] StyleGAN生成器架构 [图片由作者提供]。 Why add a mapping network?One of the issues of GAN is its entangled latent representations (the input vectors, z). For example, let’s say we have 2 dimensions latent code which represents the size of the face and the size of the eyes. In this case, the size of the face is highly entangled with the size of the eyes (bigger eyes would mean bigger face as well). On the other hand, we can simplify this by storing the ratio of the face and the eyes instead which would make our model be simpler as unentangled representations are easier for the model to interpret. With entangled representations, the data distribution may not necessarily follow the normal distribution where we want to sample the input vectors z from. For example, the data distribution would have a missing corner like this which represents the region where the ratio of the eyes and the face becomes unrealistic. 为什么要增加一个映射网络? GAN的问题之一是它的纠缠潜码表示(输入向量,z)。例如,假设我们有2个维度的潜伏码,它代表了脸的大小和眼睛的大小。在这种情况下,人脸的大小与眼睛的大小高度纠缠在一起(眼睛越大也就意味着人脸越大)。另一方面,我们可以通过存储脸部和眼睛的比例来简化这个问题,这将使我们的模型更简单,因为无纠缠的表示方式更容易让模型解释。 在纠缠表示下,数据分布不一定遵循正态分布,我们希望从那里抽取输入向量z的样本。例如,数据分布会有这样一个缺角,它代表了眼睛和脸部的比例变得不现实的区域。 [Source: Paper] [来源:文件] If we sample the z from the normal distribution, our model will try to also generate the missing region where the ratio is unrealistic and because there Is no training data that have this trait, the generator will generate the image poorly. Therefore, the mapping network aims to disentangle the latent representations and warps the latent space so it is able to be sampled from the normal distribution. 如果我们从正态分布中对z进行采样,我们的模型会试图也生成缺失的区域,其中的比例是不现实的,由于没有具有这种特征的训练数据Is,生成器会生成不良的图像。因此,映射网络的目的是拆分潜伏表征,并扭曲潜伏空间,使其能够从正态分布中采样。 [Source: Paper] [来源:文件] Additionally, Having separate input vectors, w, on each level allows the generator to control the different levels of visual features. The first few layers (4x4, 8x8) will control a higher level (coarser) of details such as the head shape, pose, and hairstyle. The last few layers (512x512, 1024x1024) will control the finer level of details such as the hair and eye color. 此外,在每个层次上有单独的输入向量w,允许生成器控制不同层次的视觉特征。前几层(4x4,8x8)将控制更高级别的细节,如头型、姿势和发型。最后几层(512x512,1024x1024)将控制更精细的细节,如头发和眼睛的颜色。 Variations in coarse level details (head shape, hairstyle, pose, glasses) [Source: Paper] 粗层次细节的变化(头型、发型、姿势、眼镜)[来源:文件] Variation in fine level details (hair color) [Source: Paper] 细部细节的变化(发色)[来源:文件] For full details on StyleGAN architecture, I recommend you to read NVIDIA’s official paper on their implementation. Here is the illustration of the full architecture from the paper itself. 关于StyleGAN架构的完整细节,我推荐大家阅读NVIDIA官方关于他们实现的论文。下面是论文本身的完整架构图示。 [Source: A Style-Based Architecture for GANs Paper] [来源:基于风格的全球行动网文件架构] Stochastic Variation StyleGAN also allows you to control the stochastic variation in different levels of details by giving noise at the respective layer. Stochastic variations are minor randomness on the image that does not change our perception or the identity of the image such as differently combed hair, different hair placement and etc. You can see the effect of variations in the animated images below. 随机变化 StyleGAN还允许你通过在各自的图层上给予噪声来控制不同层次的细节的随机变化。随机变化是图像上的小随机性,不会改变我们的感知或图像的身份,如不同的梳理的头发,不同的头发位置等。你可以在下面的动画图像中看到变化的效果。 Coarse Stochastic Variation [Source: Paper] 粗随机变化[来源:论文] Fine Stochastic Variation [Source: Paper] 精细随机变化[来源:论文] StyleGAN also made several other improvements that I will not cover in these articles such as the AdaIN normalization and other regularization. You can read the official paper, this article by Jonathan Hui, or this article by Rani Horev for further details instead. StyleGAN还做了一些其他的改进,我在这些文章中就不一一介绍了,比如AdaIN的规范化和其他常规化。你可以阅读官方论文,Jonathan Hui的这篇文章,或者Rani Horev的这篇文章来代替阅读进一步的细节。 Truncation Trick When there is an underrepresented data in the training samples, the generator may not be able to learn the sample and generate it poorly. To avoid this, StyleGAN uses a “truncation trick” by truncating the intermediate latent vector w forcing it to be close to average. The

0

图像分类

AI小助手·2020-09-24 16:35 0 阅读 35
文章数学之美:贝叶斯优化
FlyAI文献翻译英文原文::The Beauty of Bayesian Optimization, Explained in Simple Terms 标签:深度学习 Here’s a function: f(x). It’s expensive to calculate, not necessarily an analytic expression, and you don’t know its derivative. Your task: find the global minima. This is, for sure, a difficult task, one more difficult than other optimization problems within machine learning. Gradient descent, for one, has access to a function’s derivatives and takes advantage of mathematical shortcuts for faster expression evaluation. Alternatively, in some optimization scenarios the function is cheap to evaluate. If we can get hundreds of results for variants of an input x in a few seconds, a simple grid search can be employed with good results. Alternatively, an entire host of non-conventional non-gradient optimization methods can be used, like particle swarming or simulated annealing. 问题定义:给定函数f(x),该函数计算成本高、甚至可能不是解析表达式,同时假定函数导数未知。 你的任务:找到函数得全局最小值。 这无疑是一项艰巨的任务,比机器学习中的其他优化问题还要困难。一般得优化问题可以通过以下三种方式求解: 梯度下降方法依赖函数求导,通过数学方法快速估计表达式。 函数的评估成本很低得优化场景下,可以在很短时间内获得输入x的许多结果,然后使用简单的网格搜索选择较好结果。 使用粒子群或模拟退火等非梯度优化方法。 Unfortunately, the current task doesn’t have these luxuries. We are limited in our optimization by several fronts, notably: It’s expensive to calculate. Ideally we would be able to query the function enough to essentially replicate it, but our optimization method must work with a limited sampling of inputs. The derivative is unknown. There’s a reason why gradient descent and its flavors still remain the most popular methods for deep learning, and sometimes, in other machine learning algorithms. Knowing the derivative gives the optimizer a sense of direction — we don’t have this. We need to find the global minima, which is a difficult task even for a sophisticated method like gradient descent. Our model somehow will need a mechanism to avoid getting caught in local minima. The solution: Bayesian optimization, which provides an elegant framework for approaching problems that resemble the scenario described to find the global minimum in the smallest number of steps. 然而,这些方法并不适用上述定义的问题,对定义问句的优化受到以下几个方面的限制: 计算成本高。理想情况下,我们可以多次执行函数以确定其最优解,但我们的优化问题中计算过多采样是不现实的。 导数未知。 正是因为导数可知,梯度下降及类似方法广泛应用于深度学习或某些机器学习算法。导数能够直到优化方向——不幸的是,在我们问题定义中没有导数。 要找到全局最小值,即使对于梯度下降这样的方法也不是容易的事情。因此,我们的模型需要某种机制避免陷入局部最小值。 解决方案:贝叶斯优化。该方法提供了一个优雅的框架可用于来解决上述定义的问题,并且能够在尽可能少的步骤中找到全局最小值。 Let’s construct a hypothetical example of function c(x), or the cost of a model given some input x. Of course, what the function looks like will be hidden from the optimizer; this is the true shape of c(x). This is known in the lingo as the ‘objective function’. 让我们构造一个函数c(x)或者一个接收输入x的模型,如下图所示为c(x)的形状。当然,优化器并不知道该函数,称之为“目标函数”。 Bayesian optimization approaches this task through a method known as surrogate optimization. For context, a surrogate mother is a women who agrees to bear a child for another person — in that context, a surrogate function is an approximation of the objective function. The surrogate function is formed based on sampled points. 贝叶斯优化通过代理优化的方式来完成任务。一般来说,surrogate mother是指为另一个人生育孩子的代孕妇女——在本文的情况中,则是指目标函数的近似。 代理函数通过采样点模拟构造(见下图)。 Based on the surrogate function, we can identify which points are promising minima. We decide to sample more from these promising regions and update the surrogate function accordingly. 根据代理函数,我们大致可以确定哪些点是可能的最小值。然后再这些点附近做更多的采样,并随之更新代理函数。 Each iteration, we continue to look at the current surrogate function, learn more about areas of interest by sampling, and update the function. Note that the surrogate function will be mathematically expressed in a way that is significantly cheaper to evaluate (e.g. y=x to be a an approximation for a more costly function, y=arcsin((1-cos²x)/sin x) within a certain range). After a certain number of iterations, we’re destined to arrive at a global minima, unless the function’s shape is very bizarre (in that it has large and wild up-and-down swings) at which a better question than optimization should be asked: what’s wrong with your data? Take a moment to marvel at the beauty of this approach. It doesn’t make any assumptions about the function (except that it is optimizable in the first place), doesn’t require information about derivatives, and is able to use common-sense reasoning through the ingenious use of a continually updated approximation function. The expensive evaluation of our original objective function is not a problem at all. 每一次迭代,我们都会继续观察当前的代用函数,通过采样了解更多感兴趣的区域,并更新函数。需要注意的是,代用函数在数学上的表达方式将大大降低评估成本(例如y=x是一个成本较高的函数的近似值,y=arcsin((1-cos²x)/sin x)在一定范围内)。 经过一定的迭代次数后,我们注定要到达一个全局最小值,除非函数的形状非常诡异(就是它的上下波动很大很疯狂),这时应该问一个比优化更好的问题:你的数据有什么问题? 花点时间惊叹一下这种方法的妙处。它不对函数做任何假设(除了它首先是可优化的),不需要导数的信息,并且能够通过巧妙地使用不断更新的逼近函数来进行常识性的推理。我们原来的目标函数的昂贵评估根本不是问题。 This is a surrogate-based approach towards optimization. So what makes it Bayesian, exactly? The essence of Bayesian statistics and modelling is the updating of a prior (previous) belief in light of new information to produce an updated posterior (‘after’) belief. This is exactly what surrogate optimization in this case does, so it can be best represented through Bayesian systems, formulas, and ideas. Let’s take a closer look at the surrogate function, which are usually represented by Gaussian Processes, which can be thought of as a dice roll that returns functions fitted to given data points (e.g. sin, log) instead of numbers 1 to 6. The process returns several functions, which have probabilities attached to them. 这是一种基于代用的优化方法。那么,到底是什么让它成为贝叶斯的呢? 贝叶斯统计和建模的本质是根据新的信息更新前(前)信念,以产生一个更新的后(’后’)信念。这正是本案例中代偿优化的作用,所以可以通过贝叶斯系统、公式和思想来最好地表示。 让我们仔细看看代用函数,通常用高斯过程来表示,它可以被认为是掷骰子,返回与给定数据点(如sin、log)拟合的函数,而不是1到6的数字。这个过程会返回几个函数,这些函数都附有概率。 Left: Several Gaussian process-generated functions for four data points. Right: The functions aggregated. Source: Oscar Knagg, image free to share. This article by Oscar Knagg gives good intuition on how GPs work. 左图:四个数据点的几个高斯过程生成的函数。右图:函数汇总。来源:Oscar Knagg,图片免费分享。 Oscar Knagg的这篇文章对GP的工作原理有很好的直观认识。 There’s a good reason why Gaussian Processes, and not some other curve-fitting method, is used to model the surrogate function: it is Bayesian in nature. A GP is a probability distribution, like a distribution of end results of an event (e.g. 1/2 chance of a coin flip), but over all possible functions. For instance, we may define the current set of data points as being 40% representable by function a(x), 10% by function b(x), etc. By representing the surrogate function as a probability distribution, it can be updated with new information through inherently probabilistic Bayesian processes. Perhaps when new information is introduced, the data is only 20% representable by function a(x). These changes are governed by Bayesian formulas. This would be difficult or even impossible to do with, say, a polynomial regression fit to new data points. 为什么用高斯过程,而不是其他的曲线拟合方法来模拟代用函数,有一个很好的理由:它是贝叶斯性质的。一个GP是一个概率分布,就像一个事件最终结果的分布(例如抛硬币的1/2机会),但在所有可能的函数上。 例如,我们可以将当前的数据点集定义为40%可由函数a(x)表示,10%可由函数b(x)表示,等等。通过将代用函数表示为一个概率分布,它可以通过固有的概率贝叶斯过程与新信息进行更新。也许当引入新的信息时,数据只有20%可以用函数a(x)表示。这些变化是由贝叶斯公式来支配的。 这将是很难甚至不可能做到的,比如说,对新数据点进行多项式回归拟合。 The surrogate function — represented as a probability distribution, the prior — is updated with an ‘acquisition function’. This function is responsible for driving the proposition of new points to test, in an exploration and exploitation trade-off: Exploitation seeks to sample where the surrogate model predicts a good objective. This is taking advantage of known promising spots. However, if we have already explored a certain region enough, continually exploiting known information will yield little gain. Exploration seeks to sample in locations where the uncertainty is high. This ensures that no major region of the space is left unexplored — the global minima may happen to lie there. An acquisition function that encourages too much exploitation and too little exploration will lead to the model to reside only a minima it finds first (usually local — ‘going only where there is light’). An acquisition function that encourages the opposite will not stay in a minima, local or global, in the first place. Yielding good results in a delicate balance. The acquisition function, which we’ll denote a(x), must consider both exploitation and exploration. Common acquisition functions include expected improvement and maximum probability of improvement, all of which measure the probability a specific input may pay off in the future, given information about the prior (the Gaussian process). 代用函数—表示为概率分布,即先验—被更新为 “获取函数”。这个函数负责在勘探和开发的权衡中提出新的测试点。 剥削力求在代用模型预测的目标好的地方采样。这就是利用已知的有希望的点。但是,如果我们已经对某一区域进行了足够的探索,那么不断地利用已知的信息就不会有什么收获。 探索力求在不确定性较高的地点进行采样。这就确保了空间的任何主要区域都不会未被探索—全局最小值可能恰好就在那里。 一个鼓励过多的开发和过少探索的获取函数将导致模型只停留在它首先发现的最小值(通常是局部的—“只去有光的地方”)。一个鼓励相反的获取函数将不会首先停留在一个最小值,本地或全球。在微妙的平衡中产生良好的结果。 acquisition 函数,我们将其表示为a(x),必须同时考虑开发和探索。常见的获取函数包括预期改进和最大改进概率,所有这些函数都是在给定先验信息(高斯过程)的情况下,衡量特定投入在未来可能得到回报的概率。 Let’s put the pieces together. Bayesian optimization can be performed as such: Initialize a Gaussian Process ‘surrogate function’ prior distribution. Choose several data points x such that the acquisition function a(x) operating on the current prior distribution is maximized. Evaluate the data points x in the objective cost function c(x) and obtain the results, y. Update the Gaussian Process prior distribution with the new data to produce a posterior (which will become the prior in the next step). Repeat steps 2–5 for several iterations. Interpret the current Gaussian Process distribution (which is very cheap to do) to find the global minima. Bayesian optimization is all about putting probabilistic ideas behind the idea of surrogate optimization. The combination of these two idea creates a powerful system with many applications, from pharmaceutical product development to autonomous vehicles. 让我们把这些东西整合起来。贝叶斯优化可以这样进行。 1.初始化一个高斯过程 “代用函数 “的先验分布。 2.选择几个数据点x,使在当前先验分布上运行的获取函数a(x)最大化。 3.评估目标成本函数c(x)中的数据点x,得到结果,y。 4.用新的数据更新高斯过程先验分布,以产生一个后验(它将成为下一步的先验)。 5.重复步骤2-5进行多次迭代。 6.解释当前的高斯过程分布(这是非常便宜的),以找到全局最小值。 贝叶斯优化就是把概率论的思想放在代入优化的思想后面。这两种思想的结合创造了一个强大的系统,从医药产品的开发到自主汽车,都有很多应用。 Most commonly in machine learning, however, Bayesian optimization is used for hyperparameter optimization. For instance, if we’re training a gradient boosting classifier, there are dozens of parameters, from the learning rate to the maximum depth to the minimum impurity split value. In this case, x represents the hyperparameters of the model, and c(x) represents the performance of the model, given hyperparameters x. The primary motivation for using Bayesian optimization is in scenarios where it is very expensive to evaluate the output. Firstly, an entire ensemble of trees needs to be built with the parameters, and secondly, they need to run through several predictions, which are expensive for ensembles. Arguably, neural network evaluation of the loss for a given set of parameters is faster: simply repeated matrix multiplication, which is very fast, especially on specialized hardware. This is one of the reasons gradient descent is used, which makes repeated queries to understand where it is going. 但在机器学习中,最常见的是贝叶斯优化用于超参数优化。例如,如果我们要训练一个梯度提升分类器,从学习率到最大深度再到最小杂质分割值,有几十个参数。在这种情况下,x代表模型的超参数,c(x)代表模型的性能,给定超参数x。 使用贝叶斯优化的主要动机是在评估输出非常昂贵的情况下。首先,需要用参数建立整个树的合集,其次,它们需要通过多次预测来运行,这对于合集来说是非常昂贵的。 可以说,神经网络评估给定参数集的损失更快:简单的重复矩阵乘法,速度非常快,尤其是在专用硬件上。这也是使用梯度下降的原因之一,它使反复查询了解其走向。 In summary: Surrogate optimization uses a surrogate, or approximation, function to estimate the objective function through sampling. Bayesian optimization puts surrogate optimization in a probabilistic framework by representing surrogate functions as probability distributions, which can be updated in light of new information. Acquisition functions are used to evaluate the probability that exploring a certain point in space will yield a ‘good’ return given what is currently known from the prior, balancing exploration & exploitation. Use Bayesian optimization primarily when the objective function is expensive to evaluate, commonly used in hyperparameter tuning. (There are many libraries like HyperOpt for this.) Thanks for reading! 综上所述: 代用优化利用代用函数或近似函数通过抽样来估计目标函数。 贝叶斯优化将代用优化置于概率框架中,将代用函数表示为概率分布,可以根据新的信息进行更新。 获取函数用于评估在当前已知的先验条件下,探索空间中某一点会产生 “好 “收益的概率,平衡探索与开发 主要在目标函数评估成本很高的时候使用贝叶斯优化,常用于超参数调整。(这方面有很多库,比如HyperOpt)。 感谢您的阅读!

0

深度学习

AI小助手·2020-09-18 14:54 0 阅读 97
文章深度详解:对象检测和图像分割的数据探索过程
FlyAI文献翻译英文原文:Data Exploration Process for Object Detection And Image Segmentation 标签:图像分割 I’ve been working with object detection and image segmentation problems for many years. An important realization I made is that people don’t put the same amount of effort and emphasis on data exploration and results analysis as they would normally in any other non-image machine learning project. Why is it so?I believe there are two major reasons for it: People don’t understand object detection and image segmentation models in depth and treat them as black boxes, in that case they don’t even know what to look at and what the assumptions are. It can be quite tedious from a technical point of view as we don’t have good image data exploration tools. In my opinion image datasets are not really an exception, understanding how to adjust the system to match our data is a critical step to success. In this article, I will share with you how I approach data exploration for image segmentation and object detection problems. Specifically: Why you should care about image and object dimensions, Why small objects can be problematic for many deep learning architectures, Why tackling class imbalances can be quite hard, Why a good visualization is worth a thousand metrics, The pitfalls of data augmentation. 多年来,我一直致力于研究目标检测和图像分割问题。我的一个重要认识是,人们在数据探索和结果分析上没有像在其他非图像机器学习项目中那样投入同样的精力和重心。 为什么会这样呢?我觉得有两个主要的原因: 人们不深入了解目标检测和图像分割模型,而是把它们当作黑盒子,在这种情况下,他们甚至不知道要看的是什么和假设是什么。从技术角度来看,这是相当乏味的,因为我们没有好的图像数据探测工具。在我看来,图像数据集并不是真正的例外,理解如何调整系统来匹配我们的数据是成功的关键一步。 在本文中,我将与您分享我是如何处理图像分割和目标检测问题的数据探索的。具体地说: 为什么要关注图像和物体的尺寸,为什么对于许多深度学习架构来说,小对象可能是一个问题,为什么解决类别失衡可能非常困难,为什么一个好的可视化效果值一千个技巧数据扩充的陷阱。 The Need for Data Exploration for Image Segmentation and Object DetectionData exploration is key to a lot of machine learning processes. That said, when it comes to object detection and image segmentation datasets there is no straightforward way to systematically do data exploration. There are multiple things that distinguish working with regular image datasets from object and segmentation ones: The label is strongly bound to the image. Suddenly you have to be careful of whatever you do to your images as it can break the image-label-mapping. Usually much more labels per image. Much more hyperparameters to tune (especially if you train on your custom datasets) This makes evaluation, results exploration and error analysis much harder. You will also find that choosing a single performance measure for your system can be quite tricky – in that case manual exploration might still be a critical step. Data Quality and Common ProblemsThe first thing you should do when working on any machine learning problem (image segmentation, object detection included) is assessing quality and understanding your data. Common data problems when training Object Detection and Image Segmentation models include: Image dimensions and aspect ratios (especially dealing with extreme values) Labels composition – imbalances, bounding box sizes, aspect ratios (for instance a lot of small objects) Data preparation not suitable for your dataset. Modelling approach not aligned with the data. Those will be especially important if you train on custom datasets that are significantly different from typical benchmark datasets such as COCO. In the next chapters, I will show you how to spot the problems I mentioned and how to address them. 数据挖掘对于图像分割和目标检测的需要 数据探索是很多机器学习过程的关键。也就是说,当涉及到目标检测和图像分割数据集时,没有直接的方法进行系统地数据探索。 在处理常规图像数据集和分割图像数据集时,有很多东西是可以区分的: 标签被强绑定在图像上。您必须非常小心对图像所做的任何操作,因为它可能破坏图像-标签-映射。通常每个图像有更多的标签。还有更多需要调整的超参数(特别是如果您对自定义数据集进行训练)这使得评价、结果探索和错误分析变得更加困难。你还会发现,给你的系统选择单一的性能度量可能会变得非常棘手,在这种情况下,手工探索可能仍然是关键的一步。 数据质量和常见问题在处理任何机器学习问题(包括图像分割,对象检测)时,你应该做的第一件事是评估质量并理解你的数据。 图像尺寸和纵横比(特别是处理极值时)。标签组合不平衡,边框大小,长宽比(例如许多小对象)。准备的数据不适合你的数据集。建模方法与数据不一致。如果您训练的自定义数据集与经典基准数据集(例如COCO)有显着差异,那么这些将特别重要。 在下一章中,我将向您展示如何发现我提到的问题以及如何解决它们。 General Data Quality This one is simple and rather obvious, also this step would be the same for all image problems not just object detection or image segmentation. What we need to do here is: get the general feel of a dataset and inspect it visually. make sure it’s not corrupt and does not contain any obvious artifacts (for instance black only images) make sure that all the files are readable – you don’t want to find that out in the middle of your training. My tip here is to visualize as many pictures as possible. There are multiple ways of doing this. Depending on the size of the datasets some might be more suitable than the others. Plot them in a jupyter notebook using matplotlib.Use dedicated tooling like google facets to explore image data (https://pair-code.github.io/facets/) Use HTML rendering to visualize and explore in a notebook. I’m a huge fan of the last option, it works great in jupyter notebooks (even for thousands of pictures at the same time!) Try doing that with matplotlib. There is even more: you can install a hover-zoom extension that will allow you to zoom in into individual pictures to inspect them in high-resolution. 通用数据质量这一步很简单,也很明显,而且这一步对所有的图像问题都是一样的,不仅仅是目标检测或图像分割。我们要做的是: 获得数据集的总体感觉并进行检查。确保它没有损坏并且不包含任何明显的伪像(例如仅黑色图像)。确保所有的文件都是可读的,你可不想在训练过程中发现这个问题。我的建议是要尽可能多地可视化图片。 有多种方法可以做到这一点。 根据数据集的大小,某些数据可能比其他数据更合适。 使用matplotlib将它们绘制在jupyter笔记本中。使用诸如Google facets之类的专用工具来浏览图像数据(https://pair-code.github.io/facets/) 使用HTML渲染在笔记本中可视化和探索 我是最后一个选择的忠实粉丝,它在jupyter note电脑上非常有效(甚至可以同时处理数千张图片!)请尝试使用matplotlib进行操作。 还有更多功能:您可以安装一个悬停缩放扩展程序,以便将其放大到单个图片中以高分辨率对其进行检查。 Image sizes and aspect Ratios In the real world, datasets are unlikely to contain images of the same sizes and aspect ratios. Inspecting basic datasets statistics such as aspect ratios, image widths and heights will help you make important decisions: Can you and should you? do destructive resizing ? (destructive means resizing that changes the AR) For non-destructive resizing what should be your desired output resolution and amount of padding? Deep Learning models might have hyper parameters you have to tune depending on the above (for instance anchor size and ratios) or they might even have strong requirements when it comes to minimum input image size. Good resources about anchors. A special case would be if your dataset consists of images that are really big (4K+), which is not that unusual in satellite imagery or some medical modalities. For most cutting edge models in 2020, you will not be able to fit even a single 4K image per (server grade) GPU due to memory constraints. In that case, you need to figure out what realistically will be useful for your DL algorithms. Two approaches that I saw are: Training your model on image patches (randomly selected during training or extracted before training) resizing the entire dataset to avoid doing this every time you load your data. 图像大小和长宽比在现实世界中,数据集不太可能包含相同大小和高宽比的图像。检查基本的数据集统计信息,如纵横比、图像宽度和高度,将帮助您做出重要的决定: 你能和应该吗?破坏性的大小调整?(破坏性意味着改变AR的大小)对于非破坏性的大小调整应该是你想要的输出分辨率和填充量深度学习模型可能会有一些你必须根据上面的参数来调整的超参数(例如锚的大小和比率),或者当涉及到最小输入图像大小时,它们甚至可能有很强的要求。关于锚点的资源 一种特殊情况是,如果您的数据集包含非常大的图像(4K +),这在卫星图像或某些医疗模式中并不罕见。 对于2020年的大多数尖端型号,由于内存限制,每个(服务器级)GPU甚至无法容纳单个4K图像。 在这种情况下,您需要弄清楚什么对你的深度学习算法有用。 我使用的两个方法: 在图像补丁上训练你的模型(在训练中随机选择或在训练前提取)调整整个数据集的大小,以避免每次加载数据时都这样做 In general I would expect most datasets to fall into one of 3 categories. Uniformly distributed where most of the images have the same dimensions – here the only decision you will have to make is how much to resize (if at all) This will mainly depend on objects area, size and aspect ratios) Slightly bimodal distribution but most of the images are in the aspect ratio range of (0.7 … 1.5) similar to the COCO dataset. I believe other “natural-looking” datasets would follow a similar distribution – for those type of datasets you should be fine by going with a non-destructive resize -> Pad approach. Padding will be necessary but to a degree that is manageable and will not blow the size of the dataset too much Dataset with a lot of extreme values (very wide images mixed with very narrow ones) – this case is much more tricky and there are more advanced techniques to avoid excessive padding. You might consider sampling batches of images based on the aspect ratio. Remember that this can introduce a bias to your sampling process – so make sure its acceptable or not strong enough. The mmdetection framework supports this out of the box by implementing a GroupSampler that samples based on AR’s 均匀分布在大多数图像具有相同尺寸的地方–在这里,您唯一要做的决定就是调整大小(如果有的话),这主要取决于对象的面积,大小和纵横比)。 有点双峰分布,但大多数图像的纵横比在(0.7…1.5)范围内,与COCO数据集相似。 我相信其他``看起来自然’’的数据集也会遵循类似的分布-对于那些类型的数据集,您应该使用无损调整大小->填充方法来解决问题。 填充是必需的,但在一定程度上是可以控制的,不会过多地破坏数据集的大小。 具有很多极值的数据集(非常宽的图像和非常狭窄的图像混合在一起)–这种情况更加棘手,并且有更先进的技术来避免过度填充。 您可能考虑根据宽高比对图像批次进行采样。 请记住,这可能会给您的采样过程带来偏差-因此请确保其可接受程度或不足。mmdetection框架通过实现一个GroupSampler来支持此功能,该GroupSampler基于AR的采样。 Label (objects) sizes and dimensions Here we start looking at our targets (labels). Particularly we are interested in knowing how the sizes and aspect ratios are distributed. Why is this important? Depending on your modelling approach most of the frameworks will have design limitations. As I mentioned earlier, those models are designed to perform well on benchmark datasets. If for whatever reason your data is different, training them might be impossible. Let’s have a look at a default config for Retinanet from detectron2: 标签(对象)的大小和尺寸 在这里,我们开始查看目标(标签),我们特别想知道尺寸和纵横比的分布方式。 为什么这很重要? 根据您的建模方法,大多数框架都会有设计限制。 如前所述,这些模型旨在在基准数据集上表现良好。 如果出于任何原因您的数据不同,则可能无法对其进行训练。让我们看一下detectron2中Retinanet的默认配置: ANCHOR_GENERATOR: SIZES: !!python/object/apply:eval ["[[x, x * 2**(1.0/3), x * 2**(2.0/3) ] for x in [32, 64, 128, 256, 512 ]]"] What you can see there is, that for different feature maps the anchors we generate will have a certain size range: for instance, if your dataset contains only really big objects – it might be possible to simplify the model a lot, on the other side let’s assume you have small images with small objects (for instance 10x10px) given this config it can happen you will not be able to train the model. The most important things to consider when it comes to box or mask dimensions are: Aspect ratios Size (Area) 你可以看到的是,对于不同的特征地图,我们生成的锚将有一定的大小范围: 例如,如果您的数据集只包含真正大的对象,那么就有可能大大简化模型。另一方面,让我们假设你有小的图像和小的对象(例如10x10px),在这种配置下,你可能无法训练模型。关于盒子或遮罩尺寸,要考虑的最重要的事情是: 纵横比区域大小 The tail of this distribution (fig. 3) is quite long. There will be instances with extreme aspect ratios. Depending on the use case and dataset it might be fine to ignore it or not, this should be further inspected. 这种分布的尾巴很长(图3)。 某些情况下会出现极高的宽高比。 根据用例和数据集,是否忽略它可能会很好,应对此进行进一步检查。 This is especially true for anchor-based models (most of object detection / image segmentation models) where there is a step of matching ground truth labels with predefined anchor boxes (aka. Prior boxes). Remember that you control how those prior boxes are generated with hyperparameters like the number of boxes, their aspect ratio, and size. Not surprisingly you need to make sure those settings are aligned with your dataset distributions and expectations. 对于基于锚点的模型(大多数目标检测/图像分割模型)尤其如此,其中有一个步骤将真实标签与预定义的锚点框(也称为先验框)匹配。 请记住,您可以使用框的数量、它们的高宽比和大小等超参数来控制这些框的生成方式。毫不奇怪,您需要确保这些设置与您的数据集分布和期望保持一致。 An important thing to keep in mind is that labels will be transformed together with the image. So if you are making an image smaller during a preprocessing step the absolute size of the ROI’s will also shrink. If you feel that object size might be an issue in your problem and you don’t want to enlarge the images too much (for instance to keep desired performance or memory footprint) you can try to solve it with a Crop -> Resize approach. Keep in mind that this can be quite tricky (you need to handle cases what happens if you cut through a bounding box or segmentation mask) Big objects on the other hand are usually not problematic from a modelling perspective (although you still have to make sure that will be matched with anchors). The problem with them is more indirect, essentially the more big objects a class has the more likely it is that it will be underrepresented in the dataset. Most of the time the average area of objects in a given class will be inversely proportional to the (label) count. 需要记住的一件重要的事情是,标签将与图像一起转换。因此,如果你在预处理过程中缩小图像,ROI的绝对尺寸也会缩小。 如果您觉得问题可能是对象大小的问题,并且不想太大放大图像(例如保持所需的性能或内存占用),则可以尝试使用Crop-> Resize方法解决它。 请记住,这可能非常棘手(您需要处理穿过边界框或分割蒙版时发生的情况)。 另一方面,从建模的角度来看,大对象通常没有问题(尽管您仍然必须确保将其与锚点匹配)。 它们的问题更加间接,本质上说,一类具有的大对象越多,它在数据集中的代表性就越低。 在大多数情况下,给定类中对象的平均面积将与(标签)计数成反比。 Partially labeled data When creating and labeling an image detection dataset missing annotations are potentially a huge issue. The worst scenario is when you have false negatives already in your ground truth. So essentially you did not annotate objects even though they are present in the dataset. In most of the modeling approaches, everything that was not labeled or did not match with an anchor is considered background. This means that it will generate conflicting signals that will hurt the learning process a LOT. This is also a reason why you can’t really mix datasets with non-overlapping classes and train one model (there are some way to mix datasets though – for instance by soft labeling one dataset with a model trained on another one) 部分标记的数据 在创建和标记图像检测数据集时,缺少注释是一个潜在的大问题。最糟糕的情况是,在真实标记框中标记成负类样本。所以即使它们在数据集中出现,本质上你并没有注释对象。 在大多数建模方法中,没有标记或与锚不匹配的所有东西都被认为是背景。这意味着它会产生相互矛盾的信号,这会对学习过程造成很大的伤害。 这也是为什么您不能真正地将数据集与不重叠的类混合并训练一个模型的原因(尽管有一些方法可以混合数据集,例如,用一个训练有素的模型对一个数据集进行软标记) Fig 8. Shows the problem of mixing datasets – notice for example that on the right image a person is not labeled. One way to solve this problem is to soft label the dataset with a model trained on the other one. Source 图8所示。显示混合数据集的问题,注意例如,在右边的图像上一个人没有标签。解决这个问题的一种方法是用对另一个数据集进行训练的模型对数据集进行软标记。源图片 Imbalances Class imbalances can be a bit of a problem when it comes to object detection. Normally in image classification for example, one can easily oversample or downsample the dataset and control each class contribution to the loss. 不平衡问题 当涉及到目标检测时,类不平衡可能是一个问题。通常在图像分类中,例如,人们可以很容易地对数据集进行过采样或向下采样,并控制每个类对损失的贡献。 Fig 9. Object counts per class You can imagine this is more challenging when you have co-occurring classes object detection dataset since you can’t really drop some of the labels (because you would send mixed signals as to what the background is). In that case you end up having the same problem as shown in the partially labeled data paragraph. Once you start resampling on an image level you have to be aware of the fact that multiple classes will be upsampled at the same time. Note: You may want to try other solutions like: Adding weights to the loss (making the contributions of some boxes or pixels higher) Preprocessing your data differently: for example you could do some custom cropping that rebalances the dataset on the object level 图9所示 每个类的对象数量 您可以想象,在同时存在类对象检测数据集的情况下,这将更具挑战性,因为您无法真正删除某些标签(因为您会发送关于背景是什么的混合信号)。 在这种情况下,您最终会遇到与部分标记的数据段落中所示相同的问题。 一旦开始在图像级别上重新采样,就必须意识到多个类将同时被上采样的事实。 注意: 你可能试过这样的方法: 增加损失的权重(使某些方框或像素的贡献更高)。以不同的方式预处理数据:例如,您可以进行一些自定义裁剪,以在对象级别重新平衡数据集。 Understanding Augmentation and Preprocessing Sequences Preprocessing and data augmentation is an integral part of any computer vision system. If you do it well you can gain a lot but if you screw up it can really cost you. Data augmentation is by far the most important and widely used regularization technique (in image segmentation / object detection ). Applying it to object detection and segmentation problems is more challenging than in simple image classification because some transformations (like rotation, or crop) need to be applied not only to the source image but also to the target (masks or bounding boxes). Common transformations that require a target transform include: Affine transformations, Cropping, Distortions, Scaling, Rotations and many more. It is crucial to do data exploration on batches of augmented images and targets to avoid costly mistakes (dropping bounding boxes, etc). Note: Basic augmentations are a part of deep learning frameworks like PyTorch or Tensorflow but if you need more advanced functionalities you need to use one of the augmentation libraries available in the python ecosystem. My recommendations are: Albumentations (I’ll use it in this post) Imgaug Augmentor 理解增强和预处理序列 预处理和数据增强是任何计算机视觉系统的组成部分。 如果做得好,你会收获很多,但是如果搞砸了,那确实会花钱。 迄今为止,数据增强是最重要且使用最广泛的正则化技术(在图像分割/目标检测中)。 将其应用于目标检测和分割问题比简单的图像分类更具挑战性,因为一些转换(如旋转或裁剪)不仅需要应用于源图像,还需要应用于目标图像(遮罩或边框)。需要目标转换的常见转换包括: 仿射变换裁剪扭曲尺度变化旋转等等对成批的增强图像和目标进行数据探索,以避免代价高昂的错误(丢弃边界框等)至关重要。 注意: 基本增强是PyTorch或Tensorflow等深度学习框架的一部分,但如果您需要更高级的功能,则需要使用python生态系统中可用的扩充库之一。 我的建议是: Albumentations (本文中我使用的是这个) Imgaug Augmentor The Minimal Preprocessing Setup Whenever I’m building a new system I want to keep it very basic on the preprocessing and augmentation level to minimize the risk of introducing bugs early on. Basic principles I would recommend you to follow is: Disable augmentation Avoid destructive resizing Always inspect the outputs visually Let’s continue our COOC example. From the previous steps we know that:the majority of our images have: aspect ratios = width / height = 1.5 the average avg_width is = 600 and avg_height = 500. Setting the averages as our basic preprocessing resize values seems to be a reasonable thing to do ( unless there is a strong requirement on the model side to have bigger pictures ) for instance a resnet50 backbone model has a minimum size requirement of 32×32 (this is related to the number of downsampling layers) In Albumentations the basic setup implementation will look something like this: LongestMaxSize(avg_height) – this will rescale the image based on the longest side preserving the aspect ratio PadIfNeeded(avg_height, avg_width, border_mode=’FILL’, value=0) 最小的预处理设置 每当我构建新系统时,我都希望在预处理和扩充级别上保持非常基础,以最大程度地降低早期引入错误的风险。 我建议您遵循的基本原则是: 禁用增强避免破坏性的调整总是可视化地检查输出让我们继续COOC的例子。从前面的步骤我们知道:我们的大多数图像: 宽高比=宽度/高度= 1.5平均宽度= 600,avg高度= 500将平均值设置为我们的基本预处理调整大小值似乎是一件合理的事情(除非模型方面有很强的要求才能拥有更大的图片),例如resnet50主干模型的最小大小要求为32×32(这是 与下采样层数有关) 在Albumentations中,基本设置实现将如下所示: 这将根据保持长宽比的最长边来缩放图像PadIfNeeded(avg height, avg width, border mode= FILL , value=0) Fig 10 and 11. MaxSize->Pad output for two pictures with drastically different aspect ratios As you can see on figure 10 and 11 the preprocessing results in an image of 500×600 with reasonable 0-padding for both pictures. When you use padding there are many options in which you can fill the empty space. In the basic setup I suggest that you go with default constant 0 value, When you experiment with more advanced methods like reflection padding always explore your augmentations visually. Remember that you are running the risk of introducing false negatives especially in object detection problems (reflecting an object without having a label for it) 图10和11 两张图片的纵横比截然不同的MaxSize-> Pad输出 正如您在图10和图11中看到的,预处理结果是500600的图像,这两幅图像都有合理的0填充。 当您使用填充时,有许多选项可以填充空白区域。在基本设置中,我建议您使用默认常量0值。 当您尝试使用诸如反射填充之类的更高级的方法时,请始终以可视方式探索您的增强。 请记住,您冒着引入假阴性的风险,尤其是在物体检测问题中(反射物体而没有标签)。 Fig 12. Notice how reflection-padding creates false negative errors in our annotations. The cat’s reflection (top of the picture) has no label! 图12.注意反射填充如何在我们的注释中产生假的负错误。 猫的倒影(图片顶部)没有标签! Augmentation – Rotations Rotations are powerful and useful augmentations but they should be used with caution. Have a look at fig 13. below which was generated using a Rotate(45)->Resize->Pad pipeline. 数据增强 —- 旋转 旋转是强大和有用的增强,但他们应该谨慎使用。请看图13。下面是使用一个Rotate(45)->调整大小->填充生成的。 Fig 13. Rotations can be harmful to your bounding box labels The problem is that if we use standard bounding boxes (without an angle parameter), covering a rotated object can be less efficient (box-area to object-area will increase). This happens during rotation augmentations and it can harm the data. Notice that we have also introduced false positive labels in the top left corner. This is because we crop-rotated the image. My recommendation is: You might want to give up on those if you have a lot of objects with aspect ratios far from one. Another thing you can consider is using 90,180, 270 degree non-cropping rotations (if they make sense) for your problem (they will not destroy any bounding boxes) 图13所示。旋转可能对你的边框标签有害 问题是,如果我们使用标准的边界框(不带角度参数),则覆盖旋转的对象的效率可能会降低(框区域到对象区域的面积会增加)。 这种情况在旋转增强期间发生,并且可能会损坏数据。 请注意,我们还在左上角引入了误报标签。 这是因为我们裁剪了图像。 我的建议是: 如果您有许多对象的宽高比远非一个,则可能要放弃这些对象。你可以考虑使用90,180,270度的非裁剪旋转(如果它们有意义的话)来解决你的问题(它们不会破坏任何边框)。 Augmentations – Key takeaways As you see, spatial transforms can be quite tricky and a lot of unexpected things can happen (especially for object detection problems). So if you decide to use those spatial augmentations make sure to do some data exploration and visually inspect your data. Note: Do you really need spatial augmentations? I believe that in many scenarios you will not need them and as usual keep things simpler and gradually add complexity. From my experience a good starting point (without spatial transforms) and for natural looking datasets (similar to coco) is the following pipeline: 数据增强 —- 关键点 正如您所看到的,空间转换非常棘手,可能会发生许多意想不到的事情(特别是对于目标检测问题)。 因此,如果您决定使用这些空间扩展,请确保进行一些数据探索并可视化地检查您的数据。 注意: 你真的需要增加空间吗?我相信在许多情况下,您将不需要它们,并像通常那样使事情更简单并逐渐增加复杂性。 根据我的经验,对于看起来自然的数据集(类似coco)来说,下面的通道是一个很好的起点(没有空间转换): transforms = [ LongestMaxSize(max_size=500), HorizontalFlip(p=0.5), PadIfNeeded(500, 600, border_mode=0, value=0), JpegCompression(quality_lower=70, quality_upper=100, p=1), RandomBrightnessContrast(0.3, 0.3), Cutout(max_h_size=32, max_w_size=32, p=1) ] Of course things like max_size or cutout sizes are arbitrary and have to be adjusted. 当然,像最大尺寸或切断尺寸是任意的,必须调整。 Fig 14. Augmentation results with cutout, jpeg compression and contrast/brightness adjustments Best Practice:One thing I did not mention yet that I feel is pretty important: Always load the whole dataset (together with your preprocessing and augmentation pipeline) . 图14.带有剪切,jpeg压缩和对比度/亮度调整的增强结果 最佳做法:我没有提到的一件事我觉得非常重要:始终加载整个数据集(连同您的预处理和扩充通道)。 %%timeit -n 1 -r 1for b in data_loader: pass Two lines of code that will save you a lot of time. First of all, you will understand what the overhead of the data loading is and if you see a clear performance bottleneck you might consider fixing it right away. More importantly, you will catch potential issues with: corrupted files, labels that can’t be transformed etc anything fishy that can interrupt training down the line. 两行代码将节省您很多时间。首先,您将了解数据加载的开销是什么,如果您看到一个明显的性能瓶颈,您可能会考虑立即修复它。更重要的是,您将捕获潜在的问题: 损坏的文件,无法转换的标签等任何会干扰训练的可疑内容。 Understanding Results Inspecting model results and performing error analysis can be a tricky process for those types of problems. Having one metric rarely tells you the whole story and if you do have one interpreting it can be a relatively hard task. Let’s have a look at the official coco challenge and how the evaluation process looks there (all the results i will be showing are for a MASK R-CNN model with a resnet50 backbone). 看懂结果 对于这些类型的问题,检查模型结果和执行错误分析可能是一个棘手的过程。只有一个度量标准很少能告诉你整个故事,如果你有一个度量标准来解释它,那将是一项相对困难的任务。 让我们看看官方的coco挑战和评估过程是如何进行的(我将展示的所有结果都是一个带有resnet50主干的MASK R-CNN模型)。 Fig 15. Coco evaluation output It returns the AP and AR for various groups of observations partitioned by IOU (Intersection over Union of predictions and ground truth) and Area. So even the official COCO evaluation is not just one metric and there is a good reason for it. Lets focus on the IoU=0.50:0.95 notation. What this means is the following: AP and AR is calculated as the average of precisions and recalls calculated for different IoU settings (from 0.5 to 0.95 with a 0.05 step). What we gain here is a more robust evaluation process, in such a case a model will score high if its pretty good at both (localizing and classifying) Of course, your problem and dataset might be different. Maybe you need an extremely accurate detector, in that case, choosing AP@0.90IoU might be a good idea. The downside (of the coco eval tool) is that by default all the values are averaged for all the classes and all images. This might be fine in a competition-like setup where we want to evaluate the models on all the classes but in real-life situations where you train models on custom datasets (often with fewer classes) you really want to know how your model performs on a per-class basis. Looking at per-class metrics is extremely valuable, as it might give you important insights: help you compose a new dataset better make better decisions when it comes to data augmentation, data sampling etc. 图15所示。COCO评价输出 它返回由IOU和区域划分的各种观测组的AP和AR。因此,即使是官方的COCO评估也不仅仅是一种衡量标准,这样做也是有充分理由的。 让我们关注IoU=0.50 - 0.95的符号。 这意味着以下内容:AP和AR是根据不同IoU设置(从0.5到0.95,步骤为0.05)计算出的精度和召回率的平均值。我们从中得到的是一个更稳健的评估过程,在这种情况下,如果一个模型在这两方面都做得很好(定位和分类),它就会得分很高。 当然,您的问题和数据集可能有所不同。 也许您需要一个非常精确的检测器,在这种情况下,选择AP@0.90IoU可能是一个好主意。 查看每个类的度量是非常有价值的,因为它可能会给您重要的见解。 帮助您更好地构建新的数据集在数据扩充、数据采样等方面做出更好的决策 Fig 16. Per class AP Figure 16. gives you a lot of useful informations there are few things you might consider: Add more data to low performing classes For classes that score well, maybe you can consider downsampling them to speed up the training and maybe help with the performance of other less frequent classes. Spot any obvious correlations for instance classes with small objects performing poorly. 图16所示每个类的AP 图16.为您提供了许多有用的信息,您可能需要考虑以下几点: 向性能较差的类添加更多数据对于那些得分较高的类别,或许你可以考虑对它们进行抽样,以加快训练速度,或许还可以帮助提高其他不那么频繁的类别的表现找出带有小对象性能差的实例类任何明显的关联 Visualizing results Ok, so if looking at single metrics is not enough what should you do? I would definitely suggest spending some time on manual results exploration, with the combination of hard metrics from the previous analysis – visualizations will help you get the big picture. Since exploring predictions of image detection and image segmentation models can get quite messy I would suggest you do it step by step. On the gif below I show how this can be done using the coco inspector tool. 可视化结果 那么,如果仅看一个指标还不够,该怎么办? 我绝对建议您花一些时间来进行手动结果探索,并结合之前的分析中的严格指标–可视化将帮助您全面了解。 由于探索图像检测和图像分割模型的预测会变得非常混乱,因此建议您逐步进行操作。 在下面的gif上,我显示了如何使用coco检查器工具完成此操作。 Fig 17. All the predictions and ground-truths visualized On the gif we can see how all the important information is visualized: Red masks – predictions Orange masks – overlap of predictions and ground truth masks Green masks – ground truth Dashed bounding boxes – false positives (predictions without a match) Orange boxes true positiveGreen boxes – ground truth 图17.可视化的所有预测和真实值 在gif中,我们可以看到所有重要的信息是如何可视化的: 红色遮罩—预测值橙色遮罩—预测和真实值的重叠绿色遮罩—真实值虚线边框假阳性(没有匹配的预测)橙色盒子正类绿色盒子-真实值 Understanding Results – per image scores By looking at the hard metrics and inspecting images visually we most likely have a pretty good idea of what’s going on. But looking at results of random images (or grouped by class) is likely not an optimal way of doing this. If you want to really dive in and spot edge cases of your model, I suggest calculating per image metrics (for instance AP or Recall). Below and example of an image I found by doing exactly that. 看懂结果—每个图像的值 通过观察硬指标和检查图像,我们很可能有一个很好的想法,发生了什么。但是观察随机图像(或按类分组)的结果可能不是这样做的最佳方式。如果您真的想深入了解模型的边缘情况,我建议计算每个图像的度量(例如AP或Recall)。 下面是我找到的一个图片的例子。 Fig 18. Image with a very low AP score In the example above (Fig 18.) we can see two false positive stop sign predictions – from that we can deduce that our model understands what a stop sign is but not what other traffic signs are. Perhaps we can add new classes to our dataset or use our “stop sign detector” to label other traffic signs and then create a new “traffic sign” label to overcome this problem. 图18所示。图像与非常低的AP分数 在上面的例子中(图18),我们可以看到两个假阳性停车标志的预测,由此我们可以推断,我们的模型了解什么是停车标志,但不了解其他交通标志。 也许我们可以在数据集中添加新的类,或者使用我们的停止标志检测器来标记其他交通标志,然后创建一个新的交通标志标签来克服这个问题。 Fig 19. Example of an image with a good score > 0.5 AP Sometimes we will also learn that our model is doing better that it would seem from the scores alone. That’s also useful information, for instance in the example above our model detected a keyboard on the laptop but this is actually not labeled in the original dataset. 图19.得分大于0.5 AP的图像示例 有时,我们还会发现我们的模型表现得更好,仅从分数来看就可以了。 这也是有用的信息,例如在我们的模型的上例中,检测到了笔记本电脑上的键盘,但实际上未在原始数据集中标记。 COCO format The way a coco dataset is organized can be a bit intimidating at first. It consists of a set of dictionaries mapping from one to another. It’s also intended to be used together with the pycocotools / cocotools library that builds a rather confusing API on top of the dataset metadata file. Nonetheless, the coco dataset (and the coco format) became a standard way of organizing object detection and image segmentation datasets. In COCO we follow the xywh convention for bounding box encodings or as I like to call it tlwh: (top-left-width-height) that way you can not confuse it with for instance cwh: (center-point, w, h). Mask labels (segmentations) are run-length encoded (RLE explanation). COCO格式 coco数据集的组织方式一开始可能有点吓人。 它由一组相互映射的字典组成。 它也打算与pycocotools / cocotools库一起使用,该库在数据集元数据文件的顶部构建了一个令人困惑的API。 然而,coco数据集(以及coco格式)成为组织目标检测和图像分割数据集的标准方法。 在COCO中,我们遵循xywh规则来进行边框编码,或者我喜欢称之为tlwh:(左上宽高),这样你就不会与cwh:(中心点,w, h)相混淆。掩码标签(分段)是行距编码(RLE解释)。 Fig 20. The coco dataset annotations format There are still very important advantages of having a widely adopted standard: Labeling tools and services export and import COCO-like datasets Evaluation and scoring code (used for the coco competition) is pretty well optimized and battle tested. Multiple open source datasets follow it. In the previous paragraph, I used the COCO eval functionality which is another benefit of following the COCO standard. To take advantage of that you need to format your predictions in the same way as your coco dataset is constructed- then calculating metrics is as simple as calling: COCOeval(gt_dataset, pred_dataset) 图20所示。coco数据集注释格式 拥有一个被广泛采用的标准仍然有非常重要的优势: 标签工具和服务导出和导入类似coco的数据集。评估和评分代码(用于可可比赛)经过了很好的优化和测试。随之而来的是多个开源数据集。 在前一段中,我使用了COCO eval功能,这是遵循COCO标准的另一个好处。要利用这一点,您需要按照构建您的coco数据集的相同方式格式化您的预测——那么计算指标就像调用:COCOeval(gt dataset, pred dataset)一样简单。 COCO dataset explorer In order to streamline the process of data and results exploration (especially for object detection) I wrote a tool that operates on COCO datasets. Essentially you provide it with the ground truth dataset and the predictions dataset (optionally) and it will do the rest for you: Calculate most of the metrics I presented in this post Easily visualize the datasets ground truths and predictions Inspect coco metrics, per class AP metrics Inspect per-image scores 浏览COCO数据集 为了简化数据和结果探索的过程(特别是对象检测),我编写了一个在COCO数据集上操作的工具。 实际上,你向它提供真实标记的数据集和预测的数据集(可选),它将为您完成其余的工作: 计算我在这篇文章中提出的大部分指标轻松可视化数据集的真实值和预测值检查coco度量,每个类AP度量检查每张图象分数 To use COCO dataset explorer tool you need to: Clone the project repository 要使用COCO数据集资源管理器工具,您需要:克隆项目存储库 git clone https://github.com/i008/COCO-dataset-explorer.git Download example data I used for the examples or use your own data in the COCO format: Example COCO format dataset with predictions. If you downloaded the example data you will need to extract it. 下载我用于示例的示例数据,或者使用您自己的COCO格式的数据示例COCO格式化数据集与预测。 如果下载了示例数据,则需要提取它。 tar -xvf coco_data.tar Download example data I used for the examples or use your own data in the COCO format: Example COCO format dataset with predictions. If you downloaded the example data you will need to extract it. tar -xvf coco_data.tar You should have the following directory structure: COCO-dataset-explorer |coco_data |images |000000000139.jpg |000000000285.jpg |000000000632.jpg |... |ground_truth_annotations.json |predictions.json |coco_explorer.py |Dockerfile |environment.yml |... 下载我用于示例的示例数据,或者使用您自己的COCO格式的数据您应该具有以下目录结构 Set up the environment with all the dependencies 设置带有所有依赖项的环境 conda env update; conda activate cocoexplorer Run streamlit app specifying a file with ground truth and predictions in the COCO format and the image directory: streamlit run coco_explorer.py -- \ --coco_train coco_data/ground_truth_annotations.json \ --coco_predictions coco_data/predictions.json \ --images_path coco_data/images/ 运行streamlit应用程序,指定一个文件的真实值和预测在COCO格式和图像目录 Note: You can also run this with docker: 注意:你也可以用docker来运行它 sudo docker run -p 8501:8501 -it -v "$(pwd)"/coco_data:/coco_data i008/coco_explorer \ streamlit run coco_explorer.py -- \ --coco_train /coco_data/ground_truth_annotations.json \ --coco_predictions /coco_data/predictions.json \ --images_path /coco_data/images/ explore the dataset in the browser. By default, it will run on http://localhost:8501/ 在浏览器中浏览数据集。 默认情况下,它将在http:// localhost:8501 /上运行 Final words I hope that with this post I convinced you that data exploration in object detection and image segmentation is as important as in any other branch of machine learning. I’m confident that the effort we make at this stage of the project pays off in the long run. The knowledge we gather allows us to make better-informed modeling decisions, avoid multiple training pitfalls and gives you more confidence in the training process, and the predictions your model produces. This article was originally written by Jakub Cieślik and posted on the Neptune blog. You can find more in-depth articles for machine learning practitioners there. 结语 我希望通过这篇文章让你相信,在目标检测和图像分割中的数据探索与机器学习的其他分支一样重要。 我相信我们在这个项目现阶段所做的努力最终会得到回报的。 我们收集的知识使我们能够做出更明智的建模决策,避免多个训练陷阱,并使您对训练过程和模型产生的预测更有信心。 本文最初由JakubCieślik撰写并发布在Neptune博客上。 您可以在此处找到针对机器学习从业人员的更多深入文章。

0

图像分类

克里斯蒂安·刘能·2020-09-17 16:52 0 阅读 99
文章Pytorch实现基于LSTM的单词检测器
FlyAI文献翻译英文原文:LSTM Based Word Detectors 标签:语义识别 This article aims to provide the basics of LSTMs (Long Short Term Memory) and implements a word detector using the architecture. The detector implemented in this article is a cuss word detector that detects a custom set of cuss words. What are LSTMs ??? LSTMs or Long Short term memory cells are a long term memory units that were designed to solve the vanishing gradient problem with the RNNs. Normally the memory in the RNNs is short lived. We cannot store data 8 - 9 time steps behind using an RNN. To store data for longer periods like 1000 time steps we use a LSTM. 本文旨在提供LSTM(Long Short Term Memory)的基础知识,并利用该架构实现了一个单词检测器。 本文实现的检测器是一个脏话检测器,可以检测一组自定义的脏话。 什么是LSTMs ? LSTMs或长短期记忆单元是一种长期记忆单元,是为了解决RNNs的消失梯度问题而设计的。通常RNNs中的内存是短时的。我们不能使用RNN存储后面8 - 9个时间步的数据。为了存储较长时期的数据,如1000个时间步,我们使用LSTM。 LSTM History 1997: LSTM was proposed by Sepp Hochreiter and Jürgen Schmidhuber.[1] By introducing Constant Error Carousel (CEC) units, LSTM deals with the vanishing gradient problem. The initial version of LSTM block included cells, input and output gates.[5] 1999: Felix Gers and his advisor Jürgen Schmidhuber and Fred Cummins introduced the forget gate (also called “keep gate”) into LSTM architecture,[6] enabling the LSTM to reset its own state.[5] 2000: Gers & Schmidhuber & Cummins added peephole connections (connections from the cell to the gates) into the architecture.[7] Additionally, the output activation function was omitted.[5] 2009: An LSTM based model won the ICDAR connected handwriting recognition competition. Three such models were submitted by a team lead by Alex Graves.[8] One was the most accurate model in the competition and another was the fastest.[9] 2013: LSTM networks were a major component of a network that achieved a record 17.7% phoneme error rate on the classic TIMIT natural speech dataset.[10] 2014: Kyunghyun Cho et al. put forward a simplified variant called Gated recurrent unit (GRU).[11] 2015: Google started using an LSTM for speech recognition on Google Voice.[12][13] According to the official blog post, the new model cut transcription errors by 49%. [14] LSTM历史 1997: LSTM是由Sepp Hochreiter和Jürgen Schmidhuber提出的。 [1] 通过引入恒定误差转盘(CEC)单元,LSTM处理了消失梯度问题。LSTM块的初始版本包括单元、输入和输出门。 [5] 1999: Felix Gers及其顾问Jürgen Schmidhuber和Fred Cummins在LSTM架构中引入了遗忘门(也称为 “保持门”),[6] 使LSTM能够重置自己的状态。 [5] 2000: Gers & Schmidhuber & Cummins在结构中增加了窥视孔连接(从电池到门的连接)[7],此外,还省略了输出激活功能[5]。 2009: 一个基于LSTM的模型在ICDAR连接手写识别竞赛中获胜。由Alex Graves 领导的团队提交了三个这样的模型[8],其中一个是比赛中最准确的模型,另一个是最快的模型。[9] 2013: LSTM网络是一个网络的主要组成部分,该网络在经典的TIMIT自然语音数据集上实现了17.7%的音素错误率的记录。[10] 2014: Kyunghyun Cho等人提出了一种简化的变体,称为Gated recurrent unit(GRU)。[11] 2015: Google开始在Google Voice上使用LSTM进行语音识别[12][13],根据官方博客文章,新模型将转录错误减少了49%。[14] LSTM Architecture But all of the above diagram is complex math. To simplify all of it we can view their functions i.e. what all that math represents. So, simplifying it we can represent it as 但是上面的图都是复杂的数学。为了简化所有的内容,我们可以查看它们的函数,即所有的数学都代表什么。所以,简化后我们可以将其表示为: In the article we are now going to use some abbreviations. LTM : Long term memory STM : Short term memory NLTM : New long term memory NSTM : New short term memory 在文章中,我们现在要使用一些缩写。 LTM:长期记忆 STM:短期记忆 NLTM:新长期记忆 NSTM:新的短期记忆 Working The data from the LTM is pushed into the forget gate which remembers only certain features. Then this data is pushed into the use and remember gate. Now data from the STM and the event is pushed into the learn gate This data is again pushed into remember and use gates. The combined data in the remember gate from the learn gate and forget gate is the NLTM The data in the use gate which is a combination of data from forget and learn gate is the NSTM. In case you wish to get into the core mathematics behind the LSTM make sure you check out this beautiful article. Link : https://colah.github.io/posts/2015-08-Understanding-LSTMs/ 工作方法 LTM的数据被推送到遗忘门中,遗忘门只记住某些功能。 然后将这些数据推送到使用和记忆门中。 现在,来自STM的数据和事件被推送到学习门中。 这些数据又被推送到记住和使用门中。 学习门和遗忘门在记忆门中的综合数据就是NLTM 使用门中的数据是由遗忘门和学习门的数据组合而成的,这就是NSTM。 如果你想了解LSTM背后的核心数学,一定要看看这篇美文。 链接:https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Our Model Architecture LSTM Requirements In the case of an LSTM, for each piece of data in a sequence (say, for a word in a given sentence), there is a corresponding hidden state htht.This hidden state is a function of the pieces of data that an LSTM has seen over time; it contains some weights and, represents both the short term and long term memory components for the data that the LSTM has already seen. So, for an LSTM that is looking at words in a sentence, the hidden state of the LSTM will change based on each new word it sees. And, we can use the hidden state to predict the next word in a sequence or help identify the type of word in a language model, and lots of other things! To create an LSTM in PyTorch we use nn.LSTM(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers) input_dim = the number of inputs (a dimension of 20 could represent 20 inputs) hidden_dim = the size of the hidden state; this will be the number of outputs that each LSTM cell produces at each time step. n_layers = the number of hidden LSTM layers to use; this is typically a value between 1 and 3; a value of 1 means that each LSTM cell has one hidden state. This has a default value of 1. 我们的模型结构 LSTM要求 在LSTM的情况下,对于序列中的每一个数据(比如说,对于给定句子中的一个词),都有一个相应的隐藏状态htht。这个隐藏状态是LSTM在一段时间内所见过的数据片段的函数,它包含一些权重,并且,代表了LSTM已经见过的数据的短期和长期记忆成分。 所以,对于一个正在观察句子中的单词的LSTM来说,LSTM的隐藏状态会根据它看到的每一个新单词而改变。而且,我们可以使用隐藏状态来预测序列中的下一个词,或者帮助识别语言模型中的词的类型,以及其他很多事情 要在 PyTorch 中创建一个 LSTM,我们使用了 nn.LSTM(input_size=input_dim, hidden_size=hidden_dim, num_layers=n_layers) input_dim = 输入的数量(20个维度可以代表20个输入)。 hidden_dim = 隐藏状态的大小;这将是每个LSTM单元在每个时间步产生的输出数量。 n_layers = 要使用的隐藏LSTM层数;这通常是一个介于1和3之间的值;值为1意味着每个LSTM单元有一个隐藏状态。该值的默认值为1。n_layers = 要使用的隐藏LSTM层数;这通常是一个介于1和3之间的值;值为1意味着每个LSTM单元有一个隐藏状态。该值的默认值为1。 Hidden State Once an LSTM has been defined with input and hidden dimensions, we can call it and retrieve the output and hidden state at every time step. out, hidden = lstm(input.view(1, 1, -1), (h0, c0)) The inputs to an LSTM are (input, (h0, c0)). input = a Tensor containing the values in an input sequence; this has values: (seq_len, batch, input_size) h0 = a Tensor containing the initial hidden state for each element in a batch c0 = a Tensor containing the initial cell memory for each element in the batch h0 nd c0 will default to 0, if they are not specified. Their dimensions are: (n_layers, batch, hidden_dim). We know that an LSTM takes in an expected input size and hidden_dim, but sentences are rarely of a consistent size, so how can we define the input of our LSTM? Well, at the very start of this net, we’ll create an Embedding layer that takes in the size of our vocabulary and returns a vector of a specified size, embedding_dim, for each word in an input sequence of words. It’s important that this be the first layer in this net. You can read more about this embedding layer in the PyTorch documentation. Pictured below is the expected architecture for this tagger model. 隐藏状态 一旦定义了输入和隐藏维度的LSTM,我们就可以在每一个时间步调中调用它,并检索输出和隐藏状态。 LSTM的输入是(input,(h0,c0))。 input = 一个包含输入序列中的值的Tensor,这个有值。(seq_len, batch, input_size) h0=一个张量,包含批中每个元素的初始隐藏状态。 c0 = 一个张量,包含该批次中每个元素的初始单元记忆。 如果没有指定,h0和c0将默认为0。它们的维度是:(n_layers, batch, hidden_dim)。(n_layers, batch, hidden_dim). 我们知道,一个LSTM会接收一个预期的输入大小和hidden_dim,但是句子的大小很少是一致的,那么我们如何定义我们LSTM的输入呢? 好吧,在这个网的最开始,我们将创建一个Embedding层,它将接收我们词汇的大小,并为输入词序列中的每个词返回一个指定大小的向量,embedding_dim。这是很重要的,它是这个网中的第一层。您可以在 PyTorch 文档中阅读更多关于这个嵌入层的内容。 下图是这个标签器模型的预期架构。 代码 import torch import torch.nn as nn import torch.nn.functional as Fimport torch.optim as optim import matplotlib.pyplot as ply import numpy as np data = [("What the fuck".lower().split() , ["O","O","CS"]), ("The boy asked him to fuckoff".lower().split() ,["O","O","O","O","O","CS"]), ("I hate that bastard".lower().split() , ["O","O","O","CS"]), ("He is a dicked".lower().split(),["O","O","O","CS"]), ("Hey prick".lower().split(),["O","CS"]), ("What a pussy you are".lower().split() , ["O","O","CS","O","O"]), ("Dont be a cock".lower().split(),["O","O","O","CS"])] word2idx = {}for sent , tag in data: for word in sent: if word not in word2idx: word2idx[word] = len(word2idx) tag2idx = {"O" : 0 , "CS" : 1} tag2rev = {0 : "O" , 1 : "CS"} def prepare_sequence(seq , to_idx): idxs = [to_idx[word] for word in seq] idxs = np.array(idxs) return torch.tensor(idxs) testsent = "fuckoff boy".lower().split()inp = prepare_sequence(testsent , word2idx)print("The test sentence {} is tranlated to {}\r\n".format(testsent , inp))class LSTMTagger(nn.Module): def __init__(self,embedding_dim,hidden_dim,vocab_size,tagset_size): super(LSTMTagger , self).__init__() self.hidden_dim = hidden_dim self.word_embedding = nn.Embedding(vocab_size , embedding_dim= embedding_dim) self.lstm = nn.LSTM(input_size= embedding_dim , hidden_size = hidden_dim) self.hidden2tag = nn.Linear(hidden_dim , tagset_size) self.hidden = self.init_hidden() def init_hidden(self): return (torch.randn(1 , 1 , self.hidden_dim), torch.randn(1 , 1 , self.hidden_dim)) def forward(self , sentence): embeds = self.word_embedding(sentence) lstm_out , hidden_out = self.lstm(embeds.view(len(sentence) , 1 , -1) , self.hidden) tag_outputs = self.hidden2tag(lstm_out.view(len(sentence) , -1)) tag_scores = F.log_softmax(tag_outputs , dim = 1) return tag_scores EMBEDDING_DIM = 6 HIDDEN_DIM = 6 model = LSTMTagger(EMBEDDING_DIM , HIDDEN_DIM , len(word2idx) , len(tag2idx)) loss_function = nn.NLLLoss() optimizer = optim.SGD(model.parameters() , lr = 0.1) n_epochs = 300for epoch in range(n_epochs): epoch_loss = 0.0 for sent , tags in data: model.zero_grad() input_sent = prepare_sequence(sent , word2idx) tag = prepare_sequence(tags , tag2idx) model.hidden = model.init_hidden() output = model(input_sent) loss = loss_function(output , tag) epoch_loss += loss.item() loss.backward() optimizer.step() if epoch % 20 == 19: print("Epoch : {} , loss : {}".format(epoch , epoch_loss / len(data))) testsent = "You ".lower().split()inp = prepare_sequence(testsent , word2idx)print("Input sent : {}".format(testsent)) tags = model(inp) _,pred_tags = torch.max(tags , 1)print("Pred tag : {}".format(pred_tags)) pred = np.array(pred_tags)for i in range(len(testsent)): print("Word : {} , Predicted tag : {}".format(testsent[i] , tag2rev[pred[i]])) For more well documented code kindly check this GitHub repository which contains detailed instructions. Link : https://github.com/srimanthtenneti/Cuss-Word-Detector---LSTM Conclusion This is how we use LSTMs to make a word detector. 如果想了解更多文档化的代码,请查看这个GitHub仓库,其中包含了详细的说明。 链接:https://github.com/srimanthtenneti/Cuss-Word-Detector---LSTM 结论 这就是我们使用LSTMs来制作单词检测器的方法。

0

PyTorch

AI小助手·2020-09-16 14:16 0 阅读 96
没有更多了