관리-도구
편집 파일: _tokenizer.cpython-310.pyc
o ƚRe�, � @ s� d dl mZmZmZ d dlmZ d dlmZm Z d dl mZ ddlm Z ddlmZ ddlmZmZ dd lmZmZmZ dd lmZmZ ddlmZ ddlmZ dd lmZ ee�ZedkraeZne ZG dd� de�ZdS )� )�absolute_import�division�unicode_literals)�unichr)�deque�OrderedDict)�version_info� )�spaceCharacters)�entities)�asciiLetters�asciiUpper2Lower)�digits� hexDigits�EOF)� tokenTypes� tagTokenTypes)�replacementCharacters)�HTMLInputStream)�Trie)� � c sd e Zd ZdZd�� fdd� Zdd� Zdd� Zd�d d�Zdd � Zdd� Z dd� Z dd� Zdd� Zdd� Z dd� Zdd� Zdd� Zdd� Zd d!� Zd"d#� Zd$d%� Zd&d'� Zd(d)� Zd*d+� Zd,d-� Zd.d/� Zd0d1� Zd2d3� Zd4d5� Zd6d7� Zd8d9� Zd:d;� Zd<d=� Z d>d?� Z!d@dA� Z"dBdC� Z#dDdE� Z$dFdG� Z%dHdI� Z&dJdK� Z'dLdM� Z(dNdO� Z)dPdQ� Z*dRdS� Z+dTdU� Z,dVdW� Z-dXdY� Z.dZd[� Z/d\d]� Z0d^d_� Z1d`da� Z2dbdc� Z3ddde� Z4dfdg� Z5dhdi� Z6djdk� Z7dldm� Z8dndo� Z9dpdq� Z:drds� Z;dtdu� Z<dvdw� Z=dxdy� Z>dzd{� Z?d|d}� Z@d~d� ZAd�d�� ZBd�d�� ZCd�d�� ZDd�d�� ZEd�d�� ZFd�d�� ZGd�d�� ZHd�d�� ZId�d�� ZJd�d�� ZKd�d�� ZL� ZMS )�� HTMLTokenizera This class takes care of tokenizing HTML. * self.currentToken Holds the token that is currently being processed. * self.state Holds a reference to the method to be invoked... XXX * self.stream Points to HTMLInputStream object. Nc sJ t |fi |��| _|| _d| _g | _| j| _d| _d | _t t | ��� d S �NF)r �stream�parser� escapeFlag� lastFourChars� dataState�state�escape�currentToken�superr �__init__)�selfr r �kwargs�� __class__� ��/builddir/build/BUILDROOT/alt-python310-pip-21.3.1-3.el8.x86_64/opt/alt/python310/lib/python3.10/site-packages/pip/_vendor/html5lib/_tokenizer.pyr# ( s zHTMLTokenizer.__init__c c sf � t g �| _| �� r1| jjrtd | jj�d�d�V | jjs| jr+| j�� V | js"| �� s dS dS )z� This is where the magic happens. We do our usually processing through the states and when we have a token to return we yield the token which pauses processing until the next token is requested. � ParseErrorr ��type�dataN)r � tokenQueuer r �errorsr �pop�popleft�r$ r( r( r) �__iter__7 s � ���zHTMLTokenizer.__iter__c C s� t }d}|r t}d}g }| j�� }||v r+|tur+|�|� | j�� }||v r+|tustd�|�|�}|tv rJt| }| j �t d dd|id�� n�d| krTd ksYn |d krjd}| j �t d dd|id�� nfd| krtd ks�n d| krdks�n d| kr�dks�n d| kr�dks�n |tg d��v r�| j �t d dd|id�� zt|�}W n t y� |d }td|d? B �td|d@ B � }Y nw |dkr�| j �t d dd�� | j�|� |S )z�This function returns either U+FFFD or the character based on the decimal or hexadecimal representation. It also discards ";" if present. If not present self.tokenQueue.append({"type": tokenTypes["ParseError"]}) is invoked. � � � r* z$illegal-codepoint-for-numeric-entity� charAsInt�r, r- �datavarsi � i�� � � �r � � � � � i� i� )#� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� i�� r: i i � i� �;z numeric-entity-without-semicolonr+ )r r r �charr �append�int�joinr r. r � frozenset�chr� ValueError�unget) r$ �isHex�allowed�radix� charStack�cr7 rC �vr( r( r) �consumeNumberEntityG s\ � � � �$��z!HTMLTokenizer.consumeNumberEntityFc C s� d}| j �� g}|d tv s!|d tddfv s!|d ur+||d kr+| j �|d � �n|d dkr�d}|�| j �� � |d dv rKd}|�| j �� � |rS|d tv s[|si|d tv ri| j �|d � | �|�}n�| j �t d d d�� | j �|�� � dd�|� }n�|d tur�t �d�|��s�n|�| j �� � |d tus�zt �d�|d d� ��}t|�}W n ty� d }Y nw |d u�r|d d kr�| j �t d dd�� |d d kr�|r�|| tv s�|| tv s�|| dkr�| j �|�� � dd�|� }n2t| }| j �|�� � |d�||d � �7 }n| j �t d dd�� | j �|�� � dd�|� }|�rC| jd d d |7 <