Hail Risk Estimation

JAX
Numpyro
Bayesian Statistics
Quantile Regression
Zero-inflated Regression
Author

Valerio Bonometti

Published

December 29, 2025

Abstract
This analysis illustrates how to estimate hourly hail risk in the US with county-level (and sub county-level) granularity. The analysis poses particular attention to the mitigation of issues arising from small samples.

Observed Hail Events

0.1 Hypotheses to be investigated

The analysis does not have specific hypotheses to investigate but rather poses some challenges that needs to be overcome while trying to perform the estimation tasks. Here an overview of said challenges:

  1. In order to estimate the level of risk for a certain county in the U.S. we first needed to obtain a reliable estimate of the probability of an extreme hail event to occur in a given county.

  2. Due to the nature of extreme hail events, calculating their probability cannot be done by simply evaluating the observed frequency of such events but requires to leverage specific statistical frameworks (e.g., Extreme Value Theory) (i.e., EVA) than can provide long term likelihood for such events (e.g., 100s of years.)

  3. In order to obtain reliable estimates, EVA usually requires that a sufficiently large number of events have been observed. Unfortunately increasing the level of granularity (i.e., looking at county level) forces to work with small samples. Moreover, this challenge is even more pronounced in those states where the Extreme Hail Events are more rare which are also states where a correct risk assessment is more critical.

1 Methodology

In this section we will outline our methodology with a particular focus on the data used and the analyses conducted for overcoming the challenges outlined in section Section 0.1.

Show supplementary code
%load_ext watermark

from pathlib import Path

import os
import numpyro

numpyro.set_host_device_count(os.cpu_count())

from IPython.display import Image

from itertools import product
from functools import partial

from tqdm import tqdm
from joblib import Parallel, delayed

from jax import vmap

from typing import List, Tuple, Any, Callable, Dict
from numpy.typing import ArrayLike

from matplotlib.axes import Axes
import matplotlib as mpl
import matplotlib.pyplot as plt

SMALL_FONT_SIZE = 12
MEDIUM_FONT_SIZE = 15
BIGGER_FONT_SIZE = 18

SINGLE_STATE = ["florida"]

# hail alley
MULTIPLE_STATES = [
    "texas",
    "oklahoma",
    "kansas",
    "nebraska",
    "colorado",
    "missouri",
    "iowa",
    # medium states
    "louisiana",
    "mississippi",
    "alabama",
    "florida",
    # dry states
    "california",
    "new mexico",
    "arizona",
]

LOWER_CUT_OFF_YEAR = 1990
UPPER_CUT_OFF_YEAR = 2024

SELECTED_ANALYSIS_STATES = SINGLE_STATE

FLOAT_PRECISION = 2
DEGREES_PRECISION = 1e-2

TIME_COLUMN = "begin_date_time"

CONTINUOUS_MODELLING_COLUMNS = [
    "year",
    "month",
    "hour",
    "begin_lat",
    "begin_lon",
    "state",
    "countyfp",
    "magnitude",
]

COUNT_MODELLING_COLUMNS = [
    "year",
    "month",
    "hour",
    "state",
    "countyfp",
]

NUMBER_ITERATIONS = 30_000
NUMBER_PARTICLES = 1

CONTINUOUS_TARGET = "magnitude"
COUNT_TARGET = "number_events"

LAT_COVARIATES = "begin_lat"
LON_COVARIATES = "begin_lon"

YEAR_COVARIATES = "year"
MONTH_COVARIATE = "month"
HOUR_COVARIATE = "hour"

COUNTIES_INDEX = "countyfp"
STATE_INDEX = "state"

CRITICAL_VALUE_CONTINUOUS_TARGET = 1.77
CRITICAL_VALUE_COUNT_TARGET = 5
CRITICAL_QUANTILE = 0.95

COLORMAP_NAME = "Blues"
COLORMAP = mpl.colormaps[COLORMAP_NAME].resampled(5)

DATA_PATH = Path("local_data")
RESULTS_PATH = Path("results")
GIFS_PATH = Path(RESULTS_PATH, "gif_images")
IMAGES_PATH = Path(RESULTS_PATH, "images")

plt.rc('font', size=SMALL_FONT_SIZE) 
plt.rc('axes', titlesize=MEDIUM_FONT_SIZE)
plt.rc('axes', labelsize=SMALL_FONT_SIZE)
plt.rc('xtick', labelsize=SMALL_FONT_SIZE)
plt.rc('ytick', labelsize=SMALL_FONT_SIZE)
plt.rc('legend', fontsize=SMALL_FONT_SIZE)    
plt.rc('figure', titlesize=BIGGER_FONT_SIZE)
plt.rc('figure', dpi=100)

1.1 Data Gathering and Data Description

In this paragraph we will provide an overview of the data employed in this analysis:

  1. The dataset from NOAA containing records of hail events.
  2. The geo-dataframe used for mapping hail events onto U.S. counties.

1.1.1 NOAA Dataset

The dataset we used for conducting this analysis is the Storm Event Database. This dataset contains records that documents:

  1. The occurrence of storms and other significant weather phenomena.
  2. Rare, unusual, weather phenomena
  3. Other significant meteorological events

A common characteristic of these events is being so intense or exceptional so to cause potential damage to people, structure and causing disruption to commerce. The database contains data from 1950 until 2024. However due to changes in the record, measurement and collection strategy the portion containing detailed and reliable information is limited to the interval 1996 - 2024. This is also the portion of the dataset that we have used for our analyses.

1.1.2 Counties Geometry Dataset

In order to to map and visualize our analysis onto the U.S. territory (and counties in particular) we obtained a geo-dataframe containing geometry files for each state in the U.S. This dataset was not used just for visualization purposes but it also played an important role in the estimation of extreme hail events at county level (more details will be given in section Section 1.5.1.)

1.2 Data Exploration

In this section we will outline a few key characteristics of hail events as a phenomena, putting particular attention on those characteristics that makes necessary the use of particular statistical framework for addressing the challenges we have outlined in Section 0.1. Due to space constrains we will report here only the results for the state of Florida.

The state of Florida perfectly illustrates the challenges of estimating extreme hail events in liminal situations: in states like California and Texas the risk assessment becomes easier due to either the complete absence or the abundance of hail events. Florida on the other end seems to have a much more nuanced risk profile.

1.2.1 Data Preparation

In this section we will prepare the data for further analysis, mostly focusing on creating datasets that are suitable for the different types of statistical approaches we want to try.

Show supplementary code
import numpy as np

import pandas as pd
import geopandas as gpd

from pyextremes import get_extremes


def pad_dataset(
    df: pd.DataFrame,
    categorical_columns: List[str],
    fill_value: float = 0.0,
) -> pd.DataFrame:
    """Pad a dataset over all the dimensions defined by indexing columns filling the missing
    entries with fill_value.
    """
    new_index = list([df[column].unique() for column in categorical_columns])
    new_index = product(*new_index)
    df = df.set_index(keys=categorical_columns)
    df = df.reindex(new_index).fillna(fill_value).reset_index()
    return df


def make_grid(polygon: Any, edge_size: float) -> gpd.GeoSeries:
    """Create a grid of the size of polygon made of squares of size edge_size"""
    bounds = polygon.bounds
    x_coords = np.arange(bounds[0] + edge_size / 2, bounds[2], edge_size)
    y_coords = np.arange(bounds[1] + edge_size / 2, bounds[3], edge_size)
    combinations = np.array(list(product(x_coords, y_coords)))
    squares = gpd.points_from_xy(combinations[:, 0], combinations[:, 1]).buffer(
        edge_size / 1, cap_style=3
    )
    return gpd.GeoSeries(squares[squares.intersects(polygon)])


def generate_grid_df(
    geometries_df: gpd.GeoDataFrame,
    edge_size: float,
    state: List[str] = SELECTED_ANALYSIS_STATES,
) -> gpd.GeoDataFrame:
    """Generate a geod-dataframe made of giddified polygons"""
    state_geometries_df = geometries_df[
        geometries_df[STATE_INDEX].str.lower().isin([s for s in state])
    ].copy()
    grid_df = []
    list_counties_fp = state_geometries_df[COUNTIES_INDEX].values
    list_states_fp = state_geometries_df[STATE_INDEX].values
    list_geometries = state_geometries_df["geometry"].values
    for countyfp, state, geometry in zip(
        list_counties_fp, list_states_fp, list_geometries
    ):

        generated_grid = make_grid(geometry, edge_size=edge_size)
        grid_df.append(
            pd.DataFrame(
                {
                    COUNTIES_INDEX: countyfp,
                    STATE_INDEX: state,
                    "geometry": generated_grid,
                }
            )
        )
    grid_df = gpd.GeoDataFrame(pd.concat(grid_df))
    grid_df["begin_lat"] = grid_df["geometry"].centroid.y.round(FLOAT_PRECISION)
    grid_df["begin_lon"] = grid_df["geometry"].centroid.x.round(FLOAT_PRECISION)
    return grid_df


def split_dataset(
    df: pd.DataFrame,
    ordering_columns: List[str],
    split_fraction: float = 0.33,
) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """Split dataset in two portions"""
    df = df.sort_values(ordering_columns)
    cut_off = int(len(df) * split_fraction)
    return df[:-cut_off].copy(), df[-cut_off:].copy()


def perform_dithering(
    magnitudes: ArrayLike,
    cap: float = 0.5,
    dithering_coefficient: float = 0.147,
    dithering_bias: float = 0.0279,
) -> ArrayLike:
    """Perform dithering following the procedure in
    https://www.columbia.edu/~mkt14/publications/MWR-final-hailsize.pdf
    """
    dithering_amounts = dithering_bias + dithering_coefficient * magnitudes
    dithering_amounts = np.clip(dithering_amounts, a_min=0.0, a_max=cap)
    dithering_noise = np.random.uniform(-dithering_amounts, dithering_amounts)
    dithered_magnitudes = np.clip(magnitudes + dithering_noise, 0, None)
    return dithered_magnitudes


def make_time_series_continuous(
    series: pd.Series, start: str, end: str, frequency: str
) -> pd.Series:
    """Make a time series continuous and pad the missing values with zeros"""
    new_index = pd.date_range(start=start, end=end, freq=frequency)
    return series.reindex(new_index, fill_value=0)


def downscale_coordinates(
    values: ArrayLike, precision: int, degree_resolution: float
) -> ArrayLike:
    """Downscale latitude and longitude coordinate to a certain degree resolution."""
    return np.round(degree_resolution * np.round(values / degree_resolution), precision)


def create_covariates_df(
    geometries_df: gpd.GeoDataFrame,
    edge_size: float,
    state: List[str],
    months: List[int],
    hours: List[int],
    years: List[int],
) -> gpd.GeoDataFrame:
    """Create a dataframe of covariates for inference"""
    covariates_df = []
    grid_df = generate_grid_df(
        geometries_df=geometries_df,
        edge_size=edge_size,
        state=state,
    )
    for year, month, hour in list(product(years, months, hours)):

        timed_grid_df = grid_df.copy()
        timed_grid_df["year"] = year
        timed_grid_df["month"] = month
        timed_grid_df["hour"] = hour

        covariates_df.append(timed_grid_df)

    covariates_df = pd.concat(covariates_df)
    covariates_df = gpd.GeoDataFrame(covariates_df)
    return covariates_df


def create_maxima_dataset(
    df: pd.DataFrame, grouping_columns: List[str], value_column: str
) -> pd.DataFrame:
    """Create the dataset for fitting gen-extreme data"""
    maxima_df = df[grouping_columns + [value_column]].copy()
    maxima_df = maxima_df.groupby(grouping_columns)[value_column].max().reset_index()
    return maxima_df


def get_extremes_in_blocks(
    maxima_df: pd.Series,
    county_fp: str,
    state_fp,
    start: str,
    end: str,
) -> pd.DataFrame:
    """Extract the extreme values using either Point of Threshold or Block Maxima"""
    maxima_df = make_time_series_continuous(
        series=maxima_df,
        start=start,
        end=end,
        frequency="1h",
    )
    extremes = get_extremes(
        ts=maxima_df,
        method="BM",
    )
    extremes = extremes.reset_index()
    extremes[COUNTIES_INDEX] = county_fp
    extremes[STATE_INDEX] = state_fp
    return extremes


def create_eva_dataset(
    modelling_df: pd.DataFrame, geometries_df: gpd.GeoDataFrame
) -> pd.DataFrame:
    """Create a dataset used for Extreme Value Analysis"""
    maxima_data = create_maxima_dataset(
        df=modelling_df,
        grouping_columns=[TIME_COLUMN, STATE_INDEX, COUNTIES_INDEX],
        value_column=CONTINUOUS_TARGET,
    )
    grouped = maxima_data.groupby([COUNTIES_INDEX, STATE_INDEX])

    list_df = [group.set_index(TIME_COLUMN)[CONTINUOUS_TARGET] for _, group in grouped]
    list_category_names = grouped.groups.keys()

    partialized_get_extremes_df = partial(
        get_extremes_in_blocks,
        start=maxima_data[TIME_COLUMN].min(),
        end=maxima_data[TIME_COLUMN].max(),
    )
    eva_df = Parallel(n_jobs=-1)(
        delayed(partialized_get_extremes_df)(data, county_fp, state_fp)
        for data, (county_fp, state_fp) in zip(list_df, list_category_names)
    )
    eva_df: pd.DataFrame = pd.concat(eva_df)
    eva_df[CONTINUOUS_TARGET] = eva_df[CONTINUOUS_TARGET].replace(
        to_replace=0,
        value=eva_df[CONTINUOUS_TARGET].mean(),
    )
    eva_df = eva_df.rename({"date-time": TIME_COLUMN}, axis=1)
    eva_df = pd.merge(
        eva_df,
        geometries_df[[COUNTIES_INDEX, STATE_INDEX, "geometry"]].drop_duplicates(),
        on=[COUNTIES_INDEX, STATE_INDEX],
        how="inner",
    )
    eva_df = pd.merge(
        eva_df,
        modelling_df[["county", COUNTIES_INDEX, STATE_INDEX]].drop_duplicates(),
        on=[COUNTIES_INDEX, STATE_INDEX],
        how="inner",
    )
    return eva_df


def create_continuous_dataset(
    modelling_df: pd.DataFrame, geometries_df: gpd.GeoDataFrame
) -> pd.DataFrame:
    """Create a dataset for continuous modelling"""
    continuous_modelling_df = modelling_df[CONTINUOUS_MODELLING_COLUMNS + ["begin_date_time"]].copy()
    continuous_modelling_df = (
        continuous_modelling_df.groupby(
            list(continuous_modelling_df.drop(CONTINUOUS_TARGET, axis=1))
        )[CONTINUOUS_TARGET]
        .max()
        .reset_index()
        .drop("begin_date_time", axis=1)
    )
    continuous_modelling_df = pd.merge(
        continuous_modelling_df,
        geometries_df[[COUNTIES_INDEX, STATE_INDEX, "geometry"]],
        on=[COUNTIES_INDEX, STATE_INDEX],
        how="inner",
    )
    return continuous_modelling_df


def create_count_dataset(
    modelling_df: pd.DataFrame, geometries_df: gpd.GeoDataFrame
) -> gpd.GeoDataFrame:
    """Create a dataset for count modelling"""
    count_modelling_df = modelling_df[
        [
            "county",
            "begin_date_time",
            "year",
            "month",
            "hour",
            "state",
            COUNTIES_INDEX,
        ]
    ].copy()

    count_modelling_df = (
        count_modelling_df.groupby(
            [
                "county",
                "begin_date_time",
                "year",
                "month",
                "hour",
                "state",
                COUNTIES_INDEX,
            ]
        )
        .size()
        .reset_index()
        .rename({0: COUNT_TARGET}, axis=1)
        .groupby(COUNT_MODELLING_COLUMNS)[COUNT_TARGET]
        .max()
        .reset_index()
    )
    count_modelling_df = pad_dataset(
        df=count_modelling_df,
        categorical_columns=COUNT_MODELLING_COLUMNS,
    )
    count_modelling_df = pd.merge(
        count_modelling_df,
        geometries_df[[COUNTIES_INDEX, STATE_INDEX, "geometry"]],
        on=[COUNTIES_INDEX, STATE_INDEX],
        how="inner",
    )
    return count_modelling_df


def create_all_datasets(
    modelling_df: pd.DataFrame,
    geometries_df: gpd.GeoDataFrame,
    state: List[str] = SELECTED_ANALYSIS_STATES,
) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """Create all dataset required for the notebook"""
    state_modelling_df = modelling_df[modelling_df["state"].isin(state)].copy()
    continuous_state_modelling_df = create_continuous_dataset(
        modelling_df=state_modelling_df,
        geometries_df=geometries_df,
    )
    count_state_modelling_df = create_count_dataset(
        modelling_df=state_modelling_df,
        geometries_df=geometries_df,
    )
    eva_state_modelling_df = create_eva_dataset(
        modelling_df=state_modelling_df,
        geometries_df=geometries_df,
    )

    return (
        continuous_state_modelling_df,
        count_state_modelling_df,
        eva_state_modelling_df,
    )

As a first step we read and pre-process the data used for modelling in general

modelling_df = pd.read_parquet(Path(DATA_PATH, "modelling_df.parquet"))
modelling_df["state"] = modelling_df["state"].apply(
    lambda x: [i.lower() for i in x.split(" ")]
)
modelling_df["state"] = modelling_df["state"].apply(lambda x: "_".join(x))
modelling_df = modelling_df.rename({"yearly": "year"}, axis=1)

modelling_df["begin_lat"] = modelling_df["begin_lat"].values.round(FLOAT_PRECISION)
modelling_df["begin_lon"] = modelling_df["begin_lon"].values.round(FLOAT_PRECISION)
modelling_df["magnitude"] = perform_dithering(
    magnitudes=modelling_df["magnitude"].values
)
modelling_df
county begin_lat begin_lon begin_date_time countyfp statefp countyfp_nozero magnitude state monthly daily year month hour time_index
0 abbeville 34.09 -82.60 2009-04-10 19:00:00 001 45 1 1.506724 south_carolina 2009-04-06 2009-04-10 2009 4 19 116353
1 abbeville 34.09 -82.59 2010-03-28 19:00:00 001 45 1 1.066819 south_carolina 2010-03-02 2010-03-28 2010 3 19 124801
2 abbeville 34.09 -82.59 2010-04-25 01:00:00 001 45 1 1.477696 south_carolina 2010-04-01 2010-04-25 2010 4 1 125455
3 abbeville 34.10 -82.62 1998-06-09 16:00:00 001 45 1 0.841454 south_carolina 1998-06-03 1998-06-09 1998 6 16 21358
4 abbeville 34.10 -82.60 1997-04-22 14:00:00 001 45 1 0.868167 south_carolina 1997-04-09 1997-04-22 1997 4 14 11444
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
310393 ziebach 45.40 -101.86 2013-07-23 18:00:00 137 46 137 1.697297 south_dakota 2013-07-14 2013-07-23 2013 7 18 153912
310394 ziebach 45.42 -101.81 2019-09-29 22:00:00 137 46 137 0.926773 south_dakota 2019-09-11 2019-09-29 2019 9 22 208132
310395 ziebach 45.47 -101.65 2003-07-03 22:00:00 137 46 137 0.752974 south_dakota 2003-06-07 2003-07-03 2003 7 22 65764
310396 ziebach 45.47 -101.65 2008-07-18 18:00:00 137 46 137 1.477907 south_dakota 2008-07-10 2008-07-18 2008 7 18 109968
310397 ziebach 45.47 -101.65 2008-07-23 15:00:00 137 46 137 1.474617 south_dakota 2008-07-10 2008-07-23 2008 7 15 110085

310398 rows × 15 columns

modelling_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 310398 entries, 0 to 310397
Data columns (total 15 columns):
 #   Column           Non-Null Count   Dtype         
---  ------           --------------   -----         
 0   county           310398 non-null  object        
 1   begin_lat        310398 non-null  float64       
 2   begin_lon        310398 non-null  float64       
 3   begin_date_time  310398 non-null  datetime64[ns]
 4   countyfp         310398 non-null  object        
 5   statefp          310398 non-null  object        
 6   countyfp_nozero  310398 non-null  object        
 7   magnitude        310398 non-null  float64       
 8   state            310398 non-null  object        
 9   monthly          310398 non-null  datetime64[ns]
 10  daily            310398 non-null  datetime64[ns]
 11  year             310398 non-null  int32         
 12  month            310398 non-null  int32         
 13  hour             310398 non-null  int32         
 14  time_index       310398 non-null  int64         
dtypes: datetime64[ns](3), float64(3), int32(3), int64(1), object(5)
memory usage: 32.0+ MB

As well as the geo-data used for visualizing our results spatially

geometries_df = gpd.read_file(filename=Path(DATA_PATH, "us_counties_df.geojson"))
geometries_df = geometries_df.rename({"state_name": "state"}, axis=1)
geometries_df["state"] = geometries_df["state"].apply(
    lambda x: [i.lower() for i in x.split(" ")]
)
geometries_df["state"] = geometries_df["state"].apply(lambda x: "_".join(x))
geometries_df
geo_point_2d statefp countyfp countyns geoid name namelsad stusab lsad classfp ... cbsafp metdivfp funcstat aland awater intptlat intptlon state countyfp_nozero geometry
0 { "lon": -96.615073673300003, "lat": 28.439109... 48 057 01383814 48057 Calhoun Calhoun County TX 06 H1 ... 38920 None A 1312707005 1361884774 +28.4417191 -096.5795739 texas 57 POLYGON ((-96.87329 28.62291, -96.87148 28.624...
1 { "lon": -86.190418669300001, "lat": 36.751316... 21 003 00516848 21003 Allen Allen County KY 06 H1 ... 14540 None A 891838779 19482100 +36.7507703 -086.1924580 kentucky 3 POLYGON ((-86.2958 36.85107, -86.29347 36.8526...
2 { "lon": -97.721324086699994, "lat": 48.369456... 38 099 01034214 38099 Walsh Walsh County ND 06 H1 ... None None A 3319346396 32181391 +48.3769789 -097.7222304 north_dakota 99 POLYGON ((-98.29185 48.36969, -98.29211 48.369...
3 { "lon": -84.649203742099999, "lat": 36.135024... 47 129 01639778 47129 Morgan Morgan County TN 06 H1 ... 28940 None A 1352439675 823018 +36.1386970 -084.6392616 tennessee 129 POLYGON ((-84.79101 36.05853, -84.79184 36.059...
4 { "lon": -105.367471778, "lat": 38.10867790150... 08 027 00198129 08027 Custer Custer County CO 06 H1 ... None None A 1913031921 3364150 +38.1019955 -105.3735123 colorado 27 POLYGON ((-105.7969 38.26505, -105.78341 38.26...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3228 { "lon": -89.823002441400007, "lat": 31.569641... 28 065 00695756 28065 Jefferson Davis Jefferson Davis County MS 06 H1 ... None None A 1057871313 1791496 +31.5648075 -089.8270863 mississippi 65 POLYGON ((-89.97537 31.59155, -89.97535 31.592...
3229 { "lon": -89.414372033899994, "lat": 35.197080... 47 047 01639742 47047 Fayette Fayette County TN 06 H1 ... 32820 None A 1825359642 3774635 +35.1969933 -089.4138027 tennessee 47 POLYGON ((-89.63773 35.17934, -89.63768 35.181...
3230 { "lon": -97.891896839799998, "lat": 42.636781... 31 107 00835875 31107 Knox Knox County NE 06 H1 ... None None A 2870854403 81011709 +42.6344045 -097.8913492 nebraska 107 POLYGON ((-97.60303 42.85796, -97.60294 42.857...
3231 { "lon": -123.098321728, "lat": 45.56009084080... 41 067 01155137 41067 Washington Washington County OR 06 H1 ... 38900 None A 1875859540 6114246 +45.5535419 -123.0976146 oregon 67 POLYGON ((-123.20926 45.43371, -123.20976 45.4...
3232 { "lon": -72.713798537399995, "lat": 42.990603... 50 025 01461769 50025 Windham Windham County VT 06 H1 ... None None A 2034457838 32920750 +42.9953348 -072.7219550 vermont 25 POLYGON ((-72.86874 43.11317, -72.86803 43.125...

3233 rows × 22 columns

geometries_df.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 3233 entries, 0 to 3232
Data columns (total 22 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   geo_point_2d     3233 non-null   object  
 1   statefp          3233 non-null   object  
 2   countyfp         3233 non-null   object  
 3   countyns         3233 non-null   object  
 4   geoid            3233 non-null   object  
 5   name             3233 non-null   object  
 6   namelsad         3233 non-null   object  
 7   stusab           3233 non-null   object  
 8   lsad             3233 non-null   object  
 9   classfp          3233 non-null   object  
 10  mtfcc            3233 non-null   object  
 11  csafp            1255 non-null   object  
 12  cbsafp           1915 non-null   object  
 13  metdivfp         110 non-null    object  
 14  funcstat         3233 non-null   object  
 15  aland            3233 non-null   int64   
 16  awater           3233 non-null   int64   
 17  intptlat         3233 non-null   object  
 18  intptlon         3233 non-null   object  
 19  state            3233 non-null   object  
 20  countyfp_nozero  3233 non-null   object  
 21  geometry         3233 non-null   geometry
dtypes: geometry(1), int64(2), object(19)
memory usage: 555.8+ KB

We then proceed at creating 3 datasets to be used by the 3 different statistical approaches we are going to adopt:

continuous_state_modelling_df, count_state_modelling_df, eva_state_modelling_df = (
    create_all_datasets(
        modelling_df=modelling_df[
            (modelling_df["year"] >= LOWER_CUT_OFF_YEAR)
            & (modelling_df["year"] <= UPPER_CUT_OFF_YEAR)
        ],
        geometries_df=geometries_df,
        state=SELECTED_ANALYSIS_STATES,
    )
)
  1. One for extreme value analysis.
eva_state_modelling_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1972 entries, 0 to 1971
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   begin_date_time  1972 non-null   datetime64[ns]
 1   magnitude        1972 non-null   float64       
 2   countyfp         1972 non-null   object        
 3   state            1972 non-null   object        
 4   geometry         1972 non-null   geometry      
 5   county           1972 non-null   object        
dtypes: datetime64[ns](1), float64(1), geometry(1), object(3)
memory usage: 92.6+ KB
  1. One with a continuous target (i.e., hail magnitude) for the quantile regression.
continuous_state_modelling_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4386 entries, 0 to 4385
Data columns (total 9 columns):
 #   Column     Non-Null Count  Dtype   
---  ------     --------------  -----   
 0   year       4386 non-null   int32   
 1   month      4386 non-null   int32   
 2   hour       4386 non-null   int32   
 3   begin_lat  4386 non-null   float64 
 4   begin_lon  4386 non-null   float64 
 5   state      4386 non-null   object  
 6   countyfp   4386 non-null   object  
 7   magnitude  4386 non-null   float64 
 8   geometry   4386 non-null   geometry
dtypes: float64(3), geometry(1), int32(3), object(2)
memory usage: 257.1+ KB
  1. One with a count target (i.e, number of hail events) for the zero-inflated negative binomial regression.
count_state_modelling_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 559584 entries, 0 to 559583
Data columns (total 7 columns):
 #   Column         Non-Null Count   Dtype   
---  ------         --------------   -----   
 0   year           559584 non-null  int32   
 1   month          559584 non-null  int32   
 2   hour           559584 non-null  int32   
 3   state          559584 non-null  object  
 4   countyfp       559584 non-null  object  
 5   number_events  559584 non-null  float64 
 6   geometry       559584 non-null  geometry
dtypes: float64(1), geometry(1), int32(3), object(2)
memory usage: 23.5+ MB

1.2.2 Visualizing Hail Events

Show supplementary code
import gif

import seaborn as sns
import matplotlib.pyplot as plt

from mpl_toolkits.axes_grid1.inset_locator import inset_axes


def plot_distribution_hail(
    continuous_state_modelling_df: pd.DataFrame,
    x: str,
    ax: plt.Axes,
    states: List[str] = SELECTED_ANALYSIS_STATES,
    **kwargs: Any,
) -> plt.Axes:
    """Plot the distribution of hail magnitude"""
    title_states = "\n".join([state.capitalize() for state in states])
    sns.histplot(data=continuous_state_modelling_df, x=x, ax=ax, **kwargs)
    ax.grid(alpha=0.5)
    ax.set_title(f"{title_states}\nHourly Hail Events")
    return ax


def plot_hail_distribution_with_tails(
    magnitude_events: ArrayLike, state: str, threshold=CRITICAL_VALUE_CONTINUOUS_TARGET
) -> Tuple[plt.Figure, plt.Axes, plt.Axes]:
    """Plot a distribution of hail events magnitude with a focus on the tail."""
    fig, ax = plt.subplots(1, 1, figsize=[8, 5])
    sub_ax = inset_axes(
        ax,
        width=2,
        height=2,
        loc="center right",
    )
    sns.histplot(
        data=magnitude_events,
        bins=8,
        ax=ax,
        alpha=1,
    )
    sns.histplot(
        data=magnitude_events[magnitude_events > threshold],
        bins=8,
        ax=sub_ax,
        alpha=1,
    )
    ax.set_title(f"Distribution Hail Events\n{state.capitalize()}")
    ax.set_xlabel("Hail Size")

    sub_ax.set_title(f"Extreme Hail Events\n{state.capitalize()}")
    sub_ax.set_xlabel("Hail Size")

    ax.axvline(
        1.77,
        linestyle="--",
        c="r",
        linewidth=5,
        label="Critical Value",
    )
    ax.grid(alpha=0.5, zorder=0)
    ax.legend()
    return fig, ax, sub_ax


def plot_hail_events_on_map(
    geometries_df: gpd.GeoDataFrame,
    state_modelling_df: pd.DataFrame,
    selected_state: str = SELECTED_ANALYSIS_STATES,
    threshold: float = CRITICAL_VALUE_CONTINUOUS_TARGET,
    extreme_threshold: float = 2.5,
    extreme_events_size: float = 25,
    **scatter_kwargs: Any,
):
    """Visualize the hail events on a map with highlight of above-threshold event"""
    fig, axs = plt.subplots(1, 2, figsize=(10, 10), sharey=True, sharex=True)

    (
        geometries_df[geometries_df[STATE_INDEX] == selected_state].boundary.plot(
            ax=axs[0], color="k", linewidth=0.5
        )
    )
    (
        geometries_df[geometries_df[STATE_INDEX] == selected_state].boundary.plot(
            ax=axs[1], color="k", linewidth=0.5
        )
    )
    axs[0].scatter(
        state_modelling_df["begin_lon"].values,
        state_modelling_df["begin_lat"].values,
        c=state_modelling_df[CONTINUOUS_TARGET].values,
        vmin=0,
        vmax=4.5,
        cmap=COLORMAP_NAME,
        **scatter_kwargs,
    )
    axs[1].scatter(
        state_modelling_df[
            (state_modelling_df[CONTINUOUS_TARGET] > threshold)
            & (state_modelling_df[CONTINUOUS_TARGET] < extreme_threshold)
        ]["begin_lon"].values,
        state_modelling_df[
            (state_modelling_df[CONTINUOUS_TARGET] > threshold)
            & (state_modelling_df[CONTINUOUS_TARGET] < extreme_threshold)
        ]["begin_lat"].values,
        c=state_modelling_df[
            (state_modelling_df[CONTINUOUS_TARGET] > threshold)
            & (state_modelling_df[CONTINUOUS_TARGET] < extreme_threshold)
        ][CONTINUOUS_TARGET].values,
        vmin=0,
        vmax=5,
        cmap=COLORMAP_NAME,
        **scatter_kwargs,
    )
    axs[1].scatter(
        state_modelling_df[
            (state_modelling_df[CONTINUOUS_TARGET] >= extreme_threshold)
        ]["begin_lon"].values,
        state_modelling_df[
            (state_modelling_df[CONTINUOUS_TARGET] >= extreme_threshold)
        ]["begin_lat"].values,
        c="r",
        s=extreme_events_size,
    )

    for ax in axs:

        ax.grid(alpha=0.5)

    axs[0].set_title(f"Hail Events\n{selected_state.capitalize()} Counties")
    axs[1].set_title(f"Extreme Hail Events\n{selected_state.capitalize()} Counties")
    axs[0].set_xlabel("Longitude")
    axs[1].set_xlabel("Longitude")
    axs[0].set_ylabel("Latitude")
    axs[1].set_title(f"Extreme Hail Events\n{selected_state.capitalize()} Counties")
    return fig, axs


def plot_quantile_hail_time(
    time_column: str,
    y: str,
    modelling_df: pd.DataFrame,
    ax: plt.Axes,
    quantile: float,
    states: List[str] = SELECTED_ANALYSIS_STATES,
    **kwargs: Any,
) -> plt.Axes:
    """Plot the average hail attribute over a certain categorical columns"""
    title_states = "\n".join([state.capitalize() for state in states])
    sns.lineplot(
        data=modelling_df,
        x=time_column,
        y=y,
        ax=ax,
        estimator=partial(np.percentile, q=quantile),
        n_boot=100,
        **kwargs,
    )
    ax.grid(alpha=0.5)
    ax.set_ylim(0, modelling_df[y].max())
    ax.set_ylabel(y.capitalize())
    ax.set_xlabel(time_column.capitalize())
    ax.set_title(f"{title_states}\nQuantile {quantile} {y.capitalize()}")
    return ax


def plot_quantile_hail_geometry(
    modelling_df: pd.DataFrame,
    color_column: str,
    max_value: float,
    ax: Axes,
    quantile: float = 0.5,
    states: List[str] = SELECTED_ANALYSIS_STATES,
) -> Axes:
    """Plot quantile of hail attribute over geometries"""
    title_states = "\n".join([state.capitalize() for state in states])
    (
        gpd.GeoDataFrame(
            modelling_df.groupby("geometry")[color_column]
            .quantile(quantile)
            .reset_index()
        ).plot(
            color_column,
            cmap=COLORMAP_NAME,
            ax=ax,
            legend=True,
            vmin=0.0,
            vmax=max_value,
            legend_kwds={"shrink": 0.7},
        )
    )
    ax.grid(alpha=0.5)
    ax.set_title(
        f"{title_states}\nQuantile {quantile * 100} {color_column.capitalize()}"
    )
    ax.set_ylabel("Longitude")
    ax.set_xlabel("Latitude")
    return ax


@gif.frame
def plot_comparison_quantile_hail_geometry(
    quantile: float,
    continuous_state_modelling_df: gpd.GeoDataFrame,
    count_state_modelling_df: gpd.GeoDataFrame,
    states: List[str] = SELECTED_ANALYSIS_STATES,
) -> None:
    """Compare hail magnitude and hail quantity over geometries"""
    fig, axs = plt.subplots(1, 2, figsize=(15, 5))
    for ax, df, color_column in zip(
        axs,
        [continuous_state_modelling_df, count_state_modelling_df],
        [CONTINUOUS_TARGET, COUNT_TARGET],
    ):

        ax = plot_quantile_hail_geometry(
            modelling_df=df,
            color_column=color_column,
            ax=ax,
            quantile=quantile,
            max_value=df[color_column].max(),
            states=states,
        )

    plt.tight_layout()


@gif.frame
def plot_comparison_quantile_hail_time(
    y: str,
    modelling_df: gpd.GeoDataFrame,
    quantile: float,
    color: Any,
    states: List[str] = SELECTED_ANALYSIS_STATES,
    **kwargs: Any,
) -> None:
    """Compare average hail attribute over hour and month"""
    fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(20, 5))
    for ax, time_indicator in zip(axs, ["hour", "month", "year"]):

        ax = plot_quantile_hail_time(
            modelling_df=modelling_df,
            time_column=time_indicator,
            y=y,
            ax=ax,
            quantile=quantile,
            states=states,
            color=color,
            **kwargs,
        )

    plt.tight_layout()

As first thing we want to have a sense of how the hail events distributed spatially so we proceed at visualizing the entire history of hail events on Florida.

fig, axs = plot_hail_events_on_map(
    geometries_df=geometries_df,
    state_modelling_df=continuous_state_modelling_df,
    selected_state=SELECTED_ANALYSIS_STATES[0],
    threshold=1.7,
    s=1.5,
    extreme_events_size=10,
)
fig.subplots_adjust(top=1.33)
plt.tight_layout()
plt.savefig(Path(IMAGES_PATH, "hail_events_comparison_total.png"))
plt.close("all")
Figure 1: Comparing Normal and Extreme Hail Events

Figure 1 shows on the left side all the hail events coloured according to the estimated hail size of the event. On the right we can see a filtered version of the left panel where only events with estimated hail size greater than 1.7” are shown, in particular we have highlighted in red those events with hail size greater than 2.5” (i.e., exceptionally large hail events).

partialized_plot_comparison_quantile_hail_geometry = partial(
    plot_comparison_quantile_hail_geometry,
    continuous_state_modelling_df=continuous_state_modelling_df,
    count_state_modelling_df=count_state_modelling_df[
        count_state_modelling_df["number_events"] > 0
    ],
    states=SELECTED_ANALYSIS_STATES,
)
quantiles = [.25, .75, .95, .99]
frames = [partialized_plot_comparison_quantile_hail_geometry(quantile=quantile) for index, quantile in enumerate(quantiles)]
gif.save(frames, Path(GIFS_PATH, 'spatial_map_quantiles.gif').as_posix(), duration=1000)
Figure 2: Observed Spatial Quantiles

From Figure 1 and Figure 2 can already have an intuition of various characteristics of hail events

  1. They are not evenly distributed in space, there are areas which don’t see events at all. Figure 2 shows the percentiles for both hail size and hail events across counties. Darker colors indicate higher hail size and number of hail events.

  2. Very large hail events are very rare and exceptionally large hail events are even more rare.

  3. Very large hail events can occur even in areas where hail per-se is a very rare event. Figure 2 shows three different type of percentile values of hail size for the different counties: 50%, 95% and 99% as we can see some of the most extreme values occur in area where there has been very few hail events.

  4. We can notice that there is spatial coherence in how the hail events distribute on the territory of Florida (possibly reflecting geographical characteristics of the various areas)

partialized_plot_comparison_quantile_hail_time = partial(
    plot_comparison_quantile_hail_time,
    y=COUNT_TARGET,
    modelling_df=count_state_modelling_df[
        count_state_modelling_df["number_events"] > 0
    ],
    states=SELECTED_ANALYSIS_STATES,
    linewidth=3,
    solid_capstyle='round'
)
quantiles = [25, 75, 95, 99]
frames = [partialized_plot_comparison_quantile_hail_time(quantile=quantile, color=COLORMAP(index + 1)) for index, quantile in enumerate(quantiles)]
gif.save(frames, Path(GIFS_PATH, 'temporal_quantiles.gif').as_posix(), duration=1000)
Figure 3: Observed Temporal Quantiles
  1. From Figure 3 we can also observe the presence of seasonal patterns and trending behaviour that changes when considering different quantiles for both hail sizes and number of hail events.
Figure 4: Comparing Daily Empirical Probability of Hail Events by Magnitude

Figure 4 shows in more detail how the more an event is severe, in terms of hail size (i.e., disruptive), the lower is its daily probability of occurring. I particular we can see the sharp drop from moderate (1.25” to 1.75”) to severe (greater than 1.77”) hail storm.

What this tells us is that hail storms, in particular those which have potential of being disruptive or dangerous, are better framed in terms of extreme events. In the next section we will see how these events require to be treated with particular care.

1.2.3 Visualizing The characteristics of Extreme Hail Events

We now want to have a better understanding of the statistical characteristics of hail events which are extreme in size, meaning they are above 1.77”

fig, ax, sub_ax = plot_hail_distribution_with_tails(
    magnitude_events=continuous_state_modelling_df[CONTINUOUS_TARGET].values,
    state=SELECTED_ANALYSIS_STATES[0],
)
plt.savefig(Path(IMAGES_PATH, "events_distribution.png"))
plt.close("all")
Figure 5: Visualizing the Distribution of Hail Events

Figure 5 shows the distribution of the hail events with a red line indicating the critical value of 1.77”. We can observe a series of characteristics typical of hail events:

  1. Most of the mass of the distribution lies around the 1” mark indicating that the most common hail events are usually innocuous. We need to note that from Figure 5 we removed all the days in which there were no hail events. If not we would have observed the typical pattern of a zero-inflated distribution.

  2. The distribution is left skewed with a relatively long and heavy right tail. This indicates that hail event of disruptive force can appear with a non-negligible probability but are in general very rare.

  3. Focusing on the extreme hail events (i.e., above the 1.77” mark) we can see the distribution is again heavily skewed on the left suggesting that truly disastrous hail events (i.e., size above the 2.5”) can be very hard to predict.

1.3 Analyses Conducted

In this section we will outline the methodology we adopted for estimating hail size and hail risk at the county level.

1.4 Estimating Return Periods using Extreme Value Analysis

Show supplementary code
from scipy.stats import rankdata
from scipy.stats import genextreme

from pyextremes import get_extremes
from pyextremes import EVA


def compute_empirical_return_periods(hail_events_extremes: ArrayLike) -> ArrayLike:
    """Compute the return period empirically from the data"""
    ranks = rankdata(a=-hail_events_extremes)
    excedance = ranks / (len(hail_events_extremes) + 1)
    periods = 1 / excedance
    return periods


def compute_mle_return_periods_scipy(
    hail_events_extremes: pd.Series, max_years: int = 150, min_years: float = 2
) -> List[float]:
    """Estimate the parameters of a genextreme distribution and compute the return periods using
    the genextreme parameters estimated using scipy.
    """
    c, loc, scale = genextreme.fit(hail_events_extremes)
    periods = []
    for year in np.arange(min_years, max_years):

        periods.append(genextreme.ppf(1 - 1 / year, c, loc=loc, scale=scale))

    return periods


def compute_mle_return_periods_pyextremes(
    hail_events_extremes: pd.Series, max_years: int = 150
) -> Tuple[ArrayLike, Any]:
    """Estimate the parameters of a genextreme distribution and compute the return periods using
    the genextreme parameters estimated using pyextremes.
    """
    model = EVA.from_extremes(extremes=hail_events_extremes, method="BM")
    model.fit_model(distribution_kwargs={"floc": 0})
    periods, _, _ = model.get_return_value(np.arange(2, max_years))
    return periods, model


def plot_fit_extreme_values(
    ax: Axes,
    category: str,
    model_names: List[str],
    models: Dict[str, Any],
    critical_magnitude: float,
    critical_period: int,
) -> Axes:
    """Visualize how the estimated return function fit the empirical return values."""
    ax.scatter(
        models["empirical"][category]["period"],
        models["empirical"][category]["magnitude"],
        s=80,
        facecolors="none",
        edgecolors="k",
        label="Empirical Values",
    )
    ax.axhline(critical_magnitude, c="r", linestyle=":", label="Critical Limit")
    ax.axvline(critical_period, c="r", linestyle="-.", label=f"{critical_period} Years")

    for model_name in model_names:

        ax.plot(
            models[model_name][category]["period"],
            models[model_name][category]["magnitude"],
            label=f"{model_name}",
        )

    ax.set_title("\n".join(category.split("_")))
    ax.grid(alpha=0.5)
    ax.set_ylabel("Hail Size")
    ax.set_xlabel("Return Period")
    return ax

The first estimation approach we decided to use was based on Extreme Value Theory. In particular we followed this tutorial on extreme events provided by course on computational tools in climate science.

Extreme Value Analysis (EVA) usually has the aim of estimating, giving a sample of data, the probability (i.e., risk) associated with an event that is extreme in nature. For doing so it usually require three steps:

  1. Individuating the extremes in a given period of time with a given time resolution (it usually boils down to be the annual extremes).

  2. Fit an appropriated statistical distribution to the extreme data.

  3. Estimate the probability of potentially unseen extreme events using the parameters of the distribution.

1.4.1 Individuating the “Extremes”

In order to individuate the extreme in our data we have two options:

  1. Block Maxima (i.e., BM): selecting the maximum value ove a “block of time” (e.g. a year).
  2. Peak Over Threshold (i.e., POT): selecting all the values higher than a given threshold.

Both options have their advantages and dis-advantages. We won’t go into details and just say that we select BM mostly for convenience as it is the methodology that retains most of the data (which is already scarce in our situation). More information can be found in this page.

Figure 6: Example of Bock Maxima

In Figure 6 we can see how the maximum monthly hail sizes get converted in maximum annual hail sizes by applying a block maxima of size 12 months. One of the major advantages of block maxima is that of partially removing the issue of autocorrelation and seasonality as we take exactly one value for each year.

1.4.2 Fitting the Appropriated Distribution

Once we have obtained our samples of extreme values and did all that is in our power to ensure that they are independent and identically distributed (i.i.d.) we can proceed at fitting a distribution to the samples. Which distribution is more suitable depends on the characteristics of the events we are are tying to model.

In the case of extreme events what we are dealing with are tail events better described by skewed distributions with a fat tail. There are 3 types of distribution that are well suited for this: Gumbell, Weibull and Frechet which can be flexibly represented by a single 3 parameters distribution the Generalized Extreme Value distribution (GEV).

The Cumulative Distribution Function (CDF) for the GEV is defined as:

\[ H(x) = \begin{cases} [1 + \xi(\frac{x - \mu}{\sigma})]^{1/ \xi} ,& \text{if } \xi\neq 0 \\ e^{-(\frac{x - \mu}{\sigma})} ,& \text{if } \xi = 0 \\ \end{cases} \]

with \(\mu\) being the location parameter (which controls the center of the distribution), \(\sigma\) being the scale parameter (controlling the spread of the distribution) and \(\xi\) being the shape parameter (which controls the behavior in the tails of the distribution). The CDF will become important later on when we will try to convert the probability associated with a given extreme value in a return period.

There are some important statistical reasons for choosing the GEV distribution for modelling annual maxima of hail events, but in our case we can see this more clearly by looking at Figure 7

Figure 7: Comparing Gaussian and GenExtreme Fit

The figure show both a Gaussian and a GEV fit to annual maxima for hail sizes. Other than the sub-optimal fit for the Gaussian distribution we can see how the tails in particular (where most problematic and truly extreme events concentrate) are heavily discounted.

1.4.3 Estimating the probability and return period for any extreme value

Once we have obtained the parameters of the GEV distribution it is possible to estimate the probability of a new, potentially unseen, hail event as large as a critical value c as simply \(p(event <= c | \mu, \sigma, \xi)\) with \(\mu\), \(\sigma\) and \(\xi\) being estimated parameters. Of course the probability of observing an event as large or larger then a critical value c is simply given by \(1 - p\).

It is also possible to convert this probability value in what is called a “return period” which indicate that on average how much time can pass between an events of certain magnitude or greater. This value is obtained by

\[ T = \frac{1}{1 - p} \]

Given that in our case the time resolution of our extreme events is yearly, we can define each critical value as a “T years event” which indicates that every year there is a \(1-p\) of observing an event as large or larger then the critical value. For example a “100 years event” indicate a probability of 0.01 of observing said event in a given year.

Let’s see how our EVA approach perform at the state level

aggregated_eva_state_modelling = (
    eva_state_modelling_df.groupby(eva_state_modelling_df["begin_date_time"].dt.year)[
        CONTINUOUS_TARGET
    ]
    .max()
    .reset_index()
)
returns = compute_empirical_return_periods(
    aggregated_eva_state_modelling[CONTINUOUS_TARGET].values
)

plt.scatter(
    returns,
    aggregated_eva_state_modelling[CONTINUOUS_TARGET].values,
    s=80,
    facecolors="none",
    edgecolors="k",
    label="Empirical Values",
)
plt.plot(
    np.arange(1, 150),
    compute_mle_return_periods_scipy(
        aggregated_eva_state_modelling[CONTINUOUS_TARGET],
        max_years=150,
        min_years=1,
    ),
    c=COLORMAP(5),
    linewidth=2,
)
plt.grid(alpha=0.5)
plt.title(f"Return Periods\n{SINGLE_STATE[0].capitalize()}")
plt.ylabel("Hail Size")
plt.xlabel("Years")
plt.axvline(100, linestyle="-.", c="r", label="100 Years Event")
plt.axhline(
    CRITICAL_VALUE_CONTINUOUS_TARGET, linestyle=":", c="r", label='1.7" Hail Event'
)
plt.legend()
plt.show()

in this case we used scipy’s genextreme distribution instead of the library pyextreme. We see how the estimated return periods fit nicely with the empirical values.

Estimating return periods at the state level is not very useful as:

  1. US states are rather big.
  2. Over an entire state it is very likely that we will have at least one very large hail event per year. Therefore we might be overestimating the actual county-level risk profile.

We can perform the same type of analysis using county level data

Show supplementary code
models = {
    "empirical": {},
    "MLE_scipy": {},
    "MLE_pyextreme": {},
}
county_names = eva_state_modelling_df["county"].unique()
for county in tqdm(county_names):

    hail_events_extremes = eva_state_modelling_df[
        eva_state_modelling_df["county"] == county
    ][CONTINUOUS_TARGET].values
    time_index = eva_state_modelling_df[eva_state_modelling_df["county"] == county][
        TIME_COLUMN
    ].values

    return_value, model = compute_mle_return_periods_pyextremes(
        hail_events_extremes=pd.Series(data=hail_events_extremes, index=time_index),
    )

    models["empirical"][county] = {
        "period": compute_empirical_return_periods(
            hail_events_extremes=hail_events_extremes,
        ),
        "magnitude": hail_events_extremes,
    }

    models["MLE_scipy"][county] = {
        "period": np.arange(2, 150),
        "magnitude": compute_mle_return_periods_scipy(
            hail_events_extremes=hail_events_extremes,
        ),
    }
    models["MLE_pyextreme"][county] = {
        "period": np.arange(2, 150),
        "magnitude": return_value,
        "model": model,
    }
  0%|          | 0/68 [00:00<?, ?it/s]  4%|▍         | 3/68 [00:00<00:02, 29.28it/s] 10%|█         | 7/68 [00:00<00:01, 32.79it/s] 16%|█▌        | 11/68 [00:00<00:01, 30.70it/s] 22%|██▏       | 15/68 [00:00<00:01, 31.30it/s] 28%|██▊       | 19/68 [00:00<00:01, 26.16it/s] 32%|███▏      | 22/68 [00:00<00:01, 24.75it/s] 37%|███▋      | 25/68 [00:00<00:01, 23.96it/s] 41%|████      | 28/68 [00:01<00:01, 25.31it/s] 47%|████▋     | 32/68 [00:01<00:01, 26.79it/s] 51%|█████▏    | 35/68 [00:01<00:01, 25.40it/s] 57%|█████▋    | 39/68 [00:01<00:01, 25.82it/s] 63%|██████▎   | 43/68 [00:01<00:00, 28.30it/s] 68%|██████▊   | 46/68 [00:01<00:00, 26.31it/s] 74%|███████▎  | 50/68 [00:01<00:00, 29.23it/s] 79%|███████▉  | 54/68 [00:01<00:00, 28.06it/s] 84%|████████▍ | 57/68 [00:02<00:00, 28.27it/s] 88%|████████▊ | 60/68 [00:02<00:00, 28.46it/s] 93%|█████████▎| 63/68 [00:02<00:00, 28.26it/s] 99%|█████████▊| 67/68 [00:02<00:00, 29.60it/s]100%|██████████| 68/68 [00:02<00:00, 27.60it/s]
selected_counties = np.random.choice(county_names, 16)

fig, axs = plt.subplots(4, 4, figsize=(8, 8), sharex=True, sharey=True)
for ax, county in zip(axs.flatten(), selected_counties):

    ax = plot_fit_extreme_values(
        ax=ax,
        category=county,
        model_names=["MLE_pyextreme"],
        models=models,
        critical_magnitude=1.77,
        critical_period=100,
    )

plt.tight_layout()

As we can see the return periods can provide a substatially different profile depending on the county taken into consideration.

1.5 Estimating Expected Extreme Values and Hail Events using Bayesian Regression

Looking at Figure 1 we can hypothesize that the probability of observing a hail event of a certain magnitude can be a function of latitude and longitude. Indeed,the extreme hail events, although scattered, seem to show a certain degree of spatial consistency.

So why not simply empirically derive the likelihood of observing a given hail size in a given county? A bit like whe have been doing in Figure 2. As we can see, the empirical percentiles derived from the observed data are very dis-homogeneous. This is because they are strongly influenced by the data we have available for a given county. According to Figure 2 sample sizes can vary wildly from county to county and make the empirical estimate subject to the influence of outliers.

In this case what we would want to do is to pool information from the entire state for helping estimating a given percentile of hail size in each county, even in those were we observe a limited number of events. On top of that we would also want to smooth out the contribution of outliers. One option in this case could be to fit a regression model on latitude and longitude data so to be able to estimate the hail size for a given event. However, relying a conventional linear regression approach would not be appropriated in our case as:

  1. Estimating the mean as simple least square regression would do is not appropriate as in our case we would be interested in the tail of the distribution.

  2. Assuming the residuals are normally distributed, as it is the case in standard linear regression, does not hold in our case.

  3. Looking at Figure 1 we can see that if a relationship between latitude, longitude and hail size exists, it is certainly not linear.

  4. In order to perform proper risk statements on estimate of hail size we need to be able to quantify uncertainty around our estimets.

  5. Estimating the size of hailstone for extreme events might only solve half of the problem. Since hail events per se are very rare it is also important to estimate their overall likelihood.

Show supplementary code
from scipy.stats import iqr

from sklearn.preprocessing import SplineTransformer, OrdinalEncoder, MinMaxScaler
from sklearn.pipeline import Pipeline

from jax import numpy as jnp
from jax import random

from numpyro.infer.reparam import LocScaleReparam

from numpyro.distributions import (
    Normal,
    AsymmetricLaplaceQuantile,
    HalfNormal,
    HalfCauchy,
    Distribution,
    Laplace,
)
from numpyro.infer import MCMC, NUTS, Predictive, SVI, Trace_ELBO, RenyiELBO
from numpyro.infer.svi import SVIRunResult
from numpyro.infer.autoguide import (
    AutoDAIS,
    AutoNormal,
    AutoLowRankMultivariateNormal,
    AutoMultivariateNormal,
)

RNG_KEY = random.key(seed=666)

def sample_using_mcmc(
    rng_key: ArrayLike,
    model: Callable,
    model_kwargs: Dict[str, Any],
    MCMC_kwargs: Dict[str, Any],
) -> MCMC:
    """Sample from the model using MCMC"""
    rng_key, sub_rng_key = random.split(rng_key)
    kernel = NUTS(model=model)
    mcmc = MCMC(kernel, progress_bar=True, **MCMC_kwargs)
    mcmc.run(sub_rng_key, **model_kwargs)
    mcmc.print_summary()
    return mcmc


def sample_using_svi(
    rng_key: ArrayLike,
    model: Callable,
    autoguide: Any,
    model_kwargs: Dict[str, Any],
    guide_kwargs: Dict[str, Any],
    optimizer_kwargs: Dict[str, Any],
    num_steps: int,
    elbo_tracer: Any = Trace_ELBO,
    num_particles: int = 2
) -> Tuple[SVIRunResult, AutoDAIS]:
    """Sample from the model using variational inference"""
    rng_key, sub_rng_key = random.split(rng_key)
    guide = autoguide(model=model, **guide_kwargs)
    optimizer = numpyro.optim.ClippedAdam(**optimizer_kwargs)
    svi = SVI(model, guide, optimizer, loss=elbo_tracer(num_particles=num_particles))
    svi_result = svi.run(sub_rng_key, num_steps, **model_kwargs)

    fig, ax = plt.subplots()
    ax.plot(svi_result.losses)
    ax.set_title("ELBO loss")
    ax.grid(alpha=0.5)
    plt.show()

    return svi_result, guide


def sample_posterior_predictive_mcmc(
    rng_key: ArrayLike,
    model: Callable,
    posterior_samples: Dict[str, Any],
    model_kwargs: Dict[str, Any],
) -> ArrayLike:
    """Sample from the posterior using MCMC  posterior samples"""
    rng_key, sub_rng_key = random.split(rng_key)
    predictive = Predictive(model, posterior_samples=posterior_samples)
    posterior_predictive = predictive(rng_key=sub_rng_key, **model_kwargs)
    return posterior_predictive


def sample_posterior_predictive_svi(
    rng_key: ArrayLike,
    model: Callable,
    guide: AutoDAIS,
    covariates_hat: Dict[str, ArrayLike],
    svi_result: SVIRunResult,
    num_samples: int,
    model_kwargs: Dict[str, Any],
    return_sites: List[str] = None,
    target: ArrayLike = None,
) -> ArrayLike:
    """Sample from the posterior using SVI inferred parameters"""
    model_kwargs = {key: value for key, value in model_kwargs.items()}
    model_kwargs["target"] = target
    model_kwargs["covariates"] = covariates_hat
    predictive = Predictive(
        model=model,
        guide=guide,
        params=svi_result.params,
        num_samples=num_samples,
        exclude_deterministic=False,
        return_sites=return_sites,
    )
    rng_key, sub_rng_key = random.split(rng_key)
    posterior_predictive = predictive(rng_key=sub_rng_key, **model_kwargs)
    return posterior_predictive


def transform_fitting_covariates(
    covariates: Dict[str, ArrayLike],
    transformers: Dict[str, Any],
) -> Tuple[Dict[str, ArrayLike], Dict[str, Any]]:
    """Fit transformers and transform covariates"""
    transformed_covariates = {}
    for covariate_name, covariate_array in covariates.items():

        transformers[covariate_name].fit(covariate_array)
        transformed_covariates[covariate_name] = transformers[covariate_name].transform(
            covariate_array
        )

    return transformed_covariates, transformers


def transform_estimation_covariates(
    covariates: Dict[str, ArrayLike],
    transformers: Dict[str, Any],
) -> Dict[str, ArrayLike]:
    """Transform covariates using fitted transformers"""
    transformed_covariates = {}
    for covariate_name, covariate_array in covariates.items():

        transformed_covariates[covariate_name] = transformers[covariate_name].transform(
            covariate_array
        )

    return transformed_covariates


def jaxify_array_dictionary(
    array_dictionary: Dict[str, ArrayLike],
) -> Dict[str, ArrayLike]:
    """Turn arrays in a dictionary into JAX arrays"""
    jaxified_array_dictionary = {}
    for key, value in array_dictionary.items():

        jaxified_array_dictionary[key] = jnp.array(value)

    return jaxified_array_dictionary


def prepare_modelling_data(
    covariates: Dict[str, ArrayLike],
    covariates_hat: Dict[str, ArrayLike],
    transformers: Dict[str, Pipeline],
    target: ArrayLike,
) -> Tuple[Dict[str, Pipeline], Dict[str, ArrayLike], Dict[str, ArrayLike], ArrayLike]:
    transformed_covariates, transformers = transform_fitting_covariates(
        covariates=covariates,
        transformers=transformers,
    )
    transformed_covariates_hat = transform_estimation_covariates(
        covariates=covariates_hat,
        transformers=transformers,
    )
    transformed_covariates = jaxify_array_dictionary(
        array_dictionary=transformed_covariates,
    )
    transformed_covariates_hat = jaxify_array_dictionary(
        array_dictionary=transformed_covariates_hat,
    )
    target = jnp.array(target)

    return transformers, transformed_covariates, transformed_covariates_hat, target

def generate_temporal_components(
    posterior: Dict[str, ArrayLike],
    transformers: Dict[str, Pipeline],
    years: ArrayLike,
    suffix: str = "",
    parameter_transformer: Callable = None
) -> Dict[str, ArrayLike]:
    """Generate the temporal components"""
    mapped_dot = vmap(jnp.dot, in_axes=(None, 0))
    covariates = {
        "year_covariates": (years.reshape(-1, 1)),
        "month_covariates": (np.arange(1, 13).reshape(-1, 1)),
        "hour_covariates": (np.arange(24).reshape(-1, 1)),
    }

    covariates = transform_estimation_covariates(
        covariates=covariates, transformers=transformers
    )

    year_component = mapped_dot(
        covariates["year_covariates"], posterior[f"beta_year{suffix}"]
    )
    month_component = mapped_dot(
        covariates["month_covariates"], posterior[f"beta_month{suffix}"]
    )
    hour_component = mapped_dot(
        covariates["hour_covariates"], posterior[f"beta_hour{suffix}"]
    )

    posterior_components = {
        "year_component": year_component,
        "month_component": month_component,
        "hour_component": hour_component,
    }

    if parameter_transformer is not None:

        posterior_components = {
            key: parameter_transformer(value) for key, value in posterior_components.items()
        }

    return posterior_components

def visualize_geo_regression(
    covariates_hat_df: gpd.GeoDataFrame, posterior: Dict[str, ArrayLike], parameter: str, parameter_transformer: Callable = None
) -> Tuple[plt.Figure, Axes]:
    """Visualize the result from the regression on a geodataframe."""
    parameter_value = posterior[parameter]
    if parameter_transformer is not None:
        parameter_value = parameter_transformer(parameter_value)

    covariates_hat_df["2.5%"] = np.percentile(
        parameter_value,
        q=2.5,
        axis=0,
    )
    covariates_hat_df["Median"] = np.percentile(
        parameter_value,
        q=50,
        axis=0,
    )
    covariates_hat_df["97.5%"] = np.percentile(
        parameter_value,
        q=97.5,
        axis=0,
    )
    fig, axs = plt.subplots(1, 3, figsize=(15, 10))
    for ax, column in zip(axs, ["2.5%", "Median", "97.5%"]):

        covariates_hat_df.plot(
            column,
            ax=ax,
            cmap=COLORMAP_NAME,
            legend=True,
            legend_kwds={"shrink": 0.3},
            vmin=np.percentile(parameter_value, q=1),
            vmax=np.percentile(parameter_value, q=99),
        )
        ax.grid(alpha=0.5)
        ax.set_title(column)
        ax.set_ylabel("Longitude")
        ax.set_xlabel("Latitude")

    return fig, axs


def visualize_county_regression(
    modelling_df: pd.DataFrame,
    covariates_hat_df: gpd.GeoDataFrame,
    posterior: Dict[str, ArrayLike],
    parameter: str,
    target: str,
) -> Tuple[plt.Figure, Axes]:
    """Visualize the regression at county level"""
    samples_df = pd.DataFrame(posterior[parameter].T)
    samples_df[COUNTIES_INDEX] = covariates_hat_df[COUNTIES_INDEX].values
    rows = 4
    random_counties = np.random.choice(
        samples_df[COUNTIES_INDEX].unique(), rows**2, replace=False
    )
    fig, axs = plt.subplots(rows, rows, figsize=(10, 10))
    for county, ax in zip(random_counties, axs.flatten()):

        county_target = modelling_df[modelling_df[COUNTIES_INDEX] == county][
            target
        ].values
        county_samples = (
            samples_df[samples_df[COUNTIES_INDEX] == county]
            .drop([COUNTIES_INDEX], axis=1)
            .values.flatten()
        )
        sns.histplot(
            data=county_target,
            element="step",
            fill=False,
            stat="density",
            ax=ax,
            color=COLORMAP(5),
        )

        ax.axvline(np.quantile(county_target, CRITICAL_QUANTILE), c="r", linestyle="--")
        ax.axvspan(
            np.quantile(county_samples, 0.025),
            np.quantile(county_samples, 0.975),
            alpha=0.5,
            color="red",
        )
        ax.grid(alpha=0.5)
        ax.set_ylim(0, None)
    plt.tight_layout()
    return fig, axs


def visualize_temporal_components(temporal_components: Dict[str, ArrayLike]) -> Tuple[plt.Figure, Axes]:
    """Visualize the various temporal components"""
    fig, axs = plt.subplots(1, 3, figsize=(12, 4))
    for ax, component in zip(axs.flatten(), ["year_component", "month_component", "hour_component"]):
        ax.plot(
            np.percentile(
                a=temporal_components[component],
                q=50,
                axis=0
            ),
            c=COLORMAP(5)
        )
        ax.fill_between(
            np.arange(temporal_components[component].shape[1]),
            np.percentile(
                a=temporal_components[component],
                q=2.5,
                axis=0
            ),
            np.percentile(
                a=temporal_components[component],
                q=97.5,
                axis=0
            ),
            color=COLORMAP(1),
            alpha=0.5
        )
        ax.set_title(" ".join(component.split("_")))
        ax.grid(alpha=0.5)
        ax.set_xlabel(component.split("_")[0])
        ax.set_ylabel("Component Contribution")

    plt.tight_layout()
    return fig, axs


def create_tensor_product_spline(first_spline, second_spline):
    return np.tensordot(first_spline, second_spline, axes=0)

1.5.1 Estimating Expected Extreme Values Using Quantile Regression

A potential solution for estimating the hail size of rare (i.e. extreme) events would be to use a quantile regression. Differently from conventional regression, quantile regression can be used for estimating any quantile in a distribution and not just the mean

Example of Quantile Regression

Using a regression approach would allow us to include the type of spatial and temporal effect we outlined during oour exploratory data analysis. Generically the regression would have the following formulation:

\(\mathcal{Q}_{y_i|X_i}(\tau) = \alpha + \beta X_i\)

with \(\tau\) being the desired quantile, \(X_i\) all the spatio-temporal covariates associated with hail events \(y_i\) and \(\alpha\) and \(\beta\) parameters to be estimated. A convenient choice of likelihood function for estimating $\(\mathcal{Q}_{y_i|X_i}(\tau)\) would be the asymetric laplace distribution. Expanding the formulation for quantile regression further, we would have

\(\mathcal{AsymettricLaplace}(y; \mu, \tau) = \alpha + \phi(LatLon_i) \beta_{LatLon} + \phi(hour) \beta_{Hour} + \phi(month) \beta_{month} + \phi(year) \beta_{year}\)

here \(\phi\) indicate a cubic spline function that we apply in order to model non-linearity in the contribution of the various covariates. More precisly, \(\phi(LatLon_i)\) is a tensor-product spline generated from the latitude and longitude splines. \(\phi(hour)\) and \(\phi(month)\) can be thought as seasonal components while \(\phi(year)\) can be identified as a trend component. In or case the only parameter to be estimated is \(\mu\) that indicates the value of \(y\) at quantile \(\tau\).

In order to perform estimation on the entire surface of Florida we opted for an approximation of the spatial covariates by doing the following:

  1. Dividing the entire surface of Florida in a grid with 1 mile resolution (see Figure 8).
Figure 8: The Estimation Grid Used for the Regression
  1. For each square in the grid we obtain its centroid and extract the latitude, longitude and maximum observed hail size for that specific centroid obtaining something like the following table
latitude longitude hail_size
XXX YYY Z
XXX YYY Z

We retained all measurements associated with a given square so that a single square might have more than one hail event associated to it. It goes without saying that the spatial resolution of our regression here would be entirely determined by the size of the squares.

Show supplementary code
def create_estimation_covariates_quantile_regression(
    months: List[int],
    hours: List[int],
    years: List[int],
    geometries_df: gpd.GeoDataFrame,
    edge_size: float,
    states: List[str],
) -> Tuple[gpd.GeoDataFrame, Dict[str, ArrayLike]]:
    """Create the covariates for estimation"""
    quantile_regression_covariates_hat_df = create_covariates_df(
        geometries_df=geometries_df,
        edge_size=edge_size,
        state=states,
        months=months,
        hours=hours,
        years=years,
    )
    quantile_regression_covariates_hat = {
        "latitude_covariates": (
            quantile_regression_covariates_hat_df[LAT_COVARIATES].values.reshape(-1, 1)
        ),
        "longitude_covariates": (
            quantile_regression_covariates_hat_df[LON_COVARIATES].values.reshape(-1, 1)
        ),
        "year_covariates": (
            quantile_regression_covariates_hat_df[YEAR_COVARIATES].values.reshape(-1, 1)
        ),
        "month_covariates": (
            quantile_regression_covariates_hat_df[MONTH_COVARIATE].values.reshape(-1, 1)
        ),
        "hour_covariates": (
            quantile_regression_covariates_hat_df[HOUR_COVARIATE].values.reshape(-1, 1)
        ),
        "counties_index": (
            quantile_regression_covariates_hat_df[COUNTIES_INDEX].values.reshape(-1, 1)
        ),
    }
    return quantile_regression_covariates_hat_df, quantile_regression_covariates_hat
quantile_regression_transformers = {
    "latitude_covariates": Pipeline(
        steps=[
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                    extrapolation="periodic",
                    n_knots=8,
                ),
            )
        ]
    ),
    "longitude_covariates": Pipeline(
        steps=[
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                    extrapolation="periodic",
                    n_knots=8,
                ),
            )
        ]
    ),
    "year_covariates": Pipeline(
        steps=[
            (
                "ordinal_encoder",
                OrdinalEncoder(
                    dtype="int",
                ),
            ),
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                ),
            ),
        ]
    ),
    "month_covariates": Pipeline(
        steps=[
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                ),
            )
        ]
    ),
    "hour_covariates": Pipeline(
        steps=[
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                ),
            )
        ]
    ),
    "counties_index": OrdinalEncoder(
        dtype="int",
    ),
}
quantile_regression_covariates = {
    "latitude_covariates": (
        continuous_state_modelling_df[LAT_COVARIATES].values.reshape(-1, 1)
    ),
    "longitude_covariates": (
        continuous_state_modelling_df[LON_COVARIATES].values.reshape(-1, 1)
    ),
    "year_covariates": (
        continuous_state_modelling_df[YEAR_COVARIATES].values.reshape(-1, 1)
    ),
    "month_covariates": (
        continuous_state_modelling_df[MONTH_COVARIATE].values.reshape(-1, 1)
    ),
    "hour_covariates": (
        continuous_state_modelling_df[HOUR_COVARIATE].values.reshape(-1, 1)
    ),
    "counties_index": (
        continuous_state_modelling_df[COUNTIES_INDEX].values.reshape(-1, 1)
    ),
}
quantile_regression_target = continuous_state_modelling_df[CONTINUOUS_TARGET].values

quantile_regression_covariates_hat_df, quantile_regression_covariates_hat = (
    create_estimation_covariates_quantile_regression(
        months=[5],
        hours=[17],
        years=[2024],
        geometries_df=geometries_df,
        edge_size=DEGREES_PRECISION,
        states=SELECTED_ANALYSIS_STATES,
    )
)

(
    quantile_regression_transformers,
    quantile_regression_covariates,
    quantile_regression_covariates_hat,
    target,
) = prepare_modelling_data(
    covariates=quantile_regression_covariates,
    covariates_hat=quantile_regression_covariates_hat,
    target=quantile_regression_target,
    transformers=quantile_regression_transformers,
)

quantile_regression_covariates["latitude_longitude_tensor_covariates"] = (
    np.vstack(
        [
            create_tensor_product_spline(
                quantile_regression_covariates["latitude_covariates"][i, :], 
                quantile_regression_covariates["longitude_covariates"][i, :]).flatten() for i in range(quantile_regression_covariates["latitude_covariates"].shape[0]
            )
        ]
    )
)
quantile_regression_covariates_hat["latitude_longitude_tensor_covariates"] = (
    np.vstack(
        [
            create_tensor_product_spline(
                quantile_regression_covariates_hat["latitude_covariates"][i, :], 
                quantile_regression_covariates_hat["longitude_covariates"][i, :]).flatten() for i in range(quantile_regression_covariates_hat["latitude_covariates"].shape[0]
            )
        ]
    )
)

Partially Pooled Quantile Regression

Before modelling the spatial components directly (which is a rather expensive process) we attempted an approximated solution by varying the intercept of the model for each county in the state. In order to pool information across the entire state and performing outlier regularization we opted for a partially-pooled model (an amazing resource for learning more about this topic is this Michael Betancourt blog post).

\[ \begin{gather} \color{RedOrange}\sigma_{county} \sim HalfCauchy(\sigma=5) \\ \color{RedOrange}\mu_{county} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{RedOrange}\alpha_{county} \sim \mathcal{N}(\mu_{county}, \sigma_{county}) \\ \beta_{hour} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \beta_{month} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \sigma \sim HalfNormal(1)\\ \mu = exp(\alpha_{county} + \beta_{hour}f(hour) + \beta_{month}f(month) + \beta_{year}f(year))\\ \color{RedOrange} y \sim AsymmetricLapace(\mu, \sigma, \tau=.95) \end{gather} \]

Given the formulation above, it is important to remeber that given the exponential link applied to \(\mu\) the relationship between covariates is multiplicative rather than additive.

We define the model in numpyro and visualize its graphical model representation

reparam_config = {
    "alpha": LocScaleReparam(0),
}


@numpyro.handlers.reparam(config=reparam_config)
def hierarchical_non_spatial_quantile_regression(
    target: ArrayLike,
    covariates: Dict[str, ArrayLike],
    quantile: float,
    prior_mu_alpha: Distribution,
    prior_sigma_alpha: Distribution,
    prior_scale: Distribution,
    prior_beta_year: Distribution,
    prior_beta_month: Distribution,
    prior_beta_hour: Distribution,
) -> None:
    """Quantile regression model with partially pooled intercept"""
    n_groups = len(np.unique(covariates["counties_index"]))
    counties_index = covariates["counties_index"].flatten()

    mu_alpha = numpyro.sample(
        "mu_alpha",
        prior_mu_alpha,
    )
    sigma_alpha = numpyro.sample(
        "sigma_alpha",
        prior_sigma_alpha,
    )

    with numpyro.plate("counties", n_groups):

        alpha = numpyro.sample(
            "alpha",
            Normal(mu_alpha, sigma_alpha),
        )

    beta_hour = numpyro.sample(
        "beta_hour",
        prior_beta_hour.expand([covariates["hour_covariates"].shape[1]]),
    )
    beta_month = numpyro.sample(
        "beta_month",
        prior_beta_month.expand([covariates["month_covariates"].shape[1]]),
    )
    beta_year = numpyro.sample(
        "beta_year",
        prior_beta_year.expand([covariates["year_covariates"].shape[1]]),
    )
    hour_component = numpyro.deterministic(
        name="hour_component",
        value=jnp.dot(covariates["hour_covariates"], beta_hour),
    )
    month_component = numpyro.deterministic(
        name="month_component",
        value=jnp.dot(covariates["month_covariates"], beta_month),
    )
    year_component = numpyro.deterministic(
        name="year_component",
        value=jnp.dot(covariates["year_covariates"], beta_year),
    )
    temporal_component = numpyro.deterministic(
        name="temporal_component",
        value=year_component + month_component + hour_component,
    )
    spatial_component = numpyro.deterministic(
        name="spatial_component",
        value=alpha[counties_index],
    )
    loc = numpyro.deterministic(
        name="loc", 
        value=jnp.exp(spatial_component + temporal_component)
    )
    scale = numpyro.sample(
        "scale",
        prior_scale,
    )
    obs = numpyro.sample(
        "obs",
        AsymmetricLaplaceQuantile(loc=loc, scale=scale, quantile=quantile),
        obs=target,
    )
    if target is not None:
        numpyro.deterministic(
            "log_likelihood",
            AsymmetricLaplaceQuantile(
                loc=loc, 
                scale=scale, 
                quantile=quantile,
            )
            .log_prob(target)
        )

hierarchical_non_spatial_model_parameters = [
    "mu_alpha",
    "sigma_alpha",
    "beta_hour",
    "beta_month",
    "beta_year",
    "loc",
    "alpha",
    "hour_component",
    "month_component",
    "year_component",
    "temporal_component",
    "spatial_component",
    "scale",
    "obs",
]
hierarchical_non_spatial_model_kwargs = {
    "covariates": quantile_regression_covariates,
    "quantile": CRITICAL_QUANTILE,
    "target": quantile_regression_target,
    "prior_mu_alpha": Normal(loc=0.0, scale=5.0),
    "prior_sigma_alpha": HalfCauchy(scale=2.0),
    "prior_beta_year": Normal(loc=0.0, scale=1),
    "prior_beta_month": Normal(loc=0.0, scale=1),
    "prior_beta_hour": Normal(loc=0.0, scale=1),
    "prior_scale": HalfNormal(scale=1),
}
numpyro.render_model(
    hierarchical_non_spatial_quantile_regression,
    model_kwargs=hierarchical_non_spatial_model_kwargs,
    render_distributions=False,
)

Given the size of our data we then proceed at estimating the parameters using variational inference (we recommend this amazing primer on variational bayes by Tamara Broderick).

svi_pooled_quantile_regression_parameters, svi_pooled_quantile_regression_guide = (
    sample_using_svi(
        rng_key=RNG_KEY,
        model=hierarchical_non_spatial_quantile_regression,
        model_kwargs=hierarchical_non_spatial_model_kwargs,
        autoguide=AutoMultivariateNormal,
        guide_kwargs={},
        optimizer_kwargs={"step_size": 1e-4, "clip_norm": 5},
        num_steps=NUMBER_ITERATIONS,
        num_particles=NUMBER_PARTICLES,
    )
)
  0%|          | 0/30000 [00:00<?, ?it/s]  0%|          | 1/30000 [00:00<2:18:58,  3.60it/s]  1%|          | 371/30000 [00:00<00:23, 1257.32it/s]  2%|▏         | 738/30000 [00:00<00:14, 2043.46it/s]  4%|▎         | 1112/30000 [00:00<00:11, 2580.70it/s]  5%|▍         | 1484/30000 [00:00<00:09, 2936.25it/s]  6%|▌         | 1856/30000 [00:00<00:08, 3177.86it/s, init loss: 17996.7832, avg. loss [1-1500]: 18166.5820]  7%|▋         | 2226/30000 [00:00<00:08, 3337.09it/s, init loss: 17996.7832, avg. loss [1-1500]: 18166.5820]  9%|▊         | 2600/30000 [00:00<00:07, 3457.05it/s, init loss: 17996.7832, avg. loss [1-1500]: 18166.5820] 10%|▉         | 2974/30000 [00:01<00:07, 3540.16it/s, init loss: 17996.7832, avg. loss [1-1500]: 18166.5820] 11%|█         | 3347/30000 [00:01<00:07, 3596.63it/s, init loss: 17996.7832, avg. loss [1501-3000]: 17219.1074] 12%|█▏        | 3723/30000 [00:01<00:07, 3643.06it/s, init loss: 17996.7832, avg. loss [1501-3000]: 17219.1074] 14%|█▎        | 4098/30000 [00:01<00:07, 3673.09it/s, init loss: 17996.7832, avg. loss [1501-3000]: 17219.1074] 15%|█▍        | 4470/30000 [00:01<00:06, 3675.83it/s, init loss: 17996.7832, avg. loss [1501-3000]: 17219.1074] 16%|█▌        | 4843/30000 [00:01<00:06, 3690.62it/s, init loss: 17996.7832, avg. loss [3001-4500]: 16186.8682] 17%|█▋        | 5216/30000 [00:01<00:06, 3700.44it/s, init loss: 17996.7832, avg. loss [3001-4500]: 16186.8682] 19%|█▊        | 5592/30000 [00:01<00:06, 3716.71it/s, init loss: 17996.7832, avg. loss [3001-4500]: 16186.8682] 20%|█▉        | 5968/30000 [00:01<00:06, 3727.55it/s, init loss: 17996.7832, avg. loss [3001-4500]: 16186.8682] 21%|██        | 6343/30000 [00:01<00:06, 3732.73it/s, init loss: 17996.7832, avg. loss [4501-6000]: 14817.2910] 22%|██▏       | 6720/30000 [00:02<00:06, 3743.25it/s, init loss: 17996.7832, avg. loss [4501-6000]: 14817.2910] 24%|██▎       | 7095/30000 [00:02<00:06, 3738.36it/s, init loss: 17996.7832, avg. loss [4501-6000]: 14817.2910] 25%|██▍       | 7472/30000 [00:02<00:06, 3745.07it/s, init loss: 17996.7832, avg. loss [4501-6000]: 14817.2910] 26%|██▌       | 7848/30000 [00:02<00:05, 3748.11it/s, init loss: 17996.7832, avg. loss [6001-7500]: 13348.1084] 27%|██▋       | 8223/30000 [00:02<00:05, 3734.42it/s, init loss: 17996.7832, avg. loss [6001-7500]: 13348.1084] 29%|██▊       | 8600/30000 [00:02<00:05, 3744.04it/s, init loss: 17996.7832, avg. loss [6001-7500]: 13348.1084] 30%|██▉       | 8978/30000 [00:02<00:05, 3753.65it/s, init loss: 17996.7832, avg. loss [6001-7500]: 13348.1084] 31%|███       | 9354/30000 [00:02<00:05, 3753.24it/s, init loss: 17996.7832, avg. loss [7501-9000]: 12130.0576] 32%|███▏      | 9731/30000 [00:02<00:05, 3756.53it/s, init loss: 17996.7832, avg. loss [7501-9000]: 12130.0576] 34%|███▎      | 10108/30000 [00:02<00:05, 3759.05it/s, init loss: 17996.7832, avg. loss [7501-9000]: 12130.0576] 35%|███▍      | 10485/30000 [00:03<00:05, 3761.57it/s, init loss: 17996.7832, avg. loss [7501-9000]: 12130.0576] 36%|███▌      | 10862/30000 [00:03<00:05, 3758.24it/s, init loss: 17996.7832, avg. loss [9001-10500]: 11171.9355] 37%|███▋      | 11238/30000 [00:03<00:04, 3758.69it/s, init loss: 17996.7832, avg. loss [9001-10500]: 11171.9355] 39%|███▊      | 11617/30000 [00:03<00:04, 3766.44it/s, init loss: 17996.7832, avg. loss [9001-10500]: 11171.9355] 40%|███▉      | 11994/30000 [00:03<00:04, 3743.76it/s, init loss: 17996.7832, avg. loss [9001-10500]: 11171.9355] 41%|████      | 12369/30000 [00:03<00:04, 3741.28it/s, init loss: 17996.7832, avg. loss [10501-12000]: 10323.4971] 42%|████▏     | 12746/30000 [00:03<00:04, 3748.42it/s, init loss: 17996.7832, avg. loss [10501-12000]: 10323.4971] 44%|████▎     | 13122/30000 [00:03<00:04, 3750.86it/s, init loss: 17996.7832, avg. loss [10501-12000]: 10323.4971] 45%|████▍     | 13498/30000 [00:03<00:04, 3746.73it/s, init loss: 17996.7832, avg. loss [10501-12000]: 10323.4971] 46%|████▌     | 13873/30000 [00:03<00:04, 3741.22it/s, init loss: 17996.7832, avg. loss [12001-13500]: 9599.3545]  48%|████▊     | 14250/30000 [00:04<00:04, 3749.35it/s, init loss: 17996.7832, avg. loss [12001-13500]: 9599.3545] 49%|████▉     | 14625/30000 [00:04<00:04, 3747.41it/s, init loss: 17996.7832, avg. loss [12001-13500]: 9599.3545] 50%|█████     | 15001/30000 [00:04<00:04, 3748.30it/s, init loss: 17996.7832, avg. loss [13501-15000]: 8909.2158] 51%|█████▏    | 15379/30000 [00:04<00:03, 3756.22it/s, init loss: 17996.7832, avg. loss [13501-15000]: 8909.2158] 53%|█████▎    | 15755/30000 [00:04<00:03, 3746.29it/s, init loss: 17996.7832, avg. loss [13501-15000]: 8909.2158] 54%|█████▍    | 16133/30000 [00:04<00:03, 3754.58it/s, init loss: 17996.7832, avg. loss [13501-15000]: 8909.2158] 55%|█████▌    | 16509/30000 [00:04<00:03, 3747.26it/s, init loss: 17996.7832, avg. loss [15001-16500]: 8314.7490] 56%|█████▋    | 16888/30000 [00:04<00:03, 3758.43it/s, init loss: 17996.7832, avg. loss [15001-16500]: 8314.7490] 58%|█████▊    | 17264/30000 [00:04<00:03, 3756.89it/s, init loss: 17996.7832, avg. loss [15001-16500]: 8314.7490] 59%|█████▉    | 17642/30000 [00:04<00:03, 3761.63it/s, init loss: 17996.7832, avg. loss [15001-16500]: 8314.7490] 60%|██████    | 18019/30000 [00:05<00:03, 3759.43it/s, init loss: 17996.7832, avg. loss [16501-18000]: 7742.2466] 61%|██████▏   | 18399/30000 [00:05<00:03, 3768.83it/s, init loss: 17996.7832, avg. loss [16501-18000]: 7742.2466] 63%|██████▎   | 18776/30000 [00:05<00:02, 3768.48it/s, init loss: 17996.7832, avg. loss [16501-18000]: 7742.2466] 64%|██████▍   | 19154/30000 [00:05<00:02, 3771.24it/s, init loss: 17996.7832, avg. loss [16501-18000]: 7742.2466] 65%|██████▌   | 19532/30000 [00:05<00:02, 3758.20it/s, init loss: 17996.7832, avg. loss [18001-19500]: 7233.2441] 66%|██████▋   | 19908/30000 [00:05<00:02, 3748.10it/s, init loss: 17996.7832, avg. loss [18001-19500]: 7233.2441] 68%|██████▊   | 20283/30000 [00:05<00:02, 3741.28it/s, init loss: 17996.7832, avg. loss [18001-19500]: 7233.2441] 69%|██████▉   | 20658/30000 [00:05<00:02, 3742.21it/s, init loss: 17996.7832, avg. loss [18001-19500]: 7233.2441] 70%|███████   | 21033/30000 [00:05<00:02, 3742.47it/s, init loss: 17996.7832, avg. loss [19501-21000]: 6772.8862] 71%|███████▏  | 21409/30000 [00:05<00:02, 3744.82it/s, init loss: 17996.7832, avg. loss [19501-21000]: 6772.8862] 73%|███████▎  | 21786/30000 [00:06<00:02, 3749.85it/s, init loss: 17996.7832, avg. loss [19501-21000]: 6772.8862] 74%|███████▍  | 22161/30000 [00:06<00:02, 3743.82it/s, init loss: 17996.7832, avg. loss [19501-21000]: 6772.8862] 75%|███████▌  | 22536/30000 [00:06<00:01, 3736.20it/s, init loss: 17996.7832, avg. loss [21001-22500]: 6356.0107] 76%|███████▋  | 22910/30000 [00:06<00:01, 3729.36it/s, init loss: 17996.7832, avg. loss [21001-22500]: 6356.0107] 78%|███████▊  | 23283/30000 [00:06<00:01, 3721.46it/s, init loss: 17996.7832, avg. loss [21001-22500]: 6356.0107] 79%|███████▉  | 23658/30000 [00:06<00:01, 3729.53it/s, init loss: 17996.7832, avg. loss [21001-22500]: 6356.0107] 80%|████████  | 24032/30000 [00:06<00:01, 3731.59it/s, init loss: 17996.7832, avg. loss [22501-24000]: 6024.9438] 81%|████████▏ | 24407/30000 [00:06<00:01, 3735.78it/s, init loss: 17996.7832, avg. loss [22501-24000]: 6024.9438] 83%|████████▎ | 24785/30000 [00:06<00:01, 3747.00it/s, init loss: 17996.7832, avg. loss [22501-24000]: 6024.9438] 84%|████████▍ | 25162/30000 [00:07<00:01, 3751.64it/s, init loss: 17996.7832, avg. loss [22501-24000]: 6024.9438] 85%|████████▌ | 25538/30000 [00:07<00:01, 3745.26it/s, init loss: 17996.7832, avg. loss [24001-25500]: 5721.8433] 86%|████████▋ | 25914/30000 [00:07<00:01, 3747.52it/s, init loss: 17996.7832, avg. loss [24001-25500]: 5721.8433] 88%|████████▊ | 26289/30000 [00:07<00:00, 3746.65it/s, init loss: 17996.7832, avg. loss [24001-25500]: 5721.8433] 89%|████████▉ | 26664/30000 [00:07<00:00, 3741.01it/s, init loss: 17996.7832, avg. loss [24001-25500]: 5721.8433] 90%|█████████ | 27039/30000 [00:07<00:00, 3736.13it/s, init loss: 17996.7832, avg. loss [25501-27000]: 5498.3804] 91%|█████████▏| 27415/30000 [00:07<00:00, 3742.90it/s, init loss: 17996.7832, avg. loss [25501-27000]: 5498.3804] 93%|█████████▎| 27791/30000 [00:07<00:00, 3745.01it/s, init loss: 17996.7832, avg. loss [25501-27000]: 5498.3804] 94%|█████████▍| 28166/30000 [00:07<00:00, 3745.78it/s, init loss: 17996.7832, avg. loss [25501-27000]: 5498.3804] 95%|█████████▌| 28541/30000 [00:07<00:00, 3745.28it/s, init loss: 17996.7832, avg. loss [27001-28500]: 5347.0107] 96%|█████████▋| 28916/30000 [00:08<00:00, 3746.02it/s, init loss: 17996.7832, avg. loss [27001-28500]: 5347.0107] 98%|█████████▊| 29292/30000 [00:08<00:00, 3747.91it/s, init loss: 17996.7832, avg. loss [27001-28500]: 5347.0107] 99%|█████████▉| 29667/30000 [00:08<00:00, 3747.37it/s, init loss: 17996.7832, avg. loss [27001-28500]: 5347.0107]100%|██████████| 30000/30000 [00:08<00:00, 3617.57it/s, init loss: 17996.7832, avg. loss [28501-30000]: 5274.9683]

An then sampled from the posterior distribution using previously unseen data (i.e. the covariates we have generated)

posterior_hierarchical_non_spatial_regression_svi = sample_posterior_predictive_svi(
    rng_key=RNG_KEY,
    covariates_hat=quantile_regression_covariates_hat,
    model_kwargs=hierarchical_non_spatial_model_kwargs,
    model=hierarchical_non_spatial_quantile_regression,
    guide=svi_pooled_quantile_regression_guide,
    svi_result=svi_pooled_quantile_regression_parameters,
    num_samples=2000,
    return_sites=hierarchical_non_spatial_model_parameters,
)

This plot shows the estimated intercept values for each county as a crude approxiamtion of a spatial component

fig, axs = visualize_geo_regression(
    covariates_hat_df=quantile_regression_covariates_hat_df,
    posterior=posterior_hierarchical_non_spatial_regression_svi,
    parameter="spatial_component",
    parameter_transformer=lambda x: jnp.exp(x)

)
plt.show()

While this other plots show both the seasonal and trend componets coming from the hour, month and year covariates.

fig, axs = visualize_temporal_components(
    temporal_components=generate_temporal_components(
        posterior=posterior_hierarchical_non_spatial_regression_svi,
        transformers=quantile_regression_transformers,
        years=continuous_state_modelling_df[YEAR_COVARIATES].unique(),
        parameter_transformer=lambda x: jnp.exp(x)
    )
)
plt.show()

Fully Pooled Spatial Quantile Regression

In order to then obtain a finer grain spatial representation we then proceeded at substituting the intercept \(\alpha\) from the previous model with a tensor product spline obtained from latitude and longitude. In the case the \(Laplace\) prior is applied in order to put a strong regularization component on the tensor product spline.

\[ \begin{gather} \color{RedOrange}\beta_{spatial} \sim Laplace(\mu=0, \sigma=1) \\ \beta_{hour} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \beta_{month} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \beta_{year} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \sigma \sim HalfNormal(1)\\ \color{RedOrange}\mu = exp(\beta_{spatial}f(spatial) + \beta_{hour}f(hour) + \beta_{month}f(month) + \beta_{year}f(year))\\ \ y \sim AsymmetricLapace(\mu, \sigma, \tau=.95) \end{gather} \]

def pooled_quantile_regression(
    target: ArrayLike,
    covariates: Dict[str, ArrayLike],
    quantile: float,
    prior_beta_spatial: Distribution,
    prior_beta_month: Distribution,
    prior_beta_year: Distribution,
    prior_beta_hour: Distribution,
    prior_scale: Distribution,
) -> None:
    """Simple quantile regression model"""
    beta_spatial = numpyro.sample(
        "beta_spatial",
        prior_beta_spatial.expand([covariates["latitude_longitude_tensor_covariates"].shape[1]]),
    )
    beta_hour = numpyro.sample(
        "beta_hour",
        prior_beta_hour.expand([covariates["hour_covariates"].shape[1]]),
    )
    beta_month = numpyro.sample(
        "beta_month",
        prior_beta_month.expand([covariates["month_covariates"].shape[1]]),
    )
    beta_year = numpyro.sample(
        "beta_year",
        prior_beta_year.expand([covariates["year_covariates"].shape[1]]),
    )
    scale = numpyro.sample(
        "scale",
        prior_scale,
    )
    hour_component = numpyro.deterministic(
        name="hour_component",
        value=jnp.dot(covariates["hour_covariates"], beta_hour),
    )
    month_component = numpyro.deterministic(
        name="month_component",
        value=jnp.dot(covariates["month_covariates"], beta_month),
    )
    year_component = numpyro.deterministic(
        name="year_component",
        value=jnp.dot(covariates["year_covariates"], beta_year),
    )
    spatial_component = numpyro.deterministic(
        name="spatial_component",
        value=jnp.dot(covariates["latitude_longitude_tensor_covariates"], beta_spatial),
    )
    temporal_component = numpyro.deterministic(
        name="temporal_component",
        value=year_component + month_component + hour_component,
    )
    loc = numpyro.deterministic(
        name="loc",
        value=jnp.exp(spatial_component + temporal_component),
    )
    obs = numpyro.sample(
        "obs",
        AsymmetricLaplaceQuantile(loc=loc, scale=scale, quantile=quantile),
        obs=target,
    )
    if target is not None:
        numpyro.deterministic(
            "log_likelihood",
            AsymmetricLaplaceQuantile(
                loc=loc, 
                scale=scale, 
                quantile=quantile,
            )
            .log_prob(target)
        )

pooled_quantile_model_parameters = [
    "beta_spatial",
    "beta_hour",
    "beta_month",
    "beta_year",
    "loc",
    "spatial_component",
    "hour_component",
    "month_component",
    "year_component",
    "scale",
    "obs",
]
pooled_quantile_model_kwargs = {
    "covariates": quantile_regression_covariates,
    "quantile": CRITICAL_QUANTILE,
    "target": quantile_regression_target,
    "prior_beta_spatial": Laplace(loc=0.0, scale=5),
    "prior_beta_year": Normal(loc=0.0, scale=1),
    "prior_beta_month": Normal(loc=0.0, scale=1),
    "prior_beta_hour": Normal(loc=0.0, scale=1),
    "prior_scale": HalfNormal(scale=1),
}
numpyro.render_model(
    pooled_quantile_regression,
    model_kwargs=pooled_quantile_model_kwargs,
    render_distributions=False,
    render_params=True,
)

Again we proceeded at estimating parameters using variational inference

svi_pooled_quantile_regression_parameters, svi_pooled_quantile_regression_guide = (
    sample_using_svi(
        rng_key=RNG_KEY,
        model=pooled_quantile_regression,
        model_kwargs=pooled_quantile_model_kwargs,
        autoguide=AutoMultivariateNormal,
        guide_kwargs={},
        optimizer_kwargs={"step_size": 1e-4, "clip_norm": 5},
        num_steps=NUMBER_ITERATIONS,
        num_particles=NUMBER_PARTICLES,
    )
)
  0%|          | 0/30000 [00:00<?, ?it/s]  0%|          | 1/30000 [00:00<2:17:44,  3.63it/s]  1%|▏         | 441/30000 [00:00<00:19, 1503.96it/s]  3%|▎         | 887/30000 [00:00<00:11, 2470.99it/s]  4%|▍         | 1334/30000 [00:00<00:09, 3106.30it/s]  6%|▌         | 1779/30000 [00:00<00:08, 3526.35it/s, init loss: 15578.3525, avg. loss [1-1500]: 15073.7188]  7%|▋         | 2223/30000 [00:00<00:07, 3807.46it/s, init loss: 15578.3525, avg. loss [1-1500]: 15073.7188]  9%|▉         | 2670/30000 [00:00<00:06, 4007.52it/s, init loss: 15578.3525, avg. loss [1-1500]: 15073.7188] 10%|█         | 3115/30000 [00:00<00:06, 4141.69it/s, init loss: 15578.3525, avg. loss [1501-3000]: 14348.5107] 12%|█▏        | 3564/30000 [00:01<00:06, 4245.17it/s, init loss: 15578.3525, avg. loss [1501-3000]: 14348.5107] 13%|█▎        | 4013/30000 [00:01<00:06, 4316.25it/s, init loss: 15578.3525, avg. loss [1501-3000]: 14348.5107] 15%|█▍        | 4455/30000 [00:01<00:05, 4320.52it/s, init loss: 15578.3525, avg. loss [1501-3000]: 14348.5107] 16%|█▋        | 4900/30000 [00:01<00:05, 4357.28it/s, init loss: 15578.3525, avg. loss [3001-4500]: 13663.4609] 18%|█▊        | 5345/30000 [00:01<00:05, 4384.63it/s, init loss: 15578.3525, avg. loss [3001-4500]: 13663.4609] 19%|█▉        | 5792/30000 [00:01<00:05, 4409.59it/s, init loss: 15578.3525, avg. loss [3001-4500]: 13663.4609] 21%|██        | 6236/30000 [00:01<00:05, 4418.46it/s, init loss: 15578.3525, avg. loss [4501-6000]: 13008.4922] 22%|██▏       | 6683/30000 [00:01<00:05, 4431.42it/s, init loss: 15578.3525, avg. loss [4501-6000]: 13008.4922] 24%|██▍       | 7130/30000 [00:01<00:05, 4440.83it/s, init loss: 15578.3525, avg. loss [4501-6000]: 13008.4922] 25%|██▌       | 7575/30000 [00:01<00:05, 4430.21it/s, init loss: 15578.3525, avg. loss [6001-7500]: 12346.8115] 27%|██▋       | 8022/30000 [00:02<00:04, 4441.38it/s, init loss: 15578.3525, avg. loss [6001-7500]: 12346.8115] 28%|██▊       | 8470/30000 [00:02<00:04, 4452.18it/s, init loss: 15578.3525, avg. loss [6001-7500]: 12346.8115] 30%|██▉       | 8916/30000 [00:02<00:04, 4437.67it/s, init loss: 15578.3525, avg. loss [6001-7500]: 12346.8115] 31%|███       | 9362/30000 [00:02<00:04, 4442.74it/s, init loss: 15578.3525, avg. loss [7501-9000]: 11720.3867] 33%|███▎      | 9807/30000 [00:02<00:04, 4444.22it/s, init loss: 15578.3525, avg. loss [7501-9000]: 11720.3867] 34%|███▍      | 10252/30000 [00:02<00:04, 4441.60it/s, init loss: 15578.3525, avg. loss [7501-9000]: 11720.3867] 36%|███▌      | 10697/30000 [00:02<00:04, 4442.76it/s, init loss: 15578.3525, avg. loss [9001-10500]: 11099.3535] 37%|███▋      | 11145/30000 [00:02<00:04, 4453.87it/s, init loss: 15578.3525, avg. loss [9001-10500]: 11099.3535] 39%|███▊      | 11593/30000 [00:02<00:04, 4460.81it/s, init loss: 15578.3525, avg. loss [9001-10500]: 11099.3535] 40%|████      | 12040/30000 [00:02<00:04, 4458.79it/s, init loss: 15578.3525, avg. loss [10501-12000]: 10510.1357] 42%|████▏     | 12486/30000 [00:03<00:03, 4449.99it/s, init loss: 15578.3525, avg. loss [10501-12000]: 10510.1357] 43%|████▎     | 12933/30000 [00:03<00:03, 4455.53it/s, init loss: 15578.3525, avg. loss [10501-12000]: 10510.1357] 45%|████▍     | 13379/30000 [00:03<00:03, 4442.84it/s, init loss: 15578.3525, avg. loss [10501-12000]: 10510.1357] 46%|████▌     | 13824/30000 [00:03<00:03, 4442.34it/s, init loss: 15578.3525, avg. loss [12001-13500]: 9912.9385]  48%|████▊     | 14270/30000 [00:03<00:03, 4445.92it/s, init loss: 15578.3525, avg. loss [12001-13500]: 9912.9385] 49%|████▉     | 14717/30000 [00:03<00:03, 4451.29it/s, init loss: 15578.3525, avg. loss [12001-13500]: 9912.9385] 51%|█████     | 15163/30000 [00:03<00:03, 4450.55it/s, init loss: 15578.3525, avg. loss [13501-15000]: 9332.1514] 52%|█████▏    | 15612/30000 [00:03<00:03, 4460.39it/s, init loss: 15578.3525, avg. loss [13501-15000]: 9332.1514] 54%|█████▎    | 16059/30000 [00:03<00:03, 4461.96it/s, init loss: 15578.3525, avg. loss [13501-15000]: 9332.1514] 55%|█████▌    | 16506/30000 [00:03<00:03, 4448.43it/s, init loss: 15578.3525, avg. loss [15001-16500]: 8811.3408] 57%|█████▋    | 16953/30000 [00:04<00:02, 4454.70it/s, init loss: 15578.3525, avg. loss [15001-16500]: 8811.3408] 58%|█████▊    | 17402/30000 [00:04<00:02, 4463.65it/s, init loss: 15578.3525, avg. loss [15001-16500]: 8811.3408] 59%|█████▉    | 17849/30000 [00:04<00:02, 4453.58it/s, init loss: 15578.3525, avg. loss [15001-16500]: 8811.3408] 61%|██████    | 18296/30000 [00:04<00:02, 4457.42it/s, init loss: 15578.3525, avg. loss [16501-18000]: 8286.9307] 62%|██████▏   | 18743/30000 [00:04<00:02, 4459.56it/s, init loss: 15578.3525, avg. loss [16501-18000]: 8286.9307] 64%|██████▍   | 19191/30000 [00:04<00:02, 4463.34it/s, init loss: 15578.3525, avg. loss [16501-18000]: 8286.9307] 65%|██████▌   | 19638/30000 [00:04<00:02, 4462.83it/s, init loss: 15578.3525, avg. loss [18001-19500]: 7789.4985] 67%|██████▋   | 20085/30000 [00:04<00:02, 4453.06it/s, init loss: 15578.3525, avg. loss [18001-19500]: 7789.4985] 68%|██████▊   | 20534/30000 [00:04<00:02, 4463.03it/s, init loss: 15578.3525, avg. loss [18001-19500]: 7789.4985] 70%|██████▉   | 20981/30000 [00:04<00:02, 4463.43it/s, init loss: 15578.3525, avg. loss [18001-19500]: 7789.4985] 71%|███████▏  | 21429/30000 [00:05<00:01, 4466.44it/s, init loss: 15578.3525, avg. loss [19501-21000]: 7327.7246] 73%|███████▎  | 21880/30000 [00:05<00:01, 4477.94it/s, init loss: 15578.3525, avg. loss [19501-21000]: 7327.7246] 74%|███████▍  | 22328/30000 [00:05<00:01, 4458.66it/s, init loss: 15578.3525, avg. loss [19501-21000]: 7327.7246] 76%|███████▌  | 22774/30000 [00:05<00:01, 4455.08it/s, init loss: 15578.3525, avg. loss [21001-22500]: 6878.5444] 77%|███████▋  | 23224/30000 [00:05<00:01, 4465.82it/s, init loss: 15578.3525, avg. loss [21001-22500]: 6878.5444] 79%|███████▉  | 23673/30000 [00:05<00:01, 4472.13it/s, init loss: 15578.3525, avg. loss [21001-22500]: 6878.5444] 80%|████████  | 24121/30000 [00:05<00:01, 4468.61it/s, init loss: 15578.3525, avg. loss [22501-24000]: 6493.1592] 82%|████████▏ | 24570/30000 [00:05<00:01, 4473.24it/s, init loss: 15578.3525, avg. loss [22501-24000]: 6493.1592] 83%|████████▎ | 25018/30000 [00:05<00:01, 4462.53it/s, init loss: 15578.3525, avg. loss [22501-24000]: 6493.1592] 85%|████████▍ | 25466/30000 [00:05<00:01, 4466.12it/s, init loss: 15578.3525, avg. loss [22501-24000]: 6493.1592] 86%|████████▋ | 25914/30000 [00:06<00:00, 4468.87it/s, init loss: 15578.3525, avg. loss [24001-25500]: 6129.1748] 88%|████████▊ | 26363/30000 [00:06<00:00, 4471.92it/s, init loss: 15578.3525, avg. loss [24001-25500]: 6129.1748] 89%|████████▉ | 26811/30000 [00:06<00:00, 4432.19it/s, init loss: 15578.3525, avg. loss [24001-25500]: 6129.1748] 91%|█████████ | 27255/30000 [00:06<00:00, 4424.82it/s, init loss: 15578.3525, avg. loss [25501-27000]: 5826.7012] 92%|█████████▏| 27703/30000 [00:06<00:00, 4438.81it/s, init loss: 15578.3525, avg. loss [25501-27000]: 5826.7012] 94%|█████████▍| 28151/30000 [00:06<00:00, 4450.58it/s, init loss: 15578.3525, avg. loss [25501-27000]: 5826.7012] 95%|█████████▌| 28597/30000 [00:06<00:00, 4446.00it/s, init loss: 15578.3525, avg. loss [27001-28500]: 5573.7202] 97%|█████████▋| 29042/30000 [00:06<00:00, 4428.61it/s, init loss: 15578.3525, avg. loss [27001-28500]: 5573.7202] 98%|█████████▊| 29493/30000 [00:06<00:00, 4450.60it/s, init loss: 15578.3525, avg. loss [27001-28500]: 5573.7202]100%|█████████▉| 29942/30000 [00:07<00:00, 4459.91it/s, init loss: 15578.3525, avg. loss [27001-28500]: 5573.7202]100%|██████████| 30000/30000 [00:07<00:00, 4277.09it/s, init loss: 15578.3525, avg. loss [28501-30000]: 5392.4595]

And we then obtained the posterior samples

posterior_pooled_quantile_regression_svi = sample_posterior_predictive_svi(
    rng_key=RNG_KEY,
    covariates_hat=quantile_regression_covariates_hat,
    model_kwargs=pooled_quantile_model_kwargs,
    model=pooled_quantile_regression,
    guide=svi_pooled_quantile_regression_guide,
    svi_result=svi_pooled_quantile_regression_parameters,
    num_samples=2000,
    return_sites=pooled_quantile_model_parameters,
)

We can see how the spatial component now varies smoothly across the entire state

fig, axs = visualize_geo_regression(
    covariates_hat_df=quantile_regression_covariates_hat_df,
    posterior=posterior_pooled_quantile_regression_svi,
    parameter="spatial_component",
    parameter_transformer=lambda x: jnp.exp(x)
)
plt.show()

We also obtained the same temporal components as before

fig, axs = visualize_temporal_components(
    temporal_components=generate_temporal_components(
        posterior=posterior_pooled_quantile_regression_svi,
        transformers=quantile_regression_transformers,
        years=continuous_state_modelling_df[YEAR_COVARIATES].unique(),
        parameter_transformer=lambda x: jnp.exp(x)
    )
)
plt.show()

Partially Pooled Spatial Quantile Regression

As a final iteration of the quantile regression model we wanted to combine the partially pooled model with the fully pooled spatial model by allowing the spatial component to vary within each county. Although rather expensive (the model needs to estimate number_of_counties * number_spline_features parameters) this approach allows to have spatial effects that are localized to each county. This is an attempt to model the geographical peculiarities that each county might have.

\[ \begin{gather} \color{RedOrange}\sigma_{county} \sim HalfCauchy(\sigma=5) \\ \color{RedOrange}\mu_{county} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{RedOrange}\beta_{spatial,county} \sim Laplace(\mu_{county}, \sigma_{county}) \\ \beta_{hour} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \beta_{month} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \beta_{year} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \sigma \sim HalfNormal(1)\\ \color{RedOrange}\mu = exp(\beta_{spatial,county}f(spatial) + \beta_{hour}f(hour) + \beta_{month}f(month) + \beta_{year}f(year))\\ \ y \sim AsymmetricLapace(\mu, \sigma, \tau=.95) \end{gather} \]

The numpyro model here make use of a double plate notation in order to specify a different beta_spatial for each county

reparam_config = {
    "beta_spatial": LocScaleReparam(0),
}
@numpyro.handlers.reparam(config=reparam_config)
def hierarchical_quantile_regression(
    target: ArrayLike,
    covariates: Dict[str, ArrayLike],
    quantile: float,
    prior_mu_beta_spatial: Distribution,
    prior_sigma_beta_spatial: Distribution,
    prior_scale: Distribution,
    prior_beta_year: Distribution,
    prior_beta_month: Distribution,
    prior_beta_hour: Distribution,
) -> None:
    """Quantile regression model with partially pooled intercept"""
    n_groups = len(np.unique(covariates["counties_index"]))
    n_spatial_covariates = covariates["latitude_longitude_tensor_covariates"].shape[1]
    counties_index = covariates["counties_index"].flatten()

    mu_beta_spatial = numpyro.sample(
        "mu_beta_spatial",
        prior_mu_beta_spatial,
    )
    sigma_beta_spatial = numpyro.sample(
        "sigma_beta_spatial",
        prior_sigma_beta_spatial,
    )
    with numpyro.plate("counties", n_groups, dim=-2):

        with numpyro.plate("spline_coefficients", n_spatial_covariates, dim=-1):

            beta_spatial = numpyro.sample(
                "beta_spatial",
                Laplace(mu_beta_spatial, sigma_beta_spatial),
            )

    beta_hour = numpyro.sample(
        "beta_hour",
        prior_beta_hour.expand([covariates["hour_covariates"].shape[1]]),
    )
    beta_month = numpyro.sample(
        "beta_month",
        prior_beta_month.expand([covariates["month_covariates"].shape[1]]),
    )
    beta_year = numpyro.sample(
        "beta_year",
        prior_beta_year.expand([covariates["year_covariates"].shape[1]]),
    )
    spatial_component = numpyro.deterministic(
        name="spatial_component",
        value=jnp.sum(
            beta_spatial[counties_index, :] * covariates["latitude_longitude_tensor_covariates"],
            axis=1,
        ),
    )
    hour_component = numpyro.deterministic(
        name="hour_component",
        value=jnp.dot(covariates["hour_covariates"], beta_hour),
    )
    month_component = numpyro.deterministic(
        name="month_component",
        value=jnp.dot(covariates["month_covariates"], beta_month),
    )
    year_component = numpyro.deterministic(
        name="year_component",
        value=jnp.dot(covariates["year_covariates"], beta_year),
    )
    temporal_component = numpyro.deterministic(
        name="temporal_component",
        value=year_component + month_component + hour_component,
    )

    loc = numpyro.deterministic(
        name="loc",
        value=jnp.exp(spatial_component + temporal_component),
    )
    scale = numpyro.sample(
        "scale",
        prior_scale,
    )
    obs = numpyro.sample(
        "obs",
        AsymmetricLaplaceQuantile(loc=loc, scale=scale, quantile=quantile),
        obs=target,
    )
    if target is not None:
        numpyro.deterministic(
            "log_likelihood",
            AsymmetricLaplaceQuantile(
                loc=loc, 
                scale=scale, 
                quantile=quantile,
            )
            .log_prob(target)
        )

hierarchical_quantile_model_kwargs = {
    "covariates": quantile_regression_covariates,
    "quantile": CRITICAL_QUANTILE,
    "target": quantile_regression_target,
    "prior_mu_beta_spatial": Normal(loc=0.0, scale=5.0),
    "prior_sigma_beta_spatial": HalfCauchy(scale=2.0),
    "prior_beta_year": Normal(loc=0.0, scale=1),
    "prior_beta_month": Normal(loc=0.0, scale=1),
    "prior_beta_hour": Normal(loc=0.0, scale=1),
    "prior_scale": HalfNormal(scale=1),
}
hierarchical_quantile_model_parameters = [
    "mu_beta_latitude",
    "sigma_beta_latitude",
    "mu_beta_longitude",
    "sigma_beta_longitude",
    "beta_hour",
    "beta_month",
    "beta_year",
    "loc",
    "latitude_component",
    "longitude_component",
    "temporal_component",
    "spatial_component",
    "hour_component",
    "month_component",
    "year_component",
    "scale",
    "obs",
]
numpyro.render_model(
    hierarchical_quantile_regression,
    model_kwargs=hierarchical_quantile_model_kwargs,
    render_distributions=False,
)

(
    svi_hierarchical_quantile_regression_parameters,
    svi_hierarchical_quantile_regression_guide,
) = sample_using_svi(
    rng_key=RNG_KEY,
    model=hierarchical_quantile_regression,
    model_kwargs=hierarchical_quantile_model_kwargs,
    autoguide=AutoLowRankMultivariateNormal,
    guide_kwargs={},
    optimizer_kwargs={"step_size": 1e-4, "clip_norm": 5},
    num_steps=NUMBER_ITERATIONS,
    num_particles=NUMBER_PARTICLES,
)
  0%|          | 0/30000 [00:00<?, ?it/s]  0%|          | 1/30000 [00:00<3:14:01,  2.58it/s]  0%|          | 59/30000 [00:00<03:09, 157.81it/s]  0%|          | 123/30000 [00:00<01:42, 291.60it/s]  1%|          | 183/30000 [00:00<01:18, 379.12it/s]  1%|          | 246/30000 [00:00<01:06, 450.57it/s]  1%|          | 310/30000 [00:00<00:58, 504.29it/s]  1%|          | 371/30000 [00:00<00:55, 533.78it/s]  1%|▏         | 431/30000 [00:01<00:53, 551.87it/s]  2%|▏         | 492/30000 [00:01<00:52, 566.78it/s]  2%|▏         | 554/30000 [00:01<00:50, 582.25it/s]  2%|▏         | 619/30000 [00:01<00:48, 601.72it/s]  2%|▏         | 686/30000 [00:01<00:47, 620.94it/s]  3%|▎         | 754/30000 [00:01<00:45, 637.29it/s]  3%|▎         | 819/30000 [00:01<00:45, 638.96it/s]  3%|▎         | 884/30000 [00:01<00:45, 637.07it/s]  3%|▎         | 950/30000 [00:01<00:45, 643.06it/s]  3%|▎         | 1019/30000 [00:02<00:44, 654.93it/s]  4%|▎         | 1085/30000 [00:02<00:44, 644.19it/s]  4%|▍         | 1150/30000 [00:02<00:44, 645.85it/s]  4%|▍         | 1216/30000 [00:02<00:44, 646.42it/s]  4%|▍         | 1282/30000 [00:02<00:44, 648.86it/s]  4%|▍         | 1350/30000 [00:02<00:43, 656.85it/s]  5%|▍         | 1416/30000 [00:02<00:43, 657.01it/s]  5%|▍         | 1482/30000 [00:02<00:43, 656.24it/s]  5%|▌         | 1548/30000 [00:02<00:43, 654.34it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  5%|▌         | 1615/30000 [00:02<00:43, 658.98it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  6%|▌         | 1681/30000 [00:03<00:43, 651.63it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  6%|▌         | 1747/30000 [00:03<00:43, 652.21it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  6%|▌         | 1815/30000 [00:03<00:42, 659.06it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  6%|▋         | 1882/30000 [00:03<00:42, 662.07it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  7%|▋         | 1951/30000 [00:03<00:41, 669.12it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  7%|▋         | 2018/30000 [00:03<00:42, 658.64it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  7%|▋         | 2084/30000 [00:03<00:43, 643.72it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  7%|▋         | 2149/30000 [00:03<00:43, 634.09it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  7%|▋         | 2213/30000 [00:03<00:44, 622.25it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  8%|▊         | 2276/30000 [00:03<00:44, 619.49it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  8%|▊         | 2341/30000 [00:04<00:44, 626.08it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  8%|▊         | 2404/30000 [00:04<00:45, 609.81it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  8%|▊         | 2466/30000 [00:04<00:46, 592.94it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  8%|▊         | 2526/30000 [00:04<00:46, 590.11it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  9%|▊         | 2588/30000 [00:04<00:45, 597.95it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  9%|▉         | 2648/30000 [00:04<00:45, 594.71it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  9%|▉         | 2708/30000 [00:04<00:46, 593.05it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  9%|▉         | 2773/30000 [00:04<00:44, 609.55it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488]  9%|▉         | 2835/30000 [00:04<00:44, 605.50it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488] 10%|▉         | 2897/30000 [00:04<00:44, 606.82it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488] 10%|▉         | 2958/30000 [00:05<00:44, 605.72it/s, init loss: 23887.5312, avg. loss [1-1500]: 23087.0488] 10%|█         | 3020/30000 [00:05<00:44, 609.27it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 10%|█         | 3082/30000 [00:05<00:44, 610.78it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 11%|█         | 3153/30000 [00:05<00:42, 638.48it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 11%|█         | 3219/30000 [00:05<00:41, 644.20it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 11%|█         | 3289/30000 [00:05<00:40, 660.37it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 11%|█         | 3356/30000 [00:05<00:40, 662.14it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 11%|█▏        | 3423/30000 [00:05<00:40, 663.22it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 12%|█▏        | 3490/30000 [00:05<00:40, 654.35it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 12%|█▏        | 3556/30000 [00:05<00:40, 655.32it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 12%|█▏        | 3627/30000 [00:06<00:39, 669.83it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 12%|█▏        | 3695/30000 [00:06<00:39, 671.50it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 13%|█▎        | 3763/30000 [00:06<00:39, 671.17it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 13%|█▎        | 3831/30000 [00:06<00:39, 658.10it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 13%|█▎        | 3900/30000 [00:06<00:39, 667.24it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 13%|█▎        | 3973/30000 [00:06<00:37, 685.72it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 13%|█▎        | 4042/30000 [00:06<00:38, 681.12it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 14%|█▎        | 4111/30000 [00:06<00:38, 672.97it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 14%|█▍        | 4179/30000 [00:06<00:38, 668.84it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 14%|█▍        | 4246/30000 [00:07<00:38, 668.15it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 14%|█▍        | 4313/30000 [00:07<00:38, 666.64it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 15%|█▍        | 4380/30000 [00:07<00:38, 666.17it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 15%|█▍        | 4447/30000 [00:07<00:39, 654.07it/s, init loss: 23887.5312, avg. loss [1501-3000]: 21405.4805] 15%|█▌        | 4516/30000 [00:07<00:38, 662.67it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 15%|█▌        | 4583/30000 [00:07<00:38, 661.73it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 16%|█▌        | 4650/30000 [00:07<00:39, 643.79it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 16%|█▌        | 4715/30000 [00:07<00:40, 630.34it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 16%|█▌        | 4779/30000 [00:07<00:40, 627.04it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 16%|█▌        | 4844/30000 [00:07<00:39, 632.31it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 16%|█▋        | 4908/30000 [00:08<00:41, 608.90it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 17%|█▋        | 4970/30000 [00:08<00:41, 607.47it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 17%|█▋        | 5032/30000 [00:08<00:41, 608.56it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 17%|█▋        | 5093/30000 [00:08<00:41, 603.28it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 17%|█▋        | 5154/30000 [00:08<00:41, 599.51it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 17%|█▋        | 5218/30000 [00:08<00:40, 609.45it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 18%|█▊        | 5283/30000 [00:08<00:39, 619.66it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 18%|█▊        | 5346/30000 [00:08<00:40, 611.33it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 18%|█▊        | 5408/30000 [00:08<00:40, 608.28it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 18%|█▊        | 5472/30000 [00:08<00:39, 614.77it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 18%|█▊        | 5534/30000 [00:09<00:40, 605.30it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 19%|█▊        | 5596/30000 [00:09<00:40, 608.58it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 19%|█▉        | 5657/30000 [00:09<00:40, 600.51it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 19%|█▉        | 5718/30000 [00:09<00:40, 601.11it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 19%|█▉        | 5779/30000 [00:09<00:40, 602.67it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 19%|█▉        | 5840/30000 [00:09<00:40, 602.80it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 20%|█▉        | 5903/30000 [00:09<00:39, 610.37it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 20%|█▉        | 5965/30000 [00:09<00:39, 609.29it/s, init loss: 23887.5312, avg. loss [3001-4500]: 19691.8047] 20%|██        | 6030/30000 [00:09<00:38, 618.52it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 20%|██        | 6097/30000 [00:10<00:37, 630.30it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 21%|██        | 6161/30000 [00:10<00:37, 628.76it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 21%|██        | 6227/30000 [00:10<00:37, 637.76it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 21%|██        | 6296/30000 [00:10<00:36, 652.73it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 21%|██        | 6366/30000 [00:10<00:35, 664.66it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 21%|██▏       | 6433/30000 [00:10<00:35, 665.48it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 22%|██▏       | 6500/30000 [00:10<00:35, 661.91it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 22%|██▏       | 6567/30000 [00:10<00:35, 662.20it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 22%|██▏       | 6634/30000 [00:10<00:35, 656.55it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 22%|██▏       | 6700/30000 [00:10<00:35, 657.05it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 23%|██▎       | 6766/30000 [00:11<00:35, 654.00it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 23%|██▎       | 6832/30000 [00:11<00:35, 654.20it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 23%|██▎       | 6898/30000 [00:11<00:35, 654.91it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 23%|██▎       | 6964/30000 [00:11<00:35, 646.30it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 23%|██▎       | 7030/30000 [00:11<00:35, 648.93it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 24%|██▎       | 7095/30000 [00:11<00:35, 639.37it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 24%|██▍       | 7159/30000 [00:11<00:35, 635.67it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 24%|██▍       | 7228/30000 [00:11<00:34, 651.21it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 24%|██▍       | 7296/30000 [00:11<00:34, 659.22it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 25%|██▍       | 7365/30000 [00:11<00:34, 665.13it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 25%|██▍       | 7432/30000 [00:12<00:33, 663.99it/s, init loss: 23887.5312, avg. loss [4501-6000]: 17646.3457] 25%|██▌       | 7500/30000 [00:12<00:33, 667.53it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 25%|██▌       | 7567/30000 [00:12<00:34, 655.69it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 25%|██▌       | 7633/30000 [00:12<00:34, 640.13it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 26%|██▌       | 7698/30000 [00:12<00:34, 638.54it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 26%|██▌       | 7762/30000 [00:12<00:35, 626.26it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 26%|██▌       | 7825/30000 [00:12<00:35, 617.89it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 26%|██▋       | 7887/30000 [00:12<00:36, 613.75it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 27%|██▋       | 7952/30000 [00:12<00:35, 623.99it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 27%|██▋       | 8015/30000 [00:12<00:35, 620.88it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 27%|██▋       | 8078/30000 [00:13<00:35, 609.78it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 27%|██▋       | 8141/30000 [00:13<00:35, 612.38it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 27%|██▋       | 8203/30000 [00:13<00:36, 601.19it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 28%|██▊       | 8266/30000 [00:13<00:35, 605.80it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 28%|██▊       | 8328/30000 [00:13<00:35, 608.56it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 28%|██▊       | 8390/30000 [00:13<00:35, 611.51it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 28%|██▊       | 8453/30000 [00:13<00:34, 615.88it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 28%|██▊       | 8515/30000 [00:13<00:35, 612.59it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 29%|██▊       | 8577/30000 [00:13<00:34, 614.38it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 29%|██▉       | 8640/30000 [00:14<00:34, 617.63it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 29%|██▉       | 8702/30000 [00:14<00:34, 609.82it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 29%|██▉       | 8764/30000 [00:14<00:35, 605.09it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 29%|██▉       | 8825/30000 [00:14<00:35, 604.81it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 30%|██▉       | 8886/30000 [00:14<00:35, 602.35it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 30%|██▉       | 8948/30000 [00:14<00:34, 605.97it/s, init loss: 23887.5312, avg. loss [6001-7500]: 15640.4004] 30%|███       | 9012/30000 [00:14<00:34, 612.80it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 30%|███       | 9079/30000 [00:14<00:33, 626.71it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 30%|███       | 9148/30000 [00:14<00:32, 644.94it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 31%|███       | 9216/30000 [00:14<00:31, 653.77it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 31%|███       | 9283/30000 [00:15<00:31, 657.50it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 31%|███       | 9349/30000 [00:15<00:31, 651.00it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 31%|███▏      | 9415/30000 [00:15<00:31, 648.31it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 32%|███▏      | 9484/30000 [00:15<00:31, 659.36it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 32%|███▏      | 9552/30000 [00:15<00:30, 661.63it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 32%|███▏      | 9619/30000 [00:15<00:30, 661.91it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 32%|███▏      | 9686/30000 [00:15<00:30, 661.66it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 33%|███▎      | 9753/30000 [00:15<00:30, 661.94it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 33%|███▎      | 9820/30000 [00:15<00:30, 655.45it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 33%|███▎      | 9888/30000 [00:15<00:30, 662.57it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 33%|███▎      | 9956/30000 [00:16<00:30, 667.39it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 33%|███▎      | 10025/30000 [00:16<00:29, 670.77it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 34%|███▎      | 10093/30000 [00:16<00:29, 667.22it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 34%|███▍      | 10160/30000 [00:16<00:30, 656.82it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 34%|███▍      | 10230/30000 [00:16<00:29, 667.34it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 34%|███▍      | 10300/30000 [00:16<00:29, 674.65it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 35%|███▍      | 10368/30000 [00:16<00:29, 674.26it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 35%|███▍      | 10436/30000 [00:16<00:29, 673.70it/s, init loss: 23887.5312, avg. loss [7501-9000]: 14094.1924] 35%|███▌      | 10504/30000 [00:16<00:29, 653.17it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 35%|███▌      | 10570/30000 [00:16<00:30, 643.26it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 35%|███▌      | 10635/30000 [00:17<00:30, 625.16it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 36%|███▌      | 10698/30000 [00:17<00:31, 619.55it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 36%|███▌      | 10761/30000 [00:17<00:31, 617.82it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 36%|███▌      | 10825/30000 [00:17<00:30, 622.15it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 36%|███▋      | 10888/30000 [00:17<00:30, 617.51it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 37%|███▋      | 10951/30000 [00:17<00:30, 618.72it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 37%|███▋      | 11014/30000 [00:17<00:30, 619.28it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 37%|███▋      | 11076/30000 [00:17<00:30, 615.64it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 37%|███▋      | 11138/30000 [00:17<00:31, 605.85it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 37%|███▋      | 11201/30000 [00:18<00:30, 611.72it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 38%|███▊      | 11263/30000 [00:18<00:30, 604.52it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 38%|███▊      | 11324/30000 [00:18<00:31, 602.08it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 38%|███▊      | 11385/30000 [00:18<00:30, 603.16it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 38%|███▊      | 11446/30000 [00:18<00:30, 601.15it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 38%|███▊      | 11508/30000 [00:18<00:30, 606.24it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 39%|███▊      | 11569/30000 [00:18<00:30, 605.53it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 39%|███▉      | 11632/30000 [00:18<00:30, 610.74it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 39%|███▉      | 11694/30000 [00:18<00:29, 610.93it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 39%|███▉      | 11756/30000 [00:18<00:29, 610.43it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 39%|███▉      | 11818/30000 [00:19<00:30, 597.55it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 40%|███▉      | 11883/30000 [00:19<00:29, 610.96it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 40%|███▉      | 11946/30000 [00:19<00:29, 616.39it/s, init loss: 23887.5312, avg. loss [9001-10500]: 12897.6436] 40%|████      | 12016/30000 [00:19<00:28, 639.33it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 40%|████      | 12081/30000 [00:19<00:27, 641.43it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 41%|████      | 12151/30000 [00:19<00:27, 658.05it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 41%|████      | 12220/30000 [00:19<00:26, 666.02it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 41%|████      | 12291/30000 [00:19<00:26, 675.95it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 41%|████      | 12359/30000 [00:19<00:26, 663.48it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 41%|████▏     | 12430/30000 [00:19<00:26, 674.68it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 42%|████▏     | 12498/30000 [00:20<00:26, 666.25it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 42%|████▏     | 12565/30000 [00:20<00:26, 657.04it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 42%|████▏     | 12631/30000 [00:20<00:26, 650.80it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 42%|████▏     | 12697/30000 [00:20<00:26, 645.53it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 43%|████▎     | 12764/30000 [00:20<00:26, 651.35it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 43%|████▎     | 12832/30000 [00:20<00:26, 657.22it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 43%|████▎     | 12898/30000 [00:20<00:26, 653.44it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 43%|████▎     | 12964/30000 [00:20<00:26, 649.78it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 43%|████▎     | 13029/30000 [00:20<00:26, 647.41it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 44%|████▎     | 13096/30000 [00:20<00:25, 654.08it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 44%|████▍     | 13164/30000 [00:21<00:25, 661.29it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 44%|████▍     | 13231/30000 [00:21<00:25, 657.48it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 44%|████▍     | 13297/30000 [00:21<00:25, 658.17it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 45%|████▍     | 13363/30000 [00:21<00:25, 658.36it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 45%|████▍     | 13429/30000 [00:21<00:25, 640.75it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 45%|████▍     | 13494/30000 [00:21<00:26, 631.97it/s, init loss: 23887.5312, avg. loss [10501-12000]: 11846.8760] 45%|████▌     | 13558/30000 [00:21<00:26, 623.32it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 45%|████▌     | 13621/30000 [00:21<00:26, 621.88it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 46%|████▌     | 13684/30000 [00:21<00:26, 618.90it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 46%|████▌     | 13747/30000 [00:22<00:26, 620.05it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 46%|████▌     | 13810/30000 [00:22<00:26, 613.00it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 46%|████▌     | 13873/30000 [00:22<00:26, 617.46it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 46%|████▋     | 13935/30000 [00:22<00:26, 609.28it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 47%|████▋     | 13997/30000 [00:22<00:26, 610.60it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 47%|████▋     | 14059/30000 [00:22<00:26, 612.30it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 47%|████▋     | 14122/30000 [00:22<00:25, 615.09it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 47%|████▋     | 14184/30000 [00:22<00:26, 606.31it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 47%|████▋     | 14247/30000 [00:22<00:25, 612.61it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 48%|████▊     | 14310/30000 [00:22<00:25, 617.49it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 48%|████▊     | 14372/30000 [00:23<00:25, 613.12it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 48%|████▊     | 14434/30000 [00:23<00:25, 604.46it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 48%|████▊     | 14495/30000 [00:23<00:25, 602.69it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 49%|████▊     | 14559/30000 [00:23<00:25, 612.83it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 49%|████▉     | 14625/30000 [00:23<00:24, 624.04it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 49%|████▉     | 14688/30000 [00:23<00:24, 617.40it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 49%|████▉     | 14750/30000 [00:23<00:24, 612.82it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 49%|████▉     | 14812/30000 [00:23<00:24, 613.06it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 50%|████▉     | 14879/30000 [00:23<00:24, 629.16it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 50%|████▉     | 14947/30000 [00:23<00:23, 643.38it/s, init loss: 23887.5312, avg. loss [12001-13500]: 10887.6797] 50%|█████     | 15012/30000 [00:24<00:23, 644.16it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 50%|█████     | 15081/30000 [00:24<00:22, 656.63it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 50%|█████     | 15147/30000 [00:24<00:22, 653.28it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 51%|█████     | 15213/30000 [00:24<00:22, 649.68it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 51%|█████     | 15281/30000 [00:24<00:22, 657.37it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 51%|█████     | 15351/30000 [00:24<00:21, 669.41it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 51%|█████▏    | 15420/30000 [00:24<00:21, 675.23it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 52%|█████▏    | 15488/30000 [00:24<00:21, 664.30it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 52%|█████▏    | 15561/30000 [00:24<00:21, 680.11it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 52%|█████▏    | 15630/30000 [00:24<00:21, 664.17it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 52%|█████▏    | 15698/30000 [00:25<00:21, 665.60it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 53%|█████▎    | 15766/30000 [00:25<00:21, 668.47it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 53%|█████▎    | 15833/30000 [00:25<00:21, 654.39it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 53%|█████▎    | 15899/30000 [00:25<00:21, 645.05it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 53%|█████▎    | 15964/30000 [00:25<00:21, 645.49it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 53%|█████▎    | 16029/30000 [00:25<00:21, 645.82it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 54%|█████▎    | 16096/30000 [00:25<00:21, 649.50it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 54%|█████▍    | 16161/30000 [00:25<00:21, 645.74it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 54%|█████▍    | 16226/30000 [00:25<00:21, 646.92it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 54%|█████▍    | 16294/30000 [00:25<00:20, 654.75it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 55%|█████▍    | 16360/30000 [00:26<00:22, 618.49it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 55%|█████▍    | 16423/30000 [00:26<00:22, 610.11it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 55%|█████▍    | 16485/30000 [00:26<00:22, 601.82it/s, init loss: 23887.5312, avg. loss [13501-15000]: 10039.1367] 55%|█████▌    | 16546/30000 [00:26<00:22, 602.92it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195]  55%|█████▌    | 16607/30000 [00:26<00:22, 600.13it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 56%|█████▌    | 16668/30000 [00:26<00:22, 595.44it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 56%|█████▌    | 16728/30000 [00:26<00:22, 596.35it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 56%|█████▌    | 16788/30000 [00:26<00:22, 594.15it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 56%|█████▌    | 16848/30000 [00:26<00:22, 589.82it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 56%|█████▋    | 16908/30000 [00:27<00:22, 573.60it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 57%|█████▋    | 16967/30000 [00:27<00:22, 576.45it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 57%|█████▋    | 17026/30000 [00:27<00:22, 577.29it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 57%|█████▋    | 17088/30000 [00:27<00:21, 587.50it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 57%|█████▋    | 17148/30000 [00:27<00:21, 589.94it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 57%|█████▋    | 17208/30000 [00:27<00:21, 590.53it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 58%|█████▊    | 17269/30000 [00:27<00:21, 593.78it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 58%|█████▊    | 17331/30000 [00:27<00:21, 601.35it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 58%|█████▊    | 17399/30000 [00:27<00:20, 624.39it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 58%|█████▊    | 17464/30000 [00:27<00:19, 631.92it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 58%|█████▊    | 17531/30000 [00:28<00:19, 642.61it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 59%|█████▊    | 17599/30000 [00:28<00:18, 653.24it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 59%|█████▉    | 17669/30000 [00:28<00:18, 665.70it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 59%|█████▉    | 17736/30000 [00:28<00:18, 663.66it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 59%|█████▉    | 17803/30000 [00:28<00:18, 658.46it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 60%|█████▉    | 17870/30000 [00:28<00:18, 659.61it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 60%|█████▉    | 17936/30000 [00:28<00:18, 658.90it/s, init loss: 23887.5312, avg. loss [15001-16500]: 9279.5195] 60%|██████    | 18003/30000 [00:28<00:18, 661.55it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 60%|██████    | 18073/30000 [00:28<00:17, 671.80it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 60%|██████    | 18141/30000 [00:28<00:17, 674.12it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 61%|██████    | 18209/30000 [00:29<00:17, 657.93it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 61%|██████    | 18278/30000 [00:29<00:17, 665.90it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 61%|██████    | 18345/30000 [00:29<00:17, 654.56it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 61%|██████▏   | 18411/30000 [00:29<00:17, 650.55it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 62%|██████▏   | 18477/30000 [00:29<00:17, 650.53it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 62%|██████▏   | 18548/30000 [00:29<00:17, 665.87it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 62%|██████▏   | 18615/30000 [00:29<00:17, 659.80it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 62%|██████▏   | 18682/30000 [00:29<00:17, 644.50it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 62%|██████▏   | 18749/30000 [00:29<00:17, 650.11it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 63%|██████▎   | 18815/30000 [00:30<00:17, 650.79it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 63%|██████▎   | 18881/30000 [00:30<00:17, 634.15it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 63%|██████▎   | 18945/30000 [00:30<00:17, 620.81it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 63%|██████▎   | 19008/30000 [00:30<00:18, 604.92it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 64%|██████▎   | 19069/30000 [00:30<00:18, 600.99it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 64%|██████▍   | 19130/30000 [00:30<00:18, 591.07it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 64%|██████▍   | 19190/30000 [00:30<00:18, 593.30it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 64%|██████▍   | 19250/30000 [00:30<00:18, 588.80it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 64%|██████▍   | 19309/30000 [00:30<00:18, 583.93it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 65%|██████▍   | 19368/30000 [00:30<00:18, 578.21it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 65%|██████▍   | 19426/30000 [00:31<00:18, 578.61it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 65%|██████▍   | 19485/30000 [00:31<00:18, 578.63it/s, init loss: 23887.5312, avg. loss [16501-18000]: 8601.1016] 65%|██████▌   | 19546/30000 [00:31<00:17, 584.53it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 65%|██████▌   | 19607/30000 [00:31<00:17, 590.65it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 66%|██████▌   | 19667/30000 [00:31<00:17, 589.10it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 66%|██████▌   | 19726/30000 [00:31<00:17, 588.05it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 66%|██████▌   | 19786/30000 [00:31<00:17, 589.88it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 66%|██████▌   | 19846/30000 [00:31<00:17, 591.42it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 66%|██████▋   | 19906/30000 [00:31<00:17, 581.54it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 67%|██████▋   | 19965/30000 [00:31<00:17, 579.51it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 67%|██████▋   | 20025/30000 [00:32<00:17, 585.28it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 67%|██████▋   | 20084/30000 [00:32<00:17, 580.80it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 67%|██████▋   | 20143/30000 [00:32<00:16, 582.54it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 67%|██████▋   | 20202/30000 [00:32<00:16, 581.61it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 68%|██████▊   | 20267/30000 [00:32<00:16, 599.37it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 68%|██████▊   | 20333/30000 [00:32<00:15, 616.52it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 68%|██████▊   | 20395/30000 [00:32<00:15, 612.47it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 68%|██████▊   | 20463/30000 [00:32<00:15, 630.58it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 68%|██████▊   | 20529/30000 [00:32<00:14, 637.52it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 69%|██████▊   | 20596/30000 [00:32<00:14, 645.67it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 69%|██████▉   | 20661/30000 [00:33<00:14, 640.59it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 69%|██████▉   | 20727/30000 [00:33<00:14, 644.91it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 69%|██████▉   | 20795/30000 [00:33<00:14, 653.50it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 70%|██████▉   | 20865/30000 [00:33<00:13, 665.35it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 70%|██████▉   | 20932/30000 [00:33<00:13, 663.84it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 70%|██████▉   | 20999/30000 [00:33<00:13, 663.34it/s, init loss: 23887.5312, avg. loss [18001-19500]: 7981.3887] 70%|███████   | 21066/30000 [00:33<00:13, 664.87it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 70%|███████   | 21133/30000 [00:33<00:13, 651.65it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 71%|███████   | 21199/30000 [00:33<00:13, 647.19it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 71%|███████   | 21266/30000 [00:34<00:13, 653.28it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 71%|███████   | 21334/30000 [00:34<00:13, 659.03it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 71%|███████▏  | 21403/30000 [00:34<00:12, 667.32it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 72%|███████▏  | 21470/30000 [00:34<00:13, 649.82it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 72%|███████▏  | 21536/30000 [00:34<00:13, 650.70it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 72%|███████▏  | 21602/30000 [00:34<00:12, 648.78it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 72%|███████▏  | 21667/30000 [00:34<00:12, 642.02it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 72%|███████▏  | 21732/30000 [00:34<00:13, 629.54it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 73%|███████▎  | 21796/30000 [00:34<00:13, 613.87it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 73%|███████▎  | 21858/30000 [00:34<00:13, 613.01it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 73%|███████▎  | 21920/30000 [00:35<00:13, 597.01it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 73%|███████▎  | 21980/30000 [00:35<00:13, 594.73it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 73%|███████▎  | 22040/30000 [00:35<00:13, 593.80it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 74%|███████▎  | 22103/30000 [00:35<00:13, 601.08it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 74%|███████▍  | 22164/30000 [00:35<00:12, 603.49it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 74%|███████▍  | 22225/30000 [00:35<00:12, 604.30it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 74%|███████▍  | 22287/30000 [00:35<00:12, 604.51it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 74%|███████▍  | 22348/30000 [00:35<00:12, 596.22it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 75%|███████▍  | 22408/30000 [00:35<00:12, 590.74it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 75%|███████▍  | 22468/30000 [00:35<00:12, 589.90it/s, init loss: 23887.5312, avg. loss [19501-21000]: 7453.0801] 75%|███████▌  | 22530/30000 [00:36<00:12, 595.03it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 75%|███████▌  | 22593/30000 [00:36<00:12, 604.79it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 76%|███████▌  | 22656/30000 [00:36<00:12, 610.12it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 76%|███████▌  | 22718/30000 [00:36<00:12, 600.04it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 76%|███████▌  | 22779/30000 [00:36<00:12, 582.95it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 76%|███████▌  | 22841/30000 [00:36<00:12, 593.49it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 76%|███████▋  | 22901/30000 [00:36<00:12, 577.92it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 77%|███████▋  | 22959/30000 [00:36<00:12, 564.23it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 77%|███████▋  | 23017/30000 [00:36<00:12, 567.93it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 77%|███████▋  | 23074/30000 [00:37<00:12, 563.36it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 77%|███████▋  | 23138/30000 [00:37<00:11, 584.41it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 77%|███████▋  | 23204/30000 [00:37<00:11, 604.91it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 78%|███████▊  | 23269/30000 [00:37<00:10, 617.86it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 78%|███████▊  | 23333/30000 [00:37<00:10, 623.03it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 78%|███████▊  | 23399/30000 [00:37<00:10, 632.19it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 78%|███████▊  | 23464/30000 [00:37<00:10, 635.98it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 78%|███████▊  | 23531/30000 [00:37<00:10, 645.72it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 79%|███████▊  | 23596/30000 [00:37<00:10, 639.82it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 79%|███████▉  | 23664/30000 [00:37<00:09, 649.87it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 79%|███████▉  | 23730/30000 [00:38<00:09, 644.40it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 79%|███████▉  | 23796/30000 [00:38<00:09, 647.21it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 80%|███████▉  | 23861/30000 [00:38<00:09, 646.57it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 80%|███████▉  | 23926/30000 [00:38<00:09, 644.77it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 80%|███████▉  | 23991/30000 [00:38<00:09, 638.81it/s, init loss: 23887.5312, avg. loss [21001-22500]: 6997.2788] 80%|████████  | 24055/30000 [00:38<00:09, 631.72it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 80%|████████  | 24126/30000 [00:38<00:08, 653.11it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 81%|████████  | 24192/30000 [00:38<00:08, 645.84it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 81%|████████  | 24258/30000 [00:38<00:08, 648.50it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 81%|████████  | 24326/30000 [00:38<00:08, 657.60it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 81%|████████▏ | 24396/30000 [00:39<00:08, 667.45it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 82%|████████▏ | 24463/30000 [00:39<00:08, 659.08it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 82%|████████▏ | 24531/30000 [00:39<00:08, 662.97it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 82%|████████▏ | 24598/30000 [00:39<00:08, 623.13it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 82%|████████▏ | 24661/30000 [00:39<00:08, 609.91it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 82%|████████▏ | 24726/30000 [00:39<00:08, 619.86it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 83%|████████▎ | 24789/30000 [00:39<00:08, 593.15it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 83%|████████▎ | 24849/30000 [00:39<00:08, 587.63it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 83%|████████▎ | 24909/30000 [00:39<00:08, 578.89it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 83%|████████▎ | 24968/30000 [00:40<00:08, 574.45it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 83%|████████▎ | 25026/30000 [00:40<00:08, 574.46it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 84%|████████▎ | 25087/30000 [00:40<00:08, 581.55it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 84%|████████▍ | 25147/30000 [00:40<00:08, 585.55it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 84%|████████▍ | 25207/30000 [00:40<00:08, 588.30it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 84%|████████▍ | 25266/30000 [00:40<00:08, 579.26it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 84%|████████▍ | 25328/30000 [00:40<00:07, 589.71it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 85%|████████▍ | 25390/30000 [00:40<00:07, 598.02it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 85%|████████▍ | 25450/30000 [00:40<00:07, 584.24it/s, init loss: 23887.5312, avg. loss [22501-24000]: 6627.1636] 85%|████████▌ | 25511/30000 [00:40<00:07, 588.71it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 85%|████████▌ | 25570/30000 [00:41<00:07, 586.64it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 85%|████████▌ | 25634/30000 [00:41<00:07, 601.12it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 86%|████████▌ | 25703/30000 [00:41<00:06, 625.42it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 86%|████████▌ | 25766/30000 [00:41<00:06, 625.29it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 86%|████████▌ | 25831/30000 [00:41<00:06, 629.43it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 86%|████████▋ | 25903/30000 [00:41<00:06, 652.73it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 87%|████████▋ | 25969/30000 [00:41<00:06, 649.80it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 87%|████████▋ | 26034/30000 [00:41<00:06, 648.78it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 87%|████████▋ | 26099/30000 [00:41<00:06, 644.16it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 87%|████████▋ | 26165/30000 [00:41<00:05, 645.75it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 87%|████████▋ | 26230/30000 [00:42<00:05, 637.86it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 88%|████████▊ | 26294/30000 [00:42<00:05, 636.10it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 88%|████████▊ | 26358/30000 [00:42<00:05, 629.44it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 88%|████████▊ | 26421/30000 [00:42<00:05, 625.21it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 88%|████████▊ | 26488/30000 [00:42<00:05, 637.64it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 89%|████████▊ | 26560/30000 [00:42<00:05, 658.90it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 89%|████████▉ | 26626/30000 [00:42<00:05, 649.11it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 89%|████████▉ | 26691/30000 [00:42<00:05, 642.81it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 89%|████████▉ | 26756/30000 [00:42<00:05, 634.10it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 89%|████████▉ | 26820/30000 [00:43<00:05, 628.37it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 90%|████████▉ | 26885/30000 [00:43<00:04, 633.72it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 90%|████████▉ | 26949/30000 [00:43<00:04, 631.65it/s, init loss: 23887.5312, avg. loss [24001-25500]: 6345.0874] 90%|█████████ | 27013/30000 [00:43<00:04, 629.50it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 90%|█████████ | 27076/30000 [00:43<00:04, 608.07it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 90%|█████████ | 27137/30000 [00:43<00:04, 603.04it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 91%|█████████ | 27198/30000 [00:43<00:04, 585.00it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 91%|█████████ | 27257/30000 [00:43<00:04, 567.24it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 91%|█████████ | 27319/30000 [00:43<00:04, 578.46it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 91%|█████████▏| 27378/30000 [00:43<00:04, 573.43it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 91%|█████████▏| 27436/30000 [00:44<00:04, 568.23it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 92%|█████████▏| 27493/30000 [00:44<00:04, 556.74it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 92%|█████████▏| 27556/30000 [00:44<00:04, 576.19it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 92%|█████████▏| 27616/30000 [00:44<00:04, 581.53it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 92%|█████████▏| 27676/30000 [00:44<00:03, 585.56it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 92%|█████████▏| 27735/30000 [00:44<00:03, 581.48it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 93%|█████████▎| 27794/30000 [00:44<00:03, 581.32it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 93%|█████████▎| 27858/30000 [00:44<00:03, 596.92it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 93%|█████████▎| 27918/30000 [00:44<00:03, 593.06it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 93%|█████████▎| 27979/30000 [00:44<00:03, 596.67it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 93%|█████████▎| 28039/30000 [00:45<00:03, 594.28it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 94%|█████████▎| 28099/30000 [00:45<00:03, 586.67it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 94%|█████████▍| 28160/30000 [00:45<00:03, 592.51it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 94%|█████████▍| 28220/30000 [00:45<00:03, 583.34it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 94%|█████████▍| 28279/30000 [00:45<00:02, 580.70it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 94%|█████████▍| 28340/30000 [00:45<00:02, 586.37it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 95%|█████████▍| 28404/30000 [00:45<00:02, 601.38it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 95%|█████████▍| 28470/30000 [00:45<00:02, 616.70it/s, init loss: 23887.5312, avg. loss [25501-27000]: 6141.3779] 95%|█████████▌| 28540/30000 [00:45<00:02, 639.89it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 95%|█████████▌| 28610/30000 [00:45<00:02, 656.68it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 96%|█████████▌| 28676/30000 [00:46<00:02, 656.63it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 96%|█████████▌| 28742/30000 [00:46<00:01, 656.81it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 96%|█████████▌| 28809/30000 [00:46<00:01, 658.68it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 96%|█████████▋| 28875/30000 [00:46<00:01, 649.51it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 96%|█████████▋| 28940/30000 [00:46<00:01, 649.29it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 97%|█████████▋| 29006/30000 [00:46<00:01, 650.81it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 97%|█████████▋| 29073/30000 [00:46<00:01, 655.05it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 97%|█████████▋| 29139/30000 [00:46<00:01, 649.35it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 97%|█████████▋| 29206/30000 [00:46<00:01, 655.38it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 98%|█████████▊| 29276/30000 [00:47<00:01, 668.29it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 98%|█████████▊| 29344/30000 [00:47<00:00, 669.35it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 98%|█████████▊| 29411/30000 [00:47<00:00, 653.13it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 98%|█████████▊| 29480/30000 [00:47<00:00, 663.62it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 98%|█████████▊| 29547/30000 [00:47<00:00, 646.29it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 99%|█████████▊| 29617/30000 [00:47<00:00, 658.29it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 99%|█████████▉| 29687/30000 [00:47<00:00, 668.74it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 99%|█████████▉| 29755/30000 [00:47<00:00, 671.87it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133] 99%|█████████▉| 29823/30000 [00:47<00:00, 667.55it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133]100%|█████████▉| 29894/30000 [00:47<00:00, 679.86it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133]100%|█████████▉| 29963/30000 [00:48<00:00, 651.55it/s, init loss: 23887.5312, avg. loss [27001-28500]: 6019.6133]100%|██████████| 30000/30000 [00:48<00:00, 623.51it/s, init loss: 23887.5312, avg. loss [28501-30000]: 5914.2822]

posterior_hierarchical_quantile_regression_svi = sample_posterior_predictive_svi(
    rng_key=RNG_KEY,
    covariates_hat=quantile_regression_covariates_hat,
    model_kwargs=hierarchical_quantile_model_kwargs,
    model=hierarchical_quantile_regression,
    guide=svi_hierarchical_quantile_regression_guide,
    svi_result=svi_hierarchical_quantile_regression_parameters,
    num_samples=2000,
    return_sites=hierarchical_quantile_model_parameters,
)

As we can see, the spatial component is now much more heterogenous across counties. Although this is the effect we wanted to achieve, some of the sharp variation at counties’ edges do not look very natural. This is a desirable effect when environmental conditions justify this behaviour (e.g., the presence of mountains or terrain depressions) but it might also be due to the fact that we did not put any constraint forcing coninuity between contiguous boundaries.

visualize_geo_regression(
    covariates_hat_df=quantile_regression_covariates_hat_df,
    posterior=posterior_hierarchical_quantile_regression_svi,
    parameter="spatial_component",
    parameter_transformer=lambda x: jnp.exp(x),
)
plt.show()

fig, axs = visualize_temporal_components(
    temporal_components=generate_temporal_components(
        posterior=posterior_pooled_quantile_regression_svi,
        transformers=quantile_regression_transformers,
        years=continuous_state_modelling_df[YEAR_COVARIATES].unique(),
        parameter_transformer=lambda x: jnp.exp(x),
    )
)
plt.show()

1.5.2 Estimating the Likelyhood of Hail Events Using Zero-Inflated Negative Binomial Regression

A potential solution for estimating the overall likelyhood and quantity of hail events would be to use a zero-inflated negative-binomial regression.

Differently from the quantile regression in this case we did not model the spatial component as it would have been too cumbersome to enumerate all the possible occurencies of non-hail over space and time. We limited at modelling only the temporal component and using a hierarchcical intercept for modelling variations across counties.

The zero-inflated negative binomial likelihood used in this regression is a mixture of a discrete component provided by a negative binomial regression and a gate component provided by a logistic regression. The first models the observed number of hail events while the second defines the overall likelihood of a hail event to happen (we suggest consulting the lecture on mixtures from Richard McElreath statistical rethinking course for a more precise discussion on the topic.)

Show supplementary code
def create_estimation_covariates_zero_inflated_regression(
    quantile_regression_covariates_hat_df: pd.DataFrame,
    geometries_df: gpd.GeoDataFrame,
) -> Tuple[gpd.GeoDataFrame, Dict[str, ArrayLike]]:
    """Create the covariates for estimation"""
    zero_inflated_regression_covariates_df_hat = quantile_regression_covariates_hat_df[
        [YEAR_COVARIATES, MONTH_COVARIATE, HOUR_COVARIATE, COUNTIES_INDEX, STATE_INDEX]
    ].drop_duplicates()

    zero_inflated_regression_covariates_df_hat = gpd.GeoDataFrame(
        pd.merge(
            zero_inflated_regression_covariates_df_hat,
            geometries_df[[COUNTIES_INDEX, STATE_INDEX, "geometry"]],
            how="inner",
            on=[COUNTIES_INDEX, STATE_INDEX],
        )
    )
    zero_inflated_regression_covariates_hat = {
        "year_covariates": (
            zero_inflated_regression_covariates_df_hat[YEAR_COVARIATES].values.reshape(
                -1, 1
            )
        ),
        "month_covariates": (
            zero_inflated_regression_covariates_df_hat[MONTH_COVARIATE].values.reshape(
                -1, 1
            )
        ),
        "hour_covariates": (
            zero_inflated_regression_covariates_df_hat[HOUR_COVARIATE].values.reshape(
                -1, 1
            )
        ),
        "counties_index": (
            zero_inflated_regression_covariates_df_hat[COUNTIES_INDEX].values.reshape(
                -1, 1
            )
        ),
    }
    return (
        zero_inflated_regression_covariates_df_hat,
        zero_inflated_regression_covariates_hat,
    )
zero_inflated_regression_transformers = {
    "year_covariates": Pipeline(
        steps=[
            (
                "ordinal_encoder",
                OrdinalEncoder(
                    dtype="int",
                ),
            ),
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                ),
            ),
        ]
    ),
    "month_covariates": Pipeline(
        steps=[
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                ),
            )
        ]
    ),
    "hour_covariates": Pipeline(
        steps=[
            (
                "spline_transformer",
                SplineTransformer(
                    include_bias=False,
                ),
            )
        ]
    ),
    "counties_index": OrdinalEncoder(
        dtype="int",
    ),
}

zero_inflated_regression_covariates = {
    "year_covariates": (
        count_state_modelling_df[YEAR_COVARIATES].values.reshape(-1, 1)
    ),
    "month_covariates": (
        count_state_modelling_df[MONTH_COVARIATE].values.reshape(-1, 1)
    ),
    "hour_covariates": (count_state_modelling_df[HOUR_COVARIATE].values.reshape(-1, 1)),
    "counties_index": (count_state_modelling_df[COUNTIES_INDEX].values.reshape(-1, 1)),
}
zero_inflated_regression_covariates_hat_df, zero_inflated_regression_covariates_hat = (
    create_estimation_covariates_zero_inflated_regression(
        quantile_regression_covariates_hat_df=quantile_regression_covariates_hat_df,
        geometries_df=geometries_df,
    )
)

(
    zero_inflated_regression_covariates,
    zero_inflated_regression_transformers,
) = transform_fitting_covariates(
    covariates=zero_inflated_regression_covariates,
    transformers=zero_inflated_regression_transformers,
)
zero_inflated_regression_covariates_hat = transform_estimation_covariates(
    covariates=zero_inflated_regression_covariates_hat,
    transformers=zero_inflated_regression_transformers,
)
zero_inflated_regression_target = count_state_modelling_df[COUNT_TARGET].values

Fully Pooled

We first tried a version of the model where the intercept is kept fixed for all the counties, basically without modelling any sort of spatial component.

$$ \[\begin{gather} \color{RedOrange}\alpha_{Gate} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\alpha_{Mean} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \\ \color{RedOrange}\beta_{GateHour} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{RedOrange}\beta_{GateMonth} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{RedOrange}\beta_{GateYear} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\beta_{MeanHour} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\beta_{MeanMonth} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\beta_{MeanYear} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \\ \color{RedOrange}p = expit(\alpha_{Gate} + \beta_{GateHour}f(hour) + \beta_{GateMonth}f(month) + \beta_{GateYear}f(year))\\ \color{NavyBlue}\mu = exp(\alpha_{Mean} + \beta_{MeanHour}f(hour) + \beta_{MeanMonth}f(month) + \beta_{GateYear}f(year))\\ \\ \lambda \sim InverseGamma(0.3, 0.4)\\ \ y \sim ZeroInflatedNegativeBinomial(p, \mu, \lambda) \end{gather}\] $$

Also here like in the quantile regression, the effect of the various components is multiplicative rather additive due to the exponential link used for modelling the mean component in the negative binomial regression.

from jax.scipy.special import expit
from jax import numpy as jnp

from numpyro.distributions import (
    NegativeBinomial2,
    InverseGamma,
    ZeroInflatedDistribution,
)


def pooled_zero_inflated_negative_binomial_regression(
    target: ArrayLike,
    covariates: Dict[str, ArrayLike],
    prior_rate: Distribution,
    prior_alpha_gate: Distribution,
    prior_beta_year_gate: Distribution,
    prior_beta_month_gate: Distribution,
    prior_beta_hour_gate: Distribution,
    prior_alpha_mean: Distribution,
    prior_beta_year_mean: Distribution,
    prior_beta_month_mean: Distribution,
    prior_beta_hour_mean: Distribution,
) -> None:
    """Simple zero-inflated negative binomial regression model"""
    alpha_gate = numpyro.sample(
        "alpha_gate",
        prior_alpha_gate,
    )
    alpha_mean = numpyro.sample(
        "alpha_mean",
        prior_alpha_mean,
    )

    with numpyro.plate(
        "year_spline_coefficients", size=covariates["year_covariates"].shape[1]
    ):
        beta_year_gate = numpyro.sample(
            "beta_year_gate",
            prior_beta_year_gate,
        )
        beta_year_mean = numpyro.sample(
            "beta_year_mean",
            prior_beta_year_mean,
        )

    with numpyro.plate(
        "hour_spline_coefficients", size=covariates["hour_covariates"].shape[1]
    ):
        beta_hour_gate = numpyro.sample(
            "beta_hour_gate",
            prior_beta_hour_gate,
        )
        beta_hour_mean = numpyro.sample(
            "beta_hour_mean",
            prior_beta_hour_mean,
        )

    with numpyro.plate(
        "month_spline_coefficients", size=covariates["month_covariates"].shape[1]
    ):
        beta_month_gate = numpyro.sample(
            "beta_month_gate",
            prior_beta_month_gate,
        )
        beta_month_mean = numpyro.sample(
            "beta_month_mean",
            prior_beta_month_mean,
        )

    # Year component
    year_component_gate = numpyro.deterministic(
        name="year_component_gate",
        value=jnp.dot(covariates["year_covariates"], beta_year_gate),
    )
    year_component_mean = numpyro.deterministic(
        name="year_component_mean",
        value=jnp.dot(covariates["year_covariates"], beta_year_mean),
    )
    # Month component
    month_component_gate = numpyro.deterministic(
        name="month_component_gate",
        value=jnp.dot(covariates["month_covariates"], beta_month_gate),
    )
    month_component_mean = numpyro.deterministic(
        name="month_component_mean",
        value=jnp.dot(covariates["month_covariates"], beta_month_mean),
    )
    # Hour component
    hour_component_gate = numpyro.deterministic(
        name="hour_component_gate",
        value=jnp.dot(covariates["hour_covariates"], beta_hour_gate),
    )
    hour_component_mean = numpyro.deterministic(
        name="hour_component_mean",
        value=jnp.dot(covariates["hour_covariates"], beta_hour_mean),
    )

    # Temporal components
    temporal_component_gate = numpyro.deterministic(
        name="temporal_component_gate",
        value=year_component_gate + month_component_gate + hour_component_gate,
    )
    temporal_component_mean = numpyro.deterministic(
        name="temporal_component_mean",
        value=year_component_mean + month_component_mean + hour_component_mean,
    )

    gate = numpyro.deterministic(
        name="gate",
        value=1 - expit(alpha_gate + temporal_component_gate),
    )
    mean = numpyro.deterministic(
        name="mean",
        value=jnp.exp(alpha_mean + temporal_component_mean),
    )
    rate = numpyro.sample(
        "rate",
        prior_rate,
    )

    obs = numpyro.sample(
        "obs",
        ZeroInflatedDistribution(
            base_dist=NegativeBinomial2(mean, rate),
            gate=gate,
        ),
        obs=target,
    )
    if target is not None:
        numpyro.deterministic(
            "log_likelihood",
            ZeroInflatedDistribution(
                base_dist=NegativeBinomial2(mean, rate),
                gate=gate,
            )
            .log_prob(target)
        )

pooled_zero_inflated_regression_parameters = [
    "rate",
    "alpha_gate",
    "beta_year_gate",
    "beta_month_gate",
    "beta_hour_gate",
    "alpha_mean",
    "beta_year_mean",
    "beta_month_mean",
    "beta_hour_mean",
    "gate",
    "mean",
    "obs",
]
pooled_zero_inflated_regression_kwargs = {
    "covariates": zero_inflated_regression_covariates,
    "target": zero_inflated_regression_target,
    "prior_rate": InverseGamma(0.4, 0.3),
    "prior_alpha_gate": Normal(loc=0.0, scale=1.),
    "prior_beta_year_gate": Normal(loc=0.0, scale=1.),
    "prior_beta_month_gate": Normal(loc=0.0, scale=1.),
    "prior_beta_hour_gate": Normal(loc=0.0, scale=1.),
    "prior_alpha_mean": Normal(loc=0.0, scale=1.),
    "prior_beta_year_mean": Normal(loc=0.0, scale=1.),
    "prior_beta_month_mean": Normal(loc=0.0, scale=1.),
    "prior_beta_hour_mean": Normal(loc=0.0, scale=1.),
}
numpyro.render_model(
    pooled_zero_inflated_negative_binomial_regression,
    model_kwargs=pooled_zero_inflated_regression_kwargs,
    render_distributions=False,
)

(
    svi_pooled_zero_inflated_regression_parameters,
    svi_pooled_zero_inflated_regression_guide,
) = sample_using_svi(
    rng_key=RNG_KEY,
    model=pooled_zero_inflated_negative_binomial_regression,
    model_kwargs=pooled_zero_inflated_regression_kwargs,
    autoguide=AutoLowRankMultivariateNormal,
    guide_kwargs={},
    optimizer_kwargs={"step_size": 1e-4, "clip_norm": 5},
    num_steps=NUMBER_ITERATIONS,
    num_particles=NUMBER_PARTICLES,
)
  0%|          | 0/30000 [00:00<?, ?it/s]  0%|          | 1/30000 [00:00<5:47:44,  1.44it/s]  0%|          | 9/30000 [00:00<34:03, 14.68it/s]    0%|          | 18/30000 [00:00<17:24, 28.70it/s]  0%|          | 27/30000 [00:01<12:11, 40.98it/s]  0%|          | 36/30000 [00:01<09:44, 51.22it/s]  0%|          | 45/30000 [00:01<08:27, 58.98it/s]  0%|          | 54/30000 [00:01<07:38, 65.33it/s]  0%|          | 63/30000 [00:01<07:07, 69.95it/s]  0%|          | 72/30000 [00:01<06:46, 73.59it/s]  0%|          | 81/30000 [00:01<06:32, 76.22it/s]  0%|          | 90/30000 [00:01<06:23, 78.03it/s]  0%|          | 99/30000 [00:01<06:18, 78.97it/s]  0%|          | 108/30000 [00:02<06:15, 79.60it/s]  0%|          | 117/30000 [00:02<06:11, 80.41it/s]  0%|          | 126/30000 [00:02<06:10, 80.74it/s]  0%|          | 135/30000 [00:02<06:07, 81.23it/s]  0%|          | 144/30000 [00:02<06:07, 81.20it/s]  1%|          | 153/30000 [00:02<06:05, 81.60it/s]  1%|          | 162/30000 [00:02<06:03, 82.17it/s]  1%|          | 171/30000 [00:02<06:03, 82.05it/s]  1%|          | 180/30000 [00:02<06:05, 81.66it/s]  1%|          | 189/30000 [00:02<06:04, 81.78it/s]  1%|          | 198/30000 [00:03<06:06, 81.42it/s]  1%|          | 207/30000 [00:03<06:04, 81.71it/s]  1%|          | 216/30000 [00:03<06:03, 81.98it/s]  1%|          | 225/30000 [00:03<06:03, 81.83it/s]  1%|          | 234/30000 [00:03<06:02, 82.11it/s]  1%|          | 243/30000 [00:03<06:00, 82.45it/s]  1%|          | 252/30000 [00:03<06:01, 82.21it/s]  1%|          | 261/30000 [00:03<06:05, 81.46it/s]  1%|          | 270/30000 [00:03<06:05, 81.24it/s]  1%|          | 279/30000 [00:04<06:04, 81.62it/s]  1%|          | 288/30000 [00:04<06:04, 81.49it/s]  1%|          | 297/30000 [00:04<06:04, 81.39it/s]  1%|          | 306/30000 [00:04<06:05, 81.16it/s]  1%|          | 315/30000 [00:04<06:05, 81.29it/s]  1%|          | 324/30000 [00:04<06:04, 81.46it/s]  1%|          | 333/30000 [00:04<06:04, 81.39it/s]  1%|          | 342/30000 [00:04<06:06, 80.96it/s]  1%|          | 351/30000 [00:04<06:05, 81.21it/s]  1%|          | 360/30000 [00:05<06:05, 81.17it/s]  1%|          | 369/30000 [00:05<06:05, 80.97it/s]  1%|▏         | 378/30000 [00:05<06:04, 81.21it/s]  1%|▏         | 387/30000 [00:05<06:04, 81.30it/s]  1%|▏         | 396/30000 [00:05<06:04, 81.26it/s]  1%|▏         | 405/30000 [00:05<06:03, 81.35it/s]  1%|▏         | 414/30000 [00:05<06:02, 81.59it/s]  1%|▏         | 423/30000 [00:05<06:05, 80.97it/s]  1%|▏         | 432/30000 [00:05<06:05, 80.93it/s]  1%|▏         | 441/30000 [00:06<06:04, 81.19it/s]  2%|▏         | 450/30000 [00:06<06:05, 80.88it/s]  2%|▏         | 459/30000 [00:06<06:05, 80.90it/s]  2%|▏         | 468/30000 [00:06<06:06, 80.57it/s]  2%|▏         | 477/30000 [00:06<06:05, 80.73it/s]  2%|▏         | 486/30000 [00:06<06:05, 80.68it/s]  2%|▏         | 495/30000 [00:06<06:06, 80.52it/s]  2%|▏         | 504/30000 [00:06<06:06, 80.52it/s]  2%|▏         | 513/30000 [00:06<06:04, 80.86it/s]  2%|▏         | 522/30000 [00:07<06:03, 81.16it/s]  2%|▏         | 531/30000 [00:07<06:03, 81.05it/s]  2%|▏         | 540/30000 [00:07<06:04, 80.87it/s]  2%|▏         | 549/30000 [00:07<06:04, 80.76it/s]  2%|▏         | 558/30000 [00:07<06:05, 80.46it/s]  2%|▏         | 567/30000 [00:07<06:04, 80.70it/s]  2%|▏         | 576/30000 [00:07<06:03, 81.01it/s]  2%|▏         | 585/30000 [00:07<06:05, 80.48it/s]  2%|▏         | 594/30000 [00:07<06:05, 80.56it/s]  2%|▏         | 603/30000 [00:08<06:04, 80.60it/s]  2%|▏         | 612/30000 [00:08<06:04, 80.55it/s]  2%|▏         | 621/30000 [00:08<06:04, 80.56it/s]  2%|▏         | 630/30000 [00:08<06:06, 80.19it/s]  2%|▏         | 639/30000 [00:08<06:05, 80.32it/s]  2%|▏         | 648/30000 [00:08<06:04, 80.54it/s]  2%|▏         | 657/30000 [00:08<06:03, 80.70it/s]  2%|▏         | 666/30000 [00:08<06:03, 80.62it/s]  2%|▏         | 675/30000 [00:08<06:02, 81.00it/s]  2%|▏         | 684/30000 [00:09<06:03, 80.76it/s]  2%|▏         | 693/30000 [00:09<06:02, 80.83it/s]  2%|▏         | 702/30000 [00:09<06:00, 81.31it/s]  2%|▏         | 711/30000 [00:09<06:01, 80.96it/s]  2%|▏         | 720/30000 [00:09<06:00, 81.18it/s]  2%|▏         | 729/30000 [00:09<06:01, 80.98it/s]  2%|▏         | 738/30000 [00:09<06:00, 81.10it/s]  2%|▏         | 747/30000 [00:09<06:02, 80.76it/s]  3%|▎         | 756/30000 [00:09<06:02, 80.60it/s]  3%|▎         | 765/30000 [00:10<06:02, 80.64it/s]  3%|▎         | 774/30000 [00:10<06:03, 80.51it/s]  3%|▎         | 783/30000 [00:10<06:03, 80.44it/s]  3%|▎         | 792/30000 [00:10<06:03, 80.30it/s]  3%|▎         | 801/30000 [00:10<06:03, 80.25it/s]  3%|▎         | 810/30000 [00:10<06:03, 80.21it/s]  3%|▎         | 819/30000 [00:10<06:03, 80.36it/s]  3%|▎         | 828/30000 [00:10<06:03, 80.23it/s]  3%|▎         | 837/30000 [00:11<06:03, 80.26it/s]  3%|▎         | 846/30000 [00:11<06:04, 79.91it/s]  3%|▎         | 855/30000 [00:11<06:04, 80.06it/s]  3%|▎         | 864/30000 [00:11<06:03, 80.19it/s]  3%|▎         | 873/30000 [00:11<06:04, 80.00it/s]  3%|▎         | 881/30000 [00:11<06:04, 79.95it/s]  3%|▎         | 890/30000 [00:11<06:03, 80.07it/s]  3%|▎         | 899/30000 [00:11<06:04, 79.89it/s]  3%|▎         | 907/30000 [00:11<06:04, 79.86it/s]  3%|▎         | 916/30000 [00:11<06:01, 80.41it/s]  3%|▎         | 925/30000 [00:12<06:00, 80.58it/s]  3%|▎         | 934/30000 [00:12<06:01, 80.35it/s]  3%|▎         | 943/30000 [00:12<06:00, 80.62it/s]  3%|▎         | 952/30000 [00:12<06:00, 80.47it/s]  3%|▎         | 961/30000 [00:12<06:00, 80.53it/s]  3%|▎         | 970/30000 [00:12<06:00, 80.55it/s]  3%|▎         | 979/30000 [00:12<06:01, 80.29it/s]  3%|▎         | 988/30000 [00:12<06:03, 79.87it/s]  3%|▎         | 997/30000 [00:13<06:02, 80.08it/s]  3%|▎         | 1006/30000 [00:13<06:02, 80.05it/s]  3%|▎         | 1015/30000 [00:13<06:01, 80.18it/s]  3%|▎         | 1024/30000 [00:13<05:59, 80.60it/s]  3%|▎         | 1033/30000 [00:13<06:00, 80.35it/s]  3%|▎         | 1042/30000 [00:13<05:59, 80.48it/s]  4%|▎         | 1051/30000 [00:13<05:59, 80.57it/s]  4%|▎         | 1060/30000 [00:13<06:00, 80.27it/s]  4%|▎         | 1069/30000 [00:13<06:01, 79.97it/s]  4%|▎         | 1078/30000 [00:14<06:01, 80.04it/s]  4%|▎         | 1087/30000 [00:14<06:00, 80.24it/s]  4%|▎         | 1096/30000 [00:14<06:00, 80.11it/s]  4%|▎         | 1105/30000 [00:14<06:01, 79.96it/s]  4%|▎         | 1114/30000 [00:14<05:59, 80.26it/s]  4%|▎         | 1123/30000 [00:14<06:00, 80.13it/s]  4%|▍         | 1132/30000 [00:14<06:00, 80.16it/s]  4%|▍         | 1141/30000 [00:14<06:00, 80.07it/s]  4%|▍         | 1150/30000 [00:14<05:59, 80.29it/s]  4%|▍         | 1159/30000 [00:15<05:59, 80.33it/s]  4%|▍         | 1168/30000 [00:15<06:00, 80.08it/s]  4%|▍         | 1177/30000 [00:15<06:00, 80.01it/s]  4%|▍         | 1186/30000 [00:15<05:59, 80.23it/s]  4%|▍         | 1195/30000 [00:15<05:58, 80.24it/s]  4%|▍         | 1204/30000 [00:15<05:58, 80.25it/s]  4%|▍         | 1213/30000 [00:15<05:57, 80.52it/s]  4%|▍         | 1222/30000 [00:15<05:58, 80.31it/s]  4%|▍         | 1231/30000 [00:15<05:59, 79.95it/s]  4%|▍         | 1240/30000 [00:16<05:59, 79.98it/s]  4%|▍         | 1249/30000 [00:16<05:58, 80.19it/s]  4%|▍         | 1258/30000 [00:16<05:58, 80.27it/s]  4%|▍         | 1267/30000 [00:16<05:57, 80.43it/s]  4%|▍         | 1276/30000 [00:16<05:57, 80.42it/s]  4%|▍         | 1285/30000 [00:16<05:56, 80.53it/s]  4%|▍         | 1294/30000 [00:16<05:57, 80.40it/s]  4%|▍         | 1303/30000 [00:16<05:56, 80.44it/s]  4%|▍         | 1312/30000 [00:16<05:59, 79.84it/s]  4%|▍         | 1321/30000 [00:17<05:57, 80.26it/s]  4%|▍         | 1330/30000 [00:17<05:57, 80.30it/s]  4%|▍         | 1339/30000 [00:17<05:55, 80.52it/s]  4%|▍         | 1348/30000 [00:17<05:56, 80.46it/s]  5%|▍         | 1357/30000 [00:17<05:57, 80.07it/s]  5%|▍         | 1366/30000 [00:17<05:57, 80.14it/s]  5%|▍         | 1375/30000 [00:17<05:56, 80.20it/s]  5%|▍         | 1384/30000 [00:17<05:57, 80.10it/s]  5%|▍         | 1393/30000 [00:17<05:57, 79.94it/s]  5%|▍         | 1401/30000 [00:18<05:59, 79.53it/s]  5%|▍         | 1409/30000 [00:18<05:59, 79.56it/s]  5%|▍         | 1418/30000 [00:18<05:58, 79.64it/s]  5%|▍         | 1427/30000 [00:18<05:57, 79.83it/s]  5%|▍         | 1435/30000 [00:18<05:58, 79.79it/s]  5%|▍         | 1443/30000 [00:18<05:57, 79.78it/s]  5%|▍         | 1451/30000 [00:18<05:57, 79.76it/s]  5%|▍         | 1460/30000 [00:18<05:56, 80.07it/s]  5%|▍         | 1469/30000 [00:18<05:57, 79.90it/s]  5%|▍         | 1477/30000 [00:18<05:56, 79.92it/s]  5%|▍         | 1485/30000 [00:19<05:58, 79.65it/s]  5%|▍         | 1494/30000 [00:19<05:56, 79.94it/s]  5%|▌         | 1502/30000 [00:19<05:56, 79.86it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1511/30000 [00:19<05:54, 80.30it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1520/30000 [00:19<05:56, 79.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1529/30000 [00:19<05:55, 79.98it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1538/30000 [00:19<05:54, 80.30it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1547/30000 [00:19<05:59, 79.14it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1555/30000 [00:19<06:00, 78.99it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1564/30000 [00:20<05:58, 79.31it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1573/30000 [00:20<05:57, 79.46it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1581/30000 [00:20<05:57, 79.51it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1590/30000 [00:20<05:54, 80.09it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1599/30000 [00:20<05:53, 80.32it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1608/30000 [00:20<05:54, 80.13it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1617/30000 [00:20<05:55, 79.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1625/30000 [00:20<05:55, 79.81it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1634/30000 [00:20<05:54, 79.93it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  5%|▌         | 1643/30000 [00:21<05:53, 80.11it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1652/30000 [00:21<05:54, 79.94it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1660/30000 [00:21<05:55, 79.74it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1668/30000 [00:21<05:56, 79.51it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1677/30000 [00:21<05:55, 79.75it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1686/30000 [00:21<05:53, 80.09it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1695/30000 [00:21<05:53, 80.01it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1704/30000 [00:21<05:53, 80.13it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1713/30000 [00:21<05:53, 80.07it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1722/30000 [00:22<05:53, 79.99it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1731/30000 [00:22<05:52, 80.13it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1740/30000 [00:22<05:53, 80.00it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1749/30000 [00:22<05:52, 80.11it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1758/30000 [00:22<05:52, 80.10it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1767/30000 [00:22<05:53, 79.94it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1776/30000 [00:22<05:51, 80.20it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1785/30000 [00:22<05:53, 79.89it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1793/30000 [00:22<05:53, 79.83it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1802/30000 [00:23<05:52, 79.93it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1811/30000 [00:23<05:51, 80.28it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1820/30000 [00:23<05:52, 79.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1829/30000 [00:23<05:51, 80.10it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1838/30000 [00:23<05:52, 79.92it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1847/30000 [00:23<05:51, 80.16it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1856/30000 [00:23<05:50, 80.23it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1865/30000 [00:23<05:52, 79.82it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▌         | 1873/30000 [00:23<05:52, 79.75it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1881/30000 [00:24<05:52, 79.69it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1890/30000 [00:24<05:51, 79.98it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1899/30000 [00:24<05:50, 80.20it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1908/30000 [00:24<05:51, 79.96it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1916/30000 [00:24<05:52, 79.77it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1924/30000 [00:24<05:52, 79.57it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1932/30000 [00:24<05:52, 79.60it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1940/30000 [00:24<05:52, 79.63it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  6%|▋         | 1948/30000 [00:24<05:55, 78.96it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 1956/30000 [00:24<05:54, 79.11it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 1964/30000 [00:25<05:53, 79.26it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 1973/30000 [00:25<05:51, 79.70it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 1982/30000 [00:25<05:50, 79.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 1991/30000 [00:25<05:49, 80.08it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2000/30000 [00:25<05:50, 79.81it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2008/30000 [00:25<05:50, 79.82it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2016/30000 [00:25<05:50, 79.82it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2024/30000 [00:25<05:51, 79.58it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2032/30000 [00:25<05:51, 79.60it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2040/30000 [00:26<05:51, 79.52it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2048/30000 [00:26<05:51, 79.50it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2056/30000 [00:26<05:52, 79.36it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2064/30000 [00:26<05:51, 79.39it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2073/30000 [00:26<05:50, 79.71it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2081/30000 [00:26<05:50, 79.56it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2090/30000 [00:26<05:49, 79.87it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2098/30000 [00:26<05:50, 79.61it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2106/30000 [00:26<05:51, 79.42it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2114/30000 [00:26<05:50, 79.47it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2123/30000 [00:27<05:49, 79.75it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2131/30000 [00:27<05:49, 79.81it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2140/30000 [00:27<05:48, 79.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2149/30000 [00:27<05:48, 79.89it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2157/30000 [00:27<05:48, 79.80it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2165/30000 [00:27<05:48, 79.77it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2174/30000 [00:27<05:48, 79.94it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2182/30000 [00:27<05:48, 79.74it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2190/30000 [00:27<05:49, 79.66it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2199/30000 [00:28<05:48, 79.70it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2207/30000 [00:28<05:49, 79.58it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2215/30000 [00:28<05:48, 79.68it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2223/30000 [00:28<05:50, 79.33it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2231/30000 [00:28<05:49, 79.51it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2239/30000 [00:28<05:48, 79.58it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  7%|▋         | 2247/30000 [00:28<05:49, 79.31it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2255/30000 [00:28<05:49, 79.42it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2264/30000 [00:28<05:48, 79.65it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2273/30000 [00:28<05:46, 79.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2281/30000 [00:29<05:47, 79.78it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2289/30000 [00:29<05:47, 79.69it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2298/30000 [00:29<05:46, 79.96it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2307/30000 [00:29<05:45, 80.24it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2316/30000 [00:29<05:44, 80.27it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2325/30000 [00:29<05:43, 80.48it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2334/30000 [00:29<05:44, 80.30it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2343/30000 [00:29<05:45, 80.12it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2352/30000 [00:29<05:45, 80.10it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2361/30000 [00:30<05:45, 80.04it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2370/30000 [00:30<05:45, 79.93it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2378/30000 [00:30<05:46, 79.71it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2387/30000 [00:30<05:45, 79.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2395/30000 [00:30<05:46, 79.75it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2403/30000 [00:30<05:45, 79.81it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2411/30000 [00:30<05:45, 79.77it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2419/30000 [00:30<05:46, 79.65it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2427/30000 [00:30<05:48, 79.20it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2435/30000 [00:30<05:48, 79.16it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2443/30000 [00:31<05:49, 78.88it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2451/30000 [00:31<05:48, 78.97it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2459/30000 [00:31<05:47, 79.19it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2467/30000 [00:31<05:46, 79.39it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2476/30000 [00:31<05:45, 79.64it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2485/30000 [00:31<05:44, 79.78it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2493/30000 [00:31<05:45, 79.56it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2501/30000 [00:31<05:45, 79.61it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2509/30000 [00:31<05:46, 79.31it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2518/30000 [00:32<05:45, 79.65it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2526/30000 [00:32<05:46, 79.22it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2535/30000 [00:32<05:45, 79.44it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  8%|▊         | 2543/30000 [00:32<05:45, 79.56it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2552/30000 [00:32<05:43, 79.82it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2561/30000 [00:32<05:43, 79.85it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2569/30000 [00:32<05:44, 79.65it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2577/30000 [00:32<05:44, 79.66it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2585/30000 [00:32<05:43, 79.75it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2594/30000 [00:32<05:43, 79.90it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2602/30000 [00:33<05:43, 79.72it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2611/30000 [00:33<05:42, 79.95it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▊         | 2620/30000 [00:33<05:41, 80.07it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2629/30000 [00:33<05:41, 80.23it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2638/30000 [00:33<05:41, 80.03it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2647/30000 [00:33<05:42, 79.97it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2655/30000 [00:33<05:42, 79.84it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2663/30000 [00:33<05:45, 79.12it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2671/30000 [00:33<05:44, 79.26it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2679/30000 [00:34<05:45, 79.14it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2687/30000 [00:34<05:44, 79.31it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2695/30000 [00:34<05:43, 79.49it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2703/30000 [00:34<05:43, 79.46it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2712/30000 [00:34<05:41, 79.79it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2721/30000 [00:34<05:41, 79.85it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2730/30000 [00:34<05:40, 80.06it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2739/30000 [00:34<05:40, 79.95it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2747/30000 [00:34<05:42, 79.54it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2755/30000 [00:35<05:43, 79.34it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2764/30000 [00:35<05:42, 79.61it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2772/30000 [00:35<05:42, 79.58it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2780/30000 [00:35<05:41, 79.64it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2788/30000 [00:35<05:41, 79.59it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2797/30000 [00:35<05:40, 79.79it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2805/30000 [00:35<05:42, 79.41it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2813/30000 [00:35<05:41, 79.51it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2821/30000 [00:35<05:43, 79.06it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2830/30000 [00:35<05:42, 79.40it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2838/30000 [00:36<05:41, 79.51it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473]  9%|▉         | 2846/30000 [00:36<05:41, 79.49it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2854/30000 [00:36<05:41, 79.43it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2862/30000 [00:36<05:41, 79.45it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2870/30000 [00:36<05:41, 79.53it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2878/30000 [00:36<05:41, 79.48it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2886/30000 [00:36<05:40, 79.58it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2894/30000 [00:36<05:41, 79.47it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2902/30000 [00:36<05:43, 78.91it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2910/30000 [00:36<05:42, 79.16it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2918/30000 [00:37<05:41, 79.34it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2926/30000 [00:37<05:41, 79.35it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2935/30000 [00:37<05:39, 79.70it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2943/30000 [00:37<05:39, 79.67it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2951/30000 [00:37<05:40, 79.49it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2959/30000 [00:37<05:40, 79.53it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2968/30000 [00:37<05:38, 79.86it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2976/30000 [00:37<05:38, 79.83it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2984/30000 [00:37<05:39, 79.58it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|▉         | 2992/30000 [00:37<05:39, 79.57it/s, init loss: 29902.2090, avg. loss [1-1500]: 27954.4473] 10%|█         | 3000/30000 [00:38<05:39, 79.49it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3009/30000 [00:38<05:38, 79.74it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3018/30000 [00:38<05:38, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3026/30000 [00:38<05:39, 79.49it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3034/30000 [00:38<05:40, 79.31it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3043/30000 [00:38<05:38, 79.62it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3052/30000 [00:38<05:38, 79.72it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3060/30000 [00:38<05:40, 79.12it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3068/30000 [00:38<05:40, 79.12it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3077/30000 [00:39<05:39, 79.40it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3085/30000 [00:39<05:38, 79.46it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3093/30000 [00:39<05:39, 79.31it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3101/30000 [00:39<05:39, 79.27it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3109/30000 [00:39<05:38, 79.48it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3118/30000 [00:39<05:37, 79.73it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3126/30000 [00:39<05:36, 79.80it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3135/30000 [00:39<05:36, 79.83it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 10%|█         | 3143/30000 [00:39<05:38, 79.41it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3152/30000 [00:40<05:36, 79.73it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3160/30000 [00:40<05:36, 79.66it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3169/30000 [00:40<05:36, 79.81it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3178/30000 [00:40<05:35, 80.05it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3187/30000 [00:40<05:36, 79.77it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3195/30000 [00:40<05:36, 79.75it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3203/30000 [00:40<05:35, 79.81it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3211/30000 [00:40<05:35, 79.83it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3219/30000 [00:40<05:37, 79.36it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3227/30000 [00:40<05:36, 79.52it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3236/30000 [00:41<05:35, 79.70it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3245/30000 [00:41<05:34, 79.96it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3254/30000 [00:41<05:33, 80.19it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3263/30000 [00:41<05:34, 79.92it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3272/30000 [00:41<05:33, 80.09it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3281/30000 [00:41<05:33, 80.11it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3290/30000 [00:41<05:34, 79.82it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3298/30000 [00:41<05:34, 79.71it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3306/30000 [00:41<05:34, 79.77it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3314/30000 [00:42<05:35, 79.65it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3322/30000 [00:42<05:35, 79.58it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3330/30000 [00:42<05:36, 79.34it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3338/30000 [00:42<05:37, 79.11it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3346/30000 [00:42<05:37, 79.05it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3354/30000 [00:42<05:35, 79.32it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3362/30000 [00:42<05:35, 79.42it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█         | 3370/30000 [00:42<05:35, 79.26it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3378/30000 [00:42<05:35, 79.32it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3386/30000 [00:42<05:35, 79.38it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3394/30000 [00:43<05:35, 79.33it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3402/30000 [00:43<05:34, 79.41it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3410/30000 [00:43<05:34, 79.46it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3418/30000 [00:43<05:34, 79.44it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3427/30000 [00:43<05:32, 79.82it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3435/30000 [00:43<05:33, 79.65it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 11%|█▏        | 3444/30000 [00:43<05:32, 79.83it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3452/30000 [00:43<05:32, 79.77it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3460/30000 [00:43<05:35, 79.18it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3468/30000 [00:43<05:35, 78.98it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3476/30000 [00:44<05:40, 77.84it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3484/30000 [00:44<05:39, 78.01it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3492/30000 [00:44<05:38, 78.30it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3500/30000 [00:44<05:36, 78.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3508/30000 [00:44<05:36, 78.82it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3516/30000 [00:44<05:35, 79.01it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3525/30000 [00:44<05:33, 79.47it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3533/30000 [00:44<05:33, 79.47it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3541/30000 [00:44<05:33, 79.27it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3549/30000 [00:45<05:33, 79.42it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3557/30000 [00:45<05:34, 79.04it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3565/30000 [00:45<05:34, 79.08it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3573/30000 [00:45<05:34, 79.06it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3581/30000 [00:45<05:34, 79.04it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3589/30000 [00:45<05:33, 79.18it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3598/30000 [00:45<05:32, 79.48it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3606/30000 [00:45<05:33, 79.19it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3614/30000 [00:45<05:34, 78.94it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3623/30000 [00:45<05:32, 79.39it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3631/30000 [00:46<05:32, 79.42it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3639/30000 [00:46<05:31, 79.40it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3647/30000 [00:46<05:31, 79.43it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3655/30000 [00:46<05:32, 79.21it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3663/30000 [00:46<05:33, 78.97it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3671/30000 [00:46<05:32, 79.24it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3680/30000 [00:46<05:30, 79.62it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3689/30000 [00:46<05:29, 79.74it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3697/30000 [00:46<05:30, 79.62it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3705/30000 [00:46<05:30, 79.60it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3713/30000 [00:47<05:30, 79.63it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3721/30000 [00:47<05:30, 79.44it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3729/30000 [00:47<05:30, 79.38it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3737/30000 [00:47<05:30, 79.43it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 12%|█▏        | 3745/30000 [00:47<05:30, 79.45it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3753/30000 [00:47<05:30, 79.47it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3762/30000 [00:47<05:28, 79.77it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3771/30000 [00:47<05:27, 80.02it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3779/30000 [00:47<05:29, 79.49it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3787/30000 [00:48<05:29, 79.51it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3796/30000 [00:48<05:28, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3804/30000 [00:48<05:28, 79.63it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3812/30000 [00:48<05:28, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3821/30000 [00:48<05:28, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3830/30000 [00:48<05:27, 79.89it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3839/30000 [00:48<05:27, 80.00it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3847/30000 [00:48<05:27, 79.96it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3855/30000 [00:48<05:28, 79.68it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3864/30000 [00:48<05:27, 79.79it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3872/30000 [00:49<05:27, 79.72it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3880/30000 [00:49<05:27, 79.76it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3888/30000 [00:49<05:27, 79.76it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3896/30000 [00:49<05:27, 79.77it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3905/30000 [00:49<05:26, 80.00it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3913/30000 [00:49<05:26, 79.84it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3922/30000 [00:49<05:26, 79.92it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3930/30000 [00:49<05:26, 79.86it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3938/30000 [00:49<05:27, 79.66it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3946/30000 [00:49<05:27, 79.65it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3955/30000 [00:50<05:26, 79.75it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3963/30000 [00:50<05:26, 79.64it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3971/30000 [00:50<05:27, 79.55it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3980/30000 [00:50<05:26, 79.72it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3989/30000 [00:50<05:25, 79.92it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 3997/30000 [00:50<05:25, 79.86it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 4005/30000 [00:50<05:25, 79.84it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 4014/30000 [00:50<05:24, 79.98it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 4023/30000 [00:50<05:24, 80.03it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 4032/30000 [00:51<05:25, 79.75it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 4040/30000 [00:51<05:26, 79.60it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 13%|█▎        | 4048/30000 [00:51<05:26, 79.52it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4057/30000 [00:51<05:24, 79.89it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4065/30000 [00:51<05:25, 79.79it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4073/30000 [00:51<05:25, 79.56it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4082/30000 [00:51<05:24, 79.78it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4090/30000 [00:51<05:24, 79.77it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4098/30000 [00:51<05:25, 79.60it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4107/30000 [00:52<05:24, 79.86it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4115/30000 [00:52<05:25, 79.63it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▎        | 4123/30000 [00:52<05:24, 79.73it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4131/30000 [00:52<05:25, 79.58it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4139/30000 [00:52<05:24, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4147/30000 [00:52<05:25, 79.35it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4156/30000 [00:52<05:24, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4164/30000 [00:52<05:25, 79.39it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4172/30000 [00:52<05:25, 79.41it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4180/30000 [00:52<05:26, 79.09it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4188/30000 [00:53<05:25, 79.19it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4196/30000 [00:53<05:25, 79.32it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4205/30000 [00:53<05:24, 79.49it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4214/30000 [00:53<05:23, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4223/30000 [00:53<05:22, 79.87it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4231/30000 [00:53<05:23, 79.76it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4239/30000 [00:53<05:22, 79.77it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4248/30000 [00:53<05:22, 79.88it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4256/30000 [00:53<05:23, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4264/30000 [00:53<05:23, 79.64it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4272/30000 [00:54<05:24, 79.36it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4280/30000 [00:54<05:23, 79.43it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4289/30000 [00:54<05:22, 79.78it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4297/30000 [00:54<05:22, 79.62it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4305/30000 [00:54<05:22, 79.69it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4313/30000 [00:54<05:23, 79.31it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4322/30000 [00:54<05:22, 79.70it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4330/30000 [00:54<05:22, 79.70it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4338/30000 [00:54<05:22, 79.52it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 14%|█▍        | 4347/30000 [00:55<05:20, 79.94it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4355/30000 [00:55<05:21, 79.66it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4363/30000 [00:55<05:21, 79.63it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4371/30000 [00:55<05:21, 79.59it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4380/30000 [00:55<05:20, 79.99it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4388/30000 [00:55<05:20, 79.85it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4396/30000 [00:55<05:22, 79.51it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4404/30000 [00:55<05:22, 79.33it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4412/30000 [00:55<05:22, 79.23it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4420/30000 [00:55<05:22, 79.41it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4428/30000 [00:56<05:21, 79.53it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4436/30000 [00:56<05:21, 79.61it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4444/30000 [00:56<05:20, 79.67it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4452/30000 [00:56<05:20, 79.63it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4460/30000 [00:56<05:20, 79.63it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4468/30000 [00:56<05:20, 79.57it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4476/30000 [00:56<05:20, 79.55it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4485/30000 [00:56<05:20, 79.65it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▍        | 4493/30000 [00:56<05:20, 79.46it/s, init loss: 29902.2090, avg. loss [1501-3000]: 24615.1016] 15%|█▌        | 4502/30000 [00:56<05:20, 79.65it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4510/30000 [00:57<05:19, 79.71it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4518/30000 [00:57<05:21, 79.19it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4527/30000 [00:57<05:20, 79.60it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4536/30000 [00:57<05:18, 79.83it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4544/30000 [00:57<05:20, 79.34it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4553/30000 [00:57<05:19, 79.71it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4561/30000 [00:57<05:19, 79.56it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4569/30000 [00:57<05:19, 79.62it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4577/30000 [00:57<05:19, 79.55it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4586/30000 [00:58<05:18, 79.70it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4595/30000 [00:58<05:18, 79.82it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4603/30000 [00:58<05:18, 79.69it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4612/30000 [00:58<05:17, 79.89it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4620/30000 [00:58<05:18, 79.67it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4629/30000 [00:58<05:17, 79.80it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4637/30000 [00:58<05:18, 79.61it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 15%|█▌        | 4646/30000 [00:58<05:17, 79.92it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4654/30000 [00:58<05:17, 79.77it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4662/30000 [00:58<05:17, 79.80it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4671/30000 [00:59<05:16, 80.06it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4680/30000 [00:59<05:16, 79.92it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4688/30000 [00:59<05:17, 79.68it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4696/30000 [00:59<05:17, 79.61it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4704/30000 [00:59<05:18, 79.53it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4713/30000 [00:59<05:17, 79.71it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4721/30000 [00:59<05:17, 79.70it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4729/30000 [00:59<05:17, 79.59it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4737/30000 [00:59<05:18, 79.40it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4746/30000 [01:00<05:17, 79.62it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4754/30000 [01:00<05:16, 79.69it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4762/30000 [01:00<05:16, 79.64it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4770/30000 [01:00<05:16, 79.64it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4779/30000 [01:00<05:15, 79.96it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4787/30000 [01:00<05:15, 79.85it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4795/30000 [01:00<05:16, 79.76it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4803/30000 [01:00<05:16, 79.71it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4811/30000 [01:00<05:16, 79.60it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4819/30000 [01:00<05:16, 79.64it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4827/30000 [01:01<05:16, 79.44it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4836/30000 [01:01<05:16, 79.63it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4845/30000 [01:01<05:15, 79.76it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4853/30000 [01:01<05:15, 79.81it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4861/30000 [01:01<05:16, 79.44it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▌        | 4869/30000 [01:01<05:16, 79.40it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4878/30000 [01:01<05:14, 79.91it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4887/30000 [01:01<05:13, 80.00it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4895/30000 [01:01<05:14, 79.85it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4904/30000 [01:02<05:14, 79.92it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4912/30000 [01:02<05:13, 79.92it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4920/30000 [01:02<05:15, 79.49it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4928/30000 [01:02<05:15, 79.56it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4936/30000 [01:02<05:16, 79.22it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 16%|█▋        | 4944/30000 [01:02<05:18, 78.76it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 4952/30000 [01:02<05:17, 78.96it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 4960/30000 [01:02<05:17, 78.98it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 4968/30000 [01:02<05:15, 79.28it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 4976/30000 [01:02<05:15, 79.24it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 4984/30000 [01:03<05:16, 79.15it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 4993/30000 [01:03<05:14, 79.51it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5002/30000 [01:03<05:12, 79.88it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5011/30000 [01:03<05:12, 80.05it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5020/30000 [01:03<05:12, 79.99it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5028/30000 [01:03<05:12, 79.95it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5037/30000 [01:03<05:11, 80.05it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5046/30000 [01:03<05:12, 79.98it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5054/30000 [01:03<05:12, 79.80it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5062/30000 [01:04<05:13, 79.51it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5071/30000 [01:04<05:12, 79.73it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5080/30000 [01:04<05:12, 79.84it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5088/30000 [01:04<05:12, 79.81it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5096/30000 [01:04<05:19, 77.95it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5104/30000 [01:04<05:37, 73.86it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5112/30000 [01:04<05:31, 75.02it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5121/30000 [01:04<05:23, 76.94it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5129/30000 [01:04<05:20, 77.65it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5138/30000 [01:04<05:17, 78.41it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5146/30000 [01:05<05:17, 78.40it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5154/30000 [01:05<05:15, 78.74it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5163/30000 [01:05<05:13, 79.14it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5171/30000 [01:05<05:13, 79.30it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5180/30000 [01:05<05:10, 79.84it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5188/30000 [01:05<05:11, 79.63it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5196/30000 [01:05<05:11, 79.52it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5204/30000 [01:05<05:11, 79.62it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5212/30000 [01:05<05:11, 79.53it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5221/30000 [01:06<05:10, 79.84it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5230/30000 [01:06<05:09, 80.00it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5238/30000 [01:06<05:10, 79.84it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 17%|█▋        | 5247/30000 [01:06<05:09, 80.03it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5256/30000 [01:06<05:09, 79.87it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5264/30000 [01:06<05:10, 79.54it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5273/30000 [01:06<05:10, 79.69it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5281/30000 [01:06<05:10, 79.52it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5289/30000 [01:06<05:13, 78.93it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5297/30000 [01:06<05:13, 78.87it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5305/30000 [01:07<05:12, 78.94it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5313/30000 [01:07<05:13, 78.71it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5322/30000 [01:07<05:12, 79.04it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5330/30000 [01:07<05:11, 79.21it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5338/30000 [01:07<05:12, 79.02it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5346/30000 [01:07<05:11, 79.13it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5354/30000 [01:07<05:12, 78.76it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5362/30000 [01:07<05:11, 79.04it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5370/30000 [01:07<05:13, 78.54it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5379/30000 [01:08<05:11, 79.16it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5388/30000 [01:08<05:10, 79.30it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5396/30000 [01:08<05:10, 79.34it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5404/30000 [01:08<05:09, 79.46it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5412/30000 [01:08<05:11, 78.99it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5420/30000 [01:08<05:14, 78.06it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5428/30000 [01:08<05:16, 77.72it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5436/30000 [01:08<05:16, 77.60it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5444/30000 [01:08<05:17, 77.32it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5452/30000 [01:08<05:17, 77.28it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5461/30000 [01:09<05:14, 78.13it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5469/30000 [01:09<05:14, 77.91it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5477/30000 [01:09<05:12, 78.39it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5486/30000 [01:09<05:16, 77.49it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5494/30000 [01:09<05:14, 77.90it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5503/30000 [01:09<05:11, 78.67it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5511/30000 [01:09<05:12, 78.40it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5519/30000 [01:09<05:10, 78.81it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5527/30000 [01:09<05:12, 78.29it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5535/30000 [01:10<05:12, 78.18it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 18%|█▊        | 5543/30000 [01:10<05:12, 78.16it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5551/30000 [01:10<05:13, 78.00it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5559/30000 [01:10<05:11, 78.49it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5567/30000 [01:10<05:10, 78.69it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5575/30000 [01:10<05:11, 78.38it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5583/30000 [01:10<05:11, 78.49it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5591/30000 [01:10<05:10, 78.65it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5599/30000 [01:10<05:10, 78.70it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5607/30000 [01:10<05:08, 78.97it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5615/30000 [01:11<05:09, 78.90it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▊        | 5623/30000 [01:11<05:07, 79.19it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5632/30000 [01:11<05:06, 79.48it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5641/30000 [01:11<05:05, 79.70it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5650/30000 [01:11<05:03, 80.22it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5659/30000 [01:11<05:02, 80.40it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5668/30000 [01:11<05:02, 80.47it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5677/30000 [01:11<05:03, 80.09it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5686/30000 [01:11<05:06, 79.30it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5695/30000 [01:12<05:05, 79.66it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5703/30000 [01:12<05:05, 79.40it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5711/30000 [01:12<05:05, 79.54it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5719/30000 [01:12<05:04, 79.61it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5727/30000 [01:12<05:04, 79.64it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5736/30000 [01:12<05:04, 79.80it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5744/30000 [01:12<05:04, 79.60it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5752/30000 [01:12<05:06, 79.02it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5760/30000 [01:12<05:07, 78.91it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5768/30000 [01:12<05:06, 78.95it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5776/30000 [01:13<05:06, 79.04it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5784/30000 [01:13<05:05, 79.32it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5792/30000 [01:13<05:05, 79.29it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5800/30000 [01:13<05:05, 79.19it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5808/30000 [01:13<05:05, 79.19it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5816/30000 [01:13<05:04, 79.39it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5824/30000 [01:13<05:03, 79.56it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5833/30000 [01:13<05:02, 79.90it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 19%|█▉        | 5841/30000 [01:13<05:02, 79.83it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5850/30000 [01:13<05:02, 79.94it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5858/30000 [01:14<05:03, 79.65it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5867/30000 [01:14<05:02, 79.80it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5875/30000 [01:14<05:02, 79.84it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5883/30000 [01:14<05:03, 79.53it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5891/30000 [01:14<05:02, 79.60it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5899/30000 [01:14<05:02, 79.55it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5908/30000 [01:14<05:01, 79.82it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5917/30000 [01:14<05:01, 79.96it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5925/30000 [01:14<05:01, 79.91it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5933/30000 [01:15<05:01, 79.80it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5942/30000 [01:15<05:00, 79.96it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5950/30000 [01:15<05:01, 79.89it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5959/30000 [01:15<04:59, 80.17it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5968/30000 [01:15<04:59, 80.16it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5977/30000 [01:15<04:59, 80.10it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5986/30000 [01:15<04:58, 80.46it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|█▉        | 5995/30000 [01:15<04:58, 80.45it/s, init loss: 29902.2090, avg. loss [3001-4500]: 22913.9219] 20%|██        | 6004/30000 [01:15<05:00, 79.93it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6012/30000 [01:16<05:00, 79.89it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6020/30000 [01:16<05:00, 79.83it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6028/30000 [01:16<05:00, 79.83it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6036/30000 [01:16<05:00, 79.65it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6045/30000 [01:16<05:00, 79.83it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6053/30000 [01:16<04:59, 79.85it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6062/30000 [01:16<04:59, 79.97it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6070/30000 [01:16<04:59, 79.86it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6078/30000 [01:16<05:01, 79.27it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6086/30000 [01:16<05:02, 79.11it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6095/30000 [01:17<05:01, 79.41it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6103/30000 [01:17<05:00, 79.48it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6111/30000 [01:17<05:00, 79.46it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6120/30000 [01:17<04:59, 79.73it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6129/30000 [01:17<04:58, 80.00it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6138/30000 [01:17<04:57, 80.12it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 20%|██        | 6147/30000 [01:17<04:59, 79.55it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6155/30000 [01:17<05:00, 79.30it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6163/30000 [01:17<05:00, 79.28it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6172/30000 [01:18<04:59, 79.57it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6180/30000 [01:18<04:59, 79.61it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6188/30000 [01:18<04:59, 79.62it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6196/30000 [01:18<04:59, 79.59it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6205/30000 [01:18<04:58, 79.81it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6213/30000 [01:18<04:58, 79.58it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6222/30000 [01:18<04:57, 79.87it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6231/30000 [01:18<04:57, 79.84it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6239/30000 [01:18<04:58, 79.64it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6248/30000 [01:18<04:56, 80.02it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6257/30000 [01:19<04:56, 80.07it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6266/30000 [01:19<04:56, 80.17it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6275/30000 [01:19<04:55, 80.38it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6284/30000 [01:19<04:56, 80.00it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6293/30000 [01:19<04:56, 80.04it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6302/30000 [01:19<04:55, 80.10it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6311/30000 [01:19<04:55, 80.10it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6320/30000 [01:19<04:58, 79.31it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6328/30000 [01:19<04:58, 79.27it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6337/30000 [01:20<04:57, 79.60it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6346/30000 [01:20<04:56, 79.78it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6354/30000 [01:20<04:56, 79.71it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6362/30000 [01:20<04:56, 79.75it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██        | 6370/30000 [01:20<04:57, 79.31it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6378/30000 [01:20<04:58, 79.21it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6387/30000 [01:20<04:56, 79.51it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6395/30000 [01:20<04:57, 79.30it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6403/30000 [01:20<04:59, 78.79it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6412/30000 [01:21<04:57, 79.18it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6421/30000 [01:21<04:56, 79.53it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6430/30000 [01:21<04:55, 79.82it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6438/30000 [01:21<04:55, 79.78it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 21%|██▏       | 6447/30000 [01:21<04:54, 79.96it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6455/30000 [01:21<04:56, 79.32it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6463/30000 [01:21<04:57, 79.18it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6471/30000 [01:21<04:56, 79.23it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6479/30000 [01:21<04:57, 79.16it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6487/30000 [01:21<04:56, 79.39it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6495/30000 [01:22<04:55, 79.51it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6503/30000 [01:22<04:56, 79.34it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6511/30000 [01:22<04:55, 79.53it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6519/30000 [01:22<04:54, 79.61it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6527/30000 [01:22<04:55, 79.54it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6536/30000 [01:22<04:54, 79.71it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6544/30000 [01:22<04:55, 79.44it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6553/30000 [01:22<04:53, 79.81it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6561/30000 [01:22<04:55, 79.35it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6569/30000 [01:23<04:55, 79.24it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6578/30000 [01:23<04:53, 79.70it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6587/30000 [01:23<04:53, 79.90it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6595/30000 [01:23<04:52, 79.93it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6604/30000 [01:23<04:51, 80.13it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6613/30000 [01:23<04:54, 79.51it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6621/30000 [01:23<04:53, 79.56it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6629/30000 [01:23<04:54, 79.40it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6637/30000 [01:23<04:54, 79.28it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6645/30000 [01:23<04:54, 79.38it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6653/30000 [01:24<04:54, 79.30it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6661/30000 [01:24<04:54, 79.32it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6670/30000 [01:24<04:52, 79.64it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6679/30000 [01:24<04:51, 79.92it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6687/30000 [01:24<04:51, 79.93it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6696/30000 [01:24<04:50, 80.27it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6705/30000 [01:24<04:49, 80.33it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6714/30000 [01:24<04:51, 79.92it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6722/30000 [01:24<04:51, 79.77it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6731/30000 [01:25<04:50, 80.14it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6740/30000 [01:25<04:50, 80.05it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 22%|██▏       | 6749/30000 [01:25<04:50, 80.07it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6758/30000 [01:25<04:49, 80.25it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6767/30000 [01:25<04:50, 79.93it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6775/30000 [01:25<04:50, 79.86it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6784/30000 [01:25<04:50, 80.00it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6792/30000 [01:25<04:50, 79.82it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6800/30000 [01:25<04:51, 79.71it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6808/30000 [01:26<04:52, 79.42it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6816/30000 [01:26<04:51, 79.55it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6825/30000 [01:26<04:50, 79.88it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6833/30000 [01:26<04:50, 79.77it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6842/30000 [01:26<04:49, 80.03it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6851/30000 [01:26<04:49, 79.94it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6859/30000 [01:26<04:49, 79.91it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6868/30000 [01:26<04:48, 80.10it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6877/30000 [01:26<04:51, 79.41it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6886/30000 [01:26<04:49, 79.71it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6894/30000 [01:27<04:50, 79.65it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6903/30000 [01:27<04:49, 79.80it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6912/30000 [01:27<04:48, 79.89it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6921/30000 [01:27<04:48, 79.96it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6930/30000 [01:27<04:48, 80.00it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6938/30000 [01:27<04:48, 79.99it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6947/30000 [01:27<04:48, 80.00it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6955/30000 [01:27<04:49, 79.50it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6964/30000 [01:27<04:48, 79.81it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6972/30000 [01:28<04:48, 79.86it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6980/30000 [01:28<04:49, 79.55it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6989/30000 [01:28<04:48, 79.71it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 6997/30000 [01:28<04:48, 79.61it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 7005/30000 [01:28<04:48, 79.61it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 7013/30000 [01:28<04:49, 79.36it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 7021/30000 [01:28<04:49, 79.49it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 7029/30000 [01:28<04:50, 79.09it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 7037/30000 [01:28<04:49, 79.32it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 23%|██▎       | 7046/30000 [01:28<04:48, 79.51it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7055/30000 [01:29<04:47, 79.72it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7063/30000 [01:29<04:47, 79.76it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7072/30000 [01:29<04:47, 79.89it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7080/30000 [01:29<04:47, 79.65it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7089/30000 [01:29<04:47, 79.77it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7098/30000 [01:29<04:46, 79.83it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7106/30000 [01:29<04:46, 79.81it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7114/30000 [01:29<04:47, 79.65it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▎       | 7122/30000 [01:29<04:47, 79.44it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7130/30000 [01:30<04:48, 79.41it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7138/30000 [01:30<04:47, 79.44it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7147/30000 [01:30<04:47, 79.60it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7155/30000 [01:30<04:47, 79.45it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7164/30000 [01:30<04:46, 79.65it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7172/30000 [01:30<04:47, 79.32it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7180/30000 [01:30<04:47, 79.45it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7188/30000 [01:30<04:47, 79.24it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7196/30000 [01:30<04:47, 79.34it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7204/30000 [01:30<04:47, 79.40it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7212/30000 [01:31<04:46, 79.55it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7220/30000 [01:31<04:46, 79.40it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7228/30000 [01:31<04:47, 79.30it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7236/30000 [01:31<04:46, 79.43it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7244/30000 [01:31<04:46, 79.46it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7252/30000 [01:31<04:49, 78.46it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7260/30000 [01:31<04:51, 77.95it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7268/30000 [01:31<04:54, 77.10it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7276/30000 [01:31<04:56, 76.52it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7284/30000 [01:32<04:57, 76.46it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7292/30000 [01:32<04:55, 76.84it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7300/30000 [01:32<04:57, 76.41it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7308/30000 [01:32<04:56, 76.64it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7316/30000 [01:32<04:57, 76.37it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7324/30000 [01:32<04:55, 76.65it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7332/30000 [01:32<04:55, 76.74it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7340/30000 [01:32<04:54, 77.04it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 24%|██▍       | 7348/30000 [01:32<04:55, 76.57it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7356/30000 [01:32<04:58, 75.95it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7364/30000 [01:33<04:57, 76.01it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7372/30000 [01:33<04:59, 75.56it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7380/30000 [01:33<04:56, 76.33it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7388/30000 [01:33<04:59, 75.50it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7396/30000 [01:33<04:58, 75.77it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7404/30000 [01:33<04:57, 76.02it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7412/30000 [01:33<04:56, 76.29it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7420/30000 [01:33<04:56, 76.18it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7428/30000 [01:33<04:57, 75.87it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7436/30000 [01:34<04:55, 76.30it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7444/30000 [01:34<04:56, 76.05it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7452/30000 [01:34<04:55, 76.28it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7460/30000 [01:34<04:59, 75.38it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7468/30000 [01:34<04:57, 75.86it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7476/30000 [01:34<04:57, 75.78it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7484/30000 [01:34<04:56, 75.91it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▍       | 7492/30000 [01:34<04:59, 75.23it/s, init loss: 29902.2090, avg. loss [4501-6000]: 21700.6953] 25%|██▌       | 7500/30000 [01:34<04:56, 75.90it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7509/30000 [01:34<04:51, 77.22it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7518/30000 [01:35<04:47, 78.30it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7526/30000 [01:35<04:45, 78.65it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7534/30000 [01:35<04:45, 78.65it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7542/30000 [01:35<04:45, 78.79it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7550/30000 [01:35<04:44, 78.98it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7558/30000 [01:35<04:43, 79.13it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7567/30000 [01:35<04:42, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7576/30000 [01:35<04:41, 79.61it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7585/30000 [01:35<04:41, 79.72it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7593/30000 [01:36<04:41, 79.58it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7601/30000 [01:36<04:41, 79.67it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7609/30000 [01:36<04:41, 79.47it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7618/30000 [01:36<04:40, 79.69it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7626/30000 [01:36<04:40, 79.63it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7634/30000 [01:36<04:42, 79.30it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 25%|██▌       | 7642/30000 [01:36<04:42, 79.02it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7650/30000 [01:36<04:42, 79.09it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7658/30000 [01:36<04:42, 79.10it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7666/30000 [01:36<04:41, 79.32it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7674/30000 [01:37<04:41, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7682/30000 [01:37<04:40, 79.55it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7690/30000 [01:37<04:41, 79.31it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7698/30000 [01:37<04:40, 79.42it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7706/30000 [01:37<04:40, 79.53it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7714/30000 [01:37<04:40, 79.58it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7722/30000 [01:37<04:40, 79.32it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7730/30000 [01:37<04:40, 79.29it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7738/30000 [01:37<04:40, 79.25it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7746/30000 [01:37<04:41, 79.18it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7754/30000 [01:38<04:40, 79.23it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7762/30000 [01:38<04:40, 79.30it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7770/30000 [01:38<04:39, 79.51it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7779/30000 [01:38<04:39, 79.56it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7787/30000 [01:38<04:39, 79.42it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7795/30000 [01:38<04:39, 79.48it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7803/30000 [01:38<04:40, 79.22it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7811/30000 [01:38<04:40, 79.17it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7819/30000 [01:38<04:39, 79.23it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7827/30000 [01:38<04:39, 79.26it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7835/30000 [01:39<04:38, 79.47it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7843/30000 [01:39<04:38, 79.42it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7851/30000 [01:39<04:38, 79.47it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7859/30000 [01:39<04:39, 79.34it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▌       | 7868/30000 [01:39<04:37, 79.73it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7876/30000 [01:39<04:37, 79.73it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7884/30000 [01:39<04:37, 79.66it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7892/30000 [01:39<04:38, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7900/30000 [01:39<04:40, 78.86it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7908/30000 [01:39<04:40, 78.79it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7916/30000 [01:40<04:39, 79.00it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7924/30000 [01:40<04:38, 79.14it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7932/30000 [01:40<04:39, 79.00it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7940/30000 [01:40<04:38, 79.21it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 26%|██▋       | 7948/30000 [01:40<04:38, 79.15it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 7956/30000 [01:40<04:38, 79.09it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 7964/30000 [01:40<04:38, 79.12it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 7972/30000 [01:40<04:37, 79.37it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 7980/30000 [01:40<04:37, 79.28it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 7989/30000 [01:41<04:36, 79.72it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 7998/30000 [01:41<04:35, 79.77it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8006/30000 [01:41<04:36, 79.67it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8014/30000 [01:41<04:35, 79.75it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8022/30000 [01:41<04:36, 79.60it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8030/30000 [01:41<04:36, 79.46it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8038/30000 [01:41<04:36, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8046/30000 [01:41<04:36, 79.52it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8054/30000 [01:41<04:36, 79.29it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8062/30000 [01:41<04:37, 79.16it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8071/30000 [01:42<04:36, 79.42it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8079/30000 [01:42<04:35, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8087/30000 [01:42<04:35, 79.45it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8095/30000 [01:42<04:35, 79.59it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8103/30000 [01:42<04:34, 79.63it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8111/30000 [01:42<04:35, 79.48it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8119/30000 [01:42<04:35, 79.53it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8127/30000 [01:42<04:35, 79.52it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8135/30000 [01:42<04:35, 79.27it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8143/30000 [01:42<04:35, 79.35it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8151/30000 [01:43<04:35, 79.45it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8159/30000 [01:43<04:35, 79.34it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8167/30000 [01:43<04:35, 79.37it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8176/30000 [01:43<04:33, 79.67it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8185/30000 [01:43<04:33, 79.77it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8194/30000 [01:43<04:32, 80.11it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8203/30000 [01:43<04:32, 80.01it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8212/30000 [01:43<04:31, 80.19it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8221/30000 [01:43<04:34, 79.36it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8229/30000 [01:44<04:34, 79.32it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8237/30000 [01:44<04:33, 79.48it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 27%|██▋       | 8245/30000 [01:44<04:33, 79.57it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8253/30000 [01:44<04:34, 79.36it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8261/30000 [01:44<04:34, 79.29it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8269/30000 [01:44<04:33, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8277/30000 [01:44<04:33, 79.50it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8285/30000 [01:44<04:34, 79.18it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8293/30000 [01:44<04:34, 79.20it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8301/30000 [01:44<04:33, 79.37it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8309/30000 [01:45<04:33, 79.22it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8317/30000 [01:45<04:33, 79.18it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8325/30000 [01:45<04:33, 79.37it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8333/30000 [01:45<04:33, 79.31it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8341/30000 [01:45<04:32, 79.51it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8349/30000 [01:45<04:32, 79.50it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8357/30000 [01:45<04:31, 79.61it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8366/30000 [01:45<04:31, 79.74it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8374/30000 [01:45<04:33, 79.14it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8383/30000 [01:45<04:31, 79.59it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8391/30000 [01:46<04:31, 79.69it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8399/30000 [01:46<04:30, 79.77it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8407/30000 [01:46<04:30, 79.78it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8415/30000 [01:46<04:33, 79.04it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8424/30000 [01:46<04:31, 79.37it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8433/30000 [01:46<04:30, 79.61it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8441/30000 [01:46<04:30, 79.69it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8449/30000 [01:46<04:31, 79.48it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8457/30000 [01:46<04:32, 79.14it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8465/30000 [01:46<04:32, 78.96it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8473/30000 [01:47<04:32, 79.04it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8481/30000 [01:47<04:31, 79.19it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8489/30000 [01:47<04:31, 79.28it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8497/30000 [01:47<04:30, 79.46it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8505/30000 [01:47<04:31, 79.23it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8514/30000 [01:47<04:30, 79.41it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8522/30000 [01:47<04:30, 79.41it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8530/30000 [01:47<04:30, 79.38it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8538/30000 [01:47<04:32, 78.82it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 28%|██▊       | 8547/30000 [01:48<04:31, 79.09it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8555/30000 [01:48<04:31, 79.04it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8563/30000 [01:48<04:31, 78.95it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8571/30000 [01:48<04:31, 78.93it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8579/30000 [01:48<04:30, 79.06it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8587/30000 [01:48<04:30, 79.14it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8595/30000 [01:48<04:30, 79.28it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8603/30000 [01:48<04:29, 79.28it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8611/30000 [01:48<04:29, 79.44it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▊       | 8619/30000 [01:48<04:29, 79.41it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8627/30000 [01:49<04:28, 79.50it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8635/30000 [01:49<04:29, 79.30it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8644/30000 [01:49<04:28, 79.57it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8652/30000 [01:49<04:28, 79.54it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8660/30000 [01:49<04:28, 79.39it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8668/30000 [01:49<04:29, 79.16it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8676/30000 [01:49<04:30, 78.89it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8684/30000 [01:49<04:29, 79.12it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8692/30000 [01:49<04:29, 78.97it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8700/30000 [01:49<04:29, 78.93it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8708/30000 [01:50<04:30, 78.72it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8716/30000 [01:50<04:30, 78.81it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8724/30000 [01:50<04:29, 78.93it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8733/30000 [01:50<04:28, 79.28it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8741/30000 [01:50<04:28, 79.07it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8749/30000 [01:50<04:28, 79.07it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8757/30000 [01:50<04:28, 79.26it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8765/30000 [01:50<04:27, 79.30it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8773/30000 [01:50<04:29, 78.89it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8781/30000 [01:50<04:27, 79.22it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8789/30000 [01:51<04:27, 79.29it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8797/30000 [01:51<04:27, 79.37it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8805/30000 [01:51<04:26, 79.50it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8813/30000 [01:51<04:26, 79.59it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8821/30000 [01:51<04:26, 79.38it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8829/30000 [01:51<04:27, 79.28it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8837/30000 [01:51<04:27, 79.14it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 29%|██▉       | 8845/30000 [01:51<04:26, 79.28it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8853/30000 [01:51<04:26, 79.32it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8861/30000 [01:51<04:27, 79.17it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8870/30000 [01:52<04:26, 79.41it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8879/30000 [01:52<04:24, 79.77it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8887/30000 [01:52<04:25, 79.63it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8895/30000 [01:52<04:25, 79.46it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8903/30000 [01:52<04:25, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8911/30000 [01:52<04:25, 79.32it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8919/30000 [01:52<04:25, 79.27it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8927/30000 [01:52<04:26, 78.98it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8935/30000 [01:52<04:26, 79.14it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8944/30000 [01:53<04:25, 79.43it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8952/30000 [01:53<04:25, 79.38it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8960/30000 [01:53<04:24, 79.55it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8968/30000 [01:53<04:23, 79.67it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8976/30000 [01:53<04:24, 79.47it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8985/30000 [01:53<04:23, 79.72it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|██▉       | 8993/30000 [01:53<04:24, 79.44it/s, init loss: 29902.2090, avg. loss [6001-7500]: 20817.2285] 30%|███       | 9001/30000 [01:53<04:25, 79.19it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9009/30000 [01:53<04:25, 78.95it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9018/30000 [01:53<04:24, 79.31it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9026/30000 [01:54<04:23, 79.45it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9034/30000 [01:54<04:24, 79.32it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9042/30000 [01:54<04:24, 79.33it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9050/30000 [01:54<04:23, 79.42it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9058/30000 [01:54<04:24, 79.21it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9067/30000 [01:54<04:23, 79.49it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9075/30000 [01:54<04:23, 79.49it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9083/30000 [01:54<04:23, 79.41it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9091/30000 [01:54<04:24, 79.10it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9100/30000 [01:55<04:22, 79.73it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9108/30000 [01:55<04:22, 79.60it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9116/30000 [01:55<04:22, 79.66it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9124/30000 [01:55<04:23, 79.29it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9132/30000 [01:55<04:22, 79.38it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9140/30000 [01:55<04:22, 79.47it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 30%|███       | 9149/30000 [01:55<04:21, 79.66it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9157/30000 [01:55<04:21, 79.73it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9165/30000 [01:55<04:22, 79.46it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9173/30000 [01:55<04:23, 78.96it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9181/30000 [01:56<04:23, 79.01it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9189/30000 [01:56<04:22, 79.15it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9197/30000 [01:56<04:22, 79.19it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9206/30000 [01:56<04:21, 79.52it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9214/30000 [01:56<04:22, 79.24it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9222/30000 [01:56<04:21, 79.34it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9230/30000 [01:56<04:22, 79.22it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9238/30000 [01:56<04:22, 79.16it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9246/30000 [01:56<04:23, 78.83it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9255/30000 [01:56<04:21, 79.24it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9263/30000 [01:57<04:22, 79.09it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9271/30000 [01:57<04:22, 79.02it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9279/30000 [01:57<04:21, 79.17it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9287/30000 [01:57<04:22, 78.83it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9295/30000 [01:57<04:21, 79.11it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9303/30000 [01:57<04:20, 79.36it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9311/30000 [01:57<04:20, 79.33it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9319/30000 [01:57<04:20, 79.45it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9327/30000 [01:57<04:20, 79.49it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9336/30000 [01:57<04:19, 79.62it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9344/30000 [01:58<04:19, 79.59it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9353/30000 [01:58<04:18, 79.84it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9362/30000 [01:58<04:18, 79.99it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███       | 9371/30000 [01:58<04:18, 79.93it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9379/30000 [01:58<04:18, 79.85it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9388/30000 [01:58<04:17, 79.93it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9396/30000 [01:58<04:18, 79.86it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9404/30000 [01:58<04:18, 79.58it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9412/30000 [01:58<04:19, 79.49it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9420/30000 [01:59<04:19, 79.44it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9428/30000 [01:59<04:19, 79.32it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9436/30000 [01:59<04:19, 79.30it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 31%|███▏      | 9444/30000 [01:59<04:19, 79.28it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9452/30000 [01:59<04:18, 79.39it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9461/30000 [01:59<04:17, 79.65it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9469/30000 [01:59<04:17, 79.67it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9477/30000 [01:59<04:17, 79.60it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9485/30000 [01:59<04:18, 79.44it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9494/30000 [01:59<04:17, 79.61it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9502/30000 [02:00<04:17, 79.69it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9510/30000 [02:00<04:16, 79.75it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9518/30000 [02:00<04:17, 79.58it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9526/30000 [02:00<04:17, 79.58it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9534/30000 [02:00<04:16, 79.66it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9542/30000 [02:00<04:16, 79.72it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9550/30000 [02:00<04:16, 79.67it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9558/30000 [02:00<04:16, 79.70it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9566/30000 [02:00<04:17, 79.50it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9574/30000 [02:00<04:16, 79.61it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9582/30000 [02:01<04:16, 79.50it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9591/30000 [02:01<04:16, 79.67it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9599/30000 [02:01<04:16, 79.56it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9607/30000 [02:01<04:16, 79.40it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9616/30000 [02:01<04:15, 79.63it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9624/30000 [02:01<04:16, 79.46it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9632/30000 [02:01<04:16, 79.46it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9640/30000 [02:01<04:15, 79.58it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9648/30000 [02:01<04:16, 79.43it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9656/30000 [02:02<04:16, 79.22it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9664/30000 [02:02<04:17, 78.90it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9672/30000 [02:02<04:17, 78.96it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9680/30000 [02:02<04:17, 78.91it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9688/30000 [02:02<04:16, 79.19it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9696/30000 [02:02<04:16, 79.16it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9704/30000 [02:02<04:16, 79.16it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9713/30000 [02:02<04:15, 79.42it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9721/30000 [02:02<04:15, 79.43it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9729/30000 [02:02<04:15, 79.27it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9738/30000 [02:03<04:14, 79.48it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 32%|███▏      | 9746/30000 [02:03<04:14, 79.61it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9754/30000 [02:03<04:14, 79.47it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9762/30000 [02:03<04:14, 79.54it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9770/30000 [02:03<04:14, 79.56it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9778/30000 [02:03<04:15, 79.19it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9786/30000 [02:03<04:15, 79.06it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9794/30000 [02:03<04:15, 79.14it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9802/30000 [02:03<04:15, 79.14it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9810/30000 [02:03<04:15, 79.17it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9818/30000 [02:04<04:14, 79.16it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9826/30000 [02:04<04:14, 79.25it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9835/30000 [02:04<04:13, 79.56it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9843/30000 [02:04<04:13, 79.58it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9851/30000 [02:04<04:13, 79.61it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9859/30000 [02:04<04:13, 79.40it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9867/30000 [02:04<04:13, 79.29it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9875/30000 [02:04<04:13, 79.40it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9883/30000 [02:04<04:14, 78.96it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9891/30000 [02:04<04:14, 79.10it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9899/30000 [02:05<04:13, 79.32it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9908/30000 [02:05<04:12, 79.64it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9916/30000 [02:05<04:13, 79.20it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9925/30000 [02:05<04:12, 79.54it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9933/30000 [02:05<04:12, 79.34it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9941/30000 [02:05<04:13, 79.12it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9949/30000 [02:05<04:13, 79.16it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9957/30000 [02:05<04:13, 79.16it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9965/30000 [02:05<04:12, 79.26it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9974/30000 [02:06<04:11, 79.62it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9982/30000 [02:06<04:11, 79.67it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9990/30000 [02:06<04:11, 79.71it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 9998/30000 [02:06<04:11, 79.49it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 10006/30000 [02:06<04:11, 79.53it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 10014/30000 [02:06<04:11, 79.59it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 10022/30000 [02:06<04:11, 79.40it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 10030/30000 [02:06<04:11, 79.40it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 10038/30000 [02:06<04:11, 79.32it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 33%|███▎      | 10047/30000 [02:06<04:10, 79.69it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10055/30000 [02:07<04:10, 79.52it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10063/30000 [02:07<04:11, 79.39it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10071/30000 [02:07<04:10, 79.42it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10080/30000 [02:07<04:10, 79.64it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10088/30000 [02:07<04:10, 79.61it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10096/30000 [02:07<04:09, 79.62it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10104/30000 [02:07<04:10, 79.57it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10112/30000 [02:07<04:09, 79.68it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▎      | 10120/30000 [02:07<04:11, 78.94it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10128/30000 [02:07<04:10, 79.22it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10136/30000 [02:08<04:10, 79.40it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10144/30000 [02:08<04:09, 79.43it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10152/30000 [02:08<04:09, 79.47it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10161/30000 [02:08<04:09, 79.64it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10169/30000 [02:08<04:09, 79.36it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10177/30000 [02:08<04:09, 79.50it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10185/30000 [02:08<04:09, 79.42it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10193/30000 [02:08<04:09, 79.39it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10201/30000 [02:08<04:11, 78.78it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10209/30000 [02:08<04:10, 79.12it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10217/30000 [02:09<04:09, 79.38it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10226/30000 [02:09<04:07, 79.83it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10234/30000 [02:09<04:08, 79.50it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10243/30000 [02:09<04:07, 79.67it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10251/30000 [02:09<04:08, 79.49it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10259/30000 [02:09<04:08, 79.50it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10268/30000 [02:09<04:07, 79.88it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10276/30000 [02:09<04:06, 79.90it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10284/30000 [02:09<04:08, 79.35it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10292/30000 [02:10<04:08, 79.33it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10300/30000 [02:10<04:08, 79.33it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10308/30000 [02:10<04:08, 79.36it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10317/30000 [02:10<04:07, 79.60it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10325/30000 [02:10<04:07, 79.57it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10333/30000 [02:10<04:07, 79.51it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10341/30000 [02:10<04:07, 79.49it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 34%|███▍      | 10349/30000 [02:10<04:07, 79.55it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10357/30000 [02:10<04:06, 79.55it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10365/30000 [02:10<04:07, 79.44it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10374/30000 [02:11<04:06, 79.62it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10382/30000 [02:11<04:06, 79.71it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10390/30000 [02:11<04:05, 79.76it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10399/30000 [02:11<04:05, 79.72it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10407/30000 [02:11<04:06, 79.61it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10415/30000 [02:11<04:05, 79.70it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10423/30000 [02:11<04:05, 79.65it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10431/30000 [02:11<04:05, 79.60it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10439/30000 [02:11<04:07, 78.98it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10447/30000 [02:11<04:06, 79.27it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10455/30000 [02:12<04:06, 79.16it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10464/30000 [02:12<04:05, 79.50it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10473/30000 [02:12<04:04, 79.76it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10481/30000 [02:12<04:05, 79.53it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10489/30000 [02:12<04:05, 79.42it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▍      | 10497/30000 [02:12<04:05, 79.38it/s, init loss: 29902.2090, avg. loss [7501-9000]: 20181.0684] 35%|███▌      | 10505/30000 [02:12<04:06, 79.10it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10513/30000 [02:12<04:06, 78.91it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10521/30000 [02:12<04:07, 78.71it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10530/30000 [02:13<04:05, 79.27it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10538/30000 [02:13<04:05, 79.39it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10546/30000 [02:13<04:05, 79.27it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10555/30000 [02:13<04:04, 79.63it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10563/30000 [02:13<04:04, 79.65it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10571/30000 [02:13<04:04, 79.38it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10579/30000 [02:13<04:04, 79.38it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10587/30000 [02:13<04:04, 79.41it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10595/30000 [02:13<04:05, 78.94it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10603/30000 [02:13<04:06, 78.79it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10611/30000 [02:14<04:05, 79.12it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10619/30000 [02:14<04:04, 79.27it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10628/30000 [02:14<04:03, 79.56it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10636/30000 [02:14<04:03, 79.44it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 35%|███▌      | 10644/30000 [02:14<04:03, 79.47it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10652/30000 [02:14<04:03, 79.55it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10661/30000 [02:14<04:02, 79.73it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10670/30000 [02:14<04:01, 79.88it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10678/30000 [02:14<04:04, 78.99it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10686/30000 [02:14<04:04, 79.08it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10694/30000 [02:15<04:04, 79.11it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10702/30000 [02:15<04:03, 79.13it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10710/30000 [02:15<04:03, 79.09it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10718/30000 [02:15<04:03, 79.13it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10726/30000 [02:15<04:03, 79.12it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10734/30000 [02:15<04:03, 79.18it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10743/30000 [02:15<04:02, 79.44it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10751/30000 [02:15<04:02, 79.53it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10759/30000 [02:15<04:02, 79.24it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10767/30000 [02:16<04:03, 79.10it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10775/30000 [02:16<04:04, 78.72it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10783/30000 [02:16<04:04, 78.73it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10792/30000 [02:16<04:02, 79.28it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10801/30000 [02:16<04:01, 79.53it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10809/30000 [02:16<04:01, 79.51it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10817/30000 [02:16<04:01, 79.41it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10825/30000 [02:16<04:02, 79.05it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10833/30000 [02:16<04:04, 78.36it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10841/30000 [02:16<04:03, 78.56it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10849/30000 [02:17<04:03, 78.73it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10857/30000 [02:17<04:02, 79.03it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10865/30000 [02:17<04:01, 79.16it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▌      | 10873/30000 [02:17<04:01, 79.35it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10881/30000 [02:17<04:00, 79.36it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10890/30000 [02:17<03:59, 79.75it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10899/30000 [02:17<03:58, 79.94it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10907/30000 [02:17<03:59, 79.66it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10915/30000 [02:17<04:00, 79.50it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10923/30000 [02:17<03:59, 79.49it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10932/30000 [02:18<03:59, 79.78it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10940/30000 [02:18<03:58, 79.76it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 36%|███▋      | 10948/30000 [02:18<03:58, 79.72it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 10956/30000 [02:18<03:58, 79.73it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 10964/30000 [02:18<03:58, 79.74it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 10972/30000 [02:18<03:59, 79.43it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 10980/30000 [02:18<03:59, 79.48it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 10988/30000 [02:18<03:59, 79.32it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 10996/30000 [02:18<04:01, 78.79it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11005/30000 [02:19<03:59, 79.37it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11013/30000 [02:19<03:58, 79.46it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11021/30000 [02:19<03:58, 79.57it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11029/30000 [02:19<03:58, 79.48it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11037/30000 [02:19<03:58, 79.54it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11045/30000 [02:19<03:58, 79.44it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11054/30000 [02:19<03:58, 79.57it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11062/30000 [02:19<03:58, 79.46it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11070/30000 [02:19<03:59, 79.16it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11078/30000 [02:19<03:58, 79.28it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11087/30000 [02:20<03:57, 79.67it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11095/30000 [02:20<03:57, 79.75it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11104/30000 [02:20<03:56, 79.95it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11112/30000 [02:20<03:58, 79.31it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11120/30000 [02:20<03:57, 79.33it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11128/30000 [02:20<03:59, 78.76it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11136/30000 [02:20<03:58, 79.06it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11144/30000 [02:20<03:58, 79.15it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11152/30000 [02:20<03:58, 79.16it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11160/30000 [02:20<03:57, 79.16it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11168/30000 [02:21<03:57, 79.38it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11176/30000 [02:21<03:57, 79.24it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11185/30000 [02:21<03:56, 79.53it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11194/30000 [02:21<03:55, 79.72it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11202/30000 [02:21<03:56, 79.60it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11210/30000 [02:21<03:55, 79.67it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11218/30000 [02:21<03:55, 79.63it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11227/30000 [02:21<03:55, 79.82it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11235/30000 [02:21<03:56, 79.32it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 37%|███▋      | 11243/30000 [02:21<03:55, 79.50it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11252/30000 [02:22<03:55, 79.78it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11260/30000 [02:22<03:55, 79.72it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11269/30000 [02:22<03:54, 79.85it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11278/30000 [02:22<03:54, 79.90it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11287/30000 [02:22<03:54, 79.96it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11295/30000 [02:22<03:54, 79.89it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11303/30000 [02:22<03:54, 79.77it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11311/30000 [02:22<03:55, 79.52it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11319/30000 [02:22<03:54, 79.55it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11327/30000 [02:23<03:54, 79.60it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11335/30000 [02:23<03:55, 79.34it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11343/30000 [02:23<03:55, 79.23it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11351/30000 [02:23<03:55, 79.21it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11359/30000 [02:23<03:55, 79.23it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11367/30000 [02:23<03:54, 79.34it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11375/30000 [02:23<03:54, 79.37it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11384/30000 [02:23<03:53, 79.76it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11392/30000 [02:23<03:53, 79.80it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11400/30000 [02:23<03:53, 79.79it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11408/30000 [02:24<03:53, 79.70it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11417/30000 [02:24<03:52, 80.05it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11426/30000 [02:24<03:52, 79.87it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11434/30000 [02:24<03:52, 79.74it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11443/30000 [02:24<03:52, 79.83it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11451/30000 [02:24<03:53, 79.49it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11460/30000 [02:24<03:52, 79.81it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11468/30000 [02:24<03:52, 79.81it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11476/30000 [02:24<03:52, 79.58it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11485/30000 [02:25<03:52, 79.55it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11494/30000 [02:25<03:51, 79.78it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11502/30000 [02:25<03:51, 79.82it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11510/30000 [02:25<03:51, 79.77it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11518/30000 [02:25<03:52, 79.64it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11526/30000 [02:25<03:52, 79.54it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11534/30000 [02:25<03:52, 79.59it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11542/30000 [02:25<03:53, 79.16it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 38%|███▊      | 11550/30000 [02:25<03:53, 78.98it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11558/30000 [02:25<03:53, 79.14it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11566/30000 [02:26<03:52, 79.30it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11574/30000 [02:26<03:52, 79.38it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11582/30000 [02:26<03:51, 79.51it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11590/30000 [02:26<03:51, 79.40it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11598/30000 [02:26<03:51, 79.57it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11607/30000 [02:26<03:50, 79.88it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▊      | 11616/30000 [02:26<03:49, 80.06it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11625/30000 [02:26<03:49, 80.01it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11633/30000 [02:26<03:50, 79.63it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11642/30000 [02:27<03:50, 79.77it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11651/30000 [02:27<03:49, 79.87it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11660/30000 [02:27<03:49, 80.08it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11669/30000 [02:27<03:49, 79.82it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11677/30000 [02:27<03:49, 79.86it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11686/30000 [02:27<03:49, 79.94it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11694/30000 [02:27<03:49, 79.86it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11702/30000 [02:27<03:49, 79.58it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11710/30000 [02:27<03:51, 79.17it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11718/30000 [02:27<03:51, 79.07it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11727/30000 [02:28<03:50, 79.41it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11735/30000 [02:28<03:49, 79.46it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11744/30000 [02:28<03:49, 79.57it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11753/30000 [02:28<03:48, 79.79it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11761/30000 [02:28<03:48, 79.83it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11770/30000 [02:28<03:47, 80.01it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11778/30000 [02:28<03:48, 79.71it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11786/30000 [02:28<03:49, 79.49it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11794/30000 [02:28<03:49, 79.43it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11803/30000 [02:29<03:48, 79.68it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11811/30000 [02:29<03:48, 79.58it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11819/30000 [02:29<03:48, 79.52it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11828/30000 [02:29<03:47, 79.72it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11836/30000 [02:29<03:48, 79.59it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 39%|███▉      | 11844/30000 [02:29<03:48, 79.60it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11853/30000 [02:29<03:47, 79.90it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11861/30000 [02:29<03:47, 79.74it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11869/30000 [02:29<03:47, 79.72it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11877/30000 [02:29<03:47, 79.74it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11886/30000 [02:30<03:46, 79.88it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11894/30000 [02:30<03:46, 79.85it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11903/30000 [02:30<03:45, 80.09it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11912/30000 [02:30<03:46, 79.86it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11920/30000 [02:30<03:46, 79.71it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11928/30000 [02:30<03:47, 79.35it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11936/30000 [02:30<03:47, 79.33it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11944/30000 [02:30<03:47, 79.35it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11952/30000 [02:30<03:47, 79.30it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11960/30000 [02:30<03:47, 79.28it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11968/30000 [02:31<03:46, 79.44it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11976/30000 [02:31<03:47, 79.18it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11984/30000 [02:31<03:47, 79.23it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|███▉      | 11992/30000 [02:31<03:46, 79.41it/s, init loss: 29902.2090, avg. loss [9001-10500]: 19732.2715] 40%|████      | 12000/30000 [02:31<03:47, 79.00it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12009/30000 [02:31<03:46, 79.33it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12018/30000 [02:31<03:46, 79.57it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12026/30000 [02:31<03:45, 79.58it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12035/30000 [02:31<03:45, 79.62it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12043/30000 [02:32<03:46, 79.33it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12051/30000 [02:32<03:47, 78.98it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12059/30000 [02:32<03:47, 78.83it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12067/30000 [02:32<03:46, 79.17it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12076/30000 [02:32<03:45, 79.54it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12084/30000 [02:32<03:45, 79.58it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12092/30000 [02:32<03:46, 79.22it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12100/30000 [02:32<03:46, 79.18it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12108/30000 [02:32<03:46, 78.98it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12116/30000 [02:32<03:45, 79.14it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12125/30000 [02:33<03:45, 79.40it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12133/30000 [02:33<03:45, 79.38it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12141/30000 [02:33<03:45, 79.33it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 40%|████      | 12150/30000 [02:33<03:44, 79.58it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12158/30000 [02:33<03:44, 79.59it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12166/30000 [02:33<03:43, 79.67it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12174/30000 [02:33<03:43, 79.65it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12183/30000 [02:33<03:43, 79.86it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12191/30000 [02:33<03:44, 79.47it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12200/30000 [02:34<03:43, 79.65it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12208/30000 [02:34<03:43, 79.64it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12216/30000 [02:34<03:43, 79.54it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12224/30000 [02:34<03:43, 79.65it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12232/30000 [02:34<03:43, 79.64it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12240/30000 [02:34<03:43, 79.59it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12248/30000 [02:34<03:43, 79.37it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12257/30000 [02:34<03:42, 79.82it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12265/30000 [02:34<03:42, 79.61it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12273/30000 [02:34<03:43, 79.46it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12281/30000 [02:35<03:42, 79.55it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12290/30000 [02:35<03:42, 79.59it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12298/30000 [02:35<03:42, 79.50it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12306/30000 [02:35<03:42, 79.63it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12315/30000 [02:35<03:41, 79.74it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12323/30000 [02:35<03:42, 79.48it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12331/30000 [02:35<03:42, 79.49it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12339/30000 [02:35<03:41, 79.56it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12347/30000 [02:35<03:43, 79.11it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12355/30000 [02:35<03:42, 79.18it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12364/30000 [02:36<03:41, 79.66it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████      | 12372/30000 [02:36<03:41, 79.72it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12380/30000 [02:36<03:40, 79.75it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12388/30000 [02:36<03:41, 79.42it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12396/30000 [02:36<03:41, 79.45it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12404/30000 [02:36<03:41, 79.43it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12412/30000 [02:36<03:41, 79.56it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12420/30000 [02:36<03:41, 79.48it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12428/30000 [02:36<03:42, 78.94it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12437/30000 [02:37<03:41, 79.44it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 41%|████▏     | 12445/30000 [02:37<03:40, 79.44it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12453/30000 [02:37<03:40, 79.49it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12461/30000 [02:37<03:40, 79.61it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12469/30000 [02:37<03:40, 79.55it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12478/30000 [02:37<03:39, 79.84it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12486/30000 [02:37<03:39, 79.87it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12494/30000 [02:37<03:39, 79.77it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12502/30000 [02:37<03:39, 79.72it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12510/30000 [02:37<03:39, 79.75it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12518/30000 [02:38<03:39, 79.70it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12527/30000 [02:38<03:38, 79.81it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12535/30000 [02:38<03:39, 79.62it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12543/30000 [02:38<03:39, 79.63it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12551/30000 [02:38<03:39, 79.58it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12559/30000 [02:38<03:39, 79.61it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12567/30000 [02:38<03:39, 79.57it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12575/30000 [02:38<03:39, 79.49it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12583/30000 [02:38<03:40, 79.07it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12591/30000 [02:38<03:40, 78.77it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12600/30000 [02:39<03:39, 79.17it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12608/30000 [02:39<03:39, 79.11it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12616/30000 [02:39<03:39, 79.23it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12624/30000 [02:39<03:39, 79.13it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12632/30000 [02:39<03:39, 79.19it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12640/30000 [02:39<03:39, 79.03it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12648/30000 [02:39<03:39, 79.19it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12656/30000 [02:39<03:38, 79.33it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12664/30000 [02:39<03:39, 78.93it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12673/30000 [02:39<03:38, 79.40it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12682/30000 [02:40<03:37, 79.80it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12690/30000 [02:40<03:37, 79.76it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12698/30000 [02:40<03:36, 79.80it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12706/30000 [02:40<03:37, 79.49it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12714/30000 [02:40<03:38, 79.19it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12723/30000 [02:40<03:37, 79.50it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12731/30000 [02:40<03:37, 79.44it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12739/30000 [02:40<03:37, 79.44it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 42%|████▏     | 12747/30000 [02:40<03:37, 79.15it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12756/30000 [02:41<03:37, 79.38it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12764/30000 [02:41<03:37, 79.43it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12772/30000 [02:41<03:36, 79.54it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12780/30000 [02:41<03:36, 79.66it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12788/30000 [02:41<03:36, 79.57it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12797/30000 [02:41<03:35, 79.71it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12805/30000 [02:41<03:36, 79.59it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12813/30000 [02:41<03:36, 79.54it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12821/30000 [02:41<03:36, 79.47it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12830/30000 [02:41<03:35, 79.65it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12838/30000 [02:42<03:35, 79.68it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12846/30000 [02:42<03:36, 79.37it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12854/30000 [02:42<03:36, 79.35it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12862/30000 [02:42<03:35, 79.48it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12870/30000 [02:42<03:36, 79.30it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12878/30000 [02:42<03:35, 79.38it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12886/30000 [02:42<03:35, 79.33it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12894/30000 [02:42<03:35, 79.48it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12902/30000 [02:42<03:35, 79.40it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12910/30000 [02:42<03:35, 79.35it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12918/30000 [02:43<03:35, 79.32it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12926/30000 [02:43<03:34, 79.52it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12935/30000 [02:43<03:33, 79.79it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12943/30000 [02:43<03:34, 79.60it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12951/30000 [02:43<03:34, 79.47it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12959/30000 [02:43<03:34, 79.58it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12968/30000 [02:43<03:33, 79.84it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12976/30000 [02:43<03:33, 79.75it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12984/30000 [02:43<03:33, 79.55it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 12992/30000 [02:43<03:34, 79.15it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 13001/30000 [02:44<03:33, 79.45it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 13009/30000 [02:44<03:33, 79.48it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 13017/30000 [02:44<03:33, 79.58it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 13025/30000 [02:44<03:33, 79.43it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 13033/30000 [02:44<03:33, 79.44it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 13041/30000 [02:44<03:33, 79.35it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 43%|████▎     | 13049/30000 [02:44<03:34, 79.11it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13058/30000 [02:44<03:33, 79.54it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13066/30000 [02:44<03:33, 79.47it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13075/30000 [02:45<03:32, 79.69it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13083/30000 [02:45<03:32, 79.66it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13091/30000 [02:45<03:32, 79.47it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13100/30000 [02:45<03:31, 79.74it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13108/30000 [02:45<03:31, 79.72it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13116/30000 [02:45<03:31, 79.71it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▎     | 13124/30000 [02:45<03:32, 79.60it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13133/30000 [02:45<03:31, 79.77it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13141/30000 [02:45<03:31, 79.81it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13150/30000 [02:45<03:30, 80.09it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13159/30000 [02:46<03:30, 80.09it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13168/30000 [02:46<03:30, 79.88it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13176/30000 [02:46<03:31, 79.43it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13184/30000 [02:46<03:33, 78.89it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13192/30000 [02:46<03:32, 78.97it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13200/30000 [02:46<03:32, 79.17it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13208/30000 [02:46<03:33, 78.78it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13216/30000 [02:46<03:32, 78.81it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13224/30000 [02:46<03:33, 78.69it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13232/30000 [02:47<03:32, 78.91it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13241/30000 [02:47<03:30, 79.46it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13249/30000 [02:47<03:30, 79.42it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13257/30000 [02:47<03:30, 79.52it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13265/30000 [02:47<03:30, 79.34it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13274/30000 [02:47<03:30, 79.61it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13282/30000 [02:47<03:30, 79.60it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13290/30000 [02:47<03:30, 79.35it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13298/30000 [02:47<03:31, 78.97it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13307/30000 [02:47<03:30, 79.27it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13315/30000 [02:48<03:30, 79.37it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13323/30000 [02:48<03:29, 79.49it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13331/30000 [02:48<03:29, 79.52it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13339/30000 [02:48<03:29, 79.62it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 44%|████▍     | 13347/30000 [02:48<03:29, 79.60it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13355/30000 [02:48<03:29, 79.52it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13363/30000 [02:48<03:29, 79.58it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13371/30000 [02:48<03:30, 78.84it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13379/30000 [02:48<03:31, 78.52it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13387/30000 [02:48<03:30, 78.84it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13395/30000 [02:49<03:29, 79.11it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13403/30000 [02:49<03:29, 79.12it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13411/30000 [02:49<03:29, 79.33it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13419/30000 [02:49<03:29, 79.10it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13427/30000 [02:49<03:29, 79.27it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13435/30000 [02:49<03:28, 79.32it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13443/30000 [02:49<03:28, 79.37it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13451/30000 [02:49<03:28, 79.46it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13459/30000 [02:49<03:28, 79.21it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13467/30000 [02:49<03:28, 79.24it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13475/30000 [02:50<03:28, 79.36it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13483/30000 [02:50<03:27, 79.52it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13491/30000 [02:50<03:27, 79.59it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▍     | 13499/30000 [02:50<03:27, 79.66it/s, init loss: 29902.2090, avg. loss [10501-12000]: 19422.2539] 45%|████▌     | 13507/30000 [02:50<03:27, 79.59it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13515/30000 [02:50<03:27, 79.52it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13524/30000 [02:50<03:26, 79.74it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13532/30000 [02:50<03:27, 79.42it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13540/30000 [02:50<03:28, 78.99it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13548/30000 [02:50<03:28, 79.09it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13556/30000 [02:51<03:27, 79.26it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13564/30000 [02:51<03:27, 79.25it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13572/30000 [02:51<03:27, 79.20it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13580/30000 [02:51<03:27, 79.15it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13588/30000 [02:51<03:27, 79.19it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13597/30000 [02:51<03:26, 79.51it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13605/30000 [02:51<03:25, 79.60it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13614/30000 [02:51<03:25, 79.60it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13622/30000 [02:51<03:26, 79.50it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13631/30000 [02:52<03:25, 79.62it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13639/30000 [02:52<03:26, 79.31it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 45%|████▌     | 13647/30000 [02:52<03:25, 79.38it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13655/30000 [02:52<03:25, 79.51it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13663/30000 [02:52<03:25, 79.42it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13671/30000 [02:52<03:26, 79.15it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13679/30000 [02:52<03:26, 79.15it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13687/30000 [02:52<03:25, 79.37it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13695/30000 [02:52<03:25, 79.18it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13703/30000 [02:52<03:26, 78.97it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13712/30000 [02:53<03:25, 79.33it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13721/30000 [02:53<03:24, 79.66it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13729/30000 [02:53<03:24, 79.71it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13737/30000 [02:53<03:24, 79.55it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13746/30000 [02:53<03:23, 79.73it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13754/30000 [02:53<03:23, 79.77it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13762/30000 [02:53<03:24, 79.47it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13770/30000 [02:53<03:24, 79.55it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13778/30000 [02:53<03:25, 78.86it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13786/30000 [02:53<03:25, 79.07it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13795/30000 [02:54<03:23, 79.46it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13803/30000 [02:54<03:24, 79.10it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13811/30000 [02:54<03:25, 78.88it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13819/30000 [02:54<03:24, 79.20it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13827/30000 [02:54<03:24, 78.94it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13835/30000 [02:54<03:24, 79.03it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13843/30000 [02:54<03:24, 79.08it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13851/30000 [02:54<03:23, 79.24it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13859/30000 [02:54<03:24, 79.06it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▌     | 13867/30000 [02:55<03:23, 79.20it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13875/30000 [02:55<03:23, 79.38it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13883/30000 [02:55<03:22, 79.54it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13891/30000 [02:55<03:22, 79.48it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13899/30000 [02:55<03:22, 79.44it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13907/30000 [02:55<03:22, 79.46it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13915/30000 [02:55<03:22, 79.60it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13923/30000 [02:55<03:22, 79.35it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13932/30000 [02:55<03:22, 79.28it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13940/30000 [02:55<03:23, 78.94it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 46%|████▋     | 13949/30000 [02:56<03:22, 79.37it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 13957/30000 [02:56<03:21, 79.49it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 13965/30000 [02:56<03:22, 79.30it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 13973/30000 [02:56<03:22, 79.26it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 13981/30000 [02:56<03:22, 79.23it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 13989/30000 [02:56<03:21, 79.41it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 13997/30000 [02:56<03:21, 79.37it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14005/30000 [02:56<03:21, 79.21it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14013/30000 [02:56<03:21, 79.18it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14021/30000 [02:56<03:22, 79.01it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14029/30000 [02:57<03:22, 79.01it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14037/30000 [02:57<03:21, 79.22it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14045/30000 [02:57<03:22, 78.92it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14053/30000 [02:57<03:21, 79.12it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14061/30000 [02:57<03:21, 79.14it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14070/30000 [02:57<03:20, 79.46it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14078/30000 [02:57<03:20, 79.55it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14086/30000 [02:57<03:20, 79.49it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14094/30000 [02:57<03:22, 78.64it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14102/30000 [02:57<03:21, 78.91it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14110/30000 [02:58<03:20, 79.18it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14118/30000 [02:58<03:20, 79.26it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14126/30000 [02:58<03:19, 79.44it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14134/30000 [02:58<03:20, 79.23it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14142/30000 [02:58<03:19, 79.32it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14150/30000 [02:58<03:19, 79.49it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14158/30000 [02:58<03:19, 79.47it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14166/30000 [02:58<03:19, 79.44it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14174/30000 [02:58<03:19, 79.38it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14183/30000 [02:58<03:18, 79.71it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14191/30000 [02:59<03:18, 79.64it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14200/30000 [02:59<03:18, 79.72it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14208/30000 [02:59<03:18, 79.64it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14217/30000 [02:59<03:17, 79.97it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14225/30000 [02:59<03:17, 79.98it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14234/30000 [02:59<03:17, 80.02it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 47%|████▋     | 14243/30000 [02:59<03:17, 79.77it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14251/30000 [02:59<03:19, 79.04it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14259/30000 [02:59<03:19, 78.96it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14267/30000 [03:00<03:18, 79.21it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14275/30000 [03:00<03:18, 79.30it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14283/30000 [03:00<03:17, 79.48it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14292/30000 [03:00<03:17, 79.54it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14300/30000 [03:00<03:17, 79.53it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14308/30000 [03:00<03:17, 79.35it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14317/30000 [03:00<03:17, 79.56it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14325/30000 [03:00<03:16, 79.68it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14333/30000 [03:00<03:16, 79.59it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14341/30000 [03:00<03:17, 79.42it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14349/30000 [03:01<03:16, 79.51it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14357/30000 [03:01<03:16, 79.56it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14366/30000 [03:01<03:16, 79.72it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14374/30000 [03:01<03:15, 79.78it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14382/30000 [03:01<03:15, 79.80it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14390/30000 [03:01<03:15, 79.82it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14398/30000 [03:01<03:17, 79.14it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14407/30000 [03:01<03:16, 79.41it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14415/30000 [03:01<03:18, 78.69it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14423/30000 [03:02<03:17, 78.73it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14432/30000 [03:02<03:16, 79.21it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14440/30000 [03:02<03:17, 78.93it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14448/30000 [03:02<03:17, 78.86it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14456/30000 [03:02<03:16, 78.96it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14465/30000 [03:02<03:15, 79.46it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14473/30000 [03:02<03:15, 79.51it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14481/30000 [03:02<03:15, 79.52it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14489/30000 [03:02<03:15, 79.15it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14497/30000 [03:02<03:15, 79.26it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14505/30000 [03:03<03:14, 79.46it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14513/30000 [03:03<03:15, 79.38it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14522/30000 [03:03<03:14, 79.76it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14531/30000 [03:03<03:13, 79.89it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14539/30000 [03:03<03:13, 79.80it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 48%|████▊     | 14547/30000 [03:03<03:13, 79.84it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14555/30000 [03:03<03:14, 79.61it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14564/30000 [03:03<03:13, 79.77it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14572/30000 [03:03<03:14, 79.47it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14580/30000 [03:03<03:14, 79.08it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14589/30000 [03:04<03:13, 79.56it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14597/30000 [03:04<03:13, 79.61it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14605/30000 [03:04<03:13, 79.58it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14613/30000 [03:04<03:13, 79.45it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▊     | 14621/30000 [03:04<03:13, 79.57it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14629/30000 [03:04<03:13, 79.56it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14637/30000 [03:04<03:12, 79.66it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14645/30000 [03:04<03:13, 79.50it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14653/30000 [03:04<03:13, 79.20it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14661/30000 [03:05<03:14, 79.06it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14669/30000 [03:05<03:13, 79.21it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14677/30000 [03:05<03:13, 79.06it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14686/30000 [03:05<03:12, 79.41it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14694/30000 [03:05<03:12, 79.34it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14702/30000 [03:05<03:12, 79.49it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14710/30000 [03:05<03:12, 79.26it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14718/30000 [03:05<03:12, 79.34it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14726/30000 [03:05<03:12, 79.52it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14734/30000 [03:05<03:12, 79.27it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14742/30000 [03:06<03:12, 79.22it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14750/30000 [03:06<03:12, 79.31it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14758/30000 [03:06<03:12, 79.32it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14767/30000 [03:06<03:11, 79.50it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14775/30000 [03:06<03:11, 79.46it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14783/30000 [03:06<03:11, 79.39it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14791/30000 [03:06<03:12, 79.15it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14799/30000 [03:06<03:11, 79.21it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14807/30000 [03:06<03:11, 79.23it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14815/30000 [03:06<03:11, 79.32it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14823/30000 [03:07<03:10, 79.52it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14831/30000 [03:07<03:11, 79.40it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14839/30000 [03:07<03:10, 79.53it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 49%|████▉     | 14847/30000 [03:07<03:10, 79.65it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14855/30000 [03:07<03:10, 79.69it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14863/30000 [03:07<03:10, 79.54it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14871/30000 [03:07<03:10, 79.30it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14879/30000 [03:07<03:10, 79.18it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14887/30000 [03:07<03:11, 78.89it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14895/30000 [03:07<03:11, 79.04it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14903/30000 [03:08<03:11, 79.00it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14911/30000 [03:08<03:10, 79.09it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14919/30000 [03:08<03:10, 79.24it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14927/30000 [03:08<03:10, 79.19it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14936/30000 [03:08<03:09, 79.35it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14945/30000 [03:08<03:08, 79.67it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14954/30000 [03:08<03:08, 79.81it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14962/30000 [03:08<03:08, 79.65it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14970/30000 [03:08<03:09, 79.36it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14979/30000 [03:09<03:08, 79.72it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14987/30000 [03:09<03:08, 79.59it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|████▉     | 14995/30000 [03:09<03:08, 79.60it/s, init loss: 29902.2090, avg. loss [12001-13500]: 19199.0605] 50%|█████     | 15003/30000 [03:09<03:08, 79.64it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15012/30000 [03:09<03:07, 79.78it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15020/30000 [03:09<03:07, 79.78it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15028/30000 [03:09<03:08, 79.62it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15036/30000 [03:09<03:07, 79.72it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15044/30000 [03:09<03:09, 78.74it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15052/30000 [03:09<03:09, 78.91it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15060/30000 [03:10<03:08, 79.06it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15069/30000 [03:10<03:08, 79.30it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15077/30000 [03:10<03:08, 79.34it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15086/30000 [03:10<03:07, 79.59it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15094/30000 [03:10<03:07, 79.50it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15103/30000 [03:10<03:07, 79.66it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15112/30000 [03:10<03:06, 79.81it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15120/30000 [03:10<03:06, 79.63it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15128/30000 [03:10<03:08, 79.05it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15136/30000 [03:10<03:08, 79.00it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 50%|█████     | 15144/30000 [03:11<03:07, 79.04it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15152/30000 [03:11<03:08, 78.80it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15160/30000 [03:11<03:08, 78.75it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15169/30000 [03:11<03:07, 79.28it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15177/30000 [03:11<03:06, 79.40it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15185/30000 [03:11<03:06, 79.42it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15193/30000 [03:11<03:06, 79.32it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15201/30000 [03:11<03:06, 79.39it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15209/30000 [03:11<03:06, 79.41it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15217/30000 [03:12<03:05, 79.55it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15225/30000 [03:12<03:05, 79.56it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15234/30000 [03:12<03:04, 79.82it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15243/30000 [03:12<03:04, 80.06it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15252/30000 [03:12<03:04, 79.76it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15260/30000 [03:12<03:05, 79.65it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15268/30000 [03:12<03:05, 79.51it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15277/30000 [03:12<03:04, 79.76it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15285/30000 [03:12<03:04, 79.63it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15294/30000 [03:12<03:04, 79.84it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15302/30000 [03:13<03:04, 79.48it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15310/30000 [03:13<03:04, 79.41it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15318/30000 [03:13<03:04, 79.37it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15326/30000 [03:13<03:04, 79.33it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15334/30000 [03:13<03:04, 79.43it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15342/30000 [03:13<03:04, 79.28it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15350/30000 [03:13<03:04, 79.49it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15358/30000 [03:13<03:04, 79.23it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15366/30000 [03:13<03:04, 79.27it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████     | 15374/30000 [03:13<03:04, 79.42it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15382/30000 [03:14<03:03, 79.48it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15390/30000 [03:14<03:03, 79.57it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15398/30000 [03:14<03:04, 78.98it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15406/30000 [03:14<03:05, 78.85it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15414/30000 [03:14<03:04, 78.99it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15423/30000 [03:14<03:04, 79.22it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15432/30000 [03:14<03:03, 79.44it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15440/30000 [03:14<03:03, 79.34it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 51%|█████▏    | 15448/30000 [03:14<03:04, 79.03it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15457/30000 [03:15<03:02, 79.53it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15466/30000 [03:15<03:02, 79.74it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15474/30000 [03:15<03:02, 79.65it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15482/30000 [03:15<03:02, 79.57it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15490/30000 [03:15<03:02, 79.52it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15498/30000 [03:15<03:02, 79.64it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15506/30000 [03:15<03:01, 79.66it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15514/30000 [03:15<03:02, 79.52it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15522/30000 [03:15<03:02, 79.48it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15530/30000 [03:15<03:01, 79.55it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15538/30000 [03:16<03:02, 79.46it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15546/30000 [03:16<03:02, 79.34it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15554/30000 [03:16<03:02, 79.16it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15562/30000 [03:16<03:02, 79.14it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15570/30000 [03:16<03:01, 79.39it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15578/30000 [03:16<03:01, 79.33it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15587/30000 [03:16<03:01, 79.47it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15595/30000 [03:16<03:01, 79.33it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15603/30000 [03:16<03:01, 79.16it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15611/30000 [03:16<03:01, 79.39it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15619/30000 [03:17<03:01, 79.38it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15627/30000 [03:17<03:00, 79.48it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15635/30000 [03:17<03:00, 79.63it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15643/30000 [03:17<03:00, 79.58it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15652/30000 [03:17<02:59, 79.85it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15661/30000 [03:17<02:59, 79.85it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15669/30000 [03:17<02:59, 79.71it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15677/30000 [03:17<03:00, 79.52it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15685/30000 [03:17<03:00, 79.42it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15693/30000 [03:18<03:00, 79.42it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15701/30000 [03:18<03:00, 79.27it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15709/30000 [03:18<03:00, 79.23it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15717/30000 [03:18<02:59, 79.42it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15726/30000 [03:18<02:59, 79.68it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15734/30000 [03:18<02:58, 79.73it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▏    | 15742/30000 [03:18<02:58, 79.77it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 52%|█████▎    | 15750/30000 [03:18<02:59, 79.49it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15758/30000 [03:18<03:00, 78.94it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15766/30000 [03:18<03:00, 79.03it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15774/30000 [03:19<02:59, 79.10it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15782/30000 [03:19<02:59, 79.34it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15791/30000 [03:19<02:58, 79.64it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15799/30000 [03:19<02:58, 79.54it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15808/30000 [03:19<02:57, 79.91it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15816/30000 [03:19<02:57, 79.92it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15824/30000 [03:19<02:57, 79.89it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15832/30000 [03:19<02:57, 79.73it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15840/30000 [03:19<02:57, 79.77it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15848/30000 [03:19<02:57, 79.82it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15856/30000 [03:20<02:57, 79.72it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15865/30000 [03:20<02:56, 80.13it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15874/30000 [03:20<02:56, 80.15it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15883/30000 [03:20<02:55, 80.23it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15892/30000 [03:20<02:56, 79.92it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15901/30000 [03:20<02:56, 80.00it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15909/30000 [03:20<02:56, 79.99it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15917/30000 [03:20<02:56, 79.98it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15925/30000 [03:20<02:57, 79.44it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15933/30000 [03:21<02:57, 79.46it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15941/30000 [03:21<02:56, 79.59it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15949/30000 [03:21<02:56, 79.42it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15957/30000 [03:21<02:56, 79.53it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15966/30000 [03:21<02:56, 79.64it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15974/30000 [03:21<02:55, 79.70it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15982/30000 [03:21<02:56, 79.45it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15990/30000 [03:21<02:56, 79.39it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 15998/30000 [03:21<02:57, 78.78it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 16007/30000 [03:21<02:56, 79.21it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 16015/30000 [03:22<02:56, 79.39it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 16023/30000 [03:22<02:56, 79.11it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 16031/30000 [03:22<02:57, 78.81it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 16040/30000 [03:22<02:55, 79.37it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 53%|█████▎    | 16048/30000 [03:22<02:55, 79.34it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16056/30000 [03:22<02:55, 79.42it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16064/30000 [03:22<02:55, 79.51it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16072/30000 [03:22<02:54, 79.61it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16080/30000 [03:22<02:55, 79.25it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16088/30000 [03:22<02:55, 79.44it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16096/30000 [03:23<02:54, 79.49it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16104/30000 [03:23<02:54, 79.58it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16112/30000 [03:23<02:54, 79.55it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▎    | 16120/30000 [03:23<02:55, 79.31it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16129/30000 [03:23<02:54, 79.47it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16138/30000 [03:23<02:53, 79.77it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16146/30000 [03:23<02:54, 79.48it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16155/30000 [03:23<02:54, 79.47it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16163/30000 [03:23<02:55, 79.04it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16171/30000 [03:24<02:54, 79.31it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16179/30000 [03:24<02:54, 79.37it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16187/30000 [03:24<02:53, 79.51it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16196/30000 [03:24<02:53, 79.59it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16204/30000 [03:24<02:53, 79.52it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16212/30000 [03:24<02:53, 79.49it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16220/30000 [03:24<02:53, 79.54it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16228/30000 [03:24<02:53, 79.47it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16236/30000 [03:24<02:53, 79.33it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16244/30000 [03:24<02:53, 79.32it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16252/30000 [03:25<02:53, 79.31it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16260/30000 [03:25<02:53, 79.34it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16268/30000 [03:25<02:53, 79.27it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16276/30000 [03:25<02:52, 79.46it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16284/30000 [03:25<02:52, 79.59it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16292/30000 [03:25<02:52, 79.44it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16300/30000 [03:25<02:52, 79.34it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16308/30000 [03:25<02:52, 79.47it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16316/30000 [03:25<02:52, 79.54it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16324/30000 [03:25<02:52, 79.50it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16332/30000 [03:26<02:52, 79.25it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16340/30000 [03:26<02:51, 79.44it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 54%|█████▍    | 16348/30000 [03:26<02:51, 79.46it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16357/30000 [03:26<02:51, 79.62it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16365/30000 [03:26<02:51, 79.64it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16374/30000 [03:26<02:50, 79.83it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16382/30000 [03:26<02:50, 79.64it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16390/30000 [03:26<02:50, 79.64it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16398/30000 [03:26<02:52, 78.66it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16406/30000 [03:26<02:52, 78.95it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16414/30000 [03:27<02:52, 78.96it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16423/30000 [03:27<02:51, 79.28it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16431/30000 [03:27<02:50, 79.39it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16439/30000 [03:27<02:50, 79.31it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16447/30000 [03:27<02:51, 79.04it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16455/30000 [03:27<02:50, 79.25it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16463/30000 [03:27<02:50, 79.33it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16472/30000 [03:27<02:50, 79.57it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16480/30000 [03:27<02:49, 79.53it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16489/30000 [03:28<02:49, 79.79it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▍    | 16497/30000 [03:28<02:49, 79.65it/s, init loss: 29902.2090, avg. loss [13501-15000]: 19039.7754] 55%|█████▌    | 16505/30000 [03:28<02:49, 79.42it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16513/30000 [03:28<02:49, 79.46it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16522/30000 [03:28<02:48, 79.76it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16530/30000 [03:28<02:48, 79.72it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16538/30000 [03:28<02:49, 79.46it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16546/30000 [03:28<02:49, 79.33it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16554/30000 [03:28<02:49, 79.28it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16562/30000 [03:28<02:49, 79.40it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16570/30000 [03:29<02:49, 79.31it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16578/30000 [03:29<02:49, 79.41it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16586/30000 [03:29<02:48, 79.48it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16594/30000 [03:29<02:48, 79.58it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16602/30000 [03:29<02:48, 79.44it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16610/30000 [03:29<02:48, 79.35it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16618/30000 [03:29<02:48, 79.35it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16627/30000 [03:29<02:47, 79.61it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16635/30000 [03:29<02:47, 79.62it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 55%|█████▌    | 16644/30000 [03:29<02:47, 79.74it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16652/30000 [03:30<02:47, 79.71it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16660/30000 [03:30<02:50, 78.31it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16668/30000 [03:30<02:52, 77.43it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16676/30000 [03:30<02:52, 77.30it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16684/30000 [03:30<02:51, 77.56it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16692/30000 [03:30<02:51, 77.75it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16700/30000 [03:30<02:51, 77.60it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16708/30000 [03:30<02:51, 77.71it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16716/30000 [03:30<02:50, 78.04it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16724/30000 [03:31<02:49, 78.44it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16732/30000 [03:31<02:48, 78.67it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16740/30000 [03:31<02:47, 78.98it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16748/30000 [03:31<02:47, 79.18it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16757/30000 [03:31<02:46, 79.48it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16765/30000 [03:31<02:47, 79.24it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16773/30000 [03:31<02:47, 79.19it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16781/30000 [03:31<02:46, 79.39it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16789/30000 [03:31<02:46, 79.45it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16798/30000 [03:31<02:45, 79.69it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16806/30000 [03:32<02:45, 79.54it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16814/30000 [03:32<02:46, 79.21it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16822/30000 [03:32<02:45, 79.40it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16830/30000 [03:32<02:46, 79.11it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16838/30000 [03:32<02:46, 79.28it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16847/30000 [03:32<02:45, 79.62it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16855/30000 [03:32<02:45, 79.63it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16863/30000 [03:32<02:45, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▌    | 16871/30000 [03:32<02:45, 79.15it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16879/30000 [03:32<02:45, 79.25it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16887/30000 [03:33<02:45, 79.21it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16895/30000 [03:33<02:45, 79.28it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16903/30000 [03:33<02:45, 79.27it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16911/30000 [03:33<02:45, 79.33it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16919/30000 [03:33<02:44, 79.32it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16928/30000 [03:33<02:44, 79.58it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16936/30000 [03:33<02:44, 79.54it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 56%|█████▋    | 16944/30000 [03:33<02:44, 79.51it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 16952/30000 [03:33<02:45, 79.03it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 16960/30000 [03:33<02:44, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 16968/30000 [03:34<02:44, 79.19it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 16976/30000 [03:34<02:45, 78.87it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 16984/30000 [03:34<02:44, 79.08it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 16992/30000 [03:34<02:43, 79.33it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17001/30000 [03:34<02:43, 79.55it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17009/30000 [03:34<02:43, 79.61it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17017/30000 [03:34<02:43, 79.50it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17025/30000 [03:34<02:43, 79.25it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17033/30000 [03:34<02:44, 78.72it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17041/30000 [03:34<02:44, 78.88it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17049/30000 [03:35<02:44, 78.88it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17057/30000 [03:35<02:43, 79.05it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17065/30000 [03:35<02:43, 79.07it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17073/30000 [03:35<02:43, 79.13it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17081/30000 [03:35<02:43, 79.12it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17089/30000 [03:35<02:43, 79.21it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17097/30000 [03:35<02:42, 79.39it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17105/30000 [03:35<02:43, 79.07it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17113/30000 [03:35<02:42, 79.13it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17121/30000 [03:36<02:42, 79.08it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17130/30000 [03:36<02:41, 79.49it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17138/30000 [03:36<02:42, 79.17it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17146/30000 [03:36<02:42, 79.08it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17154/30000 [03:36<02:42, 79.27it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17162/30000 [03:36<02:42, 79.18it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17170/30000 [03:36<02:41, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17179/30000 [03:36<02:41, 79.43it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17187/30000 [03:36<02:42, 78.94it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17195/30000 [03:36<02:42, 78.97it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17203/30000 [03:37<02:42, 78.79it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17212/30000 [03:37<02:41, 79.28it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17220/30000 [03:37<02:41, 79.31it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17229/30000 [03:37<02:40, 79.54it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17237/30000 [03:37<02:40, 79.54it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 57%|█████▋    | 17245/30000 [03:37<02:40, 79.31it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17253/30000 [03:37<02:40, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17261/30000 [03:37<02:40, 79.40it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17269/30000 [03:37<02:41, 78.92it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17277/30000 [03:37<02:41, 79.02it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17285/30000 [03:38<02:40, 79.08it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17293/30000 [03:38<02:40, 79.19it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17301/30000 [03:38<02:40, 79.20it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17309/30000 [03:38<02:40, 79.27it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17317/30000 [03:38<02:39, 79.27it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17325/30000 [03:38<02:39, 79.28it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17333/30000 [03:38<02:39, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17341/30000 [03:38<02:39, 79.29it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17349/30000 [03:38<02:39, 79.18it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17357/30000 [03:38<02:39, 79.41it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17365/30000 [03:39<02:38, 79.51it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17373/30000 [03:39<02:39, 79.31it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17381/30000 [03:39<02:38, 79.37it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17389/30000 [03:39<02:39, 79.10it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17397/30000 [03:39<02:38, 79.31it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17405/30000 [03:39<02:39, 79.09it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17413/30000 [03:39<02:38, 79.19it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17421/30000 [03:39<02:38, 79.19it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17429/30000 [03:39<02:38, 79.18it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17437/30000 [03:39<02:38, 79.09it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17446/30000 [03:40<02:38, 79.18it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17454/30000 [03:40<02:38, 79.20it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17462/30000 [03:40<02:38, 79.22it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17470/30000 [03:40<02:38, 79.00it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17478/30000 [03:40<02:38, 78.86it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17486/30000 [03:40<02:38, 78.88it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17494/30000 [03:40<02:37, 79.20it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17502/30000 [03:40<02:37, 79.28it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17510/30000 [03:40<02:38, 78.80it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17519/30000 [03:41<02:37, 79.35it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17527/30000 [03:41<02:36, 79.48it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17535/30000 [03:41<02:36, 79.54it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 58%|█████▊    | 17543/30000 [03:41<02:37, 79.29it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17551/30000 [03:41<02:37, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17560/30000 [03:41<02:36, 79.52it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17568/30000 [03:41<02:36, 79.35it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17577/30000 [03:41<02:35, 79.67it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17585/30000 [03:41<02:35, 79.61it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17593/30000 [03:41<02:35, 79.67it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17601/30000 [03:42<02:35, 79.62it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17609/30000 [03:42<02:36, 79.24it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▊    | 17618/30000 [03:42<02:35, 79.67it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17626/30000 [03:42<02:35, 79.69it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17634/30000 [03:42<02:35, 79.32it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17642/30000 [03:42<02:35, 79.46it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17650/30000 [03:42<02:35, 79.36it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17658/30000 [03:42<02:35, 79.45it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17666/30000 [03:42<02:35, 79.29it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17674/30000 [03:42<02:35, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17682/30000 [03:43<02:35, 79.45it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17690/30000 [03:43<02:34, 79.48it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17698/30000 [03:43<02:34, 79.39it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17707/30000 [03:43<02:34, 79.63it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17715/30000 [03:43<02:34, 79.51it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17723/30000 [03:43<02:34, 79.45it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17731/30000 [03:43<02:34, 79.43it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17739/30000 [03:43<02:34, 79.57it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17747/30000 [03:43<02:35, 79.02it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17756/30000 [03:44<02:34, 79.25it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17764/30000 [03:44<02:34, 79.14it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17772/30000 [03:44<02:34, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17780/30000 [03:44<02:34, 79.31it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17788/30000 [03:44<02:34, 79.26it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17796/30000 [03:44<02:33, 79.43it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17804/30000 [03:44<02:33, 79.32it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17813/30000 [03:44<02:33, 79.59it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17821/30000 [03:44<02:33, 79.42it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17829/30000 [03:44<02:33, 79.47it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17837/30000 [03:45<02:33, 79.37it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 59%|█████▉    | 17845/30000 [03:45<02:33, 79.19it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17853/30000 [03:45<02:33, 79.25it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17861/30000 [03:45<02:33, 79.23it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17869/30000 [03:45<02:33, 78.78it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17877/30000 [03:45<02:33, 78.89it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17885/30000 [03:45<02:33, 78.98it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17894/30000 [03:45<02:32, 79.28it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17902/30000 [03:45<02:32, 79.32it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17911/30000 [03:45<02:31, 79.67it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17919/30000 [03:46<02:31, 79.70it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17927/30000 [03:46<02:31, 79.46it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17935/30000 [03:46<02:32, 79.07it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17943/30000 [03:46<02:33, 78.65it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17951/30000 [03:46<02:33, 78.43it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17959/30000 [03:46<02:32, 78.74it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17967/30000 [03:46<02:32, 78.88it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17975/30000 [03:46<02:31, 79.19it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17983/30000 [03:46<02:31, 79.27it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17991/30000 [03:46<02:31, 79.37it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|█████▉    | 17999/30000 [03:47<02:31, 79.41it/s, init loss: 29902.2090, avg. loss [15001-16500]: 18916.3613] 60%|██████    | 18007/30000 [03:47<02:31, 79.13it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18015/30000 [03:47<02:31, 79.01it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18023/30000 [03:47<02:31, 79.15it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18031/30000 [03:47<02:30, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18039/30000 [03:47<02:30, 79.24it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18047/30000 [03:47<02:30, 79.25it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18055/30000 [03:47<02:30, 79.35it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18063/30000 [03:47<02:30, 79.18it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18071/30000 [03:47<02:30, 79.10it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18079/30000 [03:48<02:30, 79.27it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18087/30000 [03:48<02:29, 79.42it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18096/30000 [03:48<02:29, 79.68it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18104/30000 [03:48<02:29, 79.55it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18112/30000 [03:48<02:29, 79.47it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18121/30000 [03:48<02:28, 79.76it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18129/30000 [03:48<02:29, 79.51it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18137/30000 [03:48<02:29, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 60%|██████    | 18145/30000 [03:48<02:29, 79.24it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18153/30000 [03:49<02:29, 79.39it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18161/30000 [03:49<02:29, 79.43it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18169/30000 [03:49<02:28, 79.55it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18177/30000 [03:49<02:28, 79.61it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18185/30000 [03:49<02:28, 79.59it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18193/30000 [03:49<02:28, 79.64it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18201/30000 [03:49<02:28, 79.37it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18209/30000 [03:49<02:28, 79.47it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18217/30000 [03:49<02:27, 79.62it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18225/30000 [03:49<02:27, 79.69it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18234/30000 [03:50<02:27, 79.77it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18242/30000 [03:50<02:27, 79.70it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18250/30000 [03:50<02:27, 79.47it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18258/30000 [03:50<02:27, 79.35it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18267/30000 [03:50<02:27, 79.59it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18275/30000 [03:50<02:27, 79.50it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18283/30000 [03:50<02:27, 79.57it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18291/30000 [03:50<02:27, 79.43it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18299/30000 [03:50<02:27, 79.43it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18307/30000 [03:50<02:27, 79.39it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18315/30000 [03:51<02:26, 79.51it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18324/30000 [03:51<02:26, 79.74it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18332/30000 [03:51<02:26, 79.79it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18340/30000 [03:51<02:26, 79.62it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18348/30000 [03:51<02:27, 79.21it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18356/30000 [03:51<02:27, 79.10it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18364/30000 [03:51<02:27, 79.14it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████    | 18372/30000 [03:51<02:27, 79.01it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18380/30000 [03:51<02:26, 79.08it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18389/30000 [03:51<02:26, 79.46it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18397/30000 [03:52<02:25, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18405/30000 [03:52<02:25, 79.57it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18413/30000 [03:52<02:25, 79.59it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18421/30000 [03:52<02:25, 79.71it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18429/30000 [03:52<02:25, 79.48it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18437/30000 [03:52<02:25, 79.60it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 61%|██████▏   | 18445/30000 [03:52<02:25, 79.68it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18453/30000 [03:52<02:25, 79.39it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18461/30000 [03:52<02:26, 78.81it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18470/30000 [03:53<02:25, 79.25it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18478/30000 [03:53<02:25, 79.30it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18486/30000 [03:53<02:25, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18494/30000 [03:53<02:24, 79.42it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18502/30000 [03:53<02:24, 79.47it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18510/30000 [03:53<02:25, 79.20it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18518/30000 [03:53<02:25, 79.05it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18527/30000 [03:53<02:24, 79.31it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18535/30000 [03:53<02:24, 79.32it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18543/30000 [03:53<02:24, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18552/30000 [03:54<02:23, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18560/30000 [03:54<02:23, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18568/30000 [03:54<02:23, 79.45it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18577/30000 [03:54<02:23, 79.67it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18585/30000 [03:54<02:23, 79.74it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18593/30000 [03:54<02:23, 79.60it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18601/30000 [03:54<02:23, 79.71it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18609/30000 [03:54<02:22, 79.68it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18617/30000 [03:54<02:23, 79.47it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18625/30000 [03:54<02:23, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18634/30000 [03:55<02:23, 79.48it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18642/30000 [03:55<02:22, 79.50it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18650/30000 [03:55<02:22, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18659/30000 [03:55<02:22, 79.70it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18667/30000 [03:55<02:22, 79.37it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18675/30000 [03:55<02:22, 79.24it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18683/30000 [03:55<02:22, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18691/30000 [03:55<02:22, 79.27it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18699/30000 [03:55<02:22, 79.23it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18707/30000 [03:55<02:23, 78.89it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18716/30000 [03:56<02:22, 79.38it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18724/30000 [03:56<02:22, 79.22it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18733/30000 [03:56<02:21, 79.60it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18741/30000 [03:56<02:21, 79.65it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 62%|██████▏   | 18749/30000 [03:56<02:21, 79.64it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18757/30000 [03:56<02:21, 79.59it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18765/30000 [03:56<02:21, 79.43it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18773/30000 [03:56<02:22, 78.87it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18782/30000 [03:56<02:21, 79.32it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18790/30000 [03:57<02:21, 79.27it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18798/30000 [03:57<02:21, 79.33it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18806/30000 [03:57<02:20, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18814/30000 [03:57<02:21, 79.20it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18822/30000 [03:57<02:20, 79.33it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18830/30000 [03:57<02:21, 79.04it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18839/30000 [03:57<02:20, 79.50it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18848/30000 [03:57<02:20, 79.61it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18856/30000 [03:57<02:21, 78.93it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18864/30000 [03:57<02:21, 78.75it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18873/30000 [03:58<02:20, 79.15it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18881/30000 [03:58<02:20, 79.27it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18889/30000 [03:58<02:20, 79.34it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18897/30000 [03:58<02:20, 79.08it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18905/30000 [03:58<02:20, 79.25it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18913/30000 [03:58<02:19, 79.32it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18921/30000 [03:58<02:20, 79.07it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18930/30000 [03:58<02:19, 79.41it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18938/30000 [03:58<02:19, 79.13it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18946/30000 [03:59<02:19, 79.20it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18954/30000 [03:59<02:19, 79.17it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18962/30000 [03:59<02:19, 79.32it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18970/30000 [03:59<02:19, 79.33it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18978/30000 [03:59<02:18, 79.36it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18987/30000 [03:59<02:18, 79.74it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 18995/30000 [03:59<02:18, 79.55it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 19003/30000 [03:59<02:18, 79.53it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 19011/30000 [03:59<02:18, 79.37it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 19019/30000 [03:59<02:18, 79.28it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 19027/30000 [04:00<02:18, 78.94it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 19036/30000 [04:00<02:18, 79.45it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 63%|██████▎   | 19044/30000 [04:00<02:17, 79.42it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19052/30000 [04:00<02:17, 79.45it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19060/30000 [04:00<02:17, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19068/30000 [04:00<02:17, 79.53it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19076/30000 [04:00<02:17, 79.35it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19085/30000 [04:00<02:17, 79.65it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19093/30000 [04:00<02:17, 79.61it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19101/30000 [04:00<02:17, 79.31it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19109/30000 [04:01<02:17, 79.43it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▎   | 19117/30000 [04:01<02:17, 79.39it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19125/30000 [04:01<02:16, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19134/30000 [04:01<02:16, 79.68it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19143/30000 [04:01<02:15, 79.94it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19151/30000 [04:01<02:15, 79.84it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19159/30000 [04:01<02:16, 79.71it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19167/30000 [04:01<02:15, 79.68it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19175/30000 [04:01<02:16, 79.27it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19183/30000 [04:01<02:16, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19191/30000 [04:02<02:15, 79.52it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19199/30000 [04:02<02:15, 79.47it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19208/30000 [04:02<02:15, 79.61it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19216/30000 [04:02<02:15, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19224/30000 [04:02<02:15, 79.57it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19232/30000 [04:02<02:15, 79.46it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19240/30000 [04:02<02:15, 79.34it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19249/30000 [04:02<02:14, 79.65it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19257/30000 [04:02<02:15, 79.23it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19265/30000 [04:03<02:15, 79.40it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19273/30000 [04:03<02:15, 79.42it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19282/30000 [04:03<02:14, 79.62it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19290/30000 [04:03<02:14, 79.69it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19298/30000 [04:03<02:14, 79.74it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19306/30000 [04:03<02:14, 79.59it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19314/30000 [04:03<02:14, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19322/30000 [04:03<02:14, 79.63it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19330/30000 [04:03<02:14, 79.56it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19338/30000 [04:03<02:14, 79.45it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 64%|██████▍   | 19347/30000 [04:04<02:13, 79.69it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19355/30000 [04:04<02:13, 79.57it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19364/30000 [04:04<02:13, 79.70it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19372/30000 [04:04<02:13, 79.55it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19380/30000 [04:04<02:13, 79.49it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19388/30000 [04:04<02:13, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19396/30000 [04:04<02:13, 79.60it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19404/30000 [04:04<02:12, 79.72it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19412/30000 [04:04<02:13, 79.50it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19420/30000 [04:04<02:13, 79.32it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19429/30000 [04:05<02:12, 79.69it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19437/30000 [04:05<02:12, 79.58it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19445/30000 [04:05<02:12, 79.60it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19453/30000 [04:05<02:12, 79.68it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19461/30000 [04:05<02:12, 79.54it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19469/30000 [04:05<02:12, 79.52it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19477/30000 [04:05<02:12, 79.66it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19485/30000 [04:05<02:11, 79.71it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▍   | 19493/30000 [04:05<02:12, 79.32it/s, init loss: 29902.2090, avg. loss [16501-18000]: 18822.2168] 65%|██████▌   | 19501/30000 [04:05<02:12, 79.40it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19510/30000 [04:06<02:11, 79.61it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19519/30000 [04:06<02:11, 79.82it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19527/30000 [04:06<02:11, 79.58it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19536/30000 [04:06<02:11, 79.82it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19544/30000 [04:06<02:11, 79.27it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19552/30000 [04:06<02:11, 79.33it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19560/30000 [04:06<02:12, 79.05it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19568/30000 [04:06<02:12, 79.01it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19577/30000 [04:06<02:11, 79.39it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19585/30000 [04:07<02:11, 79.46it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19593/30000 [04:07<02:11, 79.39it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19602/30000 [04:07<02:10, 79.69it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19610/30000 [04:07<02:10, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19618/30000 [04:07<02:10, 79.38it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19627/30000 [04:07<02:10, 79.62it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19636/30000 [04:07<02:09, 79.74it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 65%|██████▌   | 19645/30000 [04:07<02:10, 79.53it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19653/30000 [04:07<02:10, 79.33it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19661/30000 [04:08<02:10, 79.25it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19670/30000 [04:08<02:09, 79.58it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19679/30000 [04:08<02:09, 79.77it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19687/30000 [04:08<02:09, 79.74it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19696/30000 [04:08<02:08, 80.03it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19705/30000 [04:08<02:08, 79.99it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19713/30000 [04:08<02:08, 79.81it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19721/30000 [04:08<02:09, 79.65it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19729/30000 [04:08<02:08, 79.63it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19737/30000 [04:08<02:09, 79.37it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19745/30000 [04:09<02:08, 79.52it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19753/30000 [04:09<02:08, 79.54it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19761/30000 [04:09<02:08, 79.47it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19769/30000 [04:09<02:08, 79.45it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19777/30000 [04:09<02:08, 79.42it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19785/30000 [04:09<02:08, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19794/30000 [04:09<02:07, 79.78it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19802/30000 [04:09<02:07, 79.69it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19810/30000 [04:09<02:08, 79.03it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19818/30000 [04:09<02:08, 79.03it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19827/30000 [04:10<02:08, 79.41it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19836/30000 [04:10<02:07, 79.62it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19844/30000 [04:10<02:07, 79.60it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19853/30000 [04:10<02:07, 79.89it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19861/30000 [04:10<02:07, 79.73it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▌   | 19869/30000 [04:10<02:07, 79.74it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19877/30000 [04:10<02:07, 79.38it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19885/30000 [04:10<02:07, 79.23it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19893/30000 [04:10<02:07, 79.09it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19901/30000 [04:11<02:07, 79.17it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19910/30000 [04:11<02:06, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19918/30000 [04:11<02:07, 79.36it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19927/30000 [04:11<02:06, 79.59it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19935/30000 [04:11<02:06, 79.66it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 66%|██████▋   | 19943/30000 [04:11<02:06, 79.69it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 19951/30000 [04:11<02:06, 79.62it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 19959/30000 [04:11<02:06, 79.27it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 19967/30000 [04:11<02:06, 79.19it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 19975/30000 [04:11<02:06, 79.08it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 19984/30000 [04:12<02:06, 79.31it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 19992/30000 [04:12<02:06, 79.31it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20000/30000 [04:12<02:05, 79.45it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20008/30000 [04:12<02:05, 79.47it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20016/30000 [04:12<02:05, 79.32it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20024/30000 [04:12<02:05, 79.36it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20033/30000 [04:12<02:05, 79.55it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20041/30000 [04:12<02:05, 79.45it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20049/30000 [04:12<02:06, 78.73it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20057/30000 [04:12<02:05, 78.98it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20065/30000 [04:13<02:05, 79.28it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20073/30000 [04:13<02:05, 79.37it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20081/30000 [04:13<02:05, 79.35it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20089/30000 [04:13<02:05, 79.10it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20097/30000 [04:13<02:04, 79.29it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20106/30000 [04:13<02:04, 79.49it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20114/30000 [04:13<02:04, 79.46it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20122/30000 [04:13<02:04, 79.37it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20130/30000 [04:13<02:04, 79.15it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20138/30000 [04:14<02:04, 79.06it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20146/30000 [04:14<02:04, 79.33it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20154/30000 [04:14<02:03, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20162/30000 [04:14<02:03, 79.48it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20170/30000 [04:14<02:03, 79.61it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20178/30000 [04:14<02:03, 79.51it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20186/30000 [04:14<02:03, 79.21it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20194/30000 [04:14<02:03, 79.11it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20202/30000 [04:14<02:03, 79.34it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20210/30000 [04:14<02:03, 79.44it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20218/30000 [04:15<02:03, 79.47it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20226/30000 [04:15<02:02, 79.59it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20234/30000 [04:15<02:02, 79.68it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 67%|██████▋   | 20242/30000 [04:15<02:02, 79.62it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20250/30000 [04:15<02:02, 79.40it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20258/30000 [04:15<02:02, 79.26it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20267/30000 [04:15<02:02, 79.61it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20275/30000 [04:15<02:02, 79.45it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20283/30000 [04:15<02:03, 78.94it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20292/30000 [04:15<02:02, 79.31it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20300/30000 [04:16<02:02, 79.12it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20309/30000 [04:16<02:01, 79.65it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20317/30000 [04:16<02:01, 79.58it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20325/30000 [04:16<02:01, 79.55it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20333/30000 [04:16<02:01, 79.44it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20342/30000 [04:16<02:01, 79.66it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20350/30000 [04:16<02:02, 79.10it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20358/30000 [04:16<02:01, 79.09it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20366/30000 [04:16<02:02, 78.96it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20374/30000 [04:16<02:01, 79.10it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20382/30000 [04:17<02:01, 79.28it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20390/30000 [04:17<02:01, 79.35it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20398/30000 [04:17<02:01, 79.27it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20407/30000 [04:17<02:00, 79.53it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20415/30000 [04:17<02:01, 79.21it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20424/30000 [04:17<02:00, 79.51it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20432/30000 [04:17<02:00, 79.64it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20441/30000 [04:17<02:00, 79.62it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20449/30000 [04:17<02:00, 79.37it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20457/30000 [04:18<02:00, 79.26it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20465/30000 [04:18<02:00, 79.08it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20474/30000 [04:18<01:59, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20482/30000 [04:18<01:59, 79.55it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20490/30000 [04:18<01:59, 79.56it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20498/30000 [04:18<01:59, 79.40it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20506/30000 [04:18<01:59, 79.17it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20514/30000 [04:18<01:59, 79.13it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20522/30000 [04:18<01:59, 79.19it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20531/30000 [04:18<01:59, 79.44it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20540/30000 [04:19<01:58, 79.64it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 68%|██████▊   | 20548/30000 [04:19<01:59, 79.39it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20556/30000 [04:19<01:58, 79.46it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20564/30000 [04:19<01:58, 79.56it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20572/30000 [04:19<01:58, 79.40it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20580/30000 [04:19<01:58, 79.18it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20588/30000 [04:19<01:58, 79.37it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20596/30000 [04:19<01:58, 79.47it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20604/30000 [04:19<01:58, 79.38it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20612/30000 [04:19<01:58, 79.31it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▊   | 20620/30000 [04:20<01:58, 79.38it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20628/30000 [04:20<01:58, 79.29it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20636/30000 [04:20<01:57, 79.43it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20644/30000 [04:20<01:57, 79.38it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20652/30000 [04:20<01:57, 79.48it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20661/30000 [04:20<01:57, 79.60it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20669/30000 [04:20<01:57, 79.72it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20677/30000 [04:20<01:57, 79.55it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20685/30000 [04:20<01:58, 78.68it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20693/30000 [04:21<01:57, 78.99it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20701/30000 [04:21<01:57, 79.27it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20709/30000 [04:21<01:56, 79.44it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20717/30000 [04:21<01:56, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20725/30000 [04:21<01:57, 79.16it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20733/30000 [04:21<01:57, 79.10it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20741/30000 [04:21<01:56, 79.16it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20749/30000 [04:21<01:56, 79.37it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20757/30000 [04:21<01:56, 79.39it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20765/30000 [04:21<01:56, 79.32it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20773/30000 [04:22<01:56, 79.31it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20781/30000 [04:22<01:56, 79.14it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20789/30000 [04:22<01:56, 79.14it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20797/30000 [04:22<01:55, 79.38it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20806/30000 [04:22<01:55, 79.70it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20814/30000 [04:22<01:55, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20822/30000 [04:22<01:55, 79.52it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20830/30000 [04:22<01:55, 79.40it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20838/30000 [04:22<01:55, 79.57it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 69%|██████▉   | 20847/30000 [04:22<01:54, 79.74it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20855/30000 [04:23<01:54, 79.69it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20863/30000 [04:23<01:54, 79.77it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20871/30000 [04:23<01:54, 79.65it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20880/30000 [04:23<01:54, 79.64it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20888/30000 [04:23<01:54, 79.53it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20896/30000 [04:23<01:54, 79.58it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20904/30000 [04:23<01:54, 79.57it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20912/30000 [04:23<01:54, 79.50it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20920/30000 [04:23<01:54, 79.46it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20928/30000 [04:23<01:53, 79.60it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20937/30000 [04:24<01:53, 79.77it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20945/30000 [04:24<01:53, 79.46it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20953/30000 [04:24<01:53, 79.46it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20961/30000 [04:24<01:53, 79.37it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20969/30000 [04:24<01:53, 79.53it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20978/30000 [04:24<01:53, 79.73it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20986/30000 [04:24<01:53, 79.44it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|██████▉   | 20995/30000 [04:24<01:52, 79.75it/s, init loss: 29902.2090, avg. loss [18001-19500]: 18749.2520] 70%|███████   | 21003/30000 [04:24<01:53, 79.07it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21012/30000 [04:25<01:53, 79.38it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21020/30000 [04:25<01:53, 79.42it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21029/30000 [04:25<01:52, 79.65it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21037/30000 [04:25<01:52, 79.43it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21045/30000 [04:25<01:52, 79.54it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21053/30000 [04:25<01:52, 79.46it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21061/30000 [04:25<01:52, 79.53it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21069/30000 [04:25<01:52, 79.41it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21077/30000 [04:25<01:52, 79.40it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21085/30000 [04:25<01:52, 79.27it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21093/30000 [04:26<01:52, 79.35it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21101/30000 [04:26<01:51, 79.47it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21109/30000 [04:26<01:52, 79.31it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21117/30000 [04:26<01:51, 79.41it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21125/30000 [04:26<01:51, 79.53it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21133/30000 [04:26<01:51, 79.66it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21141/30000 [04:26<01:51, 79.66it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 70%|███████   | 21149/30000 [04:26<01:51, 79.54it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21157/30000 [04:26<01:51, 79.28it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21165/30000 [04:26<01:51, 79.22it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21174/30000 [04:27<01:50, 79.52it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21182/30000 [04:27<01:50, 79.58it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21191/30000 [04:27<01:50, 79.73it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21199/30000 [04:27<01:50, 79.64it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21208/30000 [04:27<01:50, 79.83it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21216/30000 [04:27<01:50, 79.72it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21224/30000 [04:27<01:50, 79.64it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21233/30000 [04:27<01:49, 79.85it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21241/30000 [04:27<01:49, 79.71it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21249/30000 [04:27<01:49, 79.61it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21257/30000 [04:28<01:49, 79.58it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21266/30000 [04:28<01:49, 79.93it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21274/30000 [04:28<01:49, 79.85it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21282/30000 [04:28<01:49, 79.31it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21290/30000 [04:28<01:50, 79.18it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21298/30000 [04:28<01:49, 79.22it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21306/30000 [04:28<01:49, 79.38it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21314/30000 [04:28<01:49, 79.49it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21322/30000 [04:28<01:49, 78.98it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21330/30000 [04:29<01:49, 79.21it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21338/30000 [04:29<01:49, 79.20it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21346/30000 [04:29<01:49, 79.00it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21354/30000 [04:29<01:49, 79.05it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21362/30000 [04:29<01:49, 78.90it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████   | 21370/30000 [04:29<01:49, 79.06it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21378/30000 [04:29<01:48, 79.26it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21386/30000 [04:29<01:48, 79.11it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21394/30000 [04:29<01:48, 79.09it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21402/30000 [04:29<01:48, 79.26it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21410/30000 [04:30<01:48, 79.17it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21418/30000 [04:30<01:48, 79.28it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21426/30000 [04:30<01:47, 79.41it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21434/30000 [04:30<01:48, 79.10it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 71%|███████▏  | 21442/30000 [04:30<01:48, 79.16it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21450/30000 [04:30<01:48, 79.06it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21459/30000 [04:30<01:47, 79.13it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21468/30000 [04:30<01:47, 79.44it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21476/30000 [04:30<01:47, 79.44it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21484/30000 [04:30<01:46, 79.59it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21492/30000 [04:31<01:46, 79.63it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21500/30000 [04:31<01:46, 79.72it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21509/30000 [04:31<01:46, 79.87it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21517/30000 [04:31<01:46, 79.89it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21525/30000 [04:31<01:46, 79.83it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21533/30000 [04:31<01:46, 79.86it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21541/30000 [04:31<01:45, 79.84it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21549/30000 [04:31<01:46, 79.58it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21557/30000 [04:31<01:47, 78.82it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21565/30000 [04:31<01:46, 78.93it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21573/30000 [04:32<01:46, 79.18it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21581/30000 [04:32<01:46, 79.42it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21589/30000 [04:32<01:46, 79.21it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21597/30000 [04:32<01:46, 79.26it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21605/30000 [04:32<01:46, 78.97it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21613/30000 [04:32<01:45, 79.14it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21621/30000 [04:32<01:45, 79.25it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21629/30000 [04:32<01:45, 79.35it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21637/30000 [04:32<01:46, 78.85it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21645/30000 [04:32<01:45, 79.03it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21653/30000 [04:33<01:45, 79.12it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21661/30000 [04:33<01:45, 79.20it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21670/30000 [04:33<01:44, 79.55it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21678/30000 [04:33<01:44, 79.45it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21686/30000 [04:33<01:44, 79.52it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21694/30000 [04:33<01:44, 79.57it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21702/30000 [04:33<01:44, 79.61it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21710/30000 [04:33<01:44, 79.52it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21718/30000 [04:33<01:44, 79.53it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21726/30000 [04:34<01:44, 79.49it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21734/30000 [04:34<01:44, 79.42it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 72%|███████▏  | 21743/30000 [04:34<01:43, 79.72it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21751/30000 [04:34<01:43, 79.51it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21760/30000 [04:34<01:43, 79.68it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21768/30000 [04:34<01:43, 79.35it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21776/30000 [04:34<01:43, 79.12it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21785/30000 [04:34<01:43, 79.43it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21793/30000 [04:34<01:43, 79.07it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21801/30000 [04:34<01:43, 79.10it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21809/30000 [04:35<01:43, 79.35it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21818/30000 [04:35<01:42, 79.51it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21826/30000 [04:35<01:42, 79.59it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21835/30000 [04:35<01:42, 79.72it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21843/30000 [04:35<01:42, 79.66it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21852/30000 [04:35<01:42, 79.82it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21860/30000 [04:35<01:42, 79.72it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21868/30000 [04:35<01:42, 79.51it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21876/30000 [04:35<01:42, 79.43it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21885/30000 [04:36<01:41, 79.67it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21894/30000 [04:36<01:41, 79.88it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21902/30000 [04:36<01:41, 79.55it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21910/30000 [04:36<01:41, 79.66it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21918/30000 [04:36<01:41, 79.24it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21926/30000 [04:36<01:41, 79.25it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21934/30000 [04:36<01:42, 78.71it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21943/30000 [04:36<01:41, 79.15it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21951/30000 [04:36<01:41, 79.08it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21959/30000 [04:36<01:41, 79.07it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21967/30000 [04:37<01:41, 79.22it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21975/30000 [04:37<01:41, 79.44it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21983/30000 [04:37<01:41, 79.33it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21991/30000 [04:37<01:41, 79.24it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 21999/30000 [04:37<01:41, 79.05it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 22008/30000 [04:37<01:40, 79.40it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 22017/30000 [04:37<01:40, 79.74it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 22025/30000 [04:37<01:40, 79.59it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 22033/30000 [04:37<01:40, 79.44it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 22041/30000 [04:37<01:40, 79.59it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 73%|███████▎  | 22049/30000 [04:38<01:39, 79.60it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22058/30000 [04:38<01:39, 79.73it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22066/30000 [04:38<01:39, 79.77it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22074/30000 [04:38<01:39, 79.52it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22082/30000 [04:38<01:39, 79.36it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22090/30000 [04:38<01:39, 79.33it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22098/30000 [04:38<01:39, 79.24it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22106/30000 [04:38<01:39, 79.00it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22114/30000 [04:38<01:39, 78.97it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▎  | 22122/30000 [04:38<01:39, 79.07it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22130/30000 [04:39<01:39, 79.23it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22139/30000 [04:39<01:38, 79.54it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22147/30000 [04:39<01:38, 79.64it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22155/30000 [04:39<01:38, 79.61it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22163/30000 [04:39<01:38, 79.68it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22171/30000 [04:39<01:38, 79.69it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22180/30000 [04:39<01:37, 79.90it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22188/30000 [04:39<01:37, 79.73it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22196/30000 [04:39<01:38, 79.29it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22204/30000 [04:40<01:38, 79.29it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22212/30000 [04:40<01:38, 79.37it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22221/30000 [04:40<01:37, 79.57it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22229/30000 [04:40<01:37, 79.53it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22237/30000 [04:40<01:37, 79.45it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22245/30000 [04:40<01:37, 79.41it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22253/30000 [04:40<01:37, 79.41it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22261/30000 [04:40<01:37, 79.38it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22269/30000 [04:40<01:37, 79.47it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22278/30000 [04:40<01:36, 79.76it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22286/30000 [04:41<01:37, 79.52it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22294/30000 [04:41<01:36, 79.53it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22302/30000 [04:41<01:36, 79.42it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22310/30000 [04:41<01:36, 79.33it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22318/30000 [04:41<01:36, 79.27it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22326/30000 [04:41<01:36, 79.25it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22334/30000 [04:41<01:36, 79.47it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 74%|███████▍  | 22343/30000 [04:41<01:36, 79.60it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22351/30000 [04:41<01:36, 78.96it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22360/30000 [04:41<01:36, 79.34it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22368/30000 [04:42<01:36, 79.33it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22377/30000 [04:42<01:35, 79.60it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22385/30000 [04:42<01:35, 79.61it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22393/30000 [04:42<01:35, 79.43it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22401/30000 [04:42<01:35, 79.40it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22409/30000 [04:42<01:35, 79.33it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22418/30000 [04:42<01:35, 79.60it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22426/30000 [04:42<01:35, 79.59it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22434/30000 [04:42<01:35, 79.55it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22442/30000 [04:43<01:35, 79.55it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22450/30000 [04:43<01:34, 79.65it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22458/30000 [04:43<01:34, 79.42it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22466/30000 [04:43<01:34, 79.34it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22474/30000 [04:43<01:34, 79.30it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22482/30000 [04:43<01:34, 79.19it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22490/30000 [04:43<01:34, 79.28it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▍  | 22498/30000 [04:43<01:34, 79.01it/s, init loss: 29902.2090, avg. loss [19501-21000]: 18696.0527] 75%|███████▌  | 22506/30000 [04:43<01:34, 78.90it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22514/30000 [04:43<01:34, 79.12it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22522/30000 [04:44<01:34, 78.85it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22531/30000 [04:44<01:34, 79.31it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22539/30000 [04:44<01:33, 79.49it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22547/30000 [04:44<01:34, 79.24it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22555/30000 [04:44<01:33, 79.36it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22563/30000 [04:44<01:34, 78.96it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22571/30000 [04:44<01:33, 79.04it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22579/30000 [04:44<01:33, 79.02it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22587/30000 [04:44<01:33, 78.97it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22595/30000 [04:44<01:33, 79.19it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22603/30000 [04:45<01:33, 79.11it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22611/30000 [04:45<01:33, 79.13it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22619/30000 [04:45<01:33, 79.35it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22627/30000 [04:45<01:32, 79.47it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22635/30000 [04:45<01:32, 79.47it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 75%|███████▌  | 22643/30000 [04:45<01:32, 79.49it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22651/30000 [04:45<01:32, 79.40it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22659/30000 [04:45<01:32, 79.34it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22667/30000 [04:45<01:32, 78.92it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22675/30000 [04:45<01:32, 79.04it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22683/30000 [04:46<01:32, 79.18it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22691/30000 [04:46<01:32, 79.32it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22699/30000 [04:46<01:32, 79.10it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22707/30000 [04:46<01:32, 78.78it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22715/30000 [04:46<01:32, 78.88it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22724/30000 [04:46<01:31, 79.30it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22733/30000 [04:46<01:31, 79.56it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22741/30000 [04:46<01:31, 79.57it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22749/30000 [04:46<01:31, 79.29it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22757/30000 [04:46<01:31, 79.04it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22765/30000 [04:47<01:31, 79.15it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22773/30000 [04:47<01:31, 79.09it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22781/30000 [04:47<01:31, 79.15it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22789/30000 [04:47<01:30, 79.36it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22797/30000 [04:47<01:30, 79.40it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22805/30000 [04:47<01:30, 79.40it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22813/30000 [04:47<01:30, 79.52it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22821/30000 [04:47<01:30, 79.51it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22829/30000 [04:47<01:30, 79.27it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22837/30000 [04:48<01:30, 79.31it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22845/30000 [04:48<01:30, 79.38it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22853/30000 [04:48<01:30, 79.19it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22861/30000 [04:48<01:29, 79.42it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▌  | 22869/30000 [04:48<01:29, 79.39it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22877/30000 [04:48<01:29, 79.34it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22885/30000 [04:48<01:29, 79.41it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22893/30000 [04:48<01:29, 79.34it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22902/30000 [04:48<01:29, 79.36it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22910/30000 [04:48<01:29, 78.95it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22919/30000 [04:49<01:29, 79.30it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22927/30000 [04:49<01:29, 79.47it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22935/30000 [04:49<01:28, 79.51it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 76%|███████▋  | 22943/30000 [04:49<01:28, 79.43it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 22951/30000 [04:49<01:28, 79.46it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 22959/30000 [04:49<01:28, 79.25it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 22967/30000 [04:49<01:28, 79.09it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 22975/30000 [04:49<01:28, 79.18it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 22983/30000 [04:49<01:29, 78.66it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 22992/30000 [04:49<01:28, 79.02it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23001/30000 [04:50<01:28, 79.44it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23009/30000 [04:50<01:27, 79.54it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23017/30000 [04:50<01:27, 79.54it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23025/30000 [04:50<01:27, 79.52it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23033/30000 [04:50<01:27, 79.19it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23041/30000 [04:50<01:27, 79.15it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23049/30000 [04:50<01:27, 79.26it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23057/30000 [04:50<01:27, 79.34it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23065/30000 [04:50<01:27, 79.23it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23073/30000 [04:50<01:27, 78.96it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23081/30000 [04:51<01:27, 79.10it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23089/30000 [04:51<01:27, 79.26it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23097/30000 [04:51<01:27, 79.32it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23105/30000 [04:51<01:26, 79.49it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23113/30000 [04:51<01:26, 79.60it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23121/30000 [04:51<01:26, 79.53it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23129/30000 [04:51<01:26, 79.12it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23137/30000 [04:51<01:27, 78.79it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23145/30000 [04:51<01:26, 78.81it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23154/30000 [04:52<01:26, 79.26it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23162/30000 [04:52<01:26, 79.25it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23170/30000 [04:52<01:26, 79.22it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23178/30000 [04:52<01:26, 79.24it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23186/30000 [04:52<01:25, 79.28it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23194/30000 [04:52<01:25, 79.49it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23202/30000 [04:52<01:25, 79.62it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23210/30000 [04:52<01:25, 79.42it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23218/30000 [04:52<01:25, 79.02it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23226/30000 [04:52<01:26, 78.40it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23234/30000 [04:53<01:26, 78.67it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 77%|███████▋  | 23242/30000 [04:53<01:25, 78.84it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23250/30000 [04:53<01:25, 79.18it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23258/30000 [04:53<01:25, 79.32it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23267/30000 [04:53<01:24, 79.57it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23275/30000 [04:53<01:24, 79.38it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23284/30000 [04:53<01:24, 79.57it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23292/30000 [04:53<01:24, 79.61it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23300/30000 [04:53<01:24, 79.35it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23308/30000 [04:53<01:24, 79.30it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23316/30000 [04:54<01:24, 79.44it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23324/30000 [04:54<01:23, 79.57it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23332/30000 [04:54<01:23, 79.62it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23341/30000 [04:54<01:23, 79.80it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23349/30000 [04:54<01:23, 79.70it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23357/30000 [04:54<01:23, 79.74it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23366/30000 [04:54<01:23, 79.90it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23374/30000 [04:54<01:23, 79.46it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23382/30000 [04:54<01:23, 79.20it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23390/30000 [04:54<01:23, 79.31it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23398/30000 [04:55<01:23, 79.42it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23406/30000 [04:55<01:23, 79.40it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23414/30000 [04:55<01:22, 79.58it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23422/30000 [04:55<01:22, 79.56it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23430/30000 [04:55<01:22, 79.33it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23438/30000 [04:55<01:22, 79.49it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23447/30000 [04:55<01:22, 79.83it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23455/30000 [04:55<01:22, 79.67it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23463/30000 [04:55<01:22, 79.62it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23471/30000 [04:55<01:22, 79.51it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23479/30000 [04:56<01:22, 79.32it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23487/30000 [04:56<01:21, 79.46it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23495/30000 [04:56<01:21, 79.40it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23503/30000 [04:56<01:21, 79.48it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23512/30000 [04:56<01:21, 79.80it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23520/30000 [04:56<01:21, 79.69it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23528/30000 [04:56<01:21, 79.55it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23536/30000 [04:56<01:21, 79.65it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 78%|███████▊  | 23544/30000 [04:56<01:21, 79.01it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23552/30000 [04:57<01:21, 79.18it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23560/30000 [04:57<01:21, 79.08it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23568/30000 [04:57<01:21, 79.20it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23576/30000 [04:57<01:21, 79.14it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23584/30000 [04:57<01:20, 79.22it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23592/30000 [04:57<01:20, 79.32it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23600/30000 [04:57<01:20, 79.33it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23608/30000 [04:57<01:20, 79.33it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23616/30000 [04:57<01:20, 79.29it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▊  | 23624/30000 [04:57<01:20, 79.11it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23632/30000 [04:58<01:20, 79.18it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23640/30000 [04:58<01:20, 78.65it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23648/30000 [04:58<01:20, 78.78it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23657/30000 [04:58<01:20, 79.28it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23665/30000 [04:58<01:19, 79.41it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23674/30000 [04:58<01:19, 79.50it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23683/30000 [04:58<01:19, 79.71it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23692/30000 [04:58<01:19, 79.83it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23700/30000 [04:58<01:18, 79.75it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23708/30000 [04:58<01:19, 79.54it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23717/30000 [04:59<01:18, 79.79it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23725/30000 [04:59<01:18, 79.68it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23734/30000 [04:59<01:18, 79.80it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23742/30000 [04:59<01:18, 79.71it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23751/30000 [04:59<01:18, 79.85it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23759/30000 [04:59<01:18, 79.66it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23767/30000 [04:59<01:18, 79.47it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23775/30000 [04:59<01:18, 79.24it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23783/30000 [04:59<01:18, 79.00it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23791/30000 [05:00<01:18, 78.96it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23799/30000 [05:00<01:18, 79.25it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23807/30000 [05:00<01:18, 79.11it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23816/30000 [05:00<01:17, 79.40it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23824/30000 [05:00<01:17, 79.46it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23832/30000 [05:00<01:17, 79.39it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23840/30000 [05:00<01:17, 79.22it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 79%|███████▉  | 23848/30000 [05:00<01:17, 79.43it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23856/30000 [05:00<01:17, 79.30it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23865/30000 [05:00<01:17, 79.61it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23873/30000 [05:01<01:17, 79.55it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23881/30000 [05:01<01:16, 79.48it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23889/30000 [05:01<01:16, 79.58it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23897/30000 [05:01<01:16, 79.70it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23905/30000 [05:01<01:16, 79.45it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23913/30000 [05:01<01:16, 79.58it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23921/30000 [05:01<01:16, 79.64it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23929/30000 [05:01<01:16, 79.50it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23937/30000 [05:01<01:16, 79.10it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23946/30000 [05:01<01:16, 79.33it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23954/30000 [05:02<01:16, 79.49it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23962/30000 [05:02<01:15, 79.55it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23970/30000 [05:02<01:15, 79.37it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23978/30000 [05:02<01:15, 79.46it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23986/30000 [05:02<01:15, 79.30it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|███████▉  | 23995/30000 [05:02<01:15, 79.56it/s, init loss: 29902.2090, avg. loss [21001-22500]: 18649.7207] 80%|████████  | 24003/30000 [05:02<01:15, 79.62it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24011/30000 [05:02<01:15, 79.64it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24019/30000 [05:02<01:15, 79.23it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24027/30000 [05:02<01:15, 78.98it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24036/30000 [05:03<01:15, 79.23it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24044/30000 [05:03<01:15, 79.25it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24052/30000 [05:03<01:14, 79.35it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24060/30000 [05:03<01:14, 79.51it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24068/30000 [05:03<01:14, 79.58it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24076/30000 [05:03<01:14, 79.61it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24084/30000 [05:03<01:14, 79.45it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24092/30000 [05:03<01:14, 79.34it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24100/30000 [05:03<01:14, 79.44it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24108/30000 [05:04<01:14, 79.26it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24117/30000 [05:04<01:14, 79.49it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24125/30000 [05:04<01:14, 79.35it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24133/30000 [05:04<01:13, 79.52it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24141/30000 [05:04<01:13, 79.30it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 80%|████████  | 24150/30000 [05:04<01:13, 79.62it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24158/30000 [05:04<01:13, 79.44it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24166/30000 [05:04<01:13, 79.29it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24174/30000 [05:04<01:14, 78.47it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24182/30000 [05:04<01:13, 78.78it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24190/30000 [05:05<01:13, 78.88it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24198/30000 [05:05<01:13, 79.18it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24206/30000 [05:05<01:13, 79.26it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24214/30000 [05:05<01:13, 79.18it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24222/30000 [05:05<01:12, 79.23it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24230/30000 [05:05<01:12, 79.45it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24239/30000 [05:05<01:12, 79.64it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24247/30000 [05:05<01:12, 79.64it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24255/30000 [05:05<01:12, 79.03it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24263/30000 [05:05<01:12, 79.13it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24271/30000 [05:06<01:12, 79.32it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24279/30000 [05:06<01:11, 79.47it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24287/30000 [05:06<01:11, 79.56it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24296/30000 [05:06<01:11, 79.75it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24304/30000 [05:06<01:11, 79.52it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24313/30000 [05:06<01:11, 79.93it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24321/30000 [05:06<01:11, 79.81it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24329/30000 [05:06<01:11, 79.54it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24337/30000 [05:06<01:11, 79.29it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24346/30000 [05:07<01:11, 79.40it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24355/30000 [05:07<01:10, 79.68it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24363/30000 [05:07<01:10, 79.56it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████  | 24372/30000 [05:07<01:10, 79.77it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24380/30000 [05:07<01:10, 79.65it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24388/30000 [05:07<01:10, 79.55it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24396/30000 [05:07<01:10, 79.67it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24404/30000 [05:07<01:10, 79.45it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24412/30000 [05:07<01:10, 79.09it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24420/30000 [05:07<01:10, 79.18it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24428/30000 [05:08<01:10, 78.96it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24436/30000 [05:08<01:10, 79.06it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 81%|████████▏ | 24445/30000 [05:08<01:09, 79.47it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24453/30000 [05:08<01:09, 79.53it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24461/30000 [05:08<01:09, 79.46it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24469/30000 [05:08<01:09, 79.47it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24477/30000 [05:08<01:09, 79.40it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24485/30000 [05:08<01:09, 79.37it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24494/30000 [05:08<01:09, 79.59it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24502/30000 [05:08<01:09, 79.49it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24510/30000 [05:09<01:09, 79.50it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24519/30000 [05:09<01:08, 79.75it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24527/30000 [05:09<01:08, 79.46it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24535/30000 [05:09<01:08, 79.43it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24543/30000 [05:09<01:08, 79.31it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24552/30000 [05:09<01:08, 79.51it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24561/30000 [05:09<01:08, 79.75it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24569/30000 [05:09<01:08, 79.71it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24577/30000 [05:09<01:08, 79.31it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24585/30000 [05:10<01:08, 79.29it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24593/30000 [05:10<01:08, 79.45it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24601/30000 [05:10<01:07, 79.49it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24609/30000 [05:10<01:08, 79.20it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24617/30000 [05:10<01:07, 79.38it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24625/30000 [05:10<01:07, 79.42it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24633/30000 [05:10<01:07, 79.35it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24642/30000 [05:10<01:07, 79.56it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24650/30000 [05:10<01:07, 79.40it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24659/30000 [05:10<01:06, 79.85it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24667/30000 [05:11<01:06, 79.82it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24675/30000 [05:11<01:06, 79.84it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24683/30000 [05:11<01:06, 79.51it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24691/30000 [05:11<01:06, 79.47it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24699/30000 [05:11<01:06, 79.58it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24707/30000 [05:11<01:06, 79.40it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24715/30000 [05:11<01:06, 79.35it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24724/30000 [05:11<01:06, 79.68it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24732/30000 [05:11<01:06, 79.28it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24741/30000 [05:11<01:06, 79.51it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 82%|████████▏ | 24749/30000 [05:12<01:06, 79.33it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24757/30000 [05:12<01:06, 79.24it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24766/30000 [05:12<01:05, 79.72it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24774/30000 [05:12<01:05, 79.73it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24782/30000 [05:12<01:05, 79.79it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24790/30000 [05:12<01:05, 79.73it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24798/30000 [05:12<01:05, 79.44it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24806/30000 [05:12<01:05, 79.57it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24814/30000 [05:12<01:05, 79.49it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24822/30000 [05:13<01:05, 79.50it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24830/30000 [05:13<01:04, 79.63it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24838/30000 [05:13<01:04, 79.44it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24846/30000 [05:13<01:04, 79.52it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24854/30000 [05:13<01:04, 79.53it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24863/30000 [05:13<01:04, 79.86it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24871/30000 [05:13<01:04, 79.49it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24879/30000 [05:13<01:04, 79.59it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24887/30000 [05:13<01:04, 79.49it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24895/30000 [05:13<01:04, 79.51it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24904/30000 [05:14<01:03, 79.77it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24912/30000 [05:14<01:03, 79.52it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24920/30000 [05:14<01:03, 79.46it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24928/30000 [05:14<01:03, 79.60it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24936/30000 [05:14<01:03, 79.65it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24944/30000 [05:14<01:03, 79.66it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24953/30000 [05:14<01:03, 79.78it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24961/30000 [05:14<01:03, 79.69it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24969/30000 [05:14<01:03, 79.40it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24977/30000 [05:14<01:03, 79.41it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24986/30000 [05:15<01:02, 79.61it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 24994/30000 [05:15<01:02, 79.55it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 25002/30000 [05:15<01:02, 79.46it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 25010/30000 [05:15<01:02, 79.24it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 25018/30000 [05:15<01:02, 79.29it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 25026/30000 [05:15<01:02, 79.31it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 25034/30000 [05:15<01:02, 79.42it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 83%|████████▎ | 25043/30000 [05:15<01:02, 79.72it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25051/30000 [05:15<01:02, 79.46it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25060/30000 [05:15<01:01, 79.84it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25069/30000 [05:16<01:01, 79.87it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25078/30000 [05:16<01:01, 79.78it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25086/30000 [05:16<01:01, 79.83it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25094/30000 [05:16<01:01, 79.66it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25103/30000 [05:16<01:01, 79.79it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25111/30000 [05:16<01:01, 79.74it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▎ | 25120/30000 [05:16<01:01, 79.87it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25128/30000 [05:16<01:01, 79.75it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25136/30000 [05:16<01:01, 79.52it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25144/30000 [05:17<01:00, 79.63it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25152/30000 [05:17<01:01, 79.37it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25160/30000 [05:17<01:01, 79.07it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25168/30000 [05:17<01:00, 79.32it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25176/30000 [05:17<01:01, 79.07it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25184/30000 [05:17<01:00, 79.13it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25193/30000 [05:17<01:00, 79.55it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25201/30000 [05:17<01:00, 79.54it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25209/30000 [05:17<01:00, 79.56it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25218/30000 [05:17<00:59, 79.71it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25226/30000 [05:18<00:59, 79.60it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25234/30000 [05:18<01:00, 79.43it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25242/30000 [05:18<00:59, 79.36it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25251/30000 [05:18<00:59, 79.68it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25259/30000 [05:18<00:59, 79.69it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25267/30000 [05:18<00:59, 79.44it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25276/30000 [05:18<00:59, 79.83it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25284/30000 [05:18<00:59, 79.66it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25292/30000 [05:18<00:59, 79.15it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25300/30000 [05:19<00:59, 79.18it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25309/30000 [05:19<00:59, 79.45it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25317/30000 [05:19<00:59, 79.28it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25325/30000 [05:19<00:58, 79.40it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25333/30000 [05:19<00:58, 79.44it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25341/30000 [05:19<00:58, 79.55it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 84%|████████▍ | 25349/30000 [05:19<00:58, 79.35it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25357/30000 [05:19<00:58, 79.29it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25365/30000 [05:19<00:58, 79.28it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25373/30000 [05:19<00:58, 79.46it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25381/30000 [05:20<00:58, 79.49it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25389/30000 [05:20<00:58, 79.48it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25397/30000 [05:20<00:57, 79.56it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25405/30000 [05:20<00:57, 79.66it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25413/30000 [05:20<00:57, 79.71it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25421/30000 [05:20<00:57, 79.63it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25429/30000 [05:20<00:57, 79.33it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25437/30000 [05:20<00:57, 79.44it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25445/30000 [05:20<00:57, 79.22it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25453/30000 [05:20<00:57, 79.26it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25462/30000 [05:21<00:56, 79.67it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25470/30000 [05:21<00:56, 79.66it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25478/30000 [05:21<00:56, 79.74it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25486/30000 [05:21<00:56, 79.79it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▍ | 25494/30000 [05:21<00:56, 79.79it/s, init loss: 29902.2090, avg. loss [22501-24000]: 18616.0430] 85%|████████▌ | 25502/30000 [05:21<00:56, 79.47it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25510/30000 [05:21<00:56, 79.56it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25518/30000 [05:21<00:56, 79.53it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25526/30000 [05:21<00:56, 79.36it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25534/30000 [05:21<00:56, 79.50it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25542/30000 [05:22<00:56, 79.41it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25550/30000 [05:22<00:56, 79.44it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25558/30000 [05:22<00:56, 79.22it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25566/30000 [05:22<00:56, 78.94it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25574/30000 [05:22<00:55, 79.19it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25582/30000 [05:22<00:55, 79.36it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25590/30000 [05:22<00:55, 78.81it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25598/30000 [05:22<00:55, 79.07it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25606/30000 [05:22<00:55, 79.02it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25614/30000 [05:22<00:56, 77.89it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25622/30000 [05:23<00:56, 77.99it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25630/30000 [05:23<00:55, 78.33it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25638/30000 [05:23<00:55, 78.79it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 85%|████████▌ | 25646/30000 [05:23<00:55, 78.94it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25655/30000 [05:23<00:54, 79.32it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25663/30000 [05:23<00:54, 79.12it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25671/30000 [05:23<00:54, 79.37it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25679/30000 [05:23<00:54, 79.35it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25687/30000 [05:23<00:54, 79.12it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25696/30000 [05:24<00:54, 79.51it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25704/30000 [05:24<00:54, 79.55it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25712/30000 [05:24<00:54, 79.40it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25720/30000 [05:24<00:53, 79.57it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25729/30000 [05:24<00:53, 79.70it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25738/30000 [05:24<00:53, 79.67it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25746/30000 [05:24<00:53, 79.68it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25754/30000 [05:24<00:53, 79.70it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25762/30000 [05:24<00:53, 79.32it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25770/30000 [05:24<00:53, 79.40it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25778/30000 [05:25<00:53, 79.34it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25786/30000 [05:25<00:53, 79.42it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25795/30000 [05:25<00:52, 79.85it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25803/30000 [05:25<00:52, 79.77it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25811/30000 [05:25<00:52, 79.67it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25820/30000 [05:25<00:52, 79.77it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25829/30000 [05:25<00:52, 79.89it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25837/30000 [05:25<00:52, 79.73it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25845/30000 [05:25<00:52, 79.48it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25853/30000 [05:25<00:52, 79.18it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25861/30000 [05:26<00:52, 79.23it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▌ | 25869/30000 [05:26<00:52, 79.30it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25877/30000 [05:26<00:52, 79.20it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25885/30000 [05:26<00:51, 79.35it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25893/30000 [05:26<00:51, 79.45it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25901/30000 [05:26<00:51, 79.54it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25909/30000 [05:26<00:51, 79.53it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25917/30000 [05:26<00:51, 79.35it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25925/30000 [05:26<00:51, 79.04it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25933/30000 [05:26<00:51, 79.25it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25941/30000 [05:27<00:51, 79.46it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 86%|████████▋ | 25950/30000 [05:27<00:50, 79.73it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 25958/30000 [05:27<00:50, 79.65it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 25966/30000 [05:27<00:50, 79.66it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 25974/30000 [05:27<00:50, 79.45it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 25982/30000 [05:27<00:50, 79.45it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 25990/30000 [05:27<00:50, 79.48it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 25999/30000 [05:27<00:50, 79.56it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26007/30000 [05:27<00:50, 79.43it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26016/30000 [05:28<00:49, 79.70it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26024/30000 [05:28<00:50, 79.46it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26032/30000 [05:28<00:49, 79.45it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26040/30000 [05:28<00:49, 79.22it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26049/30000 [05:28<00:49, 79.60it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26057/30000 [05:28<00:49, 79.55it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26065/30000 [05:28<00:49, 79.53it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26073/30000 [05:28<00:49, 79.56it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26081/30000 [05:28<00:49, 79.54it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26089/30000 [05:28<00:49, 79.48it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26097/30000 [05:29<00:49, 79.56it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26105/30000 [05:29<00:49, 79.27it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26113/30000 [05:29<00:48, 79.42it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26121/30000 [05:29<00:48, 79.36it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26129/30000 [05:29<00:48, 79.44it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26138/30000 [05:29<00:48, 79.69it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26146/30000 [05:29<00:48, 79.55it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26154/30000 [05:29<00:48, 79.42it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26162/30000 [05:29<00:48, 79.05it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26171/30000 [05:29<00:48, 79.43it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26180/30000 [05:30<00:47, 79.65it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26188/30000 [05:30<00:47, 79.72it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26196/30000 [05:30<00:47, 79.63it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26204/30000 [05:30<00:47, 79.41it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26212/30000 [05:30<00:47, 79.52it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26220/30000 [05:30<00:47, 79.66it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26228/30000 [05:30<00:47, 79.64it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26236/30000 [05:30<00:47, 79.60it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 87%|████████▋ | 26244/30000 [05:30<00:47, 78.64it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26252/30000 [05:31<00:47, 78.97it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26260/30000 [05:31<00:47, 79.10it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26268/30000 [05:31<00:47, 79.32it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26276/30000 [05:31<00:47, 79.10it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26285/30000 [05:31<00:46, 79.33it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26294/30000 [05:31<00:46, 79.58it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26302/30000 [05:31<00:46, 79.64it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26310/30000 [05:31<00:46, 79.18it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26318/30000 [05:31<00:46, 79.36it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26326/30000 [05:31<00:46, 79.50it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26334/30000 [05:32<00:46, 79.57it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26342/30000 [05:32<00:46, 79.39it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26350/30000 [05:32<00:46, 79.26it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26358/30000 [05:32<00:46, 79.14it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26367/30000 [05:32<00:45, 79.60it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26375/30000 [05:32<00:45, 79.47it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26383/30000 [05:32<00:45, 79.51it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26391/30000 [05:32<00:45, 79.64it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26399/30000 [05:32<00:45, 79.54it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26407/30000 [05:32<00:45, 79.56it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26415/30000 [05:33<00:45, 79.48it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26423/30000 [05:33<00:45, 79.44it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26432/30000 [05:33<00:44, 79.73it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26440/30000 [05:33<00:44, 79.43it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26448/30000 [05:33<00:44, 79.21it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26456/30000 [05:33<00:44, 79.21it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26465/30000 [05:33<00:44, 79.44it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26473/30000 [05:33<00:44, 79.56it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26481/30000 [05:33<00:44, 79.33it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26489/30000 [05:33<00:44, 79.45it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26498/30000 [05:34<00:43, 79.66it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26506/30000 [05:34<00:43, 79.44it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26514/30000 [05:34<00:44, 78.08it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26522/30000 [05:34<00:44, 78.37it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26530/30000 [05:34<00:44, 78.81it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26539/30000 [05:34<00:43, 79.24it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 88%|████████▊ | 26547/30000 [05:34<00:43, 79.43it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26555/30000 [05:34<00:43, 78.99it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26563/30000 [05:34<00:43, 78.33it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26572/30000 [05:35<00:43, 78.93it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26580/30000 [05:35<00:43, 79.21it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26588/30000 [05:35<00:43, 79.12it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26596/30000 [05:35<00:42, 79.31it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26604/30000 [05:35<00:42, 79.20it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26612/30000 [05:35<00:42, 79.40it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▊ | 26620/30000 [05:35<00:42, 79.08it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26628/30000 [05:35<00:42, 79.18it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26636/30000 [05:35<00:42, 79.29it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26644/30000 [05:35<00:42, 79.42it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26652/30000 [05:36<00:42, 79.59it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26660/30000 [05:36<00:42, 79.42it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26669/30000 [05:36<00:41, 79.67it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26677/30000 [05:36<00:41, 79.69it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26685/30000 [05:36<00:41, 79.39it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26693/30000 [05:36<00:41, 79.48it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26701/30000 [05:36<00:41, 79.59it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26710/30000 [05:36<00:41, 79.72it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26718/30000 [05:36<00:41, 79.38it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26726/30000 [05:36<00:41, 79.19it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26734/30000 [05:37<00:41, 79.29it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26742/30000 [05:37<00:41, 79.20it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26750/30000 [05:37<00:40, 79.35it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26758/30000 [05:37<00:40, 79.16it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26767/30000 [05:37<00:40, 79.49it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26776/30000 [05:37<00:40, 79.59it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26785/30000 [05:37<00:40, 79.83it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26793/30000 [05:37<00:40, 79.42it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26801/30000 [05:37<00:40, 79.35it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26809/30000 [05:38<00:40, 79.33it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26818/30000 [05:38<00:39, 79.77it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26826/30000 [05:38<00:39, 79.50it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26834/30000 [05:38<00:39, 79.57it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 89%|████████▉ | 26842/30000 [05:38<00:39, 79.56it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26850/30000 [05:38<00:39, 79.29it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26858/30000 [05:38<00:39, 79.19it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26867/30000 [05:38<00:39, 79.62it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26875/30000 [05:38<00:39, 79.48it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26883/30000 [05:38<00:39, 79.28it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26892/30000 [05:39<00:39, 79.54it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26900/30000 [05:39<00:38, 79.55it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26908/30000 [05:39<00:38, 79.52it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26916/30000 [05:39<00:38, 79.52it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26924/30000 [05:39<00:38, 79.29it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26933/30000 [05:39<00:38, 79.52it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26942/30000 [05:39<00:38, 79.87it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26950/30000 [05:39<00:38, 79.81it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26958/30000 [05:39<00:38, 79.52it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26966/30000 [05:39<00:38, 79.43it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26974/30000 [05:40<00:38, 79.43it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26982/30000 [05:40<00:38, 79.28it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26990/30000 [05:40<00:38, 79.12it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|████████▉ | 26999/30000 [05:40<00:37, 79.48it/s, init loss: 29902.2090, avg. loss [24001-25500]: 18587.4355] 90%|█████████ | 27007/30000 [05:40<00:37, 79.21it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27015/30000 [05:40<00:37, 79.41it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27023/30000 [05:40<00:37, 79.35it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27031/30000 [05:40<00:37, 79.40it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27039/30000 [05:40<00:37, 79.37it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27047/30000 [05:41<00:37, 79.44it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27055/30000 [05:41<00:37, 79.54it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27064/30000 [05:41<00:36, 79.83it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27073/30000 [05:41<00:36, 79.96it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27081/30000 [05:41<00:36, 79.66it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27090/30000 [05:41<00:36, 79.86it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27098/30000 [05:41<00:36, 79.81it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27107/30000 [05:41<00:36, 79.91it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27115/30000 [05:41<00:36, 79.62it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27123/30000 [05:41<00:36, 79.52it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27131/30000 [05:42<00:36, 79.49it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27139/30000 [05:42<00:36, 79.28it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 90%|█████████ | 27147/30000 [05:42<00:36, 79.03it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27155/30000 [05:42<00:35, 79.15it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27163/30000 [05:42<00:35, 79.39it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27171/30000 [05:42<00:35, 79.49it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27179/30000 [05:42<00:35, 79.40it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27188/30000 [05:42<00:35, 79.63it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27196/30000 [05:42<00:35, 79.24it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27205/30000 [05:43<00:35, 79.64it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27214/30000 [05:43<00:34, 79.81it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27222/30000 [05:43<00:34, 79.47it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27230/30000 [05:43<00:34, 79.50it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27238/30000 [05:43<00:34, 79.64it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27246/30000 [05:43<00:34, 79.63it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27254/30000 [05:43<00:34, 79.38it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27262/30000 [05:43<00:34, 79.42it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27270/30000 [05:43<00:34, 79.09it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27278/30000 [05:43<00:34, 78.56it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27286/30000 [05:44<00:34, 78.89it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27294/30000 [05:44<00:34, 79.02it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27302/30000 [05:44<00:34, 79.24it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27310/30000 [05:44<00:34, 79.09it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27318/30000 [05:44<00:33, 79.20it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27326/30000 [05:44<00:33, 79.13it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27334/30000 [05:44<00:33, 79.35it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27342/30000 [05:44<00:33, 79.28it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27350/30000 [05:44<00:33, 79.37it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27359/30000 [05:44<00:33, 79.57it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████ | 27367/30000 [05:45<00:33, 79.49it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27376/30000 [05:45<00:32, 79.68it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27385/30000 [05:45<00:32, 80.03it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27394/30000 [05:45<00:32, 79.89it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27402/30000 [05:45<00:32, 79.71it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27410/30000 [05:45<00:32, 79.75it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27418/30000 [05:45<00:32, 79.40it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27426/30000 [05:45<00:32, 79.16it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27434/30000 [05:45<00:32, 79.04it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 91%|█████████▏| 27443/30000 [05:46<00:32, 79.22it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27451/30000 [05:46<00:32, 79.34it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27459/30000 [05:46<00:32, 79.18it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27468/30000 [05:46<00:31, 79.43it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27476/30000 [05:46<00:32, 78.86it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27484/30000 [05:46<00:31, 79.17it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27493/30000 [05:46<00:31, 79.61it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27502/30000 [05:46<00:31, 79.86it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27510/30000 [05:46<00:31, 79.57it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27518/30000 [05:46<00:31, 79.48it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27527/30000 [05:47<00:31, 79.71it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27536/30000 [05:47<00:30, 79.88it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27544/30000 [05:47<00:30, 79.72it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27552/30000 [05:47<00:30, 79.65it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27560/30000 [05:47<00:30, 79.71it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27568/30000 [05:47<00:30, 79.72it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27577/30000 [05:47<00:30, 79.86it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27585/30000 [05:47<00:30, 79.67it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27593/30000 [05:47<00:30, 79.13it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27602/30000 [05:48<00:30, 79.54it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27610/30000 [05:48<00:30, 79.57it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27618/30000 [05:48<00:29, 79.47it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27626/30000 [05:48<00:29, 79.28it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27635/30000 [05:48<00:29, 79.55it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27644/30000 [05:48<00:29, 79.67it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27652/30000 [05:48<00:29, 79.56it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27660/30000 [05:48<00:29, 79.53it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27668/30000 [05:48<00:29, 79.46it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27676/30000 [05:48<00:29, 79.22it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27684/30000 [05:49<00:29, 79.45it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27692/30000 [05:49<00:29, 79.40it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27700/30000 [05:49<00:28, 79.50it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27708/30000 [05:49<00:28, 79.49it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27716/30000 [05:49<00:28, 79.53it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27724/30000 [05:49<00:28, 79.58it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27732/30000 [05:49<00:28, 79.48it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27740/30000 [05:49<00:28, 79.57it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 92%|█████████▏| 27748/30000 [05:49<00:28, 79.46it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27756/30000 [05:49<00:28, 79.19it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27764/30000 [05:50<00:28, 79.36it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27773/30000 [05:50<00:27, 79.65it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27781/30000 [05:50<00:27, 79.69it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27790/30000 [05:50<00:27, 79.85it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27799/30000 [05:50<00:27, 79.91it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27807/30000 [05:50<00:27, 79.86it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27815/30000 [05:50<00:27, 79.90it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27823/30000 [05:50<00:27, 79.79it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27831/30000 [05:50<00:27, 79.47it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27840/30000 [05:50<00:27, 79.85it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27848/30000 [05:51<00:26, 79.81it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27856/30000 [05:51<00:26, 79.71it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27865/30000 [05:51<00:26, 79.92it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27873/30000 [05:51<00:26, 79.86it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27882/30000 [05:51<00:26, 80.03it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27891/30000 [05:51<00:26, 80.02it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27900/30000 [05:51<00:26, 80.21it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27909/30000 [05:51<00:26, 79.76it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27917/30000 [05:51<00:26, 79.55it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27926/30000 [05:52<00:26, 79.73it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27935/30000 [05:52<00:25, 79.95it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27943/30000 [05:52<00:25, 79.74it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27951/30000 [05:52<00:25, 79.79it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27959/30000 [05:52<00:25, 79.59it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27967/30000 [05:52<00:25, 79.37it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27975/30000 [05:52<00:25, 79.42it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27983/30000 [05:52<00:25, 79.20it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27991/30000 [05:52<00:25, 78.92it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 27999/30000 [05:52<00:25, 78.95it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 28007/30000 [05:53<00:25, 78.99it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 28015/30000 [05:53<00:25, 79.27it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 28023/30000 [05:53<00:24, 79.26it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 28032/30000 [05:53<00:24, 79.43it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 28040/30000 [05:53<00:24, 79.45it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 93%|█████████▎| 28048/30000 [05:53<00:24, 79.41it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28056/30000 [05:53<00:24, 79.58it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28064/30000 [05:53<00:24, 79.29it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28072/30000 [05:53<00:24, 78.88it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28080/30000 [05:54<00:24, 79.02it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28088/30000 [05:54<00:24, 79.08it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28097/30000 [05:54<00:23, 79.44it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28105/30000 [05:54<00:23, 79.33it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28113/30000 [05:54<00:23, 79.28it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▎| 28121/30000 [05:54<00:23, 79.11it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28129/30000 [05:54<00:23, 79.21it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28137/30000 [05:54<00:23, 79.17it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28146/30000 [05:54<00:23, 79.41it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28154/30000 [05:54<00:23, 79.22it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28162/30000 [05:55<00:23, 79.30it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28170/30000 [05:55<00:23, 79.23it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28178/30000 [05:55<00:22, 79.32it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28186/30000 [05:55<00:22, 79.37it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28194/30000 [05:55<00:22, 79.18it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28203/30000 [05:55<00:22, 79.51it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28212/30000 [05:55<00:22, 79.85it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28220/30000 [05:55<00:22, 79.59it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28228/30000 [05:55<00:22, 79.24it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28236/30000 [05:55<00:22, 79.43it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28244/30000 [05:56<00:22, 79.50it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28252/30000 [05:56<00:22, 79.34it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28260/30000 [05:56<00:21, 79.49it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28268/30000 [05:56<00:21, 79.58it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28276/30000 [05:56<00:21, 79.57it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28284/30000 [05:56<00:21, 79.24it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28292/30000 [05:56<00:21, 79.10it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28300/30000 [05:56<00:21, 79.21it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28308/30000 [05:56<00:21, 78.98it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28317/30000 [05:56<00:21, 79.46it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28325/30000 [05:57<00:21, 79.43it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28333/30000 [05:57<00:21, 79.28it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 94%|█████████▍| 28342/30000 [05:57<00:20, 79.75it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28351/30000 [05:57<00:20, 79.75it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28359/30000 [05:57<00:20, 79.74it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28367/30000 [05:57<00:20, 79.74it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28376/30000 [05:57<00:20, 79.89it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28384/30000 [05:57<00:20, 79.58it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28392/30000 [05:57<00:20, 79.53it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28400/30000 [05:58<00:20, 79.57it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28409/30000 [05:58<00:19, 79.87it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28417/30000 [05:58<00:19, 79.83it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28426/30000 [05:58<00:19, 80.03it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28435/30000 [05:58<00:19, 79.92it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28443/30000 [05:58<00:19, 79.80it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28451/30000 [05:58<00:19, 79.56it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28460/30000 [05:58<00:19, 79.87it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28468/30000 [05:58<00:19, 79.44it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28477/30000 [05:59<00:19, 79.74it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28485/30000 [05:59<00:19, 79.72it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▍| 28493/30000 [05:59<00:19, 79.27it/s, init loss: 29902.2090, avg. loss [25501-27000]: 18567.6426] 95%|█████████▌| 28501/30000 [05:59<00:18, 79.27it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28509/30000 [05:59<00:18, 79.18it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28517/30000 [05:59<00:18, 78.96it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28525/30000 [05:59<00:18, 79.06it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28533/30000 [05:59<00:18, 79.09it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28541/30000 [05:59<00:18, 79.23it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28549/30000 [05:59<00:18, 79.22it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28557/30000 [06:00<00:18, 79.26it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28565/30000 [06:00<00:18, 79.37it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28573/30000 [06:00<00:17, 79.41it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28581/30000 [06:00<00:17, 79.34it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28589/30000 [06:00<00:17, 78.90it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28598/30000 [06:00<00:17, 79.36it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28606/30000 [06:00<00:17, 79.24it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28614/30000 [06:00<00:17, 79.32it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28622/30000 [06:00<00:17, 79.33it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28630/30000 [06:00<00:17, 79.49it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28638/30000 [06:01<00:17, 79.57it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 95%|█████████▌| 28647/30000 [06:01<00:16, 79.74it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28655/30000 [06:01<00:16, 79.75it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28663/30000 [06:01<00:16, 79.49it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28672/30000 [06:01<00:16, 79.76it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28681/30000 [06:01<00:16, 79.83it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28690/30000 [06:01<00:16, 79.93it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28698/30000 [06:01<00:16, 79.87it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28706/30000 [06:01<00:16, 79.24it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28714/30000 [06:01<00:16, 79.25it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28722/30000 [06:02<00:16, 79.46it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28730/30000 [06:02<00:15, 79.57it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28738/30000 [06:02<00:15, 79.44it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28746/30000 [06:02<00:15, 79.50it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28754/30000 [06:02<00:15, 79.41it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28762/30000 [06:02<00:15, 79.56it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28771/30000 [06:02<00:15, 79.93it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28779/30000 [06:02<00:15, 79.89it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28787/30000 [06:02<00:15, 79.60it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28796/30000 [06:03<00:15, 79.74it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28804/30000 [06:03<00:14, 79.81it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28813/30000 [06:03<00:14, 80.00it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28821/30000 [06:03<00:14, 79.76it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28829/30000 [06:03<00:14, 79.81it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28837/30000 [06:03<00:14, 79.53it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28845/30000 [06:03<00:14, 79.56it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28853/30000 [06:03<00:14, 79.61it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28861/30000 [06:03<00:14, 79.28it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▌| 28869/30000 [06:03<00:14, 79.24it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28878/30000 [06:04<00:14, 79.45it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28886/30000 [06:04<00:14, 79.39it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28894/30000 [06:04<00:13, 79.48it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28902/30000 [06:04<00:13, 79.48it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28910/30000 [06:04<00:13, 79.44it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28918/30000 [06:04<00:13, 79.47it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28926/30000 [06:04<00:13, 79.31it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28935/30000 [06:04<00:13, 79.69it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 96%|█████████▋| 28943/30000 [06:04<00:13, 79.54it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 28951/30000 [06:04<00:13, 79.61it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 28959/30000 [06:05<00:13, 79.57it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 28967/30000 [06:05<00:13, 79.45it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 28975/30000 [06:05<00:12, 79.48it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 28983/30000 [06:05<00:12, 79.46it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 28991/30000 [06:05<00:12, 79.12it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29000/30000 [06:05<00:12, 79.31it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29008/30000 [06:05<00:12, 79.12it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29017/30000 [06:05<00:12, 79.47it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29025/30000 [06:05<00:12, 79.38it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29033/30000 [06:06<00:12, 79.30it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29041/30000 [06:06<00:12, 79.34it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29049/30000 [06:06<00:11, 79.27it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29057/30000 [06:06<00:11, 79.38it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29066/30000 [06:06<00:11, 79.61it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29074/30000 [06:06<00:11, 79.63it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29082/30000 [06:06<00:11, 79.30it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29090/30000 [06:06<00:11, 79.48it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29098/30000 [06:06<00:11, 79.05it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29106/30000 [06:06<00:11, 78.91it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29114/30000 [06:07<00:11, 78.69it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29122/30000 [06:07<00:11, 78.91it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29130/30000 [06:07<00:10, 79.13it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29138/30000 [06:07<00:10, 79.26it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29146/30000 [06:07<00:10, 78.99it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29154/30000 [06:07<00:10, 79.10it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29162/30000 [06:07<00:10, 79.08it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29171/30000 [06:07<00:10, 79.23it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29179/30000 [06:07<00:10, 79.07it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29187/30000 [06:07<00:10, 79.06it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29196/30000 [06:08<00:10, 79.41it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29204/30000 [06:08<00:10, 79.23it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29212/30000 [06:08<00:09, 79.45it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29220/30000 [06:08<00:09, 79.45it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29228/30000 [06:08<00:09, 79.22it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29236/30000 [06:08<00:09, 79.23it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 97%|█████████▋| 29244/30000 [06:08<00:09, 79.39it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29252/30000 [06:08<00:09, 79.56it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29260/30000 [06:08<00:09, 79.42it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29269/30000 [06:08<00:09, 79.68it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29277/30000 [06:09<00:09, 79.51it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29285/30000 [06:09<00:09, 79.42it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29294/30000 [06:09<00:08, 79.73it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29302/30000 [06:09<00:08, 79.70it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29310/30000 [06:09<00:08, 79.75it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29318/30000 [06:09<00:08, 79.65it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29326/30000 [06:09<00:08, 79.65it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29334/30000 [06:09<00:08, 79.57it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29342/30000 [06:09<00:08, 79.06it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29351/30000 [06:10<00:08, 79.46it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29359/30000 [06:10<00:08, 79.23it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29367/30000 [06:10<00:07, 79.24it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29375/30000 [06:10<00:07, 78.95it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29384/30000 [06:10<00:07, 79.35it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29392/30000 [06:10<00:07, 79.41it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29400/30000 [06:10<00:07, 79.16it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29408/30000 [06:10<00:07, 79.25it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29416/30000 [06:10<00:07, 79.05it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29424/30000 [06:10<00:07, 78.92it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29433/30000 [06:11<00:07, 79.27it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29441/30000 [06:11<00:07, 79.23it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29449/30000 [06:11<00:06, 79.25it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29457/30000 [06:11<00:06, 79.26it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29465/30000 [06:11<00:06, 79.42it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29473/30000 [06:11<00:06, 79.56it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29482/30000 [06:11<00:06, 79.68it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29490/30000 [06:11<00:06, 79.55it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29498/30000 [06:11<00:06, 79.37it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29506/30000 [06:11<00:06, 79.31it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29514/30000 [06:12<00:06, 79.17it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29523/30000 [06:12<00:05, 79.65it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29531/30000 [06:12<00:05, 79.29it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29539/30000 [06:12<00:05, 79.13it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 98%|█████████▊| 29547/30000 [06:12<00:05, 79.27it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29556/30000 [06:12<00:05, 79.55it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29565/30000 [06:12<00:05, 79.89it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29573/30000 [06:12<00:05, 79.79it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29581/30000 [06:12<00:05, 79.55it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29589/30000 [06:13<00:05, 79.64it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29598/30000 [06:13<00:05, 79.74it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29606/30000 [06:13<00:04, 79.65it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29614/30000 [06:13<00:04, 79.67it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▊| 29623/30000 [06:13<00:04, 79.84it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29631/30000 [06:13<00:04, 79.68it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29639/30000 [06:13<00:04, 79.70it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29647/30000 [06:13<00:04, 79.50it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29655/30000 [06:13<00:04, 78.89it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29663/30000 [06:13<00:04, 78.97it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29672/30000 [06:14<00:04, 79.33it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29680/30000 [06:14<00:04, 79.38it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29689/30000 [06:14<00:03, 79.66it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29698/30000 [06:14<00:03, 79.73it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29706/30000 [06:14<00:03, 79.50it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29714/30000 [06:14<00:03, 79.51it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29723/30000 [06:14<00:03, 79.76it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29731/30000 [06:14<00:03, 79.55it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29739/30000 [06:14<00:03, 79.49it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29748/30000 [06:15<00:03, 79.62it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29756/30000 [06:15<00:03, 79.61it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29764/30000 [06:15<00:02, 79.59it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29772/30000 [06:15<00:02, 79.42it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29780/30000 [06:15<00:02, 79.55it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29788/30000 [06:15<00:02, 79.39it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29796/30000 [06:15<00:02, 79.39it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29804/30000 [06:15<00:02, 79.54it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29812/30000 [06:15<00:02, 79.51it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29820/30000 [06:15<00:02, 78.71it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29828/30000 [06:16<00:02, 79.08it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29836/30000 [06:16<00:02, 79.16it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758] 99%|█████████▉| 29844/30000 [06:16<00:01, 79.17it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29852/30000 [06:16<00:01, 79.05it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29861/30000 [06:16<00:01, 79.26it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29869/30000 [06:16<00:01, 79.15it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29877/30000 [06:16<00:01, 79.17it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29885/30000 [06:16<00:01, 79.21it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29893/30000 [06:16<00:01, 79.05it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29901/30000 [06:16<00:01, 78.96it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29909/30000 [06:17<00:01, 78.93it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29917/30000 [06:17<00:01, 79.21it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29925/30000 [06:17<00:00, 79.18it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29933/30000 [06:17<00:00, 79.15it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29941/30000 [06:17<00:00, 78.79it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29950/30000 [06:17<00:00, 79.25it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29958/30000 [06:17<00:00, 79.18it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29967/30000 [06:17<00:00, 79.56it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29975/30000 [06:17<00:00, 79.14it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29983/30000 [06:17<00:00, 79.25it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29991/30000 [06:18<00:00, 79.10it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|█████████▉| 29999/30000 [06:18<00:00, 79.14it/s, init loss: 29902.2090, avg. loss [27001-28500]: 18551.6758]100%|██████████| 30000/30000 [06:18<00:00, 79.32it/s, init loss: 29902.2090, avg. loss [28501-30000]: 18537.6777]

posterior_pooled_zero_inflated_regression_svi = sample_posterior_predictive_svi(
    rng_key=RNG_KEY,
    model=pooled_zero_inflated_negative_binomial_regression,
    guide=svi_pooled_zero_inflated_regression_guide,
    covariates_hat=zero_inflated_regression_covariates_hat,
    svi_result=svi_pooled_zero_inflated_regression_parameters,
    num_samples=1500,
    model_kwargs=pooled_zero_inflated_regression_kwargs,
    return_sites=pooled_zero_inflated_regression_parameters,
)

Differently from the qunatile regression, we can now visualize two parameters for each of the various components, the gate \(p\) and the mean \(\mu\). The first can be understood as the probability of any hail happening in a given county at a given point in time. The second is the expected number of hail events given that hail is happening.

visualize_geo_regression(
    covariates_hat_df=zero_inflated_regression_covariates_hat_df,
    posterior=posterior_pooled_zero_inflated_regression_svi,
    parameter="alpha_gate",
)
plt.show()

visualize_geo_regression(
    covariates_hat_df=zero_inflated_regression_covariates_hat_df,
    posterior=posterior_pooled_zero_inflated_regression_svi,
    parameter="alpha_mean",
)
plt.show()

fig, axs = visualize_temporal_components(
    temporal_components=generate_temporal_components(
        posterior=posterior_pooled_zero_inflated_regression_svi,
        transformers=zero_inflated_regression_transformers,
        years=count_state_modelling_df[YEAR_COVARIATES].unique(),
        suffix="_gate",
    )
)
plt.show()

fig, axs = visualize_temporal_components(
    temporal_components=generate_temporal_components(
        posterior=posterior_pooled_zero_inflated_regression_svi,
        transformers=zero_inflated_regression_transformers,
        years=count_state_modelling_df[YEAR_COVARIATES].unique(),
        suffix="_mean",
    )
)
plt.show()

Partially Pooled

In the same way we did for the quantile regression, we also tried to vary the intercept of both the gate and the mean models across counties using a partial pooling strategy

$$ \[\begin{gather} \color{RedOrange}\sigma_{Gate, County} \sim HalfCauchy(\sigma=5) \\ \color{RedOrange}\mu_{Gate, County} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{RedOrange}\alpha_{Gate, County} \sim \mathcal{N}(\mu_{GateCounty}, \sigma_{GateCounty}) \\ \color{NavyBlue}\sigma_{Mean, County} \sim HalfCauchy(\sigma=5) \\ \color{NavyBlue}\mu_{Mean, County} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\alpha_{Mean, County} \sim \mathcal{N}(\mu_{MeanCounty}, \sigma_{MeanCounty}) \\ \\ \color{RedOrange}\beta_{GateHour} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{RedOrange}\beta_{GateMonth} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{RedOrange}\beta_{GateYear} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\beta_{MeanHour} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\beta_{MeanMonth} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \color{NavyBlue}\beta_{MeanYear} \sim \mathcal{N}(\mu=0, \sigma=1) \\ \\ \color{RedOrange}p = expit(\alpha_{Gate, County} + \beta_{GateHour}f(hour) + \beta_{GateMonth}f(month) + \beta_{GateYear}f(year))\\ \color{NavyBlue}\mu = exp(\alpha_{Mean, County} + \beta_{MeanHour}f(hour) + \beta_{MeanMonth}f(month) + \beta_{GateYear}f(year))\\ \\ \lambda \sim InverseGamma(0.3, 0.4)\\ \ y \sim ZeroInflatedNegativeBinomial(p, \mu, \lambda) \end{gather}\] $$

reparam_config = {"alpha_gate": LocScaleReparam(0), "alpha_mean": LocScaleReparam(0)}


@numpyro.handlers.reparam(config=reparam_config)
def hierarchical_zero_inflated_negative_binomial_regression(
    target: ArrayLike,
    covariates: Dict[str, ArrayLike],
    prior_rate: Distribution,
    prior_beta_year_gate: Distribution,
    prior_beta_month_gate: Distribution,
    prior_beta_hour_gate: Distribution,
    prior_beta_year_mean: Distribution,
    prior_beta_month_mean: Distribution,
    prior_beta_hour_mean: Distribution,
    prior_mu_alpha_gate: Distribution,
    prior_sigma_alpha_gate: Distribution,
    prior_mu_alpha_mean: Distribution,
    prior_sigma_alpha_mean: Distribution,
) -> None:
    """Hierarchical zero-inflated Negative Binomial regression model"""
    number_groups = len(np.unique(covariates["counties_index"]))
    counties_index = covariates["counties_index"].flatten()

    mu_alpha_gate = numpyro.sample(
        "mu_alpha_gate",
        prior_mu_alpha_gate,
    )
    sigma_alpha_gate = numpyro.sample(
        "sigma_alpha_gate",
        prior_sigma_alpha_gate,
    )
    mu_alpha_mean = numpyro.sample(
        "mu_alpha_mean",
        prior_mu_alpha_mean,
    )
    sigma_alpha_mean = numpyro.sample(
        "sigma_alpha_mean",
        prior_sigma_alpha_mean,
    )

    with numpyro.plate("counties", size=number_groups):

        alpha_gate = numpyro.sample(
            "alpha_gate",
            Normal(mu_alpha_gate, sigma_alpha_gate),
        )
        alpha_mean = numpyro.sample(
            "alpha_mean",
            Normal(mu_alpha_mean, sigma_alpha_mean),
        )

    with numpyro.plate(
        "year_spline_coefficients", size=covariates["year_covariates"].shape[1]
    ):
        beta_year_gate = numpyro.sample(
            "beta_year_gate",
            prior_beta_year_gate,
        )
        beta_year_mean = numpyro.sample(
            "beta_year_mean",
            prior_beta_year_mean,
        )

    with numpyro.plate(
        "hour_spline_coefficients", size=covariates["hour_covariates"].shape[1]
    ):
        beta_hour_gate = numpyro.sample(
            "beta_hour_gate",
            prior_beta_hour_gate,
        )
        beta_hour_mean = numpyro.sample(
            "beta_hour_mean",
            prior_beta_hour_mean,
        )

    with numpyro.plate(
        "month_spline_coefficients", size=covariates["month_covariates"].shape[1]
    ):
        beta_month_gate = numpyro.sample(
            "beta_month_gate",
            prior_beta_month_gate,
        )
        beta_month_mean = numpyro.sample(
            "beta_month_mean",
            prior_beta_month_mean,
        )

    # Year component
    year_component_gate = numpyro.deterministic(
        name="year_component_gate",
        value=jnp.dot(covariates["year_covariates"], beta_year_gate),
    )
    year_component_mean = numpyro.deterministic(
        name="year_component_mean",
        value=jnp.dot(covariates["year_covariates"], beta_year_mean),
    )
    # Month component
    month_component_gate = numpyro.deterministic(
        name="month_component_gate",
        value=jnp.dot(covariates["month_covariates"], beta_month_gate),
    )
    month_component_mean = numpyro.deterministic(
        name="month_component_mean",
        value=jnp.dot(covariates["month_covariates"], beta_month_mean),
    )
    # Hour component
    hour_component_gate = numpyro.deterministic(
        name="hour_component_gate",
        value=jnp.dot(covariates["hour_covariates"], beta_hour_gate),
    )
    hour_component_mean = numpyro.deterministic(
        name="hour_component_mean",
        value=jnp.dot(covariates["hour_covariates"], beta_hour_mean),
    )

    # Temporal components
    temporal_component_gate = numpyro.deterministic(
        name="temporal_component_gate",
        value=year_component_gate + month_component_gate + hour_component_gate,
    )
    temporal_component_mean = numpyro.deterministic(
        name="temporal_component_mean",
        value=year_component_mean + month_component_mean + hour_component_mean,
    )

    spatial_component_mean = numpyro.deterministic(
        name="spatial_component_mean",
        value=alpha_mean[counties_index],
    )

    spatial_component_gate = numpyro.deterministic(
        name="spatial_component_gate",
        value=alpha_gate[counties_index],
    )

    gate = numpyro.deterministic(
        name="gate",
        value=1 - expit(spatial_component_gate + temporal_component_gate),
    )
    mean = numpyro.deterministic(
        name="mean",
        value=jnp.exp(spatial_component_mean + temporal_component_mean),
    )
    rate = numpyro.sample(
        "rate",
        prior_rate,
    )

    obs = numpyro.sample(
        "obs",
        ZeroInflatedDistribution(
            base_dist=NegativeBinomial2(mean, rate),
            gate=gate,
        ),
        obs=target,
    )

    if target is not None:
        numpyro.deterministic(
            "log_likelihood",
            ZeroInflatedDistribution(
                base_dist=NegativeBinomial2(mean, rate),
                gate=gate,
            )
            .log_prob(target)
        )

hierarchical_zero_inflated_regression_parameters = [
    "rate",
    "mean",
    "gate",
    "mu_alpha_gate",
    "sigma_alpha_gate",
    "mu_alpha_mean",
    "sigma_alpha_mean",
    "alpha_gate",
    "spatial_component_mean",
    "spatial_component_gate",
    "alpha_mean",
    "beta_year_mean",
    "beta_month_mean",
    "beta_hour_mean",
    "beta_year_gate",
    "beta_month_gate",
    "beta_hour_gate",
    "obs",
]
hierarchical_zero_inflated_regression_kwargs = {
    "covariates": zero_inflated_regression_covariates,
    "target": zero_inflated_regression_target,
    "prior_rate": InverseGamma(0.4, 0.3),
    "prior_mu_alpha_gate": Normal(loc=0.0, scale=1),
    "prior_sigma_alpha_gate": HalfCauchy(5),
    "prior_mu_alpha_mean": Normal(loc=0.0, scale=1),
    "prior_sigma_alpha_mean": HalfCauchy(5),
    "prior_beta_year_gate": Normal(loc=0.0, scale=1),
    "prior_beta_month_gate": Normal(loc=0.0, scale=1),
    "prior_beta_hour_gate": Normal(loc=0.0, scale=1),
    "prior_beta_year_mean": Normal(loc=0.0, scale=1),
    "prior_beta_month_mean": Normal(loc=0.0, scale=1),
    "prior_beta_hour_mean": Normal(loc=0.0, scale=1),
}
numpyro.render_model(
    hierarchical_zero_inflated_negative_binomial_regression,
    model_kwargs=hierarchical_zero_inflated_regression_kwargs,
    render_distributions=False,
)

(
    svi_hierarchical_zero_inflated_regression_parameters,
    svi_hierarchical_zero_inflated_regression_guide,
) = sample_using_svi(
    rng_key=RNG_KEY,
    model=hierarchical_zero_inflated_negative_binomial_regression,
    model_kwargs=hierarchical_zero_inflated_regression_kwargs,
    autoguide=AutoLowRankMultivariateNormal,
    guide_kwargs={},
    optimizer_kwargs={"step_size": 1e-4, "clip_norm": 5},
    num_steps=NUMBER_ITERATIONS,
    num_particles=NUMBER_PARTICLES,
)
  0%|          | 0/30000 [00:00<?, ?it/s]  0%|          | 1/30000 [00:00<6:53:00,  1.21it/s]  0%|          | 8/30000 [00:00<44:18, 11.28it/s]    0%|          | 16/30000 [00:01<21:45, 22.97it/s]  0%|          | 24/30000 [00:01<14:48, 33.73it/s]  0%|          | 32/30000 [00:01<11:33, 43.21it/s]  0%|          | 40/30000 [00:01<09:42, 51.40it/s]  0%|          | 48/30000 [00:01<08:40, 57.56it/s]  0%|          | 56/30000 [00:01<07:58, 62.63it/s]  0%|          | 64/30000 [00:01<07:29, 66.53it/s]  0%|          | 72/30000 [00:01<07:12, 69.24it/s]  0%|          | 80/30000 [00:01<06:59, 71.25it/s]  0%|          | 88/30000 [00:01<06:51, 72.74it/s]  0%|          | 96/30000 [00:02<06:44, 73.89it/s]  0%|          | 104/30000 [00:02<06:44, 73.94it/s]  0%|          | 112/30000 [00:02<06:40, 74.67it/s]  0%|          | 120/30000 [00:02<06:36, 75.27it/s]  0%|          | 128/30000 [00:02<06:33, 75.90it/s]  0%|          | 136/30000 [00:02<06:32, 76.06it/s]  0%|          | 144/30000 [00:02<06:31, 76.30it/s]  1%|          | 152/30000 [00:02<06:31, 76.25it/s]  1%|          | 160/30000 [00:02<06:30, 76.43it/s]  1%|          | 168/30000 [00:03<06:31, 76.15it/s]  1%|          | 176/30000 [00:03<06:31, 76.18it/s]  1%|          | 184/30000 [00:03<06:31, 76.21it/s]  1%|          | 192/30000 [00:03<06:29, 76.49it/s]  1%|          | 200/30000 [00:03<06:32, 75.95it/s]  1%|          | 208/30000 [00:03<06:31, 76.15it/s]  1%|          | 216/30000 [00:03<06:30, 76.33it/s]  1%|          | 224/30000 [00:03<06:31, 76.04it/s]  1%|          | 232/30000 [00:03<06:30, 76.18it/s]  1%|          | 240/30000 [00:03<06:32, 75.90it/s]  1%|          | 248/30000 [00:04<06:31, 76.03it/s]  1%|          | 256/30000 [00:04<06:30, 76.15it/s]  1%|          | 264/30000 [00:04<06:30, 76.10it/s]  1%|          | 272/30000 [00:04<06:30, 76.15it/s]  1%|          | 280/30000 [00:04<06:29, 76.24it/s]  1%|          | 288/30000 [00:04<06:28, 76.48it/s]  1%|          | 296/30000 [00:04<06:30, 76.06it/s]  1%|          | 304/30000 [00:04<06:29, 76.22it/s]  1%|          | 312/30000 [00:04<06:30, 76.07it/s]  1%|          | 320/30000 [00:05<06:30, 75.93it/s]  1%|          | 328/30000 [00:05<06:31, 75.70it/s]  1%|          | 336/30000 [00:05<06:30, 75.88it/s]  1%|          | 344/30000 [00:05<06:30, 75.94it/s]  1%|          | 352/30000 [00:05<06:28, 76.24it/s]  1%|          | 360/30000 [00:05<06:29, 76.09it/s]  1%|          | 368/30000 [00:05<06:28, 76.28it/s]  1%|▏         | 376/30000 [00:05<06:28, 76.30it/s]  1%|▏         | 384/30000 [00:05<06:29, 76.12it/s]  1%|▏         | 392/30000 [00:05<06:29, 76.06it/s]  1%|▏         | 400/30000 [00:06<06:29, 75.97it/s]  1%|▏         | 408/30000 [00:06<06:29, 76.04it/s]  1%|▏         | 416/30000 [00:06<06:28, 76.20it/s]  1%|▏         | 424/30000 [00:06<06:28, 76.12it/s]  1%|▏         | 432/30000 [00:06<06:26, 76.48it/s]  1%|▏         | 440/30000 [00:06<06:26, 76.54it/s]  1%|▏         | 448/30000 [00:06<06:27, 76.27it/s]  2%|▏         | 456/30000 [00:06<06:29, 75.80it/s]  2%|▏         | 464/30000 [00:06<06:28, 75.98it/s]  2%|▏         | 472/30000 [00:07<06:28, 75.94it/s]  2%|▏         | 480/30000 [00:07<06:28, 76.01it/s]  2%|▏         | 488/30000 [00:07<06:28, 75.89it/s]  2%|▏         | 496/30000 [00:07<06:27, 76.22it/s]  2%|▏         | 504/30000 [00:07<06:26, 76.37it/s]  2%|▏         | 512/30000 [00:07<06:25, 76.53it/s]  2%|▏         | 520/30000 [00:07<06:26, 76.35it/s]  2%|▏         | 528/30000 [00:07<06:24, 76.70it/s]  2%|▏         | 536/30000 [00:07<06:23, 76.90it/s]  2%|▏         | 544/30000 [00:07<06:25, 76.47it/s]  2%|▏         | 552/30000 [00:08<06:25, 76.43it/s]  2%|▏         | 560/30000 [00:08<06:25, 76.34it/s]  2%|▏         | 568/30000 [00:08<06:24, 76.52it/s]  2%|▏         | 576/30000 [00:08<06:25, 76.25it/s]  2%|▏         | 584/30000 [00:08<06:25, 76.31it/s]  2%|▏         | 592/30000 [00:08<06:27, 75.90it/s]  2%|▏         | 600/30000 [00:08<06:28, 75.66it/s]  2%|▏         | 608/30000 [00:08<06:27, 75.90it/s]  2%|▏         | 616/30000 [00:08<06:25, 76.18it/s]  2%|▏         | 624/30000 [00:09<06:23, 76.65it/s]  2%|▏         | 632/30000 [00:09<06:25, 76.24it/s]  2%|▏         | 640/30000 [00:09<06:25, 76.23it/s]  2%|▏         | 648/30000 [00:09<06:23, 76.62it/s]  2%|▏         | 656/30000 [00:09<06:24, 76.39it/s]  2%|▏         | 664/30000 [00:09<06:25, 76.15it/s]  2%|▏         | 672/30000 [00:09<06:25, 76.11it/s]  2%|▏         | 680/30000 [00:09<06:26, 75.95it/s]  2%|▏         | 688/30000 [00:09<06:25, 76.07it/s]  2%|▏         | 696/30000 [00:09<06:25, 76.11it/s]  2%|▏         | 704/30000 [00:10<06:24, 76.26it/s]  2%|▏         | 712/30000 [00:10<06:24, 76.15it/s]  2%|▏         | 720/30000 [00:10<06:23, 76.39it/s]  2%|▏         | 728/30000 [00:10<06:23, 76.30it/s]  2%|▏         | 736/30000 [00:10<06:23, 76.40it/s]  2%|▏         | 744/30000 [00:10<06:22, 76.54it/s]  3%|▎         | 752/30000 [00:10<06:23, 76.23it/s]  3%|▎         | 760/30000 [00:10<06:24, 76.02it/s]  3%|▎         | 768/30000 [00:10<06:23, 76.15it/s]  3%|▎         | 776/30000 [00:11<06:22, 76.34it/s]  3%|▎         | 784/30000 [00:11<06:24, 76.00it/s]  3%|▎         | 792/30000 [00:11<06:24, 76.02it/s]  3%|▎         | 800/30000 [00:11<06:23, 76.16it/s]  3%|▎         | 808/30000 [00:11<06:25, 75.63it/s]  3%|▎         | 816/30000 [00:11<06:26, 75.42it/s]  3%|▎         | 824/30000 [00:11<06:24, 75.83it/s]  3%|▎         | 832/30000 [00:11<06:23, 75.98it/s]  3%|▎         | 840/30000 [00:11<06:23, 76.01it/s]  3%|▎         | 848/30000 [00:11<06:23, 76.08it/s]  3%|▎         | 856/30000 [00:12<06:22, 76.23it/s]  3%|▎         | 864/30000 [00:12<06:24, 75.69it/s]  3%|▎         | 872/30000 [00:12<06:24, 75.81it/s]  3%|▎         | 880/30000 [00:12<06:22, 76.17it/s]  3%|▎         | 888/30000 [00:12<06:23, 76.01it/s]  3%|▎         | 896/30000 [00:12<06:21, 76.26it/s]  3%|▎         | 904/30000 [00:12<06:21, 76.23it/s]  3%|▎         | 912/30000 [00:12<06:22, 76.01it/s]  3%|▎         | 920/30000 [00:12<06:23, 75.77it/s]  3%|▎         | 928/30000 [00:13<06:21, 76.18it/s]  3%|▎         | 936/30000 [00:13<06:22, 75.92it/s]  3%|▎         | 944/30000 [00:13<06:25, 75.44it/s]  3%|▎         | 952/30000 [00:13<06:28, 74.81it/s]  3%|▎         | 960/30000 [00:13<06:27, 74.85it/s]  3%|▎         | 968/30000 [00:13<06:25, 75.22it/s]  3%|▎         | 976/30000 [00:13<06:25, 75.32it/s]  3%|▎         | 984/30000 [00:13<06:24, 75.41it/s]  3%|▎         | 992/30000 [00:13<06:25, 75.29it/s]  3%|▎         | 1000/30000 [00:13<06:23, 75.70it/s]  3%|▎         | 1008/30000 [00:14<06:22, 75.72it/s]  3%|▎         | 1016/30000 [00:14<06:23, 75.57it/s]  3%|▎         | 1024/30000 [00:14<06:22, 75.71it/s]  3%|▎         | 1032/30000 [00:14<06:23, 75.62it/s]  3%|▎         | 1040/30000 [00:14<06:22, 75.65it/s]  3%|▎         | 1048/30000 [00:14<06:24, 75.21it/s]  4%|▎         | 1056/30000 [00:14<06:23, 75.49it/s]  4%|▎         | 1064/30000 [00:14<06:22, 75.57it/s]  4%|▎         | 1072/30000 [00:14<06:22, 75.66it/s]  4%|▎         | 1080/30000 [00:15<06:21, 75.74it/s]  4%|▎         | 1088/30000 [00:15<06:22, 75.62it/s]  4%|▎         | 1096/30000 [00:15<06:23, 75.35it/s]  4%|▎         | 1104/30000 [00:15<06:23, 75.38it/s]  4%|▎         | 1112/30000 [00:15<06:24, 75.17it/s]  4%|▎         | 1120/30000 [00:15<06:23, 75.27it/s]  4%|▍         | 1128/30000 [00:15<06:22, 75.58it/s]  4%|▍         | 1136/30000 [00:15<06:20, 75.83it/s]  4%|▍         | 1144/30000 [00:15<06:20, 75.87it/s]  4%|▍         | 1152/30000 [00:15<06:20, 75.84it/s]  4%|▍         | 1160/30000 [00:16<06:18, 76.27it/s]  4%|▍         | 1168/30000 [00:16<06:21, 75.54it/s]  4%|▍         | 1176/30000 [00:16<06:20, 75.76it/s]  4%|▍         | 1184/30000 [00:16<06:19, 75.95it/s]  4%|▍         | 1192/30000 [00:16<06:18, 76.16it/s]  4%|▍         | 1200/30000 [00:16<06:17, 76.19it/s]  4%|▍         | 1208/30000 [00:16<06:18, 76.14it/s]  4%|▍         | 1216/30000 [00:16<06:17, 76.19it/s]  4%|▍         | 1224/30000 [00:16<06:16, 76.37it/s]  4%|▍         | 1232/30000 [00:17<06:16, 76.44it/s]  4%|▍         | 1240/30000 [00:17<06:17, 76.16it/s]  4%|▍         | 1248/30000 [00:17<06:17, 76.19it/s]  4%|▍         | 1256/30000 [00:17<06:17, 76.19it/s]  4%|▍         | 1264/30000 [00:17<06:17, 76.13it/s]  4%|▍         | 1272/30000 [00:17<06:17, 76.04it/s]  4%|▍         | 1280/30000 [00:17<06:17, 75.98it/s]  4%|▍         | 1288/30000 [00:17<06:16, 76.31it/s]  4%|▍         | 1296/30000 [00:17<06:15, 76.38it/s]  4%|▍         | 1304/30000 [00:17<06:17, 76.00it/s]  4%|▍         | 1312/30000 [00:18<06:14, 76.58it/s]  4%|▍         | 1320/30000 [00:18<06:16, 76.09it/s]  4%|▍         | 1328/30000 [00:18<06:17, 76.04it/s]  4%|▍         | 1336/30000 [00:18<06:16, 76.03it/s]  4%|▍         | 1344/30000 [00:18<06:14, 76.51it/s]  5%|▍         | 1352/30000 [00:18<06:14, 76.51it/s]  5%|▍         | 1360/30000 [00:18<06:14, 76.48it/s]  5%|▍         | 1368/30000 [00:18<06:13, 76.62it/s]  5%|▍         | 1376/30000 [00:18<06:14, 76.34it/s]  5%|▍         | 1384/30000 [00:19<06:13, 76.51it/s]  5%|▍         | 1392/30000 [00:19<06:17, 75.79it/s]  5%|▍         | 1400/30000 [00:19<06:16, 76.05it/s]  5%|▍         | 1408/30000 [00:19<06:16, 75.95it/s]  5%|▍         | 1416/30000 [00:19<06:14, 76.29it/s]  5%|▍         | 1424/30000 [00:19<06:13, 76.44it/s]  5%|▍         | 1432/30000 [00:19<06:12, 76.68it/s]  5%|▍         | 1440/30000 [00:19<06:13, 76.45it/s]  5%|▍         | 1448/30000 [00:19<06:13, 76.38it/s]  5%|▍         | 1456/30000 [00:19<06:15, 76.10it/s]  5%|▍         | 1464/30000 [00:20<06:13, 76.34it/s]  5%|▍         | 1472/30000 [00:20<06:15, 76.06it/s]  5%|▍         | 1480/30000 [00:20<06:14, 76.16it/s]  5%|▍         | 1488/30000 [00:20<06:12, 76.48it/s]  5%|▍         | 1496/30000 [00:20<06:12, 76.49it/s]  5%|▌         | 1504/30000 [00:20<06:12, 76.41it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1512/30000 [00:20<06:13, 76.28it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1520/30000 [00:20<06:11, 76.61it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1528/30000 [00:20<06:10, 76.79it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1536/30000 [00:21<06:10, 76.79it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1544/30000 [00:21<06:11, 76.52it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1552/30000 [00:21<06:14, 75.94it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1560/30000 [00:21<06:13, 76.10it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1568/30000 [00:21<06:13, 76.09it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1576/30000 [00:21<06:14, 75.98it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1584/30000 [00:21<06:14, 75.92it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1592/30000 [00:21<06:15, 75.66it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1600/30000 [00:21<06:12, 76.28it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1608/30000 [00:21<06:12, 76.22it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1616/30000 [00:22<06:14, 75.85it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1624/30000 [00:22<06:15, 75.63it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1632/30000 [00:22<06:13, 76.03it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1640/30000 [00:22<06:13, 76.02it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  5%|▌         | 1648/30000 [00:22<06:12, 76.18it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1656/30000 [00:22<06:12, 76.03it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1664/30000 [00:22<06:12, 76.16it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1672/30000 [00:22<06:11, 76.21it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1680/30000 [00:22<06:14, 75.56it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1688/30000 [00:23<06:13, 75.73it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1696/30000 [00:23<06:14, 75.54it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1704/30000 [00:23<06:13, 75.71it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1712/30000 [00:23<06:12, 75.99it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1720/30000 [00:23<06:13, 75.81it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1728/30000 [00:23<06:10, 76.23it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1736/30000 [00:23<06:11, 76.15it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1744/30000 [00:23<06:11, 76.13it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1752/30000 [00:23<06:10, 76.19it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1760/30000 [00:23<06:09, 76.51it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1768/30000 [00:24<06:12, 75.84it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1776/30000 [00:24<06:11, 75.99it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1784/30000 [00:24<06:11, 75.98it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1792/30000 [00:24<06:10, 76.10it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1800/30000 [00:24<06:10, 76.12it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1808/30000 [00:24<06:09, 76.26it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1816/30000 [00:24<06:13, 75.41it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1824/30000 [00:24<06:12, 75.57it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1832/30000 [00:24<06:11, 75.83it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1840/30000 [00:25<06:10, 75.91it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1848/30000 [00:25<06:10, 76.03it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1856/30000 [00:25<06:08, 76.35it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1864/30000 [00:25<06:08, 76.40it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▌         | 1872/30000 [00:25<06:09, 76.18it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1880/30000 [00:25<06:08, 76.22it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1888/30000 [00:25<06:07, 76.46it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1896/30000 [00:25<06:09, 76.05it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1904/30000 [00:25<06:08, 76.24it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1912/30000 [00:25<06:08, 76.30it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1920/30000 [00:26<06:06, 76.54it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1928/30000 [00:26<06:10, 75.86it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1936/30000 [00:26<06:10, 75.73it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  6%|▋         | 1944/30000 [00:26<06:10, 75.82it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 1952/30000 [00:26<06:09, 75.95it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 1960/30000 [00:26<06:09, 75.98it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 1968/30000 [00:26<06:09, 75.92it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 1976/30000 [00:26<06:08, 76.11it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 1984/30000 [00:26<06:06, 76.44it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 1992/30000 [00:27<06:04, 76.84it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2000/30000 [00:27<06:05, 76.68it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2008/30000 [00:27<06:04, 76.71it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2016/30000 [00:27<06:05, 76.51it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2024/30000 [00:27<06:04, 76.84it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2032/30000 [00:27<06:07, 76.09it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2040/30000 [00:27<06:07, 76.08it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2048/30000 [00:27<06:06, 76.31it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2056/30000 [00:27<06:05, 76.50it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2064/30000 [00:27<06:05, 76.42it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2072/30000 [00:28<06:08, 75.76it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2080/30000 [00:28<06:09, 75.47it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2088/30000 [00:28<06:08, 75.66it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2096/30000 [00:28<06:07, 76.01it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2104/30000 [00:28<06:07, 75.96it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2112/30000 [00:28<06:07, 75.84it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2120/30000 [00:28<06:07, 75.93it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2128/30000 [00:28<06:06, 76.00it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2136/30000 [00:28<06:07, 75.90it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2144/30000 [00:29<06:06, 76.08it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2152/30000 [00:29<06:05, 76.17it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2160/30000 [00:29<06:06, 75.87it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2168/30000 [00:29<06:06, 75.99it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2176/30000 [00:29<06:05, 76.18it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2184/30000 [00:29<06:04, 76.27it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2192/30000 [00:29<06:03, 76.44it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2200/30000 [00:29<06:03, 76.51it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2208/30000 [00:29<06:02, 76.68it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2216/30000 [00:29<06:00, 76.97it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2224/30000 [00:30<06:00, 77.13it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2232/30000 [00:30<06:02, 76.54it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2240/30000 [00:30<06:03, 76.45it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  7%|▋         | 2248/30000 [00:30<06:03, 76.39it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2256/30000 [00:30<06:03, 76.27it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2264/30000 [00:30<06:02, 76.55it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2272/30000 [00:30<06:01, 76.66it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2280/30000 [00:30<06:02, 76.44it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2288/30000 [00:30<06:02, 76.47it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2296/30000 [00:30<06:03, 76.26it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2304/30000 [00:31<06:05, 75.80it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2312/30000 [00:31<06:04, 75.97it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2320/30000 [00:31<06:02, 76.41it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2328/30000 [00:31<06:01, 76.49it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2336/30000 [00:31<06:00, 76.83it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2344/30000 [00:31<05:58, 77.15it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2352/30000 [00:31<05:59, 76.82it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2360/30000 [00:31<06:01, 76.53it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2368/30000 [00:31<06:03, 76.09it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2376/30000 [00:32<06:03, 75.90it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2384/30000 [00:32<06:04, 75.73it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2392/30000 [00:32<06:04, 75.69it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2400/30000 [00:32<06:02, 76.03it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2408/30000 [00:32<06:03, 75.97it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2416/30000 [00:32<06:01, 76.24it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2424/30000 [00:32<06:02, 76.12it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2432/30000 [00:32<06:01, 76.21it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2440/30000 [00:32<06:01, 76.15it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2448/30000 [00:32<05:59, 76.61it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2456/30000 [00:33<06:01, 76.23it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2464/30000 [00:33<06:01, 76.21it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2472/30000 [00:33<06:02, 76.00it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2480/30000 [00:33<06:01, 76.03it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2488/30000 [00:33<06:01, 76.13it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2496/30000 [00:33<06:01, 76.17it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2504/30000 [00:33<05:59, 76.52it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2512/30000 [00:33<06:01, 76.00it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2520/30000 [00:33<06:00, 76.13it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2528/30000 [00:34<06:01, 76.03it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2536/30000 [00:34<06:03, 75.45it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  8%|▊         | 2544/30000 [00:34<06:03, 75.49it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2552/30000 [00:34<06:02, 75.66it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2560/30000 [00:34<06:03, 75.58it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2568/30000 [00:34<06:01, 75.86it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2576/30000 [00:34<06:01, 75.89it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2584/30000 [00:34<06:01, 75.80it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2592/30000 [00:34<06:00, 75.97it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2600/30000 [00:34<06:00, 76.10it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2608/30000 [00:35<05:59, 76.24it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2616/30000 [00:35<05:59, 76.17it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▊         | 2624/30000 [00:35<05:59, 76.10it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2632/30000 [00:35<05:58, 76.27it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2640/30000 [00:35<06:00, 75.91it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2648/30000 [00:35<05:59, 76.01it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2656/30000 [00:35<06:01, 75.55it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2664/30000 [00:35<06:01, 75.57it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2672/30000 [00:35<06:03, 75.16it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2680/30000 [00:36<06:00, 75.71it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2688/30000 [00:36<06:00, 75.66it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2696/30000 [00:36<06:01, 75.51it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2704/30000 [00:36<05:59, 75.86it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2712/30000 [00:36<05:59, 75.97it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2720/30000 [00:36<06:01, 75.47it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2728/30000 [00:36<05:59, 75.86it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2736/30000 [00:36<05:59, 75.79it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2744/30000 [00:36<05:59, 75.90it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2752/30000 [00:36<05:58, 76.03it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2760/30000 [00:37<06:00, 75.54it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2768/30000 [00:37<06:01, 75.35it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2776/30000 [00:37<06:01, 75.34it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2784/30000 [00:37<06:00, 75.52it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2792/30000 [00:37<06:00, 75.48it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2800/30000 [00:37<05:59, 75.75it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2808/30000 [00:37<05:59, 75.70it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2816/30000 [00:37<05:57, 75.94it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2824/30000 [00:37<05:58, 75.82it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2832/30000 [00:38<05:56, 76.11it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2840/30000 [00:38<05:56, 76.09it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250]  9%|▉         | 2848/30000 [00:38<05:57, 75.95it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2856/30000 [00:38<05:57, 75.98it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2864/30000 [00:38<05:56, 76.08it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2872/30000 [00:38<05:56, 76.00it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2880/30000 [00:38<05:55, 76.27it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2888/30000 [00:38<05:54, 76.39it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2896/30000 [00:38<05:55, 76.23it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2904/30000 [00:38<05:57, 75.88it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2912/30000 [00:39<05:57, 75.70it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2920/30000 [00:39<05:57, 75.71it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2928/30000 [00:39<05:57, 75.75it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2936/30000 [00:39<05:57, 75.68it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2944/30000 [00:39<05:56, 75.93it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2952/30000 [00:39<05:55, 76.13it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2960/30000 [00:39<05:55, 75.99it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2968/30000 [00:39<05:55, 76.01it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2976/30000 [00:39<05:54, 76.21it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2984/30000 [00:40<05:54, 76.22it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|▉         | 2992/30000 [00:40<05:55, 76.02it/s, init loss: 40160.1836, avg. loss [1-1500]: 35580.6250] 10%|█         | 3000/30000 [00:40<05:55, 75.98it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3008/30000 [00:40<05:56, 75.79it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3016/30000 [00:40<05:56, 75.64it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3024/30000 [00:40<05:54, 76.07it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3032/30000 [00:40<05:53, 76.35it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3040/30000 [00:40<05:55, 75.88it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3048/30000 [00:40<05:54, 76.05it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3056/30000 [00:40<05:53, 76.17it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3064/30000 [00:41<05:54, 75.90it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3072/30000 [00:41<05:54, 75.88it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3080/30000 [00:41<05:55, 75.76it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3088/30000 [00:41<05:54, 75.86it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3096/30000 [00:41<05:55, 75.69it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3104/30000 [00:41<05:55, 75.61it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3112/30000 [00:41<05:55, 75.68it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3120/30000 [00:41<05:53, 75.97it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3128/30000 [00:41<05:53, 76.03it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3136/30000 [00:42<05:52, 76.13it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 10%|█         | 3144/30000 [00:42<05:52, 76.27it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3152/30000 [00:42<05:53, 76.00it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3160/30000 [00:42<05:50, 76.60it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3168/30000 [00:42<05:50, 76.49it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3176/30000 [00:42<05:50, 76.53it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3184/30000 [00:42<05:51, 76.32it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3192/30000 [00:42<05:50, 76.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3200/30000 [00:42<05:49, 76.62it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3208/30000 [00:42<05:49, 76.68it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3216/30000 [00:43<05:51, 76.17it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3224/30000 [00:43<05:53, 75.79it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3232/30000 [00:43<05:53, 75.63it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3240/30000 [00:43<05:51, 76.14it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3248/30000 [00:43<05:52, 75.88it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3256/30000 [00:43<05:53, 75.76it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3264/30000 [00:43<05:52, 75.86it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3272/30000 [00:43<05:52, 75.76it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3280/30000 [00:43<05:51, 76.11it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3288/30000 [00:44<05:49, 76.50it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3296/30000 [00:44<05:50, 76.16it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3304/30000 [00:44<05:50, 76.12it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3312/30000 [00:44<05:51, 75.89it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3320/30000 [00:44<05:49, 76.31it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3328/30000 [00:44<05:48, 76.58it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3336/30000 [00:44<05:49, 76.38it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3344/30000 [00:44<05:49, 76.36it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3352/30000 [00:44<05:49, 76.31it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3360/30000 [00:44<05:48, 76.43it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█         | 3368/30000 [00:45<05:48, 76.45it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3376/30000 [00:45<05:48, 76.34it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3384/30000 [00:45<05:49, 76.24it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3392/30000 [00:45<05:48, 76.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3400/30000 [00:45<05:47, 76.62it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3408/30000 [00:45<05:48, 76.33it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3416/30000 [00:45<05:46, 76.72it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3424/30000 [00:45<05:46, 76.70it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3432/30000 [00:45<05:46, 76.76it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3440/30000 [00:46<05:45, 76.85it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 11%|█▏        | 3448/30000 [00:46<05:50, 75.79it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3456/30000 [00:46<05:50, 75.66it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3464/30000 [00:46<05:48, 76.10it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3472/30000 [00:46<05:49, 75.99it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3480/30000 [00:46<05:49, 75.96it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3488/30000 [00:46<05:47, 76.24it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3496/30000 [00:46<05:47, 76.36it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3504/30000 [00:46<05:47, 76.19it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3512/30000 [00:46<05:47, 76.27it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3520/30000 [00:47<05:45, 76.57it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3528/30000 [00:47<05:48, 75.95it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3536/30000 [00:47<05:49, 75.77it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3544/30000 [00:47<05:48, 75.83it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3552/30000 [00:47<05:47, 76.17it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3560/30000 [00:47<05:46, 76.36it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3568/30000 [00:47<05:46, 76.29it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3576/30000 [00:47<05:46, 76.27it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3584/30000 [00:47<05:46, 76.14it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3592/30000 [00:48<05:46, 76.29it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3600/30000 [00:48<05:49, 75.62it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3608/30000 [00:48<05:47, 75.92it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3616/30000 [00:48<05:45, 76.26it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3624/30000 [00:48<05:46, 76.18it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3632/30000 [00:48<05:43, 76.77it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3640/30000 [00:48<05:45, 76.28it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3648/30000 [00:48<05:45, 76.20it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3656/30000 [00:48<05:44, 76.46it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3664/30000 [00:48<05:44, 76.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3672/30000 [00:49<05:44, 76.32it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3680/30000 [00:49<05:46, 75.95it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3688/30000 [00:49<05:45, 76.21it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3696/30000 [00:49<05:45, 76.05it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3704/30000 [00:49<05:45, 76.10it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3712/30000 [00:49<05:44, 76.33it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3720/30000 [00:49<05:43, 76.48it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3728/30000 [00:49<05:43, 76.46it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3736/30000 [00:49<05:43, 76.45it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 12%|█▏        | 3744/30000 [00:50<05:43, 76.52it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3752/30000 [00:50<05:43, 76.33it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3760/30000 [00:50<05:43, 76.30it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3768/30000 [00:50<05:44, 76.13it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3776/30000 [00:50<05:44, 76.16it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3784/30000 [00:50<05:46, 75.76it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3792/30000 [00:50<05:46, 75.63it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3800/30000 [00:50<05:46, 75.56it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3808/30000 [00:50<05:46, 75.60it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3816/30000 [00:50<05:45, 75.74it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3824/30000 [00:51<05:43, 76.13it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3832/30000 [00:51<05:43, 76.12it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3840/30000 [00:51<05:43, 76.19it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3848/30000 [00:51<05:42, 76.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3856/30000 [00:51<05:43, 76.15it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3864/30000 [00:51<05:41, 76.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3872/30000 [00:51<05:40, 76.80it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3880/30000 [00:51<05:41, 76.51it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3888/30000 [00:51<05:41, 76.41it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3896/30000 [00:52<05:42, 76.26it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3904/30000 [00:52<05:44, 75.74it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3912/30000 [00:52<05:42, 76.11it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3920/30000 [00:52<05:42, 76.16it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3928/30000 [00:52<05:40, 76.58it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3936/30000 [00:52<05:40, 76.55it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3944/30000 [00:52<05:39, 76.66it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3952/30000 [00:52<05:40, 76.45it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3960/30000 [00:52<05:39, 76.65it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3968/30000 [00:52<05:40, 76.34it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3976/30000 [00:53<05:41, 76.22it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3984/30000 [00:53<05:42, 76.02it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 3992/30000 [00:53<05:41, 76.26it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 4000/30000 [00:53<05:41, 76.11it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 4008/30000 [00:53<05:42, 75.99it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 4016/30000 [00:53<05:40, 76.31it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 4024/30000 [00:53<05:40, 76.39it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 4032/30000 [00:53<05:41, 76.14it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 4040/30000 [00:53<05:40, 76.31it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 13%|█▎        | 4048/30000 [00:54<05:38, 76.60it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4056/30000 [00:54<05:41, 76.02it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4064/30000 [00:54<05:42, 75.74it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4072/30000 [00:54<05:44, 75.28it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4080/30000 [00:54<05:43, 75.55it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4088/30000 [00:54<05:42, 75.62it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4096/30000 [00:54<05:41, 75.92it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4104/30000 [00:54<05:40, 76.06it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4112/30000 [00:54<05:40, 76.08it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▎        | 4120/30000 [00:54<05:41, 75.79it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4128/30000 [00:55<05:41, 75.85it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4136/30000 [00:55<05:42, 75.60it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4144/30000 [00:55<05:42, 75.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4152/30000 [00:55<05:42, 75.50it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4160/30000 [00:55<05:40, 75.93it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4168/30000 [00:55<05:39, 76.01it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4176/30000 [00:55<05:38, 76.35it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4184/30000 [00:55<05:39, 76.12it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4192/30000 [00:55<05:40, 75.77it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4200/30000 [00:56<05:40, 75.82it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4208/30000 [00:56<05:40, 75.72it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4216/30000 [00:56<05:38, 76.15it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4224/30000 [00:56<05:36, 76.52it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4232/30000 [00:56<05:40, 75.78it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4240/30000 [00:56<05:39, 75.98it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4248/30000 [00:56<05:39, 75.83it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4256/30000 [00:56<05:38, 76.01it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4264/30000 [00:56<05:38, 76.10it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4272/30000 [00:56<05:37, 76.33it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4280/30000 [00:57<05:36, 76.40it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4288/30000 [00:57<05:38, 76.01it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4296/30000 [00:57<05:38, 76.04it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4304/30000 [00:57<05:36, 76.36it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4312/30000 [00:57<05:37, 76.01it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4320/30000 [00:57<05:38, 75.77it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4328/30000 [00:57<05:36, 76.27it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4336/30000 [00:57<05:35, 76.46it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 14%|█▍        | 4344/30000 [00:57<05:35, 76.56it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4352/30000 [00:58<05:35, 76.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4360/30000 [00:58<05:36, 76.14it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4368/30000 [00:58<05:36, 76.13it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4376/30000 [00:58<05:35, 76.32it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4384/30000 [00:58<05:34, 76.53it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4392/30000 [00:58<05:35, 76.44it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4400/30000 [00:58<05:34, 76.57it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4408/30000 [00:58<05:33, 76.78it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4416/30000 [00:58<05:36, 76.10it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4424/30000 [00:58<05:35, 76.15it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4432/30000 [00:59<05:34, 76.37it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4440/30000 [00:59<05:36, 76.03it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4448/30000 [00:59<05:34, 76.28it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4456/30000 [00:59<05:36, 75.86it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4464/30000 [00:59<05:37, 75.61it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4472/30000 [00:59<05:37, 75.57it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4480/30000 [00:59<05:39, 75.22it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4488/30000 [00:59<05:38, 75.30it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▍        | 4496/30000 [00:59<05:37, 75.62it/s, init loss: 40160.1836, avg. loss [1501-3000]: 27438.7754] 15%|█▌        | 4504/30000 [01:00<05:36, 75.84it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4512/30000 [01:00<05:37, 75.57it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4520/30000 [01:00<05:36, 75.77it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4528/30000 [01:00<05:35, 75.83it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4536/30000 [01:00<05:34, 76.20it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4544/30000 [01:00<05:34, 76.14it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4552/30000 [01:00<05:33, 76.20it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4560/30000 [01:00<05:32, 76.47it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4568/30000 [01:00<05:33, 76.16it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4576/30000 [01:00<05:33, 76.34it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4584/30000 [01:01<05:32, 76.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4592/30000 [01:01<05:33, 76.14it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4600/30000 [01:01<05:34, 76.02it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4608/30000 [01:01<05:33, 76.02it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4616/30000 [01:01<05:35, 75.75it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4624/30000 [01:01<05:32, 76.27it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4632/30000 [01:01<05:33, 75.97it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4640/30000 [01:01<05:32, 76.18it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 15%|█▌        | 4648/30000 [01:01<05:31, 76.44it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4656/30000 [01:02<05:32, 76.23it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4664/30000 [01:02<05:32, 76.09it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4672/30000 [01:02<05:35, 75.60it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4680/30000 [01:02<05:35, 75.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4688/30000 [01:02<05:33, 75.79it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4696/30000 [01:02<05:33, 75.94it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4704/30000 [01:02<05:31, 76.22it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4712/30000 [01:02<05:32, 76.09it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4720/30000 [01:02<05:33, 75.76it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4728/30000 [01:02<05:32, 75.94it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4736/30000 [01:03<05:33, 75.74it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4744/30000 [01:03<05:34, 75.41it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4752/30000 [01:03<05:34, 75.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4760/30000 [01:03<05:33, 75.77it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4768/30000 [01:03<05:32, 75.98it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4776/30000 [01:03<05:32, 75.95it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4784/30000 [01:03<05:31, 76.03it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4792/30000 [01:03<05:32, 75.78it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4800/30000 [01:03<05:31, 76.06it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4808/30000 [01:04<05:31, 76.11it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4816/30000 [01:04<05:31, 75.96it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4824/30000 [01:04<05:29, 76.30it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4832/30000 [01:04<05:31, 75.89it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4840/30000 [01:04<05:30, 76.17it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4848/30000 [01:04<05:29, 76.23it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4856/30000 [01:04<05:30, 75.96it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4864/30000 [01:04<05:30, 76.08it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▌        | 4872/30000 [01:04<05:31, 75.71it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4880/30000 [01:04<05:31, 75.76it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4888/30000 [01:05<05:31, 75.82it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4896/30000 [01:05<05:32, 75.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4904/30000 [01:05<05:31, 75.78it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4912/30000 [01:05<05:29, 76.23it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4920/30000 [01:05<05:28, 76.38it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4928/30000 [01:05<05:28, 76.41it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4936/30000 [01:05<05:27, 76.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 16%|█▋        | 4944/30000 [01:05<05:27, 76.50it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 4952/30000 [01:05<05:26, 76.72it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 4960/30000 [01:06<05:25, 76.90it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 4968/30000 [01:06<05:26, 76.67it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 4976/30000 [01:06<05:25, 76.86it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 4984/30000 [01:06<05:25, 76.81it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 4992/30000 [01:06<05:25, 76.79it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5000/30000 [01:06<05:26, 76.47it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5008/30000 [01:06<05:27, 76.38it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5016/30000 [01:06<05:26, 76.48it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5024/30000 [01:06<05:27, 76.32it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5032/30000 [01:06<05:26, 76.36it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5040/30000 [01:07<05:26, 76.37it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5048/30000 [01:07<05:26, 76.36it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5056/30000 [01:07<05:26, 76.40it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5064/30000 [01:07<05:26, 76.28it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5072/30000 [01:07<05:26, 76.42it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5080/30000 [01:07<05:26, 76.22it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5088/30000 [01:07<05:26, 76.39it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5096/30000 [01:07<05:25, 76.41it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5104/30000 [01:07<05:25, 76.58it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5112/30000 [01:08<05:25, 76.50it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5120/30000 [01:08<05:25, 76.36it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5128/30000 [01:08<05:26, 76.16it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5136/30000 [01:08<05:24, 76.55it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5144/30000 [01:08<05:23, 76.72it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5152/30000 [01:08<05:23, 76.72it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5160/30000 [01:08<05:22, 76.96it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5168/30000 [01:08<05:24, 76.63it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5176/30000 [01:08<05:24, 76.51it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5184/30000 [01:08<05:24, 76.43it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5192/30000 [01:09<05:26, 76.04it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5200/30000 [01:09<05:29, 75.36it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5208/30000 [01:09<05:28, 75.45it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5216/30000 [01:09<05:28, 75.51it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5224/30000 [01:09<05:27, 75.72it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5232/30000 [01:09<05:25, 76.03it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5240/30000 [01:09<05:25, 76.13it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 17%|█▋        | 5248/30000 [01:09<05:24, 76.20it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5256/30000 [01:09<05:26, 75.76it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5264/30000 [01:09<05:26, 75.67it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5272/30000 [01:10<05:25, 75.87it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5280/30000 [01:10<05:25, 76.01it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5288/30000 [01:10<05:24, 76.19it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5296/30000 [01:10<05:25, 75.87it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5304/30000 [01:10<05:23, 76.29it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5312/30000 [01:10<05:23, 76.30it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5320/30000 [01:10<05:22, 76.55it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5328/30000 [01:10<05:23, 76.21it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5336/30000 [01:10<05:24, 76.01it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5344/30000 [01:11<05:24, 75.98it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5352/30000 [01:11<05:26, 75.41it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5360/30000 [01:11<05:27, 75.26it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5368/30000 [01:11<05:25, 75.66it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5376/30000 [01:11<05:23, 76.19it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5384/30000 [01:11<05:22, 76.23it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5392/30000 [01:11<05:22, 76.22it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5400/30000 [01:11<05:21, 76.40it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5408/30000 [01:11<05:23, 76.03it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5416/30000 [01:11<05:23, 76.07it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5424/30000 [01:12<05:24, 75.77it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5432/30000 [01:12<05:24, 75.60it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5440/30000 [01:12<05:23, 75.97it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5448/30000 [01:12<05:22, 76.19it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5456/30000 [01:12<05:21, 76.29it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5464/30000 [01:12<05:21, 76.41it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5472/30000 [01:12<05:20, 76.51it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5480/30000 [01:12<05:20, 76.61it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5488/30000 [01:12<05:20, 76.47it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5496/30000 [01:13<05:21, 76.17it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5504/30000 [01:13<05:25, 75.19it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5512/30000 [01:13<05:24, 75.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5520/30000 [01:13<05:22, 75.98it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5528/30000 [01:13<05:22, 75.88it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5536/30000 [01:13<05:22, 75.77it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 18%|█▊        | 5544/30000 [01:13<05:23, 75.61it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5552/30000 [01:13<05:22, 75.88it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5560/30000 [01:13<05:20, 76.24it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5568/30000 [01:13<05:19, 76.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5576/30000 [01:14<05:21, 75.96it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5584/30000 [01:14<05:21, 76.03it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5592/30000 [01:14<05:20, 76.08it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5600/30000 [01:14<05:20, 76.02it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5608/30000 [01:14<05:20, 76.15it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5616/30000 [01:14<05:19, 76.30it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▊        | 5624/30000 [01:14<05:19, 76.27it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5632/30000 [01:14<05:19, 76.20it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5640/30000 [01:14<05:19, 76.13it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5648/30000 [01:15<05:21, 75.71it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5656/30000 [01:15<05:22, 75.58it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5664/30000 [01:15<05:22, 75.38it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5672/30000 [01:15<05:21, 75.70it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5680/30000 [01:15<05:20, 75.91it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5688/30000 [01:15<05:19, 76.21it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5696/30000 [01:15<05:18, 76.31it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5704/30000 [01:15<05:18, 76.33it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5712/30000 [01:15<05:17, 76.41it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5720/30000 [01:15<05:17, 76.57it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5728/30000 [01:16<05:17, 76.48it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5736/30000 [01:16<05:17, 76.52it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5744/30000 [01:16<05:18, 76.15it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5752/30000 [01:16<05:20, 75.76it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5760/30000 [01:16<05:17, 76.32it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5768/30000 [01:16<05:17, 76.38it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5776/30000 [01:16<05:16, 76.48it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5784/30000 [01:16<05:16, 76.43it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5792/30000 [01:16<05:17, 76.31it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5800/30000 [01:17<05:16, 76.46it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5808/30000 [01:17<05:18, 76.01it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5816/30000 [01:17<05:18, 75.98it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5824/30000 [01:17<05:17, 76.10it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5832/30000 [01:17<05:16, 76.41it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5840/30000 [01:17<05:16, 76.26it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 19%|█▉        | 5848/30000 [01:17<05:15, 76.55it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5856/30000 [01:17<05:16, 76.32it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5864/30000 [01:17<05:16, 76.31it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5872/30000 [01:17<05:17, 76.05it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5880/30000 [01:18<05:17, 75.86it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5888/30000 [01:18<05:17, 75.99it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5896/30000 [01:18<05:16, 76.20it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5904/30000 [01:18<05:17, 75.95it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5912/30000 [01:18<05:17, 75.95it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5920/30000 [01:18<05:15, 76.30it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5928/30000 [01:18<05:15, 76.31it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5936/30000 [01:18<05:15, 76.26it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5944/30000 [01:18<05:18, 75.54it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5952/30000 [01:19<05:17, 75.72it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5960/30000 [01:19<05:18, 75.42it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5968/30000 [01:19<05:17, 75.67it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5976/30000 [01:19<05:15, 76.22it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5984/30000 [01:19<05:15, 76.06it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|█▉        | 5992/30000 [01:19<05:15, 76.21it/s, init loss: 40160.1836, avg. loss [3001-4500]: 24562.1152] 20%|██        | 6000/30000 [01:19<05:14, 76.25it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6008/30000 [01:19<05:15, 76.00it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6016/30000 [01:19<05:15, 76.06it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6024/30000 [01:19<05:15, 75.89it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6032/30000 [01:20<05:15, 76.07it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6040/30000 [01:20<05:15, 75.88it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6048/30000 [01:20<05:14, 76.21it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6056/30000 [01:20<05:15, 75.94it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6064/30000 [01:20<05:14, 76.08it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6072/30000 [01:20<05:13, 76.21it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6080/30000 [01:20<05:12, 76.60it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6088/30000 [01:20<05:12, 76.48it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6096/30000 [01:20<05:11, 76.66it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6104/30000 [01:21<05:11, 76.73it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6112/30000 [01:21<05:14, 75.97it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6120/30000 [01:21<05:14, 75.94it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6128/30000 [01:21<05:16, 75.45it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6136/30000 [01:21<05:15, 75.63it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 20%|██        | 6144/30000 [01:21<05:15, 75.68it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6152/30000 [01:21<05:13, 76.02it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6160/30000 [01:21<05:13, 75.95it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6168/30000 [01:21<05:13, 76.10it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6176/30000 [01:21<05:13, 75.97it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6184/30000 [01:22<05:12, 76.10it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6192/30000 [01:22<05:14, 75.64it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6200/30000 [01:22<05:13, 75.95it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6208/30000 [01:22<05:13, 75.80it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6216/30000 [01:22<05:12, 76.09it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6224/30000 [01:22<05:11, 76.31it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6232/30000 [01:22<05:12, 76.17it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6240/30000 [01:22<05:10, 76.45it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6248/30000 [01:22<05:10, 76.42it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6256/30000 [01:23<05:10, 76.49it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6264/30000 [01:23<05:10, 76.36it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6272/30000 [01:23<05:10, 76.42it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6280/30000 [01:23<05:10, 76.48it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6288/30000 [01:23<05:09, 76.73it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6296/30000 [01:23<05:09, 76.54it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6304/30000 [01:23<05:08, 76.78it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6312/30000 [01:23<05:10, 76.24it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6320/30000 [01:23<05:10, 76.24it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6328/30000 [01:23<05:11, 76.03it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6336/30000 [01:24<05:09, 76.43it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6344/30000 [01:24<05:11, 76.00it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6352/30000 [01:24<05:10, 76.18it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6360/30000 [01:24<05:09, 76.45it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██        | 6368/30000 [01:24<05:09, 76.47it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6376/30000 [01:24<05:10, 76.14it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6384/30000 [01:24<05:14, 75.05it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6392/30000 [01:24<05:13, 75.27it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6400/30000 [01:24<05:14, 74.96it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6408/30000 [01:25<05:13, 75.19it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6416/30000 [01:25<05:14, 74.91it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6424/30000 [01:25<05:14, 74.88it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6432/30000 [01:25<05:16, 74.39it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6440/30000 [01:25<05:13, 75.11it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 21%|██▏       | 6448/30000 [01:25<05:14, 75.00it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6456/30000 [01:25<05:13, 75.04it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6464/30000 [01:25<05:12, 75.40it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6472/30000 [01:25<05:11, 75.48it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6480/30000 [01:25<05:09, 76.06it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6488/30000 [01:26<05:08, 76.14it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6496/30000 [01:26<05:07, 76.32it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6504/30000 [01:26<05:06, 76.71it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6512/30000 [01:26<05:06, 76.54it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6520/30000 [01:26<05:07, 76.48it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6528/30000 [01:26<05:06, 76.49it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6536/30000 [01:26<05:06, 76.64it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6544/30000 [01:26<05:05, 76.75it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6552/30000 [01:26<05:06, 76.54it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6560/30000 [01:27<05:06, 76.54it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6568/30000 [01:27<05:07, 76.27it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6576/30000 [01:27<05:05, 76.59it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6584/30000 [01:27<05:06, 76.47it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6592/30000 [01:27<05:06, 76.42it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6600/30000 [01:27<05:06, 76.33it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6608/30000 [01:27<05:06, 76.36it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6616/30000 [01:27<05:06, 76.40it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6624/30000 [01:27<05:04, 76.70it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6632/30000 [01:27<05:05, 76.52it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6640/30000 [01:28<05:07, 75.99it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6648/30000 [01:28<05:07, 76.04it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6656/30000 [01:28<05:06, 76.20it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6664/30000 [01:28<05:05, 76.32it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6672/30000 [01:28<05:06, 76.12it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6680/30000 [01:28<05:05, 76.33it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6688/30000 [01:28<05:06, 76.07it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6696/30000 [01:28<05:05, 76.20it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6704/30000 [01:28<05:04, 76.40it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6712/30000 [01:29<05:03, 76.77it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6720/30000 [01:29<05:04, 76.44it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6728/30000 [01:29<05:03, 76.69it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6736/30000 [01:29<05:03, 76.74it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 22%|██▏       | 6744/30000 [01:29<05:05, 76.21it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6752/30000 [01:29<05:05, 76.09it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6760/30000 [01:29<05:05, 76.02it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6768/30000 [01:29<05:04, 76.25it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6776/30000 [01:29<05:03, 76.57it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6784/30000 [01:29<05:03, 76.40it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6792/30000 [01:30<05:04, 76.23it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6800/30000 [01:30<05:05, 76.03it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6808/30000 [01:30<05:03, 76.36it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6816/30000 [01:30<05:04, 76.17it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6824/30000 [01:30<05:04, 76.14it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6832/30000 [01:30<05:05, 75.76it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6840/30000 [01:30<05:06, 75.54it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6848/30000 [01:30<05:05, 75.80it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6856/30000 [01:30<05:06, 75.57it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6864/30000 [01:31<05:04, 75.90it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6872/30000 [01:31<05:06, 75.54it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6880/30000 [01:31<05:05, 75.67it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6888/30000 [01:31<05:03, 76.09it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6896/30000 [01:31<05:04, 75.99it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6904/30000 [01:31<05:02, 76.39it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6912/30000 [01:31<05:02, 76.23it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6920/30000 [01:31<05:02, 76.33it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6928/30000 [01:31<05:03, 76.11it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6936/30000 [01:31<05:04, 75.74it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6944/30000 [01:32<05:04, 75.80it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6952/30000 [01:32<05:04, 75.64it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6960/30000 [01:32<05:02, 76.12it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6968/30000 [01:32<05:01, 76.38it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6976/30000 [01:32<05:01, 76.39it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6984/30000 [01:32<05:01, 76.32it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 6992/30000 [01:32<05:00, 76.51it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 7000/30000 [01:32<05:02, 76.09it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 7008/30000 [01:32<05:00, 76.57it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 7016/30000 [01:33<04:59, 76.71it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 7024/30000 [01:33<04:59, 76.65it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 7032/30000 [01:33<04:59, 76.71it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 7040/30000 [01:33<04:59, 76.68it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 23%|██▎       | 7048/30000 [01:33<04:59, 76.65it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7056/30000 [01:33<04:59, 76.60it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7064/30000 [01:33<05:00, 76.41it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7072/30000 [01:33<05:00, 76.39it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7080/30000 [01:33<04:58, 76.75it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7088/30000 [01:33<05:00, 76.25it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7096/30000 [01:34<04:59, 76.36it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7104/30000 [01:34<05:00, 76.23it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7112/30000 [01:34<04:59, 76.52it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▎       | 7120/30000 [01:34<04:59, 76.41it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7128/30000 [01:34<04:59, 76.31it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7136/30000 [01:34<05:00, 76.15it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7144/30000 [01:34<04:59, 76.30it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7152/30000 [01:34<04:58, 76.49it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7160/30000 [01:34<04:58, 76.44it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7168/30000 [01:35<04:58, 76.44it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7176/30000 [01:35<04:57, 76.73it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7184/30000 [01:35<04:57, 76.64it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7192/30000 [01:35<04:59, 76.08it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7200/30000 [01:35<04:59, 76.16it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7208/30000 [01:35<04:58, 76.33it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7216/30000 [01:35<04:58, 76.32it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7224/30000 [01:35<04:59, 76.02it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7232/30000 [01:35<04:58, 76.23it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7240/30000 [01:35<04:58, 76.22it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7248/30000 [01:36<04:57, 76.46it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7256/30000 [01:36<04:59, 76.05it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7264/30000 [01:36<04:58, 76.16it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7272/30000 [01:36<04:57, 76.37it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7280/30000 [01:36<04:57, 76.31it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7288/30000 [01:36<04:57, 76.25it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7296/30000 [01:36<04:58, 76.15it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7304/30000 [01:36<04:58, 75.99it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7312/30000 [01:36<04:57, 76.33it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7320/30000 [01:37<04:56, 76.43it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7328/30000 [01:37<04:57, 76.24it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7336/30000 [01:37<04:56, 76.37it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 24%|██▍       | 7344/30000 [01:37<04:55, 76.62it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7352/30000 [01:37<04:56, 76.50it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7360/30000 [01:37<04:56, 76.36it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7368/30000 [01:37<04:57, 76.04it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7376/30000 [01:37<04:57, 76.06it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7384/30000 [01:37<04:57, 76.08it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7392/30000 [01:37<04:57, 75.95it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7400/30000 [01:38<04:57, 75.94it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7408/30000 [01:38<04:57, 75.96it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7416/30000 [01:38<04:57, 75.98it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7424/30000 [01:38<04:56, 76.19it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7432/30000 [01:38<04:57, 75.80it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7440/30000 [01:38<04:55, 76.25it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7448/30000 [01:38<04:57, 75.72it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7456/30000 [01:38<04:56, 75.97it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7464/30000 [01:38<04:57, 75.79it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7472/30000 [01:39<04:57, 75.71it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7480/30000 [01:39<04:57, 75.82it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7488/30000 [01:39<04:57, 75.70it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▍       | 7496/30000 [01:39<04:57, 75.72it/s, init loss: 40160.1836, avg. loss [4501-6000]: 23147.3555] 25%|██▌       | 7504/30000 [01:39<04:58, 75.45it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7512/30000 [01:39<04:55, 76.06it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7520/30000 [01:39<04:55, 76.16it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7528/30000 [01:39<04:54, 76.43it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7536/30000 [01:39<04:55, 76.02it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7544/30000 [01:39<04:55, 76.10it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7552/30000 [01:40<04:54, 76.29it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7560/30000 [01:40<04:54, 76.23it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7568/30000 [01:40<04:55, 76.03it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7576/30000 [01:40<04:54, 76.19it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7584/30000 [01:40<04:54, 76.20it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7592/30000 [01:40<04:54, 76.10it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7600/30000 [01:40<04:52, 76.55it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7608/30000 [01:40<04:52, 76.57it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7616/30000 [01:40<04:52, 76.50it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7624/30000 [01:40<04:53, 76.26it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7632/30000 [01:41<04:52, 76.35it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7640/30000 [01:41<04:52, 76.48it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 25%|██▌       | 7648/30000 [01:41<04:51, 76.56it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7656/30000 [01:41<04:52, 76.39it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7664/30000 [01:41<04:52, 76.26it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7672/30000 [01:41<04:53, 75.99it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7680/30000 [01:41<04:53, 76.03it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7688/30000 [01:41<04:54, 75.70it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7696/30000 [01:41<04:53, 76.01it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7704/30000 [01:42<04:52, 76.16it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7712/30000 [01:42<04:55, 75.30it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7720/30000 [01:42<04:54, 75.53it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7728/30000 [01:42<04:54, 75.72it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7736/30000 [01:42<04:54, 75.53it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7744/30000 [01:42<04:52, 76.11it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7752/30000 [01:42<04:51, 76.36it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7760/30000 [01:42<04:50, 76.51it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7768/30000 [01:42<04:48, 76.98it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7776/30000 [01:42<04:49, 76.66it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7784/30000 [01:43<04:49, 76.78it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7792/30000 [01:43<04:50, 76.49it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7800/30000 [01:43<04:50, 76.53it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7808/30000 [01:43<04:50, 76.30it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7816/30000 [01:43<04:50, 76.27it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7824/30000 [01:43<04:49, 76.49it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7832/30000 [01:43<04:49, 76.57it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7840/30000 [01:43<04:51, 76.03it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7848/30000 [01:43<04:50, 76.21it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7856/30000 [01:44<04:51, 76.10it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7864/30000 [01:44<04:53, 75.34it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▌       | 7872/30000 [01:44<04:51, 75.87it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7880/30000 [01:44<04:51, 75.79it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7888/30000 [01:44<04:50, 76.17it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7896/30000 [01:44<04:49, 76.34it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7904/30000 [01:44<04:48, 76.55it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7912/30000 [01:44<04:48, 76.47it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7920/30000 [01:44<04:47, 76.71it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7928/30000 [01:44<04:47, 76.84it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7936/30000 [01:45<04:47, 76.77it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 26%|██▋       | 7944/30000 [01:45<04:49, 76.21it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 7952/30000 [01:45<04:48, 76.50it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 7960/30000 [01:45<04:48, 76.36it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 7968/30000 [01:45<04:50, 75.72it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 7976/30000 [01:45<04:52, 75.37it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 7984/30000 [01:45<04:51, 75.64it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 7992/30000 [01:45<04:51, 75.60it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8000/30000 [01:45<04:48, 76.15it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8008/30000 [01:46<04:50, 75.83it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8016/30000 [01:46<04:49, 75.84it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8024/30000 [01:46<04:48, 76.30it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8032/30000 [01:46<04:46, 76.75it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8040/30000 [01:46<04:46, 76.74it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8048/30000 [01:46<04:45, 76.87it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8056/30000 [01:46<04:46, 76.55it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8064/30000 [01:46<04:46, 76.46it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8072/30000 [01:46<04:45, 76.80it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8080/30000 [01:46<04:46, 76.60it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8088/30000 [01:47<04:46, 76.59it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8096/30000 [01:47<04:47, 76.12it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8104/30000 [01:47<04:47, 76.13it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8112/30000 [01:47<04:47, 76.06it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8120/30000 [01:47<04:48, 75.79it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8128/30000 [01:47<04:47, 76.08it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8136/30000 [01:47<04:47, 76.05it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8144/30000 [01:47<04:47, 75.99it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8152/30000 [01:47<04:48, 75.71it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8160/30000 [01:48<04:46, 76.22it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8168/30000 [01:48<04:46, 76.21it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8176/30000 [01:48<04:46, 76.09it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8184/30000 [01:48<04:48, 75.60it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8192/30000 [01:48<04:48, 75.55it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8200/30000 [01:48<04:48, 75.68it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8208/30000 [01:48<04:47, 75.72it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8216/30000 [01:48<04:48, 75.46it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8224/30000 [01:48<04:48, 75.42it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8232/30000 [01:48<04:47, 75.66it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8240/30000 [01:49<04:48, 75.45it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 27%|██▋       | 8248/30000 [01:49<04:49, 75.22it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8256/30000 [01:49<04:47, 75.50it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8264/30000 [01:49<04:47, 75.55it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8272/30000 [01:49<04:47, 75.60it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8280/30000 [01:49<04:46, 75.84it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8288/30000 [01:49<04:44, 76.36it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8296/30000 [01:49<04:44, 76.16it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8304/30000 [01:49<04:44, 76.28it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8312/30000 [01:50<04:44, 76.33it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8320/30000 [01:50<04:47, 75.47it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8328/30000 [01:50<04:45, 75.90it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8336/30000 [01:50<04:45, 76.00it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8344/30000 [01:50<04:44, 76.08it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8352/30000 [01:50<04:44, 76.01it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8360/30000 [01:50<04:45, 75.91it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8368/30000 [01:50<04:43, 76.20it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8376/30000 [01:50<04:43, 76.24it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8384/30000 [01:50<04:43, 76.34it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8392/30000 [01:51<04:43, 76.29it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8400/30000 [01:51<04:46, 75.37it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8408/30000 [01:51<04:46, 75.35it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8416/30000 [01:51<04:46, 75.45it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8424/30000 [01:51<04:45, 75.66it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8432/30000 [01:51<04:44, 75.81it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8440/30000 [01:51<04:44, 75.82it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8448/30000 [01:51<04:44, 75.66it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8456/30000 [01:51<04:44, 75.69it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8464/30000 [01:52<04:43, 75.99it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8472/30000 [01:52<04:45, 75.47it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8480/30000 [01:52<04:43, 75.82it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8488/30000 [01:52<04:43, 75.95it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8496/30000 [01:52<04:43, 75.88it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8504/30000 [01:52<04:41, 76.23it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8512/30000 [01:52<04:41, 76.24it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8520/30000 [01:52<04:41, 76.35it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8528/30000 [01:52<04:42, 76.02it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8536/30000 [01:52<04:42, 76.04it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 28%|██▊       | 8544/30000 [01:53<04:41, 76.25it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8552/30000 [01:53<04:43, 75.78it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8560/30000 [01:53<04:43, 75.60it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8568/30000 [01:53<04:42, 75.95it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8576/30000 [01:53<04:40, 76.25it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8584/30000 [01:53<04:40, 76.46it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8592/30000 [01:53<04:39, 76.60it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8600/30000 [01:53<04:40, 76.22it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8608/30000 [01:53<04:39, 76.63it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8616/30000 [01:54<04:39, 76.61it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▊       | 8624/30000 [01:54<04:39, 76.39it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8632/30000 [01:54<04:39, 76.32it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8640/30000 [01:54<04:39, 76.40it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8648/30000 [01:54<04:40, 76.23it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8656/30000 [01:54<04:40, 76.09it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8664/30000 [01:54<04:40, 76.02it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8672/30000 [01:54<04:39, 76.35it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8680/30000 [01:54<04:39, 76.35it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8688/30000 [01:54<04:39, 76.26it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8696/30000 [01:55<04:39, 76.22it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8704/30000 [01:55<04:41, 75.56it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8712/30000 [01:55<04:40, 75.79it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8720/30000 [01:55<04:39, 76.04it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8728/30000 [01:55<04:40, 75.95it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8736/30000 [01:55<04:38, 76.41it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8744/30000 [01:55<04:37, 76.63it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8752/30000 [01:55<04:37, 76.46it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8760/30000 [01:55<04:37, 76.66it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8768/30000 [01:56<04:37, 76.59it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8776/30000 [01:56<04:38, 76.08it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8784/30000 [01:56<04:39, 75.93it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8792/30000 [01:56<04:40, 75.61it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8800/30000 [01:56<04:39, 75.86it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8808/30000 [01:56<04:39, 75.92it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8816/30000 [01:56<04:38, 76.05it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8824/30000 [01:56<04:39, 75.83it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8832/30000 [01:56<04:38, 76.13it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8840/30000 [01:56<04:36, 76.40it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 29%|██▉       | 8848/30000 [01:57<04:35, 76.69it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8856/30000 [01:57<04:37, 76.09it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8864/30000 [01:57<04:39, 75.60it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8872/30000 [01:57<04:40, 75.42it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8880/30000 [01:57<04:38, 75.74it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8888/30000 [01:57<04:37, 76.17it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8896/30000 [01:57<04:37, 75.93it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8904/30000 [01:57<04:37, 75.95it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8912/30000 [01:57<04:37, 76.02it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8920/30000 [01:58<04:36, 76.25it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8928/30000 [01:58<04:36, 76.12it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8936/30000 [01:58<04:37, 75.81it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8944/30000 [01:58<04:38, 75.74it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8952/30000 [01:58<04:36, 76.03it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8960/30000 [01:58<04:36, 76.12it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8968/30000 [01:58<04:37, 75.90it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8976/30000 [01:58<04:36, 75.98it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8984/30000 [01:58<04:36, 76.00it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|██▉       | 8992/30000 [01:58<04:36, 75.92it/s, init loss: 40160.1836, avg. loss [6001-7500]: 22087.3301] 30%|███       | 9000/30000 [01:59<04:37, 75.66it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9008/30000 [01:59<04:37, 75.51it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9016/30000 [01:59<04:36, 75.94it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9024/30000 [01:59<04:35, 76.06it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9032/30000 [01:59<04:35, 76.14it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9040/30000 [01:59<04:35, 75.96it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9048/30000 [01:59<04:36, 75.84it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9056/30000 [01:59<04:36, 75.83it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9064/30000 [01:59<04:36, 75.58it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9072/30000 [02:00<04:37, 75.47it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9080/30000 [02:00<04:39, 74.82it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9088/30000 [02:00<04:36, 75.53it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9096/30000 [02:00<04:35, 75.85it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9104/30000 [02:00<04:34, 76.01it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9112/30000 [02:00<04:34, 76.07it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9120/30000 [02:00<04:35, 75.66it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9128/30000 [02:00<04:34, 76.09it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9136/30000 [02:00<04:34, 76.13it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 30%|███       | 9144/30000 [02:00<04:32, 76.51it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9152/30000 [02:01<04:32, 76.44it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9160/30000 [02:01<04:36, 75.26it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9168/30000 [02:01<04:36, 75.26it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9176/30000 [02:01<04:35, 75.67it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9184/30000 [02:01<04:33, 76.12it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9192/30000 [02:01<04:32, 76.38it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9200/30000 [02:01<04:31, 76.57it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9208/30000 [02:01<04:32, 76.22it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9216/30000 [02:01<04:32, 76.19it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9224/30000 [02:02<04:32, 76.27it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9232/30000 [02:02<04:34, 75.66it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9240/30000 [02:02<04:34, 75.69it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9248/30000 [02:02<04:32, 76.04it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9256/30000 [02:02<04:32, 76.00it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9264/30000 [02:02<04:33, 75.95it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9272/30000 [02:02<04:32, 76.10it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9280/30000 [02:02<04:31, 76.36it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9288/30000 [02:02<04:31, 76.36it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9296/30000 [02:02<04:31, 76.36it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9304/30000 [02:03<04:31, 76.11it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9312/30000 [02:03<04:32, 75.91it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9320/30000 [02:03<04:31, 76.17it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9328/30000 [02:03<04:31, 76.16it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9336/30000 [02:03<04:31, 76.00it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9344/30000 [02:03<04:31, 75.96it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9352/30000 [02:03<04:32, 75.77it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9360/30000 [02:03<04:32, 75.71it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███       | 9368/30000 [02:03<04:32, 75.66it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9376/30000 [02:04<04:31, 75.92it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9384/30000 [02:04<04:34, 75.12it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9392/30000 [02:04<04:34, 74.95it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9400/30000 [02:04<04:35, 74.72it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9408/30000 [02:04<04:34, 75.01it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9416/30000 [02:04<04:33, 75.25it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9424/30000 [02:04<04:33, 75.15it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9432/30000 [02:04<04:32, 75.50it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9440/30000 [02:04<04:31, 75.67it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 31%|███▏      | 9448/30000 [02:04<04:29, 76.20it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9456/30000 [02:05<04:29, 76.17it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9464/30000 [02:05<04:29, 76.26it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9472/30000 [02:05<04:29, 76.28it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9480/30000 [02:05<04:29, 76.03it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9488/30000 [02:05<04:30, 75.80it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9496/30000 [02:05<04:29, 76.18it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9504/30000 [02:05<04:29, 76.13it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9512/30000 [02:05<04:28, 76.35it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9520/30000 [02:05<04:27, 76.67it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9528/30000 [02:06<04:26, 76.91it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9536/30000 [02:06<04:28, 76.21it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9544/30000 [02:06<04:28, 76.15it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9552/30000 [02:06<04:28, 76.07it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9560/30000 [02:06<04:28, 76.15it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9568/30000 [02:06<04:28, 76.06it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9576/30000 [02:06<04:28, 76.06it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9584/30000 [02:06<04:30, 75.53it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9592/30000 [02:06<04:29, 75.77it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9600/30000 [02:06<04:29, 75.73it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9608/30000 [02:07<04:27, 76.14it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9616/30000 [02:07<04:28, 75.99it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9624/30000 [02:07<04:28, 75.89it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9632/30000 [02:07<04:26, 76.31it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9640/30000 [02:07<04:25, 76.56it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9648/30000 [02:07<04:26, 76.46it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9656/30000 [02:07<04:25, 76.64it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9664/30000 [02:07<04:26, 76.36it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9672/30000 [02:07<04:26, 76.21it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9680/30000 [02:08<04:25, 76.39it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9688/30000 [02:08<04:27, 76.00it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9696/30000 [02:08<04:26, 76.20it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9704/30000 [02:08<04:25, 76.55it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9712/30000 [02:08<04:24, 76.72it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9720/30000 [02:08<04:25, 76.34it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9728/30000 [02:08<04:25, 76.46it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9736/30000 [02:08<04:26, 76.18it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 32%|███▏      | 9744/30000 [02:08<04:26, 75.96it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9752/30000 [02:08<04:25, 76.17it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9760/30000 [02:09<04:25, 76.36it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9768/30000 [02:09<04:25, 76.21it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9776/30000 [02:09<04:24, 76.44it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9784/30000 [02:09<04:26, 75.85it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9792/30000 [02:09<04:26, 75.87it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9800/30000 [02:09<04:25, 76.00it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9808/30000 [02:09<04:25, 76.07it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9816/30000 [02:09<04:25, 76.08it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9824/30000 [02:09<04:26, 75.65it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9832/30000 [02:10<04:27, 75.53it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9840/30000 [02:10<04:27, 75.40it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9848/30000 [02:10<04:27, 75.25it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9856/30000 [02:10<04:27, 75.43it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9864/30000 [02:10<04:26, 75.63it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9872/30000 [02:10<04:24, 76.22it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9880/30000 [02:10<04:22, 76.69it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9888/30000 [02:10<04:23, 76.44it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9896/30000 [02:10<04:21, 76.82it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9904/30000 [02:10<04:21, 76.89it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9912/30000 [02:11<04:23, 76.12it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9920/30000 [02:11<04:23, 76.14it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9928/30000 [02:11<04:23, 76.19it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9936/30000 [02:11<04:22, 76.51it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9944/30000 [02:11<04:22, 76.43it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9952/30000 [02:11<04:21, 76.76it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9960/30000 [02:11<04:21, 76.54it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9968/30000 [02:11<04:21, 76.59it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9976/30000 [02:11<04:23, 75.93it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9984/30000 [02:12<04:22, 76.15it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 9992/30000 [02:12<04:23, 75.99it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 10000/30000 [02:12<04:23, 75.90it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 10008/30000 [02:12<04:23, 75.84it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 10016/30000 [02:12<04:23, 75.82it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 10024/30000 [02:12<04:22, 76.04it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 10032/30000 [02:12<04:22, 76.15it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 10040/30000 [02:12<04:21, 76.32it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 33%|███▎      | 10048/30000 [02:12<04:23, 75.66it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10056/30000 [02:12<04:24, 75.42it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10064/30000 [02:13<04:22, 75.90it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10072/30000 [02:13<04:23, 75.76it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10080/30000 [02:13<04:21, 76.21it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10088/30000 [02:13<04:20, 76.41it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10096/30000 [02:13<04:20, 76.39it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10104/30000 [02:13<04:21, 76.12it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10112/30000 [02:13<04:21, 75.93it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▎      | 10120/30000 [02:13<04:21, 76.00it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10128/30000 [02:13<04:22, 75.79it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10136/30000 [02:14<04:22, 75.75it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10144/30000 [02:14<04:23, 75.36it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10152/30000 [02:14<04:22, 75.70it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10160/30000 [02:14<04:21, 75.77it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10168/30000 [02:14<04:21, 75.93it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10176/30000 [02:14<04:21, 75.86it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10184/30000 [02:14<04:19, 76.27it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10192/30000 [02:14<04:20, 76.10it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10200/30000 [02:14<04:20, 75.89it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10208/30000 [02:14<04:19, 76.30it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10216/30000 [02:15<04:18, 76.46it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10224/30000 [02:15<04:21, 75.74it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10232/30000 [02:15<04:20, 75.88it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10240/30000 [02:15<04:18, 76.31it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10248/30000 [02:15<04:18, 76.29it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10256/30000 [02:15<04:17, 76.56it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10264/30000 [02:15<04:18, 76.22it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10272/30000 [02:15<04:20, 75.72it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10280/30000 [02:15<04:20, 75.64it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10288/30000 [02:16<04:22, 75.24it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10296/30000 [02:16<04:20, 75.56it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10304/30000 [02:16<04:20, 75.73it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10312/30000 [02:16<04:20, 75.62it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10320/30000 [02:16<04:18, 76.02it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10328/30000 [02:16<04:19, 75.85it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10336/30000 [02:16<04:18, 75.96it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 34%|███▍      | 10344/30000 [02:16<04:17, 76.28it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10352/30000 [02:16<04:17, 76.23it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10360/30000 [02:16<04:16, 76.51it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10368/30000 [02:17<04:17, 76.30it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10376/30000 [02:17<04:17, 76.28it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10384/30000 [02:17<04:17, 76.26it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10392/30000 [02:17<04:16, 76.33it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10400/30000 [02:17<04:16, 76.48it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10408/30000 [02:17<04:16, 76.28it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10416/30000 [02:17<04:17, 76.16it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10424/30000 [02:17<04:16, 76.34it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10432/30000 [02:17<04:16, 76.41it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10440/30000 [02:18<04:16, 76.38it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10448/30000 [02:18<04:18, 75.68it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10456/30000 [02:18<04:18, 75.73it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10464/30000 [02:18<04:16, 76.11it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10472/30000 [02:18<04:17, 75.83it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10480/30000 [02:18<04:16, 76.23it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10488/30000 [02:18<04:15, 76.32it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▍      | 10496/30000 [02:18<04:15, 76.43it/s, init loss: 40160.1836, avg. loss [7501-9000]: 21233.8457] 35%|███▌      | 10504/30000 [02:18<04:15, 76.25it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10512/30000 [02:18<04:15, 76.27it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10520/30000 [02:19<04:13, 76.78it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10528/30000 [02:19<04:15, 76.10it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10536/30000 [02:19<04:14, 76.44it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10544/30000 [02:19<04:14, 76.52it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10552/30000 [02:19<04:13, 76.64it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10560/30000 [02:19<04:14, 76.35it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10568/30000 [02:19<04:14, 76.44it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10576/30000 [02:19<04:14, 76.46it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10584/30000 [02:19<04:14, 76.42it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10592/30000 [02:20<04:14, 76.30it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10600/30000 [02:20<04:15, 75.99it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10608/30000 [02:20<04:14, 76.18it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10616/30000 [02:20<04:14, 76.26it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10624/30000 [02:20<04:14, 76.26it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10632/30000 [02:20<04:14, 76.04it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10640/30000 [02:20<04:14, 76.09it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 35%|███▌      | 10648/30000 [02:20<04:14, 76.06it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10656/30000 [02:20<04:14, 75.98it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10664/30000 [02:20<04:14, 76.06it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10672/30000 [02:21<04:15, 75.64it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10680/30000 [02:21<04:15, 75.68it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10688/30000 [02:21<04:15, 75.59it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10696/30000 [02:21<04:15, 75.48it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10704/30000 [02:21<04:15, 75.54it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10712/30000 [02:21<04:13, 75.98it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10720/30000 [02:21<04:12, 76.30it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10728/30000 [02:21<04:12, 76.35it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10736/30000 [02:21<04:13, 75.98it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10744/30000 [02:22<04:13, 75.92it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10752/30000 [02:22<04:12, 76.32it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10760/30000 [02:22<04:11, 76.53it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10768/30000 [02:22<04:10, 76.67it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10776/30000 [02:22<04:10, 76.60it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10784/30000 [02:22<04:10, 76.67it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10792/30000 [02:22<04:11, 76.38it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10800/30000 [02:22<04:10, 76.72it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10808/30000 [02:22<04:11, 76.27it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10816/30000 [02:22<04:11, 76.42it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10824/30000 [02:23<04:10, 76.68it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10832/30000 [02:23<04:11, 76.14it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10840/30000 [02:23<04:12, 76.02it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10848/30000 [02:23<04:10, 76.48it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10856/30000 [02:23<04:12, 75.69it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10864/30000 [02:23<04:12, 75.77it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▌      | 10872/30000 [02:23<04:12, 75.61it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10880/30000 [02:23<04:12, 75.80it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10888/30000 [02:23<04:11, 75.98it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10896/30000 [02:24<04:11, 75.92it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10904/30000 [02:24<04:11, 75.78it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10912/30000 [02:24<04:11, 75.90it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10920/30000 [02:24<04:11, 75.98it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10928/30000 [02:24<04:10, 75.99it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10936/30000 [02:24<04:10, 76.06it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 36%|███▋      | 10944/30000 [02:24<04:10, 75.93it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 10952/30000 [02:24<04:12, 75.54it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 10960/30000 [02:24<04:11, 75.82it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 10968/30000 [02:24<04:11, 75.82it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 10976/30000 [02:25<04:09, 76.16it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 10984/30000 [02:25<04:10, 75.89it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 10992/30000 [02:25<04:08, 76.37it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11000/30000 [02:25<04:08, 76.36it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11008/30000 [02:25<04:07, 76.62it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11016/30000 [02:25<04:08, 76.44it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11024/30000 [02:25<04:08, 76.43it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11032/30000 [02:25<04:09, 76.08it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11040/30000 [02:25<04:08, 76.17it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11048/30000 [02:26<04:08, 76.21it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11056/30000 [02:26<04:09, 75.93it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11064/30000 [02:26<04:09, 75.80it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11072/30000 [02:26<04:07, 76.37it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11080/30000 [02:26<04:08, 76.09it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11088/30000 [02:26<04:08, 76.20it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11096/30000 [02:26<04:08, 76.16it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11104/30000 [02:26<04:08, 76.08it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11112/30000 [02:26<04:08, 76.02it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11120/30000 [02:26<04:08, 76.10it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11128/30000 [02:27<04:07, 76.27it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11136/30000 [02:27<04:07, 76.10it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11144/30000 [02:27<04:08, 75.75it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11152/30000 [02:27<04:08, 75.96it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11160/30000 [02:27<04:08, 75.85it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11168/30000 [02:27<04:08, 75.87it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11176/30000 [02:27<04:06, 76.24it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11184/30000 [02:27<04:05, 76.51it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11192/30000 [02:27<04:05, 76.56it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11200/30000 [02:28<04:05, 76.60it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11208/30000 [02:28<04:06, 76.29it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11216/30000 [02:28<04:05, 76.42it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11224/30000 [02:28<04:06, 76.15it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11232/30000 [02:28<04:07, 75.79it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11240/30000 [02:28<04:07, 75.88it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 37%|███▋      | 11248/30000 [02:28<04:06, 76.04it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11256/30000 [02:28<04:06, 76.14it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11264/30000 [02:28<04:06, 76.08it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11272/30000 [02:28<04:05, 76.13it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11280/30000 [02:29<04:06, 75.89it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11288/30000 [02:29<04:08, 75.27it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11296/30000 [02:29<04:08, 75.39it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11304/30000 [02:29<04:06, 75.82it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11312/30000 [02:29<04:04, 76.29it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11320/30000 [02:29<04:04, 76.39it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11328/30000 [02:29<04:05, 76.21it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11336/30000 [02:29<04:03, 76.63it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11344/30000 [02:29<04:04, 76.38it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11352/30000 [02:30<04:04, 76.41it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11360/30000 [02:30<04:03, 76.44it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11368/30000 [02:30<04:02, 76.77it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11376/30000 [02:30<04:02, 76.85it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11384/30000 [02:30<04:02, 76.85it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11392/30000 [02:30<04:02, 76.64it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11400/30000 [02:30<04:03, 76.23it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11408/30000 [02:30<04:03, 76.32it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11416/30000 [02:30<04:03, 76.28it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11424/30000 [02:30<04:03, 76.38it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11432/30000 [02:31<04:02, 76.50it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11440/30000 [02:31<04:04, 75.95it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11448/30000 [02:31<04:05, 75.71it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11456/30000 [02:31<04:03, 76.10it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11464/30000 [02:31<04:03, 76.26it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11472/30000 [02:31<04:02, 76.47it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11480/30000 [02:31<04:03, 76.13it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11488/30000 [02:31<04:04, 75.82it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11496/30000 [02:31<04:05, 75.51it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11504/30000 [02:32<04:05, 75.32it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11512/30000 [02:32<04:05, 75.21it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11520/30000 [02:32<04:05, 75.33it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11528/30000 [02:32<04:03, 76.01it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11536/30000 [02:32<04:02, 76.00it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 38%|███▊      | 11544/30000 [02:32<04:02, 75.98it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11552/30000 [02:32<04:03, 75.84it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11560/30000 [02:32<04:03, 75.84it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11568/30000 [02:32<04:01, 76.38it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11576/30000 [02:32<04:01, 76.21it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11584/30000 [02:33<04:02, 75.83it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11592/30000 [02:33<04:02, 75.88it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11600/30000 [02:33<04:02, 75.77it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11608/30000 [02:33<04:01, 76.15it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11616/30000 [02:33<04:00, 76.51it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▊      | 11624/30000 [02:33<04:00, 76.29it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11632/30000 [02:33<04:01, 75.99it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11640/30000 [02:33<04:00, 76.27it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11648/30000 [02:33<04:01, 76.15it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11656/30000 [02:33<04:01, 75.90it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11664/30000 [02:34<04:01, 76.04it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11672/30000 [02:34<04:00, 76.21it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11680/30000 [02:34<04:00, 76.32it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11688/30000 [02:34<03:59, 76.39it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11696/30000 [02:34<03:59, 76.46it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11704/30000 [02:34<03:58, 76.57it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11712/30000 [02:34<03:58, 76.64it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11720/30000 [02:34<03:58, 76.73it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11728/30000 [02:34<03:58, 76.51it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11736/30000 [02:35<03:58, 76.59it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11744/30000 [02:35<03:59, 76.31it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11752/30000 [02:35<03:59, 76.26it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11760/30000 [02:35<03:58, 76.58it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11768/30000 [02:35<03:58, 76.55it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11776/30000 [02:35<03:57, 76.63it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11784/30000 [02:35<03:58, 76.42it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11792/30000 [02:35<03:58, 76.19it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11800/30000 [02:35<03:58, 76.27it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11808/30000 [02:35<03:58, 76.30it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11816/30000 [02:36<03:58, 76.40it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11824/30000 [02:36<03:58, 76.18it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11832/30000 [02:36<03:59, 75.82it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11840/30000 [02:36<03:58, 76.07it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 39%|███▉      | 11848/30000 [02:36<03:58, 76.12it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11856/30000 [02:36<03:59, 75.88it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11864/30000 [02:36<03:59, 75.84it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11872/30000 [02:36<03:59, 75.79it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11880/30000 [02:36<03:57, 76.16it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11888/30000 [02:37<03:57, 76.22it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11896/30000 [02:37<03:58, 75.86it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11904/30000 [02:37<03:58, 75.84it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11912/30000 [02:37<03:58, 75.81it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11920/30000 [02:37<03:58, 75.77it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11928/30000 [02:37<03:57, 75.95it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11936/30000 [02:37<03:58, 75.68it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11944/30000 [02:37<03:58, 75.66it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11952/30000 [02:37<03:58, 75.77it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11960/30000 [02:37<03:57, 76.02it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11968/30000 [02:38<03:56, 76.12it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11976/30000 [02:38<03:56, 76.21it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11984/30000 [02:38<03:56, 76.30it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|███▉      | 11992/30000 [02:38<03:56, 76.28it/s, init loss: 40160.1836, avg. loss [9001-10500]: 20536.6074] 40%|████      | 12000/30000 [02:38<03:55, 76.32it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12008/30000 [02:38<03:55, 76.40it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12016/30000 [02:38<03:55, 76.37it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12024/30000 [02:38<03:56, 76.12it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12032/30000 [02:38<03:55, 76.43it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12040/30000 [02:39<03:55, 76.19it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12048/30000 [02:39<03:58, 75.26it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12056/30000 [02:39<03:57, 75.54it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12064/30000 [02:39<03:57, 75.48it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12072/30000 [02:39<03:56, 75.75it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12080/30000 [02:39<03:57, 75.40it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12088/30000 [02:39<03:57, 75.55it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12096/30000 [02:39<03:57, 75.49it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12104/30000 [02:39<03:55, 75.98it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12112/30000 [02:39<03:56, 75.51it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12120/30000 [02:40<03:55, 75.96it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12128/30000 [02:40<03:55, 76.03it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12136/30000 [02:40<03:55, 75.98it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 40%|████      | 12144/30000 [02:40<03:54, 76.06it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12152/30000 [02:40<03:54, 76.04it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12160/30000 [02:40<03:53, 76.26it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12168/30000 [02:40<03:55, 75.88it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12176/30000 [02:40<03:54, 75.96it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12184/30000 [02:40<03:54, 76.10it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12192/30000 [02:41<03:55, 75.76it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12200/30000 [02:41<03:54, 76.06it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12208/30000 [02:41<03:54, 75.87it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12216/30000 [02:41<03:54, 75.74it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12224/30000 [02:41<03:55, 75.59it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12232/30000 [02:41<03:55, 75.43it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12240/30000 [02:41<03:56, 74.97it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12248/30000 [02:41<03:55, 75.50it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12256/30000 [02:41<03:53, 75.97it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12264/30000 [02:41<03:53, 76.10it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12272/30000 [02:42<03:53, 76.04it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12280/30000 [02:42<03:52, 76.30it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12288/30000 [02:42<03:51, 76.40it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12296/30000 [02:42<03:52, 76.21it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12304/30000 [02:42<03:52, 76.07it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12312/30000 [02:42<03:51, 76.39it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12320/30000 [02:42<03:51, 76.28it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12328/30000 [02:42<03:51, 76.19it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12336/30000 [02:42<03:50, 76.48it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12344/30000 [02:43<03:50, 76.62it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12352/30000 [02:43<03:50, 76.42it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12360/30000 [02:43<03:50, 76.65it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████      | 12368/30000 [02:43<03:50, 76.58it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12376/30000 [02:43<03:50, 76.48it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12384/30000 [02:43<03:49, 76.84it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12392/30000 [02:43<03:50, 76.51it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12400/30000 [02:43<03:50, 76.34it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12408/30000 [02:43<03:51, 75.84it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12416/30000 [02:43<03:51, 75.95it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12424/30000 [02:44<03:51, 75.91it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12432/30000 [02:44<03:51, 75.79it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12440/30000 [02:44<03:52, 75.63it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 41%|████▏     | 12448/30000 [02:44<03:51, 75.70it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12456/30000 [02:44<03:52, 75.36it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12464/30000 [02:44<03:51, 75.76it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12472/30000 [02:44<03:51, 75.83it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12480/30000 [02:44<03:50, 76.07it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12488/30000 [02:44<03:48, 76.56it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12496/30000 [02:45<03:47, 76.77it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12504/30000 [02:45<03:50, 75.95it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12512/30000 [02:45<03:49, 76.28it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12520/30000 [02:45<03:48, 76.41it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12528/30000 [02:45<03:48, 76.38it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12536/30000 [02:45<03:48, 76.59it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12544/30000 [02:45<03:49, 76.01it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12552/30000 [02:45<03:48, 76.30it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12560/30000 [02:45<03:48, 76.22it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12568/30000 [02:45<03:47, 76.49it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12576/30000 [02:46<03:47, 76.48it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12584/30000 [02:46<03:49, 76.01it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12592/30000 [02:46<03:49, 75.97it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12600/30000 [02:46<03:48, 76.10it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12608/30000 [02:46<03:48, 76.12it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12616/30000 [02:46<03:47, 76.55it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12624/30000 [02:46<03:47, 76.48it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12632/30000 [02:46<03:46, 76.59it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12640/30000 [02:46<03:47, 76.42it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12648/30000 [02:47<03:48, 75.99it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12656/30000 [02:47<03:49, 75.44it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12664/30000 [02:47<03:48, 75.85it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12672/30000 [02:47<03:47, 76.21it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12680/30000 [02:47<03:47, 76.19it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12688/30000 [02:47<03:47, 76.05it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12696/30000 [02:47<03:47, 75.98it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12704/30000 [02:47<03:48, 75.70it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12712/30000 [02:47<03:48, 75.79it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12720/30000 [02:47<03:47, 75.88it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12728/30000 [02:48<03:47, 75.84it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12736/30000 [02:48<03:47, 75.80it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 42%|████▏     | 12744/30000 [02:48<03:47, 75.99it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12752/30000 [02:48<03:46, 76.03it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12760/30000 [02:48<03:47, 75.74it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12768/30000 [02:48<03:46, 76.17it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12776/30000 [02:48<03:45, 76.30it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12784/30000 [02:48<03:47, 75.73it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12792/30000 [02:48<03:47, 75.70it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12800/30000 [02:49<03:47, 75.50it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12808/30000 [02:49<03:47, 75.49it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12816/30000 [02:49<03:46, 75.95it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12824/30000 [02:49<03:45, 76.05it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12832/30000 [02:49<03:46, 75.89it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12840/30000 [02:49<03:46, 75.69it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12848/30000 [02:49<03:46, 75.84it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12856/30000 [02:49<03:45, 76.08it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12864/30000 [02:49<03:46, 75.65it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12872/30000 [02:49<03:45, 75.95it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12880/30000 [02:50<03:44, 76.32it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12888/30000 [02:50<03:45, 76.01it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12896/30000 [02:50<03:43, 76.52it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12904/30000 [02:50<03:44, 76.08it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12912/30000 [02:50<03:45, 75.72it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12920/30000 [02:50<03:45, 75.85it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12928/30000 [02:50<03:44, 76.14it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12936/30000 [02:50<03:43, 76.34it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12944/30000 [02:50<03:45, 75.57it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12952/30000 [02:51<03:45, 75.62it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12960/30000 [02:51<03:46, 75.26it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12968/30000 [02:51<03:45, 75.53it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12976/30000 [02:51<03:46, 75.30it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12984/30000 [02:51<03:45, 75.38it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 12992/30000 [02:51<03:45, 75.45it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 13000/30000 [02:51<03:43, 75.95it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 13008/30000 [02:51<03:44, 75.82it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 13016/30000 [02:51<03:44, 75.74it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 13024/30000 [02:51<03:44, 75.73it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 13032/30000 [02:52<03:43, 75.88it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 13040/30000 [02:52<03:43, 75.87it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 43%|████▎     | 13048/30000 [02:52<03:43, 75.80it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13056/30000 [02:52<03:43, 75.98it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13064/30000 [02:52<03:42, 76.03it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13072/30000 [02:52<03:42, 76.01it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13080/30000 [02:52<03:42, 76.11it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13088/30000 [02:52<03:42, 76.16it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13096/30000 [02:52<03:42, 76.04it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13104/30000 [02:53<03:42, 76.02it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13112/30000 [02:53<03:43, 75.54it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▎     | 13120/30000 [02:53<03:42, 75.72it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13128/30000 [02:53<03:41, 76.06it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13136/30000 [02:53<03:40, 76.32it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13144/30000 [02:53<03:39, 76.69it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13152/30000 [02:53<03:39, 76.69it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13160/30000 [02:53<03:40, 76.43it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13168/30000 [02:53<03:39, 76.63it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13176/30000 [02:53<03:40, 76.43it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13184/30000 [02:54<03:39, 76.48it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13192/30000 [02:54<03:41, 75.94it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13200/30000 [02:54<03:42, 75.59it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13208/30000 [02:54<03:41, 75.73it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13216/30000 [02:54<03:40, 76.08it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13224/30000 [02:54<03:40, 76.09it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13232/30000 [02:54<03:39, 76.29it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13240/30000 [02:54<03:39, 76.45it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13248/30000 [02:54<03:40, 76.06it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13256/30000 [02:55<03:40, 75.81it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13264/30000 [02:55<03:40, 75.88it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13272/30000 [02:55<03:39, 76.06it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13280/30000 [02:55<03:39, 76.23it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13288/30000 [02:55<03:40, 75.86it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13296/30000 [02:55<03:40, 75.76it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13304/30000 [02:55<03:39, 76.11it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13312/30000 [02:55<03:39, 76.05it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13320/30000 [02:55<03:38, 76.50it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13328/30000 [02:55<03:38, 76.37it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13336/30000 [02:56<03:37, 76.73it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 44%|████▍     | 13344/30000 [02:56<03:37, 76.44it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13352/30000 [02:56<03:37, 76.53it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13360/30000 [02:56<03:37, 76.54it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13368/30000 [02:56<03:36, 76.70it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13376/30000 [02:56<03:37, 76.45it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13384/30000 [02:56<03:37, 76.42it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13392/30000 [02:56<03:37, 76.37it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13400/30000 [02:56<03:37, 76.34it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13408/30000 [02:57<03:37, 76.23it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13416/30000 [02:57<03:39, 75.68it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13424/30000 [02:57<03:39, 75.59it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13432/30000 [02:57<03:38, 75.71it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13440/30000 [02:57<03:38, 75.65it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13448/30000 [02:57<03:39, 75.51it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13456/30000 [02:57<03:37, 75.99it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13464/30000 [02:57<03:37, 76.08it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13472/30000 [02:57<03:38, 75.67it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13480/30000 [02:57<03:37, 75.93it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13488/30000 [02:58<03:37, 76.09it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▍     | 13496/30000 [02:58<03:38, 75.53it/s, init loss: 40160.1836, avg. loss [10501-12000]: 19957.6445] 45%|████▌     | 13504/30000 [02:58<03:38, 75.60it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13512/30000 [02:58<03:38, 75.53it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13520/30000 [02:58<03:38, 75.28it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13528/30000 [02:58<03:37, 75.68it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13536/30000 [02:58<03:36, 76.03it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13544/30000 [02:58<03:35, 76.39it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13552/30000 [02:58<03:34, 76.70it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13560/30000 [02:59<03:33, 76.89it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13568/30000 [02:59<03:35, 76.13it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13576/30000 [02:59<03:35, 76.13it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13584/30000 [02:59<03:34, 76.40it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13592/30000 [02:59<03:35, 76.07it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13600/30000 [02:59<03:34, 76.32it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13608/30000 [02:59<03:34, 76.42it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13616/30000 [02:59<03:33, 76.60it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13624/30000 [02:59<03:34, 76.33it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13632/30000 [02:59<03:33, 76.49it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13640/30000 [03:00<03:34, 76.44it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 45%|████▌     | 13648/30000 [03:00<03:34, 76.33it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13656/30000 [03:00<03:33, 76.43it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13664/30000 [03:00<03:33, 76.41it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13672/30000 [03:00<03:34, 76.13it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13680/30000 [03:00<03:33, 76.33it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13688/30000 [03:00<03:33, 76.27it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13696/30000 [03:00<03:34, 75.84it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13704/30000 [03:00<03:34, 75.94it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13712/30000 [03:01<03:33, 76.18it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13720/30000 [03:01<03:33, 76.20it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13728/30000 [03:01<03:33, 76.25it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13736/30000 [03:01<03:32, 76.42it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13744/30000 [03:01<03:33, 76.30it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13752/30000 [03:01<03:33, 76.14it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13760/30000 [03:01<03:33, 76.05it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13768/30000 [03:01<03:33, 76.02it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13776/30000 [03:01<03:33, 76.04it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13784/30000 [03:01<03:32, 76.20it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13792/30000 [03:02<03:33, 75.96it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13800/30000 [03:02<03:32, 76.07it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13808/30000 [03:02<03:33, 75.91it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13816/30000 [03:02<03:33, 75.94it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13824/30000 [03:02<03:32, 76.04it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13832/30000 [03:02<03:31, 76.30it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13840/30000 [03:02<03:32, 76.09it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13848/30000 [03:02<03:31, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13856/30000 [03:02<03:31, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13864/30000 [03:03<03:31, 76.12it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▌     | 13872/30000 [03:03<03:33, 75.46it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13880/30000 [03:03<03:32, 75.71it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13888/30000 [03:03<03:31, 76.10it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13896/30000 [03:03<03:30, 76.45it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13904/30000 [03:03<03:31, 76.09it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13912/30000 [03:03<03:30, 76.37it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13920/30000 [03:03<03:32, 75.84it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13928/30000 [03:03<03:31, 76.06it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13936/30000 [03:03<03:30, 76.34it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 46%|████▋     | 13944/30000 [03:04<03:29, 76.58it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 13952/30000 [03:04<03:30, 76.17it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 13960/30000 [03:04<03:29, 76.54it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 13968/30000 [03:04<03:30, 76.00it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 13976/30000 [03:04<03:31, 75.83it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 13984/30000 [03:04<03:30, 76.05it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 13992/30000 [03:04<03:30, 76.02it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14000/30000 [03:04<03:30, 75.85it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14008/30000 [03:04<03:29, 76.28it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14016/30000 [03:05<03:28, 76.55it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14024/30000 [03:05<03:30, 75.90it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14032/30000 [03:05<03:30, 75.98it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14040/30000 [03:05<03:30, 75.86it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14048/30000 [03:05<03:29, 76.20it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14056/30000 [03:05<03:29, 76.12it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14064/30000 [03:05<03:28, 76.44it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14072/30000 [03:05<03:27, 76.73it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14080/30000 [03:05<03:27, 76.55it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14088/30000 [03:05<03:27, 76.59it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14096/30000 [03:06<03:27, 76.54it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14104/30000 [03:06<03:27, 76.72it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14112/30000 [03:06<03:27, 76.63it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14120/30000 [03:06<03:28, 76.33it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14128/30000 [03:06<03:27, 76.31it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14136/30000 [03:06<03:27, 76.45it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14144/30000 [03:06<03:27, 76.44it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14152/30000 [03:06<03:27, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14160/30000 [03:06<03:27, 76.42it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14168/30000 [03:07<03:27, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14176/30000 [03:07<03:28, 75.84it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14184/30000 [03:07<03:27, 76.22it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14192/30000 [03:07<03:27, 76.27it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14200/30000 [03:07<03:27, 76.30it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14208/30000 [03:07<03:27, 76.03it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14216/30000 [03:07<03:27, 76.12it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14224/30000 [03:07<03:27, 76.17it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14232/30000 [03:07<03:26, 76.32it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14240/30000 [03:07<03:27, 76.08it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 47%|████▋     | 14248/30000 [03:08<03:27, 75.90it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14256/30000 [03:08<03:26, 76.07it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14264/30000 [03:08<03:26, 76.39it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14272/30000 [03:08<03:27, 75.83it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14280/30000 [03:08<03:28, 75.54it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14288/30000 [03:08<03:26, 76.03it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14296/30000 [03:08<03:26, 76.20it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14304/30000 [03:08<03:26, 76.17it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14312/30000 [03:08<03:25, 76.30it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14320/30000 [03:09<03:25, 76.13it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14328/30000 [03:09<03:26, 75.79it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14336/30000 [03:09<03:26, 75.81it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14344/30000 [03:09<03:27, 75.52it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14352/30000 [03:09<03:25, 75.98it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14360/30000 [03:09<03:26, 75.75it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14368/30000 [03:09<03:25, 75.90it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14376/30000 [03:09<03:27, 75.43it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14384/30000 [03:09<03:25, 75.88it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14392/30000 [03:09<03:24, 76.21it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14400/30000 [03:10<03:24, 76.19it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14408/30000 [03:10<03:25, 75.85it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14416/30000 [03:10<03:25, 75.97it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14424/30000 [03:10<03:24, 76.12it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14432/30000 [03:10<03:24, 76.20it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14440/30000 [03:10<03:24, 76.10it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14448/30000 [03:10<03:24, 75.98it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14456/30000 [03:10<03:23, 76.46it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14464/30000 [03:10<03:22, 76.59it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14472/30000 [03:11<03:22, 76.59it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14480/30000 [03:11<03:22, 76.69it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14488/30000 [03:11<03:23, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14496/30000 [03:11<03:23, 76.26it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14504/30000 [03:11<03:23, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14512/30000 [03:11<03:22, 76.40it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14520/30000 [03:11<03:22, 76.56it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14528/30000 [03:11<03:22, 76.58it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14536/30000 [03:11<03:21, 76.87it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 48%|████▊     | 14544/30000 [03:11<03:21, 76.61it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14552/30000 [03:12<03:21, 76.74it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14560/30000 [03:12<03:22, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14568/30000 [03:12<03:21, 76.63it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14576/30000 [03:12<03:22, 76.24it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14584/30000 [03:12<03:22, 76.13it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14592/30000 [03:12<03:22, 76.22it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14600/30000 [03:12<03:21, 76.50it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14608/30000 [03:12<03:21, 76.55it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14616/30000 [03:12<03:21, 76.31it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▊     | 14624/30000 [03:12<03:21, 76.36it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14632/30000 [03:13<03:21, 76.09it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14640/30000 [03:13<03:21, 76.15it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14648/30000 [03:13<03:22, 75.98it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14656/30000 [03:13<03:21, 76.33it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14664/30000 [03:13<03:20, 76.58it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14672/30000 [03:13<03:20, 76.59it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14680/30000 [03:13<03:20, 76.59it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14688/30000 [03:13<03:19, 76.76it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14696/30000 [03:13<03:19, 76.64it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14704/30000 [03:14<03:18, 76.86it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14712/30000 [03:14<03:19, 76.54it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14720/30000 [03:14<03:19, 76.59it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14728/30000 [03:14<03:18, 76.80it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14736/30000 [03:14<03:18, 76.95it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14744/30000 [03:14<03:19, 76.58it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14752/30000 [03:14<03:18, 76.80it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14760/30000 [03:14<03:18, 76.63it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14768/30000 [03:14<03:20, 76.03it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14776/30000 [03:14<03:19, 76.48it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14784/30000 [03:15<03:19, 76.14it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14792/30000 [03:15<03:21, 75.47it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14800/30000 [03:15<03:19, 76.14it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14808/30000 [03:15<03:19, 76.26it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14816/30000 [03:15<03:19, 76.25it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14824/30000 [03:15<03:18, 76.29it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14832/30000 [03:15<03:19, 76.11it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14840/30000 [03:15<03:18, 76.38it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 49%|████▉     | 14848/30000 [03:15<03:17, 76.58it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14856/30000 [03:16<03:17, 76.76it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14864/30000 [03:16<03:17, 76.72it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14872/30000 [03:16<03:17, 76.67it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14880/30000 [03:16<03:16, 76.80it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14888/30000 [03:16<03:17, 76.56it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14896/30000 [03:16<03:16, 76.70it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14904/30000 [03:16<03:16, 76.66it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14912/30000 [03:16<03:16, 76.61it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14920/30000 [03:16<03:17, 76.53it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14928/30000 [03:16<03:17, 76.24it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14936/30000 [03:17<03:18, 76.06it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14944/30000 [03:17<03:17, 76.30it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14952/30000 [03:17<03:17, 76.15it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14960/30000 [03:17<03:16, 76.51it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14968/30000 [03:17<03:16, 76.42it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14976/30000 [03:17<03:17, 76.23it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14984/30000 [03:17<03:16, 76.39it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|████▉     | 14992/30000 [03:17<03:16, 76.45it/s, init loss: 40160.1836, avg. loss [12001-13500]: 19498.0234] 50%|█████     | 15000/30000 [03:17<03:17, 75.95it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15008/30000 [03:18<03:17, 75.85it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15016/30000 [03:18<03:18, 75.63it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15024/30000 [03:18<03:17, 75.94it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15032/30000 [03:18<03:16, 76.32it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15040/30000 [03:18<03:15, 76.46it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15048/30000 [03:18<03:15, 76.45it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15056/30000 [03:18<03:15, 76.61it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15064/30000 [03:18<03:14, 76.68it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15072/30000 [03:18<03:14, 76.90it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15080/30000 [03:18<03:15, 76.38it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15088/30000 [03:19<03:15, 76.35it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15096/30000 [03:19<03:15, 76.26it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15104/30000 [03:19<03:15, 76.35it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15112/30000 [03:19<03:14, 76.50it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15120/30000 [03:19<03:14, 76.58it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15128/30000 [03:19<03:14, 76.53it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15136/30000 [03:19<03:14, 76.26it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 50%|█████     | 15144/30000 [03:19<03:14, 76.29it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15152/30000 [03:19<03:15, 76.11it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15160/30000 [03:20<03:14, 76.24it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15168/30000 [03:20<03:14, 76.07it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15176/30000 [03:20<03:15, 75.89it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15184/30000 [03:20<03:15, 75.81it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15192/30000 [03:20<03:15, 75.58it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15200/30000 [03:20<03:15, 75.65it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15208/30000 [03:20<03:14, 76.02it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15216/30000 [03:20<03:14, 75.85it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15224/30000 [03:20<03:15, 75.70it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15232/30000 [03:20<03:13, 76.19it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15240/30000 [03:21<03:14, 75.96it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15248/30000 [03:21<03:13, 76.15it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15256/30000 [03:21<03:13, 76.37it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15264/30000 [03:21<03:12, 76.73it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15272/30000 [03:21<03:12, 76.52it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15280/30000 [03:21<03:12, 76.47it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15288/30000 [03:21<03:12, 76.44it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15296/30000 [03:21<03:13, 76.18it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15304/30000 [03:21<03:12, 76.50it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15312/30000 [03:22<03:11, 76.60it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15320/30000 [03:22<03:12, 76.36it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15328/30000 [03:22<03:12, 76.37it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15336/30000 [03:22<03:11, 76.61it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15344/30000 [03:22<03:12, 76.31it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15352/30000 [03:22<03:11, 76.38it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15360/30000 [03:22<03:12, 76.17it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████     | 15368/30000 [03:22<03:11, 76.48it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15376/30000 [03:22<03:10, 76.78it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15384/30000 [03:22<03:09, 77.18it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15392/30000 [03:23<03:09, 77.12it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15400/30000 [03:23<03:10, 76.82it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15408/30000 [03:23<03:09, 77.18it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15416/30000 [03:23<03:10, 76.71it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15424/30000 [03:23<03:09, 76.96it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15432/30000 [03:23<03:09, 76.72it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15440/30000 [03:23<03:10, 76.55it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 51%|█████▏    | 15448/30000 [03:23<03:10, 76.34it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15456/30000 [03:23<03:09, 76.62it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15464/30000 [03:23<03:09, 76.83it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15472/30000 [03:24<03:09, 76.54it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15480/30000 [03:24<03:10, 76.31it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15488/30000 [03:24<03:09, 76.48it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15496/30000 [03:24<03:09, 76.54it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15504/30000 [03:24<03:09, 76.61it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15512/30000 [03:24<03:09, 76.58it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15520/30000 [03:24<03:10, 76.05it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15528/30000 [03:24<03:09, 76.27it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15536/30000 [03:24<03:10, 76.09it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15544/30000 [03:25<03:09, 76.32it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15552/30000 [03:25<03:10, 75.99it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15560/30000 [03:25<03:11, 75.44it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15568/30000 [03:25<03:10, 75.70it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15576/30000 [03:25<03:10, 75.87it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15584/30000 [03:25<03:09, 76.06it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15592/30000 [03:25<03:08, 76.29it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15600/30000 [03:25<03:08, 76.58it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15608/30000 [03:25<03:08, 76.52it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15616/30000 [03:25<03:08, 76.26it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15624/30000 [03:26<03:08, 76.32it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15632/30000 [03:26<03:09, 75.99it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15640/30000 [03:26<03:09, 75.81it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15648/30000 [03:26<03:08, 76.28it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15656/30000 [03:26<03:08, 75.97it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15664/30000 [03:26<03:07, 76.42it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15672/30000 [03:26<03:06, 76.72it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15680/30000 [03:26<03:06, 76.69it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15688/30000 [03:26<03:06, 76.59it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15696/30000 [03:27<03:06, 76.59it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15704/30000 [03:27<03:06, 76.69it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15712/30000 [03:27<03:05, 76.94it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15720/30000 [03:27<03:05, 76.87it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15728/30000 [03:27<03:06, 76.64it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15736/30000 [03:27<03:05, 76.76it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 52%|█████▏    | 15744/30000 [03:27<03:06, 76.56it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15752/30000 [03:27<03:05, 76.61it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15760/30000 [03:27<03:05, 76.81it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15768/30000 [03:27<03:05, 76.75it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15776/30000 [03:28<03:04, 77.00it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15784/30000 [03:28<03:05, 76.67it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15792/30000 [03:28<03:06, 76.00it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15800/30000 [03:28<03:06, 76.14it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15808/30000 [03:28<03:06, 76.02it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15816/30000 [03:28<03:06, 75.92it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15824/30000 [03:28<03:06, 76.11it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15832/30000 [03:28<03:05, 76.18it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15840/30000 [03:28<03:05, 76.34it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15848/30000 [03:29<03:05, 76.40it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15856/30000 [03:29<03:06, 75.71it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15864/30000 [03:29<03:06, 75.70it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15872/30000 [03:29<03:07, 75.55it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15880/30000 [03:29<03:05, 76.13it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15888/30000 [03:29<03:05, 76.09it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15896/30000 [03:29<03:06, 75.75it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15904/30000 [03:29<03:05, 76.16it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15912/30000 [03:29<03:04, 76.52it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15920/30000 [03:29<03:04, 76.21it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15928/30000 [03:30<03:05, 75.74it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15936/30000 [03:30<03:06, 75.49it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15944/30000 [03:30<03:04, 76.03it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15952/30000 [03:30<03:05, 75.91it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15960/30000 [03:30<03:04, 76.17it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15968/30000 [03:30<03:03, 76.45it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15976/30000 [03:30<03:03, 76.44it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15984/30000 [03:30<03:03, 76.30it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 15992/30000 [03:30<03:02, 76.55it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 16000/30000 [03:31<03:02, 76.54it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 16008/30000 [03:31<03:02, 76.79it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 16016/30000 [03:31<03:02, 76.50it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 16024/30000 [03:31<03:02, 76.42it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 16032/30000 [03:31<03:02, 76.53it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 16040/30000 [03:31<03:01, 76.84it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 53%|█████▎    | 16048/30000 [03:31<03:02, 76.65it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16056/30000 [03:31<03:00, 77.05it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16064/30000 [03:31<03:01, 76.86it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16072/30000 [03:31<03:01, 76.95it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16080/30000 [03:32<03:00, 77.08it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16088/30000 [03:32<03:00, 77.01it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16096/30000 [03:32<03:00, 76.95it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16104/30000 [03:32<03:00, 76.85it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16112/30000 [03:32<03:00, 76.91it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▎    | 16120/30000 [03:32<03:00, 77.00it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16128/30000 [03:32<03:00, 77.05it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16136/30000 [03:32<03:00, 76.68it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16144/30000 [03:32<03:01, 76.20it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16152/30000 [03:32<03:01, 76.48it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16160/30000 [03:33<03:00, 76.48it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16168/30000 [03:33<03:00, 76.46it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16176/30000 [03:33<03:00, 76.58it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16184/30000 [03:33<03:01, 76.21it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16192/30000 [03:33<03:01, 76.28it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16200/30000 [03:33<03:01, 76.16it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16208/30000 [03:33<03:00, 76.23it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16216/30000 [03:33<03:01, 76.15it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16224/30000 [03:33<03:00, 76.13it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16232/30000 [03:34<03:01, 75.99it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16240/30000 [03:34<03:02, 75.44it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16248/30000 [03:34<03:02, 75.46it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16256/30000 [03:34<03:02, 75.23it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16264/30000 [03:34<03:01, 75.55it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16272/30000 [03:34<03:01, 75.64it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16280/30000 [03:34<03:00, 75.86it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16288/30000 [03:34<02:59, 76.44it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16296/30000 [03:34<02:59, 76.50it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16304/30000 [03:34<02:59, 76.19it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16312/30000 [03:35<03:00, 75.83it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16320/30000 [03:35<03:01, 75.47it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16328/30000 [03:35<03:00, 75.75it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16336/30000 [03:35<02:59, 75.96it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 54%|█████▍    | 16344/30000 [03:35<02:59, 76.27it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16352/30000 [03:35<02:58, 76.30it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16360/30000 [03:35<02:59, 75.95it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16368/30000 [03:35<02:59, 75.79it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16376/30000 [03:35<02:59, 75.86it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16384/30000 [03:36<02:58, 76.23it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16392/30000 [03:36<02:59, 75.96it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16400/30000 [03:36<02:58, 76.19it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16408/30000 [03:36<02:58, 76.15it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16416/30000 [03:36<02:57, 76.48it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16424/30000 [03:36<02:58, 76.24it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16432/30000 [03:36<02:57, 76.24it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16440/30000 [03:36<02:57, 76.42it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16448/30000 [03:36<02:58, 76.09it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16456/30000 [03:36<02:57, 76.52it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16464/30000 [03:37<02:56, 76.66it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16472/30000 [03:37<02:57, 76.26it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16480/30000 [03:37<02:56, 76.47it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16488/30000 [03:37<02:56, 76.45it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▍    | 16496/30000 [03:37<02:56, 76.40it/s, init loss: 40160.1836, avg. loss [13501-15000]: 19131.3457] 55%|█████▌    | 16504/30000 [03:37<02:56, 76.25it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16512/30000 [03:37<02:56, 76.61it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16520/30000 [03:37<02:56, 76.51it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16528/30000 [03:37<02:55, 76.75it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16536/30000 [03:38<02:55, 76.88it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16544/30000 [03:38<02:55, 76.83it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16552/30000 [03:38<02:55, 76.45it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16560/30000 [03:38<02:55, 76.55it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16568/30000 [03:38<02:55, 76.72it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16576/30000 [03:38<02:55, 76.62it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16584/30000 [03:38<02:54, 76.92it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16592/30000 [03:38<02:54, 76.69it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16600/30000 [03:38<02:54, 76.67it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16608/30000 [03:38<02:54, 76.78it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16616/30000 [03:39<02:53, 76.97it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16624/30000 [03:39<02:53, 77.01it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16632/30000 [03:39<02:53, 77.13it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16640/30000 [03:39<02:53, 77.15it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 55%|█████▌    | 16648/30000 [03:39<02:53, 77.10it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16656/30000 [03:39<02:53, 76.74it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16664/30000 [03:39<02:53, 76.96it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16672/30000 [03:39<02:53, 76.78it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16680/30000 [03:39<02:53, 76.66it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16688/30000 [03:40<02:53, 76.65it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16696/30000 [03:40<02:54, 76.17it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16704/30000 [03:40<02:54, 76.11it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16712/30000 [03:40<02:53, 76.64it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16720/30000 [03:40<02:53, 76.56it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16728/30000 [03:40<02:53, 76.32it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16736/30000 [03:40<02:52, 76.79it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16744/30000 [03:40<02:52, 76.80it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16752/30000 [03:40<02:53, 76.21it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16760/30000 [03:40<02:53, 76.33it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16768/30000 [03:41<02:53, 76.10it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16776/30000 [03:41<02:52, 76.54it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16784/30000 [03:41<02:52, 76.67it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16792/30000 [03:41<02:52, 76.61it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16800/30000 [03:41<02:52, 76.40it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16808/30000 [03:41<02:51, 76.72it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16816/30000 [03:41<02:50, 77.15it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16824/30000 [03:41<02:51, 76.92it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16832/30000 [03:41<02:50, 77.10it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16840/30000 [03:42<02:51, 76.96it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16848/30000 [03:42<02:50, 77.06it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16856/30000 [03:42<02:50, 76.89it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16864/30000 [03:42<02:51, 76.82it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▌    | 16872/30000 [03:42<02:50, 77.13it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16880/30000 [03:42<02:50, 77.10it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16888/30000 [03:42<02:49, 77.16it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16896/30000 [03:42<02:49, 77.19it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16904/30000 [03:42<02:50, 76.80it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16912/30000 [03:42<02:49, 77.14it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16920/30000 [03:43<02:49, 77.07it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16928/30000 [03:43<02:49, 76.97it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16936/30000 [03:43<02:49, 77.05it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 56%|█████▋    | 16944/30000 [03:43<02:49, 77.09it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 16952/30000 [03:43<02:48, 77.23it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 16960/30000 [03:43<02:49, 76.87it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 16968/30000 [03:43<02:49, 76.71it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 16976/30000 [03:43<02:50, 76.20it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 16984/30000 [03:43<02:50, 76.56it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 16992/30000 [03:43<02:49, 76.70it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17000/30000 [03:44<02:49, 76.67it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17008/30000 [03:44<02:50, 76.21it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17016/30000 [03:44<02:49, 76.57it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17024/30000 [03:44<02:48, 76.86it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17032/30000 [03:44<02:49, 76.45it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17040/30000 [03:44<02:48, 76.95it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17048/30000 [03:44<02:47, 77.12it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17056/30000 [03:44<02:48, 77.01it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17064/30000 [03:44<02:47, 77.27it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17072/30000 [03:45<02:47, 77.11it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17080/30000 [03:45<02:47, 76.97it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17088/30000 [03:45<02:47, 77.07it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17096/30000 [03:45<02:47, 77.25it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17104/30000 [03:45<02:47, 76.78it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17112/30000 [03:45<02:47, 77.01it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17120/30000 [03:45<02:46, 77.31it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17128/30000 [03:45<02:47, 76.92it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17136/30000 [03:45<02:46, 77.21it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17144/30000 [03:45<02:47, 76.87it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17152/30000 [03:46<02:47, 76.92it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17160/30000 [03:46<02:46, 76.97it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17168/30000 [03:46<02:48, 76.37it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17176/30000 [03:46<02:46, 76.79it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17184/30000 [03:46<02:46, 76.83it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17192/30000 [03:46<02:46, 76.83it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17200/30000 [03:46<02:46, 76.74it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17208/30000 [03:46<02:47, 76.57it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17216/30000 [03:46<02:47, 76.19it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17224/30000 [03:46<02:46, 76.61it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17232/30000 [03:47<02:47, 76.37it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17240/30000 [03:47<02:46, 76.44it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 57%|█████▋    | 17248/30000 [03:47<02:46, 76.54it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17256/30000 [03:47<02:45, 77.03it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17264/30000 [03:47<02:45, 77.11it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17272/30000 [03:47<02:44, 77.19it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17280/30000 [03:47<02:45, 76.69it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17288/30000 [03:47<02:45, 76.59it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17296/30000 [03:47<02:45, 76.67it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17304/30000 [03:48<02:45, 76.61it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17312/30000 [03:48<02:45, 76.71it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17320/30000 [03:48<02:44, 76.93it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17328/30000 [03:48<02:44, 77.12it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17336/30000 [03:48<02:44, 77.21it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17344/30000 [03:48<02:43, 77.26it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17352/30000 [03:48<02:45, 76.56it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17360/30000 [03:48<02:44, 77.01it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17368/30000 [03:48<02:44, 76.93it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17376/30000 [03:48<02:44, 76.95it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17384/30000 [03:49<02:44, 76.50it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17392/30000 [03:49<02:45, 76.21it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17400/30000 [03:49<02:44, 76.39it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17408/30000 [03:49<02:43, 76.89it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17416/30000 [03:49<02:43, 76.84it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17424/30000 [03:49<02:44, 76.62it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17432/30000 [03:49<02:43, 76.90it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17440/30000 [03:49<02:43, 76.71it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17448/30000 [03:49<02:43, 76.95it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17456/30000 [03:50<02:43, 76.82it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17464/30000 [03:50<02:43, 76.74it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17472/30000 [03:50<02:42, 76.93it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17480/30000 [03:50<02:42, 77.01it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17488/30000 [03:50<02:42, 77.00it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17496/30000 [03:50<02:42, 76.90it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17504/30000 [03:50<02:42, 76.93it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17512/30000 [03:50<02:42, 76.79it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17520/30000 [03:50<02:41, 77.25it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17528/30000 [03:50<02:41, 77.12it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17536/30000 [03:51<02:41, 77.27it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 58%|█████▊    | 17544/30000 [03:51<02:41, 77.15it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17552/30000 [03:51<02:41, 77.10it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17560/30000 [03:51<02:41, 76.94it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17568/30000 [03:51<02:41, 77.14it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17576/30000 [03:51<02:40, 77.59it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17584/30000 [03:51<02:39, 77.74it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17592/30000 [03:51<02:39, 77.83it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17600/30000 [03:51<02:40, 77.18it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17608/30000 [03:51<02:40, 77.19it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17616/30000 [03:52<02:40, 77.33it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▊    | 17624/30000 [03:52<02:40, 76.95it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17632/30000 [03:52<02:40, 77.07it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17640/30000 [03:52<02:40, 76.83it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17648/30000 [03:52<02:40, 76.85it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17656/30000 [03:52<02:40, 76.71it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17664/30000 [03:52<02:41, 76.47it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17672/30000 [03:52<02:40, 76.80it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17680/30000 [03:52<02:40, 76.67it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17688/30000 [03:53<02:41, 76.32it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17696/30000 [03:53<02:42, 75.92it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17704/30000 [03:53<02:40, 76.49it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17712/30000 [03:53<02:40, 76.61it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17720/30000 [03:53<02:39, 76.84it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17728/30000 [03:53<02:39, 76.80it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17736/30000 [03:53<02:39, 77.12it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17744/30000 [03:53<02:38, 77.40it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17752/30000 [03:53<02:38, 77.06it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17760/30000 [03:53<02:39, 76.93it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17768/30000 [03:54<02:39, 76.69it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17776/30000 [03:54<02:40, 76.29it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17784/30000 [03:54<02:40, 76.32it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17792/30000 [03:54<02:39, 76.49it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17800/30000 [03:54<02:39, 76.54it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17808/30000 [03:54<02:39, 76.58it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17816/30000 [03:54<02:38, 76.81it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17824/30000 [03:54<02:38, 76.66it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17832/30000 [03:54<02:38, 76.97it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17840/30000 [03:55<02:37, 77.00it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 59%|█████▉    | 17848/30000 [03:55<02:38, 76.70it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17856/30000 [03:55<02:38, 76.83it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17864/30000 [03:55<02:37, 76.84it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17872/30000 [03:55<02:38, 76.74it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17880/30000 [03:55<02:38, 76.45it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17888/30000 [03:55<02:37, 76.72it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17896/30000 [03:55<02:37, 76.85it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17904/30000 [03:55<02:38, 76.54it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17912/30000 [03:55<02:38, 76.47it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17920/30000 [03:56<02:37, 76.51it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17928/30000 [03:56<02:38, 76.31it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17936/30000 [03:56<02:38, 76.22it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17944/30000 [03:56<02:37, 76.45it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17952/30000 [03:56<02:37, 76.60it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17960/30000 [03:56<02:37, 76.54it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17968/30000 [03:56<02:36, 76.80it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17976/30000 [03:56<02:36, 76.99it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17984/30000 [03:56<02:35, 77.11it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|█████▉    | 17992/30000 [03:56<02:36, 76.65it/s, init loss: 40160.1836, avg. loss [15001-16500]: 18846.8691] 60%|██████    | 18000/30000 [03:57<02:37, 76.20it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18008/30000 [03:57<02:37, 75.93it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18016/30000 [03:57<02:37, 76.28it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18024/30000 [03:57<02:35, 76.95it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18032/30000 [03:57<02:35, 77.05it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18040/30000 [03:57<02:35, 76.99it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18048/30000 [03:57<02:35, 76.97it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18056/30000 [03:57<02:35, 76.97it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18064/30000 [03:57<02:34, 77.32it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18072/30000 [03:58<02:34, 77.41it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18080/30000 [03:58<02:34, 77.01it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18088/30000 [03:58<02:34, 77.12it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18096/30000 [03:58<02:35, 76.70it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18104/30000 [03:58<02:35, 76.58it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18112/30000 [03:58<02:34, 76.75it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18120/30000 [03:58<02:35, 76.49it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18128/30000 [03:58<02:35, 76.40it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18136/30000 [03:58<02:35, 76.32it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 60%|██████    | 18144/30000 [03:58<02:34, 76.58it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18152/30000 [03:59<02:33, 77.02it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18160/30000 [03:59<02:34, 76.76it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18168/30000 [03:59<02:33, 76.87it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18176/30000 [03:59<02:33, 76.99it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18184/30000 [03:59<02:33, 76.91it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18192/30000 [03:59<02:33, 77.08it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18200/30000 [03:59<02:32, 77.29it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18208/30000 [03:59<02:33, 76.71it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18216/30000 [03:59<02:33, 76.81it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18224/30000 [04:00<02:33, 76.81it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18232/30000 [04:00<02:32, 77.13it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18240/30000 [04:00<02:32, 77.32it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18248/30000 [04:00<02:33, 76.42it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18256/30000 [04:00<02:34, 76.14it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18264/30000 [04:00<02:34, 76.05it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18272/30000 [04:00<02:33, 76.34it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18280/30000 [04:00<02:33, 76.26it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18288/30000 [04:00<02:32, 76.63it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18296/30000 [04:00<02:32, 76.68it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18304/30000 [04:01<02:31, 77.08it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18312/30000 [04:01<02:32, 76.42it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18320/30000 [04:01<02:32, 76.74it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18328/30000 [04:01<02:32, 76.30it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18336/30000 [04:01<02:32, 76.53it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18344/30000 [04:01<02:32, 76.44it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18352/30000 [04:01<02:32, 76.47it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18360/30000 [04:01<02:31, 76.76it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████    | 18368/30000 [04:01<02:32, 76.36it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18376/30000 [04:02<02:31, 76.70it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18384/30000 [04:02<02:32, 76.20it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18392/30000 [04:02<02:31, 76.38it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18400/30000 [04:02<02:30, 77.12it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18408/30000 [04:02<02:29, 77.31it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18416/30000 [04:02<02:29, 77.72it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18424/30000 [04:02<02:29, 77.41it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18432/30000 [04:02<02:29, 77.14it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18440/30000 [04:02<02:28, 77.64it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 61%|██████▏   | 18448/30000 [04:02<02:30, 76.99it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18456/30000 [04:03<02:29, 77.35it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18464/30000 [04:03<02:30, 76.82it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18472/30000 [04:03<02:29, 76.88it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18480/30000 [04:03<02:29, 77.06it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18488/30000 [04:03<02:29, 77.14it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18496/30000 [04:03<02:28, 77.38it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18504/30000 [04:03<02:28, 77.25it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18512/30000 [04:03<02:28, 77.15it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18520/30000 [04:03<02:28, 77.22it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18528/30000 [04:03<02:28, 77.30it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18536/30000 [04:04<02:28, 77.26it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18544/30000 [04:04<02:29, 76.55it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18552/30000 [04:04<02:29, 76.71it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18560/30000 [04:04<02:29, 76.54it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18568/30000 [04:04<02:28, 76.91it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18576/30000 [04:04<02:28, 76.87it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18584/30000 [04:04<02:28, 76.78it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18592/30000 [04:04<02:29, 76.33it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18600/30000 [04:04<02:29, 76.43it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18608/30000 [04:05<02:28, 76.65it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18616/30000 [04:05<02:29, 76.28it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18624/30000 [04:05<02:29, 76.08it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18632/30000 [04:05<02:28, 76.62it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18640/30000 [04:05<02:28, 76.70it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18648/30000 [04:05<02:28, 76.70it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18656/30000 [04:05<02:27, 76.88it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18664/30000 [04:05<02:26, 77.28it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18672/30000 [04:05<02:26, 77.09it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18680/30000 [04:05<02:26, 77.12it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18688/30000 [04:06<02:26, 77.25it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18696/30000 [04:06<02:26, 76.94it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18704/30000 [04:06<02:26, 77.10it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18712/30000 [04:06<02:26, 77.10it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18720/30000 [04:06<02:27, 76.58it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18728/30000 [04:06<02:27, 76.55it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18736/30000 [04:06<02:27, 76.56it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 62%|██████▏   | 18744/30000 [04:06<02:26, 76.71it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18752/30000 [04:06<02:26, 76.90it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18760/30000 [04:06<02:26, 76.79it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18768/30000 [04:07<02:25, 77.01it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18776/30000 [04:07<02:26, 76.74it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18784/30000 [04:07<02:25, 77.05it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18792/30000 [04:07<02:25, 77.06it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18800/30000 [04:07<02:25, 77.18it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18808/30000 [04:07<02:24, 77.47it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18816/30000 [04:07<02:24, 77.64it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18824/30000 [04:07<02:23, 77.95it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18832/30000 [04:07<02:23, 77.73it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18840/30000 [04:08<02:23, 77.71it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18848/30000 [04:08<02:24, 77.38it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18856/30000 [04:08<02:24, 77.32it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18864/30000 [04:08<02:24, 76.82it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18872/30000 [04:08<02:25, 76.70it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18880/30000 [04:08<02:24, 76.84it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18888/30000 [04:08<02:25, 76.55it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18896/30000 [04:08<02:24, 76.64it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18904/30000 [04:08<02:24, 76.89it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18912/30000 [04:08<02:24, 76.53it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18920/30000 [04:09<02:23, 77.06it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18928/30000 [04:09<02:24, 76.77it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18936/30000 [04:09<02:25, 76.22it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18944/30000 [04:09<02:24, 76.73it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18952/30000 [04:09<02:23, 77.00it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18960/30000 [04:09<02:23, 77.04it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18968/30000 [04:09<02:23, 77.10it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18976/30000 [04:09<02:23, 76.87it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18984/30000 [04:09<02:23, 76.61it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 18992/30000 [04:10<02:22, 77.03it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 19000/30000 [04:10<02:23, 76.57it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 19008/30000 [04:10<02:23, 76.76it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 19016/30000 [04:10<02:23, 76.68it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 19024/30000 [04:10<02:22, 77.10it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 19032/30000 [04:10<02:21, 77.25it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 19040/30000 [04:10<02:22, 77.05it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 63%|██████▎   | 19048/30000 [04:10<02:22, 76.79it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19056/30000 [04:10<02:22, 76.68it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19064/30000 [04:10<02:23, 76.15it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19072/30000 [04:11<02:24, 75.89it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19080/30000 [04:11<02:23, 75.89it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19088/30000 [04:11<02:23, 75.92it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19096/30000 [04:11<02:23, 76.07it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19104/30000 [04:11<02:22, 76.49it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19112/30000 [04:11<02:22, 76.62it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▎   | 19120/30000 [04:11<02:21, 77.04it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19128/30000 [04:11<02:20, 77.25it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19136/30000 [04:11<02:20, 77.20it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19144/30000 [04:11<02:20, 77.45it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19152/30000 [04:12<02:20, 77.00it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19160/30000 [04:12<02:21, 76.65it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19168/30000 [04:12<02:21, 76.78it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19176/30000 [04:12<02:20, 76.94it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19184/30000 [04:12<02:20, 76.82it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19192/30000 [04:12<02:20, 77.04it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19200/30000 [04:12<02:20, 76.79it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19208/30000 [04:12<02:19, 77.19it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19216/30000 [04:12<02:20, 77.03it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19224/30000 [04:13<02:20, 76.90it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19232/30000 [04:13<02:19, 77.00it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19240/30000 [04:13<02:19, 76.97it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19248/30000 [04:13<02:18, 77.46it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19256/30000 [04:13<02:18, 77.54it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19264/30000 [04:13<02:18, 77.57it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19272/30000 [04:13<02:19, 77.08it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19280/30000 [04:13<02:19, 76.89it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19288/30000 [04:13<02:18, 77.17it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19296/30000 [04:13<02:18, 77.12it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19304/30000 [04:14<02:19, 76.68it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19312/30000 [04:14<02:18, 77.00it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19320/30000 [04:14<02:18, 77.29it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19328/30000 [04:14<02:18, 77.24it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19336/30000 [04:14<02:18, 77.17it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 64%|██████▍   | 19344/30000 [04:14<02:18, 76.74it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19352/30000 [04:14<02:19, 76.34it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19360/30000 [04:14<02:18, 76.56it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19368/30000 [04:14<02:18, 76.64it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19376/30000 [04:15<02:18, 76.96it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19384/30000 [04:15<02:18, 76.75it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19392/30000 [04:15<02:17, 77.01it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19400/30000 [04:15<02:17, 76.96it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19408/30000 [04:15<02:18, 76.66it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19416/30000 [04:15<02:17, 77.21it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19424/30000 [04:15<02:16, 77.30it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19432/30000 [04:15<02:17, 77.13it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19440/30000 [04:15<02:16, 77.21it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19448/30000 [04:15<02:16, 77.31it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19456/30000 [04:16<02:16, 77.22it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19464/30000 [04:16<02:16, 77.38it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19472/30000 [04:16<02:16, 77.39it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19480/30000 [04:16<02:16, 77.34it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19488/30000 [04:16<02:16, 77.22it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▍   | 19496/30000 [04:16<02:17, 76.50it/s, init loss: 40160.1836, avg. loss [16501-18000]: 18625.4531] 65%|██████▌   | 19504/30000 [04:16<02:16, 76.65it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19512/30000 [04:16<02:16, 77.03it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19520/30000 [04:16<02:17, 76.01it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19528/30000 [04:16<02:17, 76.33it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19536/30000 [04:17<02:16, 76.79it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19544/30000 [04:17<02:17, 76.25it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19552/30000 [04:17<02:16, 76.81it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19560/30000 [04:17<02:15, 77.00it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19568/30000 [04:17<02:15, 77.21it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19576/30000 [04:17<02:15, 76.90it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19584/30000 [04:17<02:15, 76.67it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19592/30000 [04:17<02:16, 76.43it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19600/30000 [04:17<02:16, 76.24it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19608/30000 [04:18<02:15, 76.56it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19616/30000 [04:18<02:16, 76.18it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19624/30000 [04:18<02:15, 76.32it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19632/30000 [04:18<02:15, 76.63it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19640/30000 [04:18<02:14, 76.88it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 65%|██████▌   | 19648/30000 [04:18<02:14, 77.20it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19656/30000 [04:18<02:14, 76.95it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19664/30000 [04:18<02:14, 77.10it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19672/30000 [04:18<02:13, 77.41it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19680/30000 [04:18<02:13, 77.44it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19688/30000 [04:19<02:13, 77.49it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19696/30000 [04:19<02:13, 76.91it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19704/30000 [04:19<02:13, 77.28it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19712/30000 [04:19<02:12, 77.60it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19720/30000 [04:19<02:13, 77.16it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19728/30000 [04:19<02:13, 76.76it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19736/30000 [04:19<02:13, 76.70it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19744/30000 [04:19<02:13, 76.92it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19752/30000 [04:19<02:13, 76.90it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19760/30000 [04:19<02:13, 76.44it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19768/30000 [04:20<02:14, 76.20it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19776/30000 [04:20<02:14, 76.09it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19784/30000 [04:20<02:14, 76.12it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19792/30000 [04:20<02:14, 76.14it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19800/30000 [04:20<02:13, 76.48it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19808/30000 [04:20<02:12, 76.63it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19816/30000 [04:20<02:12, 76.81it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19824/30000 [04:20<02:12, 76.99it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19832/30000 [04:20<02:12, 76.52it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19840/30000 [04:21<02:12, 76.78it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19848/30000 [04:21<02:12, 76.51it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19856/30000 [04:21<02:12, 76.55it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19864/30000 [04:21<02:12, 76.64it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▌   | 19872/30000 [04:21<02:12, 76.58it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19880/30000 [04:21<02:11, 77.02it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19888/30000 [04:21<02:11, 76.87it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19896/30000 [04:21<02:11, 76.94it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19904/30000 [04:21<02:10, 77.13it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19912/30000 [04:21<02:10, 77.07it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19920/30000 [04:22<02:10, 77.41it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19928/30000 [04:22<02:10, 77.06it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19936/30000 [04:22<02:10, 76.99it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 66%|██████▋   | 19944/30000 [04:22<02:11, 76.60it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 19952/30000 [04:22<02:10, 76.71it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 19960/30000 [04:22<02:10, 76.73it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 19968/30000 [04:22<02:10, 77.07it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 19976/30000 [04:22<02:09, 77.42it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 19984/30000 [04:22<02:10, 76.85it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 19992/30000 [04:23<02:10, 76.97it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20000/30000 [04:23<02:10, 76.46it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20008/30000 [04:23<02:09, 76.89it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20016/30000 [04:23<02:09, 77.34it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20024/30000 [04:23<02:09, 77.00it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20032/30000 [04:23<02:09, 76.81it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20040/30000 [04:23<02:10, 76.58it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20048/30000 [04:23<02:09, 76.71it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20056/30000 [04:23<02:09, 76.90it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20064/30000 [04:23<02:09, 76.71it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20072/30000 [04:24<02:09, 76.71it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20080/30000 [04:24<02:08, 76.95it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20088/30000 [04:24<02:08, 77.23it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20096/30000 [04:24<02:08, 77.35it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20104/30000 [04:24<02:08, 77.26it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20112/30000 [04:24<02:07, 77.44it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20120/30000 [04:24<02:09, 76.30it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20128/30000 [04:24<02:08, 76.70it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20136/30000 [04:24<02:08, 76.77it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20144/30000 [04:24<02:08, 76.95it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20152/30000 [04:25<02:08, 76.52it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20160/30000 [04:25<02:08, 76.39it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20168/30000 [04:25<02:08, 76.55it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20176/30000 [04:25<02:07, 76.76it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20184/30000 [04:25<02:08, 76.48it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20192/30000 [04:25<02:07, 76.64it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20200/30000 [04:25<02:07, 76.60it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20208/30000 [04:25<02:07, 77.05it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20216/30000 [04:25<02:06, 77.19it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20224/30000 [04:26<02:07, 76.93it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20232/30000 [04:26<02:07, 76.40it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20240/30000 [04:26<02:07, 76.59it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 67%|██████▋   | 20248/30000 [04:26<02:07, 76.40it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20256/30000 [04:26<02:06, 76.78it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20264/30000 [04:26<02:06, 76.89it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20272/30000 [04:26<02:06, 77.07it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20280/30000 [04:26<02:05, 77.34it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20288/30000 [04:26<02:05, 77.36it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20296/30000 [04:26<02:05, 77.28it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20304/30000 [04:27<02:05, 77.12it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20312/30000 [04:27<02:05, 77.34it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20320/30000 [04:27<02:05, 76.99it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20328/30000 [04:27<02:05, 77.15it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20336/30000 [04:27<02:05, 77.02it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20344/30000 [04:27<02:04, 77.42it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20352/30000 [04:27<02:04, 77.30it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20360/30000 [04:27<02:04, 77.26it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20368/30000 [04:27<02:04, 77.20it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20376/30000 [04:28<02:04, 77.31it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20384/30000 [04:28<02:04, 77.14it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20392/30000 [04:28<02:04, 77.03it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20400/30000 [04:28<02:04, 76.85it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20408/30000 [04:28<02:05, 76.41it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20416/30000 [04:28<02:04, 76.75it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20424/30000 [04:28<02:04, 76.89it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20432/30000 [04:28<02:04, 76.70it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20440/30000 [04:28<02:04, 76.54it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20448/30000 [04:28<02:04, 76.66it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20456/30000 [04:29<02:03, 77.03it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20464/30000 [04:29<02:03, 76.97it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20472/30000 [04:29<02:03, 77.09it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20480/30000 [04:29<02:03, 76.82it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20488/30000 [04:29<02:03, 76.80it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20496/30000 [04:29<02:04, 76.43it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20504/30000 [04:29<02:03, 76.86it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20512/30000 [04:29<02:03, 76.99it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20520/30000 [04:29<02:03, 77.03it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20528/30000 [04:29<02:02, 77.03it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20536/30000 [04:30<02:02, 77.18it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 68%|██████▊   | 20544/30000 [04:30<02:03, 76.57it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20552/30000 [04:30<02:02, 77.06it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20560/30000 [04:30<02:02, 77.08it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20568/30000 [04:30<02:01, 77.44it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20576/30000 [04:30<02:01, 77.52it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20584/30000 [04:30<02:01, 77.29it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20592/30000 [04:30<02:01, 77.37it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20600/30000 [04:30<02:01, 77.20it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20608/30000 [04:31<02:01, 77.59it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20616/30000 [04:31<02:01, 76.93it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▊   | 20624/30000 [04:31<02:01, 77.25it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20632/30000 [04:31<02:00, 77.44it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20640/30000 [04:31<02:01, 77.13it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20648/30000 [04:31<02:01, 77.13it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20656/30000 [04:31<02:00, 77.37it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20664/30000 [04:31<02:00, 77.23it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20672/30000 [04:31<02:00, 77.23it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20680/30000 [04:31<01:59, 77.67it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20688/30000 [04:32<01:59, 77.68it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20696/30000 [04:32<02:00, 76.99it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20704/30000 [04:32<02:00, 77.06it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20712/30000 [04:32<02:00, 77.37it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20720/30000 [04:32<02:00, 77.26it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20728/30000 [04:32<02:00, 76.92it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20736/30000 [04:32<02:00, 77.04it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20744/30000 [04:32<01:59, 77.18it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20752/30000 [04:32<01:59, 77.09it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20760/30000 [04:32<01:59, 77.36it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20768/30000 [04:33<01:59, 77.51it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20776/30000 [04:33<01:58, 77.55it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20784/30000 [04:33<01:58, 77.47it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20792/30000 [04:33<01:58, 77.77it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20800/30000 [04:33<01:58, 77.70it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20808/30000 [04:33<01:58, 77.26it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20816/30000 [04:33<01:59, 77.05it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20824/30000 [04:33<01:59, 77.08it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20832/30000 [04:33<01:58, 77.18it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20840/30000 [04:34<01:59, 76.91it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 69%|██████▉   | 20848/30000 [04:34<01:58, 77.08it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20856/30000 [04:34<01:58, 77.09it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20864/30000 [04:34<01:58, 77.24it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20872/30000 [04:34<01:57, 77.37it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20880/30000 [04:34<01:57, 77.52it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20888/30000 [04:34<01:58, 77.19it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20896/30000 [04:34<01:57, 77.19it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20904/30000 [04:34<01:57, 77.21it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20912/30000 [04:34<01:57, 77.12it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20920/30000 [04:35<01:57, 77.23it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20928/30000 [04:35<01:57, 76.95it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20936/30000 [04:35<01:57, 77.14it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20944/30000 [04:35<01:57, 77.12it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20952/30000 [04:35<01:56, 77.39it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20960/30000 [04:35<01:57, 77.21it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20968/30000 [04:35<01:57, 76.80it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20976/30000 [04:35<01:57, 76.81it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20984/30000 [04:35<01:57, 77.04it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|██████▉   | 20992/30000 [04:35<01:57, 76.88it/s, init loss: 40160.1836, avg. loss [18001-19500]: 18451.4219] 70%|███████   | 21000/30000 [04:36<01:57, 76.50it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21008/30000 [04:36<01:57, 76.56it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21016/30000 [04:36<01:57, 76.63it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21024/30000 [04:36<01:56, 77.07it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21032/30000 [04:36<01:56, 76.92it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21040/30000 [04:36<01:56, 76.95it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21048/30000 [04:36<01:56, 77.02it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21056/30000 [04:36<01:56, 76.84it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21064/30000 [04:36<01:56, 76.51it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21072/30000 [04:37<01:56, 76.42it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21080/30000 [04:37<01:57, 76.14it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21088/30000 [04:37<01:56, 76.30it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21096/30000 [04:37<01:56, 76.62it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21104/30000 [04:37<01:55, 76.83it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21112/30000 [04:37<01:55, 77.12it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21120/30000 [04:37<01:55, 77.11it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21128/30000 [04:37<01:55, 76.97it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21136/30000 [04:37<01:55, 77.03it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 70%|███████   | 21144/30000 [04:37<01:55, 76.71it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21152/30000 [04:38<01:54, 77.27it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21160/30000 [04:38<01:54, 77.15it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21168/30000 [04:38<01:54, 77.33it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21176/30000 [04:38<01:53, 77.41it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21184/30000 [04:38<01:54, 77.32it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21192/30000 [04:38<01:54, 76.83it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21200/30000 [04:38<01:54, 76.91it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21208/30000 [04:38<01:54, 76.99it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21216/30000 [04:38<01:53, 77.12it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21224/30000 [04:39<01:54, 76.48it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21232/30000 [04:39<01:55, 76.10it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21240/30000 [04:39<01:54, 76.32it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21248/30000 [04:39<01:54, 76.49it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21256/30000 [04:39<01:53, 77.07it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21264/30000 [04:39<01:54, 76.50it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21272/30000 [04:39<01:53, 76.89it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21280/30000 [04:39<01:52, 77.35it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21288/30000 [04:39<01:52, 77.43it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21296/30000 [04:39<01:52, 77.50it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21304/30000 [04:40<01:52, 77.22it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21312/30000 [04:40<01:52, 77.11it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21320/30000 [04:40<01:52, 77.44it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21328/30000 [04:40<01:51, 77.52it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21336/30000 [04:40<01:51, 77.52it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21344/30000 [04:40<01:52, 77.05it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21352/30000 [04:40<01:52, 76.93it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21360/30000 [04:40<01:51, 77.15it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████   | 21368/30000 [04:40<01:52, 76.55it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21376/30000 [04:40<01:52, 76.53it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21384/30000 [04:41<01:52, 76.73it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21392/30000 [04:41<01:52, 76.48it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21400/30000 [04:41<01:52, 76.76it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21408/30000 [04:41<01:52, 76.68it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21416/30000 [04:41<01:51, 76.87it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21424/30000 [04:41<01:51, 76.77it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21432/30000 [04:41<01:52, 76.31it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21440/30000 [04:41<01:51, 76.54it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 71%|███████▏  | 21448/30000 [04:41<01:51, 76.68it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21456/30000 [04:42<01:51, 76.57it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21464/30000 [04:42<01:52, 76.17it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21472/30000 [04:42<01:51, 76.36it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21480/30000 [04:42<01:51, 76.67it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21488/30000 [04:42<01:50, 77.32it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21496/30000 [04:42<01:50, 76.85it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21504/30000 [04:42<01:50, 76.86it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21512/30000 [04:42<01:49, 77.49it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21520/30000 [04:42<01:49, 77.29it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21528/30000 [04:42<01:49, 77.59it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21536/30000 [04:43<01:49, 77.15it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21544/30000 [04:43<01:50, 76.84it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21552/30000 [04:43<01:49, 76.96it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21560/30000 [04:43<01:49, 76.96it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21568/30000 [04:43<01:49, 77.35it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21576/30000 [04:43<01:48, 77.57it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21584/30000 [04:43<01:48, 77.29it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21592/30000 [04:43<01:48, 77.31it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21600/30000 [04:43<01:48, 77.32it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21608/30000 [04:44<01:48, 77.14it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21616/30000 [04:44<01:48, 77.07it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21624/30000 [04:44<01:48, 77.23it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21632/30000 [04:44<01:48, 77.01it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21640/30000 [04:44<01:48, 76.78it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21648/30000 [04:44<01:47, 77.46it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21656/30000 [04:44<01:47, 77.49it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21664/30000 [04:44<01:47, 77.42it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21672/30000 [04:44<01:48, 77.09it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21680/30000 [04:44<01:47, 77.10it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21688/30000 [04:45<01:48, 76.66it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21696/30000 [04:45<01:48, 76.31it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21704/30000 [04:45<01:48, 76.49it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21712/30000 [04:45<01:48, 76.31it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21720/30000 [04:45<01:48, 76.41it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21728/30000 [04:45<01:47, 76.88it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21736/30000 [04:45<01:47, 77.23it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 72%|███████▏  | 21744/30000 [04:45<01:47, 77.01it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21752/30000 [04:45<01:46, 77.14it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21760/30000 [04:45<01:47, 76.98it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21768/30000 [04:46<01:46, 77.09it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21776/30000 [04:46<01:46, 77.02it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21784/30000 [04:46<01:46, 77.38it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21792/30000 [04:46<01:46, 77.30it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21800/30000 [04:46<01:46, 77.06it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21808/30000 [04:46<01:45, 77.36it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21816/30000 [04:46<01:45, 77.68it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21824/30000 [04:46<01:46, 76.94it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21832/30000 [04:46<01:45, 77.23it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21840/30000 [04:47<01:45, 77.16it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21848/30000 [04:47<01:46, 76.83it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21856/30000 [04:47<01:45, 77.06it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21864/30000 [04:47<01:45, 77.08it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21872/30000 [04:47<01:46, 76.65it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21880/30000 [04:47<01:45, 76.86it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21888/30000 [04:47<01:45, 77.19it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21896/30000 [04:47<01:45, 77.13it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21904/30000 [04:47<01:44, 77.16it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21912/30000 [04:47<01:44, 77.27it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21920/30000 [04:48<01:44, 77.13it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21928/30000 [04:48<01:44, 77.31it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21936/30000 [04:48<01:44, 77.26it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21944/30000 [04:48<01:44, 76.98it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21952/30000 [04:48<01:44, 76.94it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21960/30000 [04:48<01:44, 77.22it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21968/30000 [04:48<01:44, 76.95it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21976/30000 [04:48<01:43, 77.25it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21984/30000 [04:48<01:43, 77.44it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 21992/30000 [04:48<01:43, 77.23it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 22000/30000 [04:49<01:43, 77.00it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 22008/30000 [04:49<01:44, 76.78it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 22016/30000 [04:49<01:43, 76.82it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 22024/30000 [04:49<01:43, 76.95it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 22032/30000 [04:49<01:43, 77.08it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 22040/30000 [04:49<01:43, 77.18it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 73%|███████▎  | 22048/30000 [04:49<01:43, 77.08it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22056/30000 [04:49<01:43, 77.03it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22064/30000 [04:49<01:42, 77.12it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22072/30000 [04:50<01:42, 76.99it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22080/30000 [04:50<01:43, 76.81it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22088/30000 [04:50<01:42, 77.22it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22096/30000 [04:50<01:42, 76.74it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22104/30000 [04:50<01:42, 77.10it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22112/30000 [04:50<01:42, 77.30it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▎  | 22120/30000 [04:50<01:42, 77.01it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22128/30000 [04:50<01:41, 77.24it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22136/30000 [04:50<01:41, 77.63it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22144/30000 [04:50<01:41, 77.69it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22152/30000 [04:51<01:41, 77.34it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22160/30000 [04:51<01:42, 76.72it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22168/30000 [04:51<01:41, 77.06it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22176/30000 [04:51<01:42, 76.65it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22184/30000 [04:51<01:42, 76.54it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22192/30000 [04:51<01:41, 76.58it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22200/30000 [04:51<01:41, 77.02it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22208/30000 [04:51<01:41, 76.50it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22216/30000 [04:51<01:41, 76.52it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22224/30000 [04:52<01:41, 76.86it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22232/30000 [04:52<01:41, 76.90it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22240/30000 [04:52<01:41, 76.73it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22248/30000 [04:52<01:41, 76.75it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22256/30000 [04:52<01:40, 76.81it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22264/30000 [04:52<01:40, 77.09it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22272/30000 [04:52<01:40, 76.85it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22280/30000 [04:52<01:40, 76.70it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22288/30000 [04:52<01:41, 76.23it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22296/30000 [04:52<01:40, 76.82it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22304/30000 [04:53<01:39, 76.98it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22312/30000 [04:53<01:40, 76.84it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22320/30000 [04:53<01:39, 76.88it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22328/30000 [04:53<01:39, 77.01it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22336/30000 [04:53<01:39, 77.08it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 74%|███████▍  | 22344/30000 [04:53<01:39, 76.73it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22352/30000 [04:53<01:39, 77.01it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22360/30000 [04:53<01:39, 77.10it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22368/30000 [04:53<01:39, 76.90it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22376/30000 [04:53<01:38, 77.38it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22384/30000 [04:54<01:38, 77.09it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22392/30000 [04:54<01:38, 77.01it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22400/30000 [04:54<01:38, 77.16it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22408/30000 [04:54<01:38, 76.95it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22416/30000 [04:54<01:39, 76.59it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22424/30000 [04:54<01:38, 76.97it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22432/30000 [04:54<01:37, 77.27it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22440/30000 [04:54<01:37, 77.37it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22448/30000 [04:54<01:37, 77.39it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22456/30000 [04:55<01:37, 77.30it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22464/30000 [04:55<01:37, 77.23it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22472/30000 [04:55<01:37, 77.23it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22480/30000 [04:55<01:37, 76.93it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22488/30000 [04:55<01:37, 77.25it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▍  | 22496/30000 [04:55<01:37, 77.09it/s, init loss: 40160.1836, avg. loss [19501-21000]: 18321.1660] 75%|███████▌  | 22504/30000 [04:55<01:38, 76.40it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22512/30000 [04:55<01:37, 76.63it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22520/30000 [04:55<01:36, 77.20it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22528/30000 [04:55<01:37, 76.61it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22536/30000 [04:56<01:37, 76.56it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22544/30000 [04:56<01:37, 76.64it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22552/30000 [04:56<01:37, 76.68it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22560/30000 [04:56<01:36, 77.06it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22568/30000 [04:56<01:36, 76.96it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22576/30000 [04:56<01:36, 77.01it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22584/30000 [04:56<01:36, 77.12it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22592/30000 [04:56<01:35, 77.18it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22600/30000 [04:56<01:36, 76.91it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22608/30000 [04:56<01:36, 76.72it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22616/30000 [04:57<01:35, 77.04it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22624/30000 [04:57<01:35, 77.06it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22632/30000 [04:57<01:35, 77.36it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22640/30000 [04:57<01:35, 77.43it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 75%|███████▌  | 22648/30000 [04:57<01:35, 76.78it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22656/30000 [04:57<01:35, 77.26it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22664/30000 [04:57<01:35, 77.08it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22672/30000 [04:57<01:35, 76.73it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22680/30000 [04:57<01:35, 77.01it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22688/30000 [04:58<01:34, 77.12it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22696/30000 [04:58<01:35, 76.87it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22704/30000 [04:58<01:34, 77.13it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22712/30000 [04:58<01:34, 77.25it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22720/30000 [04:58<01:34, 77.13it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22728/30000 [04:58<01:34, 76.66it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22736/30000 [04:58<01:34, 76.93it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22744/30000 [04:58<01:34, 76.97it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22752/30000 [04:58<01:33, 77.27it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22760/30000 [04:58<01:33, 77.40it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22768/30000 [04:59<01:33, 77.29it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22776/30000 [04:59<01:34, 76.58it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22784/30000 [04:59<01:34, 76.63it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22792/30000 [04:59<01:33, 76.81it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22800/30000 [04:59<01:33, 77.05it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22808/30000 [04:59<01:33, 76.98it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22816/30000 [04:59<01:33, 77.12it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22824/30000 [04:59<01:33, 76.62it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22832/30000 [04:59<01:33, 76.63it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22840/30000 [05:00<01:33, 76.63it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22848/30000 [05:00<01:33, 76.24it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22856/30000 [05:00<01:33, 76.29it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22864/30000 [05:00<01:33, 76.70it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▌  | 22872/30000 [05:00<01:32, 76.98it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22880/30000 [05:00<01:32, 76.79it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22888/30000 [05:00<01:32, 76.97it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22896/30000 [05:00<01:32, 76.71it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22904/30000 [05:00<01:31, 77.16it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22912/30000 [05:00<01:32, 76.76it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22920/30000 [05:01<01:32, 76.63it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22928/30000 [05:01<01:32, 76.26it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22936/30000 [05:01<01:32, 76.46it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 76%|███████▋  | 22944/30000 [05:01<01:31, 77.01it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 22952/30000 [05:01<01:32, 76.60it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 22960/30000 [05:01<01:31, 76.88it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 22968/30000 [05:01<01:31, 77.10it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 22976/30000 [05:01<01:31, 76.90it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 22984/30000 [05:01<01:30, 77.29it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 22992/30000 [05:01<01:30, 77.30it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23000/30000 [05:02<01:30, 77.38it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23008/30000 [05:02<01:30, 76.99it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23016/30000 [05:02<01:30, 77.26it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23024/30000 [05:02<01:30, 77.28it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23032/30000 [05:02<01:30, 76.70it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23040/30000 [05:02<01:30, 76.52it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23048/30000 [05:02<01:30, 76.76it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23056/30000 [05:02<01:30, 76.65it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23064/30000 [05:02<01:30, 76.75it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23072/30000 [05:03<01:30, 76.95it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23080/30000 [05:03<01:30, 76.45it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23088/30000 [05:03<01:29, 76.80it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23096/30000 [05:03<01:29, 77.06it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23104/30000 [05:03<01:28, 77.56it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23112/30000 [05:03<01:28, 77.41it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23120/30000 [05:03<01:29, 76.83it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23128/30000 [05:03<01:28, 77.26it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23136/30000 [05:03<01:28, 77.32it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23144/30000 [05:03<01:28, 77.52it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23152/30000 [05:04<01:28, 77.62it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23160/30000 [05:04<01:28, 77.12it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23168/30000 [05:04<01:28, 76.90it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23176/30000 [05:04<01:28, 76.70it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23184/30000 [05:04<01:28, 76.69it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23192/30000 [05:04<01:29, 76.15it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23200/30000 [05:04<01:28, 76.44it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23208/30000 [05:04<01:29, 76.24it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23216/30000 [05:04<01:28, 76.91it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23224/30000 [05:05<01:28, 76.70it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23232/30000 [05:05<01:28, 76.32it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23240/30000 [05:05<01:28, 76.65it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 77%|███████▋  | 23248/30000 [05:05<01:27, 77.10it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23256/30000 [05:05<01:27, 77.27it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23264/30000 [05:05<01:27, 77.37it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23272/30000 [05:05<01:27, 77.18it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23280/30000 [05:05<01:27, 76.67it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23288/30000 [05:05<01:27, 76.64it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23296/30000 [05:05<01:27, 76.90it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23304/30000 [05:06<01:27, 76.89it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23312/30000 [05:06<01:26, 77.10it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23320/30000 [05:06<01:26, 76.89it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23328/30000 [05:06<01:26, 77.25it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23336/30000 [05:06<01:26, 76.98it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23344/30000 [05:06<01:26, 76.61it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23352/30000 [05:06<01:26, 76.76it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23360/30000 [05:06<01:26, 76.82it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23368/30000 [05:06<01:26, 76.87it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23376/30000 [05:06<01:26, 76.78it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23384/30000 [05:07<01:26, 76.76it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23392/30000 [05:07<01:26, 76.23it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23400/30000 [05:07<01:26, 76.24it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23408/30000 [05:07<01:26, 76.16it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23416/30000 [05:07<01:25, 76.68it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23424/30000 [05:07<01:25, 76.71it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23432/30000 [05:07<01:24, 77.32it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23440/30000 [05:07<01:24, 77.47it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23448/30000 [05:07<01:24, 77.33it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23456/30000 [05:08<01:25, 76.84it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23464/30000 [05:08<01:25, 76.44it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23472/30000 [05:08<01:24, 76.82it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23480/30000 [05:08<01:24, 77.03it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23488/30000 [05:08<01:24, 77.25it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23496/30000 [05:08<01:24, 77.35it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23504/30000 [05:08<01:23, 77.48it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23512/30000 [05:08<01:23, 77.72it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23520/30000 [05:08<01:23, 77.54it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23528/30000 [05:08<01:23, 77.20it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23536/30000 [05:09<01:23, 77.28it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 78%|███████▊  | 23544/30000 [05:09<01:23, 77.18it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23552/30000 [05:09<01:23, 77.49it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23560/30000 [05:09<01:23, 77.41it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23568/30000 [05:09<01:22, 77.54it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23576/30000 [05:09<01:22, 77.64it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23584/30000 [05:09<01:22, 77.86it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23592/30000 [05:09<01:22, 77.71it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23600/30000 [05:09<01:22, 77.91it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23608/30000 [05:09<01:21, 77.96it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23616/30000 [05:10<01:22, 77.74it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▊  | 23624/30000 [05:10<01:22, 77.52it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23632/30000 [05:10<01:22, 77.45it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23640/30000 [05:10<01:22, 77.47it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23648/30000 [05:10<01:22, 77.31it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23656/30000 [05:10<01:22, 76.94it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23664/30000 [05:10<01:22, 76.90it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23672/30000 [05:10<01:22, 77.14it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23680/30000 [05:10<01:22, 76.55it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23688/30000 [05:11<01:22, 76.67it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23696/30000 [05:11<01:22, 76.21it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23704/30000 [05:11<01:22, 76.53it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23712/30000 [05:11<01:22, 76.65it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23720/30000 [05:11<01:21, 76.65it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23728/30000 [05:11<01:21, 76.52it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23736/30000 [05:11<01:21, 76.43it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23744/30000 [05:11<01:21, 76.47it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23752/30000 [05:11<01:21, 76.69it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23760/30000 [05:11<01:21, 76.54it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23768/30000 [05:12<01:21, 76.87it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23776/30000 [05:12<01:21, 76.74it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23784/30000 [05:12<01:20, 77.26it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23792/30000 [05:12<01:20, 77.19it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23800/30000 [05:12<01:20, 77.26it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23808/30000 [05:12<01:20, 76.55it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23816/30000 [05:12<01:20, 76.36it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23824/30000 [05:12<01:21, 76.14it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23832/30000 [05:12<01:20, 77.02it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23840/30000 [05:13<01:19, 77.25it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 79%|███████▉  | 23848/30000 [05:13<01:20, 76.87it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23856/30000 [05:13<01:19, 77.13it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23864/30000 [05:13<01:19, 77.12it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23872/30000 [05:13<01:19, 77.04it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23880/30000 [05:13<01:19, 76.91it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23888/30000 [05:13<01:19, 77.20it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23896/30000 [05:13<01:18, 77.43it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23904/30000 [05:13<01:18, 77.39it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23912/30000 [05:13<01:19, 76.98it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23920/30000 [05:14<01:19, 76.26it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23928/30000 [05:14<01:19, 76.45it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23936/30000 [05:14<01:19, 76.53it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23944/30000 [05:14<01:19, 76.52it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23952/30000 [05:14<01:18, 76.97it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23960/30000 [05:14<01:18, 77.36it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23968/30000 [05:14<01:17, 77.40it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23976/30000 [05:14<01:18, 77.19it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23984/30000 [05:14<01:17, 77.22it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|███████▉  | 23992/30000 [05:14<01:17, 77.30it/s, init loss: 40160.1836, avg. loss [21001-22500]: 18216.8105] 80%|████████  | 24000/30000 [05:15<01:17, 76.93it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24008/30000 [05:15<01:17, 77.20it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24016/30000 [05:15<01:17, 77.34it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24024/30000 [05:15<01:17, 77.38it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24032/30000 [05:15<01:16, 77.73it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24040/30000 [05:15<01:16, 78.03it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24048/30000 [05:15<01:16, 77.94it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24056/30000 [05:15<01:16, 78.07it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24064/30000 [05:15<01:16, 78.10it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24072/30000 [05:16<01:16, 77.12it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24080/30000 [05:16<01:16, 76.93it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24088/30000 [05:16<01:16, 76.95it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24096/30000 [05:16<01:16, 77.01it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24104/30000 [05:16<01:16, 76.67it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24112/30000 [05:16<01:16, 76.76it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24120/30000 [05:16<01:16, 76.88it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24128/30000 [05:16<01:16, 76.78it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24136/30000 [05:16<01:15, 77.36it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 80%|████████  | 24144/30000 [05:16<01:15, 77.12it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24152/30000 [05:17<01:15, 77.54it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24160/30000 [05:17<01:15, 77.22it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24168/30000 [05:17<01:15, 77.31it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24176/30000 [05:17<01:15, 77.23it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24184/30000 [05:17<01:15, 77.11it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24192/30000 [05:17<01:15, 76.95it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24200/30000 [05:17<01:15, 76.94it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24208/30000 [05:17<01:15, 76.87it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24216/30000 [05:17<01:14, 77.26it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24224/30000 [05:17<01:14, 77.05it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24232/30000 [05:18<01:14, 77.16it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24240/30000 [05:18<01:14, 76.81it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24248/30000 [05:18<01:14, 77.18it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24256/30000 [05:18<01:14, 77.20it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24264/30000 [05:18<01:14, 77.05it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24272/30000 [05:18<01:14, 77.34it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24280/30000 [05:18<01:14, 77.06it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24288/30000 [05:18<01:13, 77.37it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24296/30000 [05:18<01:14, 76.98it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24304/30000 [05:19<01:13, 77.00it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24312/30000 [05:19<01:13, 76.98it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24320/30000 [05:19<01:13, 77.12it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24328/30000 [05:19<01:13, 76.76it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24336/30000 [05:19<01:13, 76.87it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24344/30000 [05:19<01:13, 77.06it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24352/30000 [05:19<01:13, 77.24it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24360/30000 [05:19<01:12, 77.52it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████  | 24368/30000 [05:19<01:12, 77.61it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24376/30000 [05:19<01:12, 77.59it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24384/30000 [05:20<01:12, 77.57it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24392/30000 [05:20<01:12, 77.36it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24400/30000 [05:20<01:12, 77.25it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24408/30000 [05:20<01:12, 77.07it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24416/30000 [05:20<01:12, 77.27it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24424/30000 [05:20<01:11, 77.51it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24432/30000 [05:20<01:11, 77.46it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24440/30000 [05:20<01:11, 77.27it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 81%|████████▏ | 24448/30000 [05:20<01:11, 77.29it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24456/30000 [05:20<01:11, 77.20it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24464/30000 [05:21<01:11, 77.43it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24472/30000 [05:21<01:11, 76.94it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24480/30000 [05:21<01:11, 77.06it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24488/30000 [05:21<01:11, 77.32it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24496/30000 [05:21<01:11, 77.00it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24504/30000 [05:21<01:11, 76.96it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24512/30000 [05:21<01:11, 76.86it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24520/30000 [05:21<01:11, 76.44it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24528/30000 [05:21<01:11, 76.67it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24536/30000 [05:22<01:10, 76.98it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24544/30000 [05:22<01:11, 76.78it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24552/30000 [05:22<01:10, 76.91it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24560/30000 [05:22<01:10, 77.03it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24568/30000 [05:22<01:10, 77.05it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24576/30000 [05:22<01:10, 77.10it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24584/30000 [05:22<01:10, 76.32it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24592/30000 [05:22<01:10, 76.71it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24600/30000 [05:22<01:10, 76.97it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24608/30000 [05:22<01:10, 76.80it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24616/30000 [05:23<01:09, 77.12it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24624/30000 [05:23<01:09, 76.86it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24632/30000 [05:23<01:09, 76.94it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24640/30000 [05:23<01:09, 77.02it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24648/30000 [05:23<01:09, 76.63it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24656/30000 [05:23<01:09, 76.64it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24664/30000 [05:23<01:09, 77.01it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24672/30000 [05:23<01:09, 77.16it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24680/30000 [05:23<01:09, 76.85it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24688/30000 [05:23<01:08, 77.28it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24696/30000 [05:24<01:08, 77.03it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24704/30000 [05:24<01:09, 76.55it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24712/30000 [05:24<01:08, 76.95it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24720/30000 [05:24<01:08, 77.02it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24728/30000 [05:24<01:08, 77.07it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24736/30000 [05:24<01:08, 76.76it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 82%|████████▏ | 24744/30000 [05:24<01:08, 76.66it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24752/30000 [05:24<01:08, 77.04it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24760/30000 [05:24<01:07, 77.16it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24768/30000 [05:25<01:07, 77.35it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24776/30000 [05:25<01:07, 77.11it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24784/30000 [05:25<01:07, 77.41it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24792/30000 [05:25<01:07, 77.40it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24800/30000 [05:25<01:07, 77.48it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24808/30000 [05:25<01:07, 77.16it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24816/30000 [05:25<01:07, 77.21it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24824/30000 [05:25<01:06, 77.51it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24832/30000 [05:25<01:06, 77.78it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24840/30000 [05:25<01:06, 77.16it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24848/30000 [05:26<01:06, 77.53it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24856/30000 [05:26<01:06, 77.17it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24864/30000 [05:26<01:06, 76.93it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24872/30000 [05:26<01:06, 77.09it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24880/30000 [05:26<01:06, 76.88it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24888/30000 [05:26<01:06, 77.15it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24896/30000 [05:26<01:05, 77.59it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24904/30000 [05:26<01:05, 77.68it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24912/30000 [05:26<01:05, 77.26it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24920/30000 [05:27<01:06, 76.64it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24928/30000 [05:27<01:06, 76.63it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24936/30000 [05:27<01:06, 76.68it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24944/30000 [05:27<01:05, 76.82it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24952/30000 [05:27<01:05, 77.13it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24960/30000 [05:27<01:05, 76.95it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24968/30000 [05:27<01:05, 76.78it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24976/30000 [05:27<01:05, 76.95it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24984/30000 [05:27<01:05, 76.58it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 24992/30000 [05:27<01:05, 76.87it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 25000/30000 [05:28<01:04, 77.00it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 25008/30000 [05:28<01:05, 76.46it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 25016/30000 [05:28<01:05, 76.65it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 25024/30000 [05:28<01:04, 76.64it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 25032/30000 [05:28<01:04, 77.09it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 25040/30000 [05:28<01:04, 76.83it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 83%|████████▎ | 25048/30000 [05:28<01:04, 76.59it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25056/30000 [05:28<01:04, 76.70it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25064/30000 [05:28<01:04, 77.08it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25072/30000 [05:28<01:04, 76.76it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25080/30000 [05:29<01:04, 76.77it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25088/30000 [05:29<01:04, 76.60it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25096/30000 [05:29<01:03, 76.84it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25104/30000 [05:29<01:04, 76.49it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25112/30000 [05:29<01:03, 76.43it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▎ | 25120/30000 [05:29<01:03, 76.40it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25128/30000 [05:29<01:03, 76.37it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25136/30000 [05:29<01:03, 76.72it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25144/30000 [05:29<01:03, 76.82it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25152/30000 [05:30<01:02, 77.07it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25160/30000 [05:30<01:02, 76.89it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25168/30000 [05:30<01:02, 77.41it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25176/30000 [05:30<01:02, 77.08it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25184/30000 [05:30<01:02, 77.02it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25192/30000 [05:30<01:02, 77.02it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25200/30000 [05:30<01:02, 77.29it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25208/30000 [05:30<01:02, 77.25it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25216/30000 [05:30<01:01, 77.57it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25224/30000 [05:30<01:01, 77.55it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25232/30000 [05:31<01:01, 77.33it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25240/30000 [05:31<01:01, 77.19it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25248/30000 [05:31<01:01, 77.27it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25256/30000 [05:31<01:01, 77.16it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25264/30000 [05:31<01:01, 77.12it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25272/30000 [05:31<01:01, 76.90it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25280/30000 [05:31<01:01, 77.18it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25288/30000 [05:31<01:01, 76.78it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25296/30000 [05:31<01:01, 76.58it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25304/30000 [05:32<01:01, 76.49it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25312/30000 [05:32<01:01, 76.45it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25320/30000 [05:32<01:00, 76.77it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25328/30000 [05:32<01:00, 76.89it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25336/30000 [05:32<01:00, 76.88it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 84%|████████▍ | 25344/30000 [05:32<01:00, 76.41it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25352/30000 [05:32<01:00, 76.61it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25360/30000 [05:32<01:00, 76.90it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25368/30000 [05:32<01:00, 76.56it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25376/30000 [05:32<01:00, 76.84it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25384/30000 [05:33<00:59, 77.13it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25392/30000 [05:33<00:59, 77.04it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25400/30000 [05:33<00:59, 76.74it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25408/30000 [05:33<00:59, 76.57it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25416/30000 [05:33<00:59, 76.42it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25424/30000 [05:33<00:59, 76.71it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25432/30000 [05:33<00:59, 76.96it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25440/30000 [05:33<00:59, 77.03it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25448/30000 [05:33<00:59, 76.78it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25456/30000 [05:33<00:59, 76.97it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25464/30000 [05:34<00:58, 77.37it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25472/30000 [05:34<00:58, 77.08it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25480/30000 [05:34<00:58, 77.02it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25488/30000 [05:34<00:58, 77.40it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▍ | 25496/30000 [05:34<00:58, 77.04it/s, init loss: 40160.1836, avg. loss [22501-24000]: 18140.5430] 85%|████████▌ | 25504/30000 [05:34<00:58, 76.88it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25512/30000 [05:34<00:58, 76.73it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25520/30000 [05:34<00:58, 76.81it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25528/30000 [05:34<00:57, 77.12it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25536/30000 [05:35<00:58, 76.74it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25544/30000 [05:35<00:58, 76.08it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25552/30000 [05:35<00:58, 76.44it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25560/30000 [05:35<00:58, 76.53it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25568/30000 [05:35<00:57, 76.63it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25576/30000 [05:35<00:57, 76.48it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25584/30000 [05:35<00:57, 76.84it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25592/30000 [05:35<00:57, 77.00it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25600/30000 [05:35<00:57, 77.14it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25608/30000 [05:35<00:56, 77.25it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25616/30000 [05:36<00:56, 77.39it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25624/30000 [05:36<00:56, 76.89it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25632/30000 [05:36<00:56, 77.26it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25640/30000 [05:36<00:56, 77.45it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 85%|████████▌ | 25648/30000 [05:36<00:56, 76.72it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25656/30000 [05:36<00:56, 76.65it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25664/30000 [05:36<00:56, 76.89it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25672/30000 [05:36<00:56, 76.37it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25680/30000 [05:36<00:56, 76.48it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25688/30000 [05:36<00:56, 76.80it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25696/30000 [05:37<00:56, 76.80it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25704/30000 [05:37<00:55, 76.90it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25712/30000 [05:37<00:55, 77.09it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25720/30000 [05:37<00:55, 77.02it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25728/30000 [05:37<00:55, 76.91it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25736/30000 [05:37<00:55, 77.00it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25744/30000 [05:37<00:55, 77.30it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25752/30000 [05:37<00:54, 77.28it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25760/30000 [05:37<00:55, 77.07it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25768/30000 [05:38<00:54, 76.97it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25776/30000 [05:38<00:55, 76.41it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25784/30000 [05:38<00:54, 76.86it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25792/30000 [05:38<00:54, 77.14it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25800/30000 [05:38<00:54, 76.93it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25808/30000 [05:38<00:54, 77.08it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25816/30000 [05:38<00:54, 77.30it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25824/30000 [05:38<00:54, 76.47it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25832/30000 [05:38<00:54, 76.54it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25840/30000 [05:38<00:54, 77.04it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25848/30000 [05:39<00:53, 77.23it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25856/30000 [05:39<00:54, 76.61it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25864/30000 [05:39<00:54, 76.48it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▌ | 25872/30000 [05:39<00:54, 76.41it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25880/30000 [05:39<00:53, 76.42it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25888/30000 [05:39<00:53, 76.47it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25896/30000 [05:39<00:53, 76.45it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25904/30000 [05:39<00:53, 76.21it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25912/30000 [05:39<00:53, 76.09it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25920/30000 [05:40<00:53, 76.04it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25928/30000 [05:40<00:53, 75.76it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25936/30000 [05:40<00:53, 76.28it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 86%|████████▋ | 25944/30000 [05:40<00:52, 76.74it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 25952/30000 [05:40<00:52, 76.81it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 25960/30000 [05:40<00:52, 76.77it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 25968/30000 [05:40<00:52, 76.88it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 25976/30000 [05:40<00:52, 77.34it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 25984/30000 [05:40<00:51, 77.69it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 25992/30000 [05:40<00:51, 77.44it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26000/30000 [05:41<00:51, 77.27it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26008/30000 [05:41<00:52, 76.58it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26016/30000 [05:41<00:52, 76.61it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26024/30000 [05:41<00:51, 76.95it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26032/30000 [05:41<00:51, 77.35it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26040/30000 [05:41<00:50, 77.65it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26048/30000 [05:41<00:50, 77.70it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26056/30000 [05:41<00:50, 77.43it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26064/30000 [05:41<00:51, 76.88it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26072/30000 [05:41<00:51, 76.52it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26080/30000 [05:42<00:50, 77.22it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26088/30000 [05:42<00:50, 77.19it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26096/30000 [05:42<00:51, 76.38it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26104/30000 [05:42<00:50, 76.66it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26112/30000 [05:42<00:50, 77.21it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26120/30000 [05:42<00:50, 77.59it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26128/30000 [05:42<00:49, 77.60it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26136/30000 [05:42<00:49, 77.73it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26144/30000 [05:42<00:49, 77.74it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26152/30000 [05:43<00:49, 77.86it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26160/30000 [05:43<00:49, 77.48it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26168/30000 [05:43<00:49, 77.53it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26176/30000 [05:43<00:49, 77.22it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26184/30000 [05:43<00:49, 77.14it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26192/30000 [05:43<00:49, 76.94it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26200/30000 [05:43<00:49, 77.13it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26208/30000 [05:43<00:48, 77.60it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26216/30000 [05:43<00:48, 77.42it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26224/30000 [05:43<00:48, 77.32it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26232/30000 [05:44<00:48, 77.42it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26240/30000 [05:44<00:49, 76.69it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 87%|████████▋ | 26248/30000 [05:44<00:48, 76.61it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26256/30000 [05:44<00:48, 76.98it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26264/30000 [05:44<00:48, 77.06it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26272/30000 [05:44<00:48, 77.43it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26280/30000 [05:44<00:48, 77.14it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26288/30000 [05:44<00:47, 77.66it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26296/30000 [05:44<00:47, 77.19it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26304/30000 [05:44<00:47, 77.23it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26312/30000 [05:45<00:48, 76.54it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26320/30000 [05:45<00:48, 76.39it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26328/30000 [05:45<00:47, 76.53it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26336/30000 [05:45<00:47, 76.86it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26344/30000 [05:45<00:47, 76.81it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26352/30000 [05:45<00:47, 76.93it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26360/30000 [05:45<00:47, 77.31it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26368/30000 [05:45<00:46, 77.38it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26376/30000 [05:45<00:46, 77.21it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26384/30000 [05:46<00:47, 76.69it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26392/30000 [05:46<00:47, 76.58it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26400/30000 [05:46<00:46, 76.68it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26408/30000 [05:46<00:46, 76.97it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26416/30000 [05:46<00:46, 77.33it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26424/30000 [05:46<00:46, 77.22it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26432/30000 [05:46<00:46, 76.99it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26440/30000 [05:46<00:46, 76.96it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26448/30000 [05:46<00:46, 77.11it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26456/30000 [05:46<00:45, 77.57it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26464/30000 [05:47<00:45, 77.49it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26472/30000 [05:47<00:45, 77.22it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26480/30000 [05:47<00:45, 77.41it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26488/30000 [05:47<00:45, 77.63it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26496/30000 [05:47<00:45, 77.44it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26504/30000 [05:47<00:45, 76.90it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26512/30000 [05:47<00:44, 77.55it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26520/30000 [05:47<00:44, 77.69it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26528/30000 [05:47<00:44, 77.55it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26536/30000 [05:48<00:44, 78.03it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 88%|████████▊ | 26544/30000 [05:48<00:44, 77.55it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26552/30000 [05:48<00:44, 77.29it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26560/30000 [05:48<00:44, 77.34it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26568/30000 [05:48<00:44, 77.18it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26576/30000 [05:48<00:44, 77.41it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26584/30000 [05:48<00:44, 77.13it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26592/30000 [05:48<00:44, 77.43it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26600/30000 [05:48<00:43, 77.40it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26608/30000 [05:48<00:43, 77.71it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26616/30000 [05:49<00:43, 77.87it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▊ | 26624/30000 [05:49<00:43, 77.15it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26632/30000 [05:49<00:43, 77.33it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26640/30000 [05:49<00:43, 77.06it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26648/30000 [05:49<00:43, 77.16it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26656/30000 [05:49<00:43, 77.27it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26664/30000 [05:49<00:43, 77.46it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26672/30000 [05:49<00:43, 77.06it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26680/30000 [05:49<00:43, 77.12it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26688/30000 [05:49<00:42, 77.37it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26696/30000 [05:50<00:42, 77.43it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26704/30000 [05:50<00:42, 76.77it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26712/30000 [05:50<00:42, 77.00it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26720/30000 [05:50<00:42, 76.74it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26728/30000 [05:50<00:42, 76.65it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26736/30000 [05:50<00:42, 76.56it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26744/30000 [05:50<00:42, 76.72it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26752/30000 [05:50<00:42, 76.80it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26760/30000 [05:50<00:42, 76.79it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26768/30000 [05:51<00:42, 76.60it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26776/30000 [05:51<00:41, 76.79it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26784/30000 [05:51<00:41, 77.09it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26792/30000 [05:51<00:41, 76.60it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26800/30000 [05:51<00:41, 76.76it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26808/30000 [05:51<00:41, 76.79it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26816/30000 [05:51<00:41, 77.24it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26824/30000 [05:51<00:41, 77.30it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26832/30000 [05:51<00:41, 77.25it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26840/30000 [05:51<00:41, 76.80it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 89%|████████▉ | 26848/30000 [05:52<00:40, 77.35it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26856/30000 [05:52<00:41, 76.34it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26864/30000 [05:52<00:40, 76.50it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26872/30000 [05:52<00:41, 76.10it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26880/30000 [05:52<00:40, 76.41it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26888/30000 [05:52<00:40, 76.64it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26896/30000 [05:52<00:40, 76.40it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26904/30000 [05:52<00:40, 76.95it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26912/30000 [05:52<00:40, 77.02it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26920/30000 [05:52<00:39, 77.47it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26928/30000 [05:53<00:39, 77.20it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26936/30000 [05:53<00:39, 76.92it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26944/30000 [05:53<00:39, 77.40it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26952/30000 [05:53<00:39, 77.58it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26960/30000 [05:53<00:39, 77.49it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26968/30000 [05:53<00:39, 77.67it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26976/30000 [05:53<00:38, 78.01it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26984/30000 [05:53<00:38, 77.38it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|████████▉ | 26992/30000 [05:53<00:38, 77.16it/s, init loss: 40160.1836, avg. loss [24001-25500]: 18080.8086] 90%|█████████ | 27000/30000 [05:54<00:38, 77.37it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27008/30000 [05:54<00:38, 76.92it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27016/30000 [05:54<00:38, 77.29it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27024/30000 [05:54<00:38, 77.13it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27032/30000 [05:54<00:38, 77.39it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27040/30000 [05:54<00:38, 77.08it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27048/30000 [05:54<00:38, 76.84it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27056/30000 [05:54<00:38, 76.87it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27064/30000 [05:54<00:38, 76.62it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27072/30000 [05:54<00:38, 76.80it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27080/30000 [05:55<00:37, 77.26it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27088/30000 [05:55<00:37, 76.73it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27096/30000 [05:55<00:37, 76.71it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27104/30000 [05:55<00:37, 76.78it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27112/30000 [05:55<00:37, 76.85it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27120/30000 [05:55<00:37, 76.80it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27128/30000 [05:55<00:37, 76.66it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27136/30000 [05:55<00:37, 77.11it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 90%|█████████ | 27144/30000 [05:55<00:37, 76.92it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27152/30000 [05:56<00:37, 76.63it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27160/30000 [05:56<00:36, 76.87it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27168/30000 [05:56<00:36, 77.24it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27176/30000 [05:56<00:36, 77.33it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27184/30000 [05:56<00:36, 77.11it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27192/30000 [05:56<00:36, 77.13it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27200/30000 [05:56<00:36, 77.49it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27208/30000 [05:56<00:35, 77.57it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27216/30000 [05:56<00:35, 77.71it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27224/30000 [05:56<00:35, 77.67it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27232/30000 [05:57<00:35, 77.88it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27240/30000 [05:57<00:35, 77.78it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27248/30000 [05:57<00:35, 77.65it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27256/30000 [05:57<00:35, 77.53it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27264/30000 [05:57<00:35, 77.36it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27272/30000 [05:57<00:35, 77.14it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27280/30000 [05:57<00:35, 76.72it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27288/30000 [05:57<00:35, 77.13it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27296/30000 [05:57<00:35, 76.99it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27304/30000 [05:57<00:35, 76.80it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27312/30000 [05:58<00:34, 76.87it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27320/30000 [05:58<00:34, 76.69it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27328/30000 [05:58<00:34, 76.93it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27336/30000 [05:58<00:34, 77.23it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27344/30000 [05:58<00:34, 77.10it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27352/30000 [05:58<00:34, 77.07it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27360/30000 [05:58<00:34, 77.06it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████ | 27368/30000 [05:58<00:34, 77.24it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27376/30000 [05:58<00:33, 77.24it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27384/30000 [05:59<00:33, 77.24it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27392/30000 [05:59<00:33, 77.38it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27400/30000 [05:59<00:33, 77.38it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27408/30000 [05:59<00:33, 77.44it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27416/30000 [05:59<00:33, 77.54it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27424/30000 [05:59<00:33, 77.25it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27432/30000 [05:59<00:33, 77.01it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27440/30000 [05:59<00:33, 77.42it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 91%|█████████▏| 27448/30000 [05:59<00:32, 77.36it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27456/30000 [05:59<00:33, 76.60it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27464/30000 [06:00<00:33, 76.38it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27472/30000 [06:00<00:33, 76.23it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27480/30000 [06:00<00:32, 76.39it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27488/30000 [06:00<00:32, 76.87it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27496/30000 [06:00<00:32, 76.83it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27504/30000 [06:00<00:32, 76.88it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27512/30000 [06:00<00:32, 77.33it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27520/30000 [06:00<00:32, 76.69it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27528/30000 [06:00<00:32, 77.03it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27536/30000 [06:00<00:31, 77.21it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27544/30000 [06:01<00:31, 77.16it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27552/30000 [06:01<00:31, 76.95it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27560/30000 [06:01<00:31, 76.93it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27568/30000 [06:01<00:31, 76.86it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27576/30000 [06:01<00:31, 77.03it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27584/30000 [06:01<00:31, 76.81it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27592/30000 [06:01<00:31, 76.86it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27600/30000 [06:01<00:31, 76.83it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27608/30000 [06:01<00:31, 77.02it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27616/30000 [06:02<00:30, 77.30it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27624/30000 [06:02<00:30, 77.11it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27632/30000 [06:02<00:30, 77.13it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27640/30000 [06:02<00:30, 77.04it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27648/30000 [06:02<00:30, 77.19it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27656/30000 [06:02<00:30, 77.44it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27664/30000 [06:02<00:30, 77.20it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27672/30000 [06:02<00:30, 77.13it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27680/30000 [06:02<00:29, 77.41it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27688/30000 [06:02<00:29, 77.09it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27696/30000 [06:03<00:29, 76.93it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27704/30000 [06:03<00:29, 77.15it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27712/30000 [06:03<00:29, 77.14it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27720/30000 [06:03<00:29, 76.87it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27728/30000 [06:03<00:29, 76.43it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27736/30000 [06:03<00:29, 76.73it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 92%|█████████▏| 27744/30000 [06:03<00:29, 76.62it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27752/30000 [06:03<00:29, 77.03it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27760/30000 [06:03<00:28, 77.27it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27768/30000 [06:03<00:28, 77.18it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27776/30000 [06:04<00:28, 77.38it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27784/30000 [06:04<00:28, 77.35it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27792/30000 [06:04<00:28, 77.69it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27800/30000 [06:04<00:28, 77.89it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27808/30000 [06:04<00:28, 77.70it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27816/30000 [06:04<00:28, 76.78it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27824/30000 [06:04<00:28, 76.90it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27832/30000 [06:04<00:28, 76.70it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27840/30000 [06:04<00:28, 76.70it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27848/30000 [06:05<00:27, 76.95it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27856/30000 [06:05<00:27, 76.94it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27864/30000 [06:05<00:27, 77.08it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27872/30000 [06:05<00:27, 77.65it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27880/30000 [06:05<00:27, 77.19it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27888/30000 [06:05<00:27, 77.55it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27896/30000 [06:05<00:27, 77.59it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27904/30000 [06:05<00:27, 77.56it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27912/30000 [06:05<00:27, 76.79it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27920/30000 [06:05<00:26, 77.04it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27928/30000 [06:06<00:26, 77.29it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27936/30000 [06:06<00:26, 76.86it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27944/30000 [06:06<00:26, 77.13it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27952/30000 [06:06<00:26, 77.40it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27960/30000 [06:06<00:26, 77.15it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27968/30000 [06:06<00:26, 77.14it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27976/30000 [06:06<00:26, 77.27it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27984/30000 [06:06<00:26, 77.50it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 27992/30000 [06:06<00:25, 77.80it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 28000/30000 [06:06<00:25, 77.88it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 28008/30000 [06:07<00:25, 77.49it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 28016/30000 [06:07<00:25, 77.17it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 28024/30000 [06:07<00:25, 77.48it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 28032/30000 [06:07<00:25, 77.45it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 28040/30000 [06:07<00:25, 76.89it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 93%|█████████▎| 28048/30000 [06:07<00:25, 76.91it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28056/30000 [06:07<00:25, 77.26it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28064/30000 [06:07<00:25, 77.27it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28072/30000 [06:07<00:24, 77.64it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28080/30000 [06:08<00:24, 77.35it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28088/30000 [06:08<00:24, 76.85it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28096/30000 [06:08<00:24, 77.05it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28104/30000 [06:08<00:24, 76.79it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28112/30000 [06:08<00:24, 77.04it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▎| 28120/30000 [06:08<00:24, 77.00it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28128/30000 [06:08<00:24, 77.16it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28136/30000 [06:08<00:24, 77.32it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28144/30000 [06:08<00:24, 77.02it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28152/30000 [06:08<00:24, 76.84it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28160/30000 [06:09<00:23, 76.93it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28168/30000 [06:09<00:23, 76.87it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28176/30000 [06:09<00:23, 77.34it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28184/30000 [06:09<00:23, 77.53it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28192/30000 [06:09<00:23, 77.51it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28200/30000 [06:09<00:23, 77.59it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28208/30000 [06:09<00:23, 77.23it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28216/30000 [06:09<00:23, 77.53it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28224/30000 [06:09<00:22, 77.43it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28232/30000 [06:09<00:22, 77.44it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28240/30000 [06:10<00:22, 77.25it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28248/30000 [06:10<00:22, 77.08it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28256/30000 [06:10<00:22, 77.15it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28264/30000 [06:10<00:22, 76.92it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28272/30000 [06:10<00:22, 76.83it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28280/30000 [06:10<00:22, 77.08it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28288/30000 [06:10<00:22, 77.49it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28296/30000 [06:10<00:22, 77.09it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28304/30000 [06:10<00:22, 77.07it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28312/30000 [06:11<00:21, 77.12it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28320/30000 [06:11<00:21, 77.09it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28328/30000 [06:11<00:21, 76.44it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28336/30000 [06:11<00:21, 76.60it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 94%|█████████▍| 28344/30000 [06:11<00:21, 76.33it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28352/30000 [06:11<00:21, 76.54it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28360/30000 [06:11<00:21, 77.05it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28368/30000 [06:11<00:21, 77.16it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28376/30000 [06:11<00:21, 76.70it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28384/30000 [06:11<00:21, 76.77it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28392/30000 [06:12<00:20, 76.64it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28400/30000 [06:12<00:20, 76.96it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28408/30000 [06:12<00:20, 77.02it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28416/30000 [06:12<00:20, 77.29it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28424/30000 [06:12<00:20, 76.76it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28432/30000 [06:12<00:20, 76.66it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28440/30000 [06:12<00:20, 76.86it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28448/30000 [06:12<00:20, 77.34it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28456/30000 [06:12<00:19, 77.38it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28464/30000 [06:13<00:19, 77.89it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28472/30000 [06:13<00:19, 77.61it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28480/30000 [06:13<00:19, 77.17it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28488/30000 [06:13<00:19, 77.03it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▍| 28496/30000 [06:13<00:19, 76.72it/s, init loss: 40160.1836, avg. loss [25501-27000]: 18034.4297] 95%|█████████▌| 28504/30000 [06:13<00:19, 76.67it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28512/30000 [06:13<00:19, 76.83it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28520/30000 [06:13<00:19, 77.08it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28528/30000 [06:13<00:19, 77.13it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28536/30000 [06:13<00:18, 77.13it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28544/30000 [06:14<00:18, 77.07it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28552/30000 [06:14<00:18, 76.97it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28560/30000 [06:14<00:18, 77.09it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28568/30000 [06:14<00:18, 77.36it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28576/30000 [06:14<00:18, 76.83it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28584/30000 [06:14<00:18, 77.01it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28592/30000 [06:14<00:18, 76.81it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28600/30000 [06:14<00:18, 77.10it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28608/30000 [06:14<00:18, 77.32it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28616/30000 [06:14<00:17, 77.37it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28624/30000 [06:15<00:17, 77.60it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28632/30000 [06:15<00:17, 77.17it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28640/30000 [06:15<00:17, 77.31it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 95%|█████████▌| 28648/30000 [06:15<00:17, 77.15it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28656/30000 [06:15<00:17, 77.05it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28664/30000 [06:15<00:17, 77.38it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28672/30000 [06:15<00:17, 77.64it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28680/30000 [06:15<00:17, 77.54it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28688/30000 [06:15<00:17, 76.99it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28696/30000 [06:16<00:16, 77.53it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28704/30000 [06:16<00:16, 77.22it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28712/30000 [06:16<00:16, 77.27it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28720/30000 [06:16<00:16, 76.96it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28728/30000 [06:16<00:16, 77.08it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28736/30000 [06:16<00:16, 77.09it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28744/30000 [06:16<00:16, 77.34it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28752/30000 [06:16<00:16, 77.32it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28760/30000 [06:16<00:16, 77.21it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28768/30000 [06:16<00:15, 77.50it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28776/30000 [06:17<00:15, 77.73it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28784/30000 [06:17<00:15, 77.64it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28792/30000 [06:17<00:15, 77.95it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28800/30000 [06:17<00:15, 77.88it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28808/30000 [06:17<00:15, 77.10it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28816/30000 [06:17<00:15, 77.25it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28824/30000 [06:17<00:15, 77.24it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28832/30000 [06:17<00:15, 77.57it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28840/30000 [06:17<00:14, 77.73it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28848/30000 [06:17<00:14, 78.06it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28856/30000 [06:18<00:14, 77.86it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28864/30000 [06:18<00:14, 77.59it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▌| 28872/30000 [06:18<00:14, 77.78it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28880/30000 [06:18<00:14, 77.39it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28888/30000 [06:18<00:14, 77.26it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28896/30000 [06:18<00:14, 77.13it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28904/30000 [06:18<00:14, 77.19it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28912/30000 [06:18<00:14, 77.09it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28920/30000 [06:18<00:14, 76.79it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28928/30000 [06:19<00:13, 77.37it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28936/30000 [06:19<00:13, 77.00it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 96%|█████████▋| 28944/30000 [06:19<00:13, 76.97it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 28952/30000 [06:19<00:13, 76.79it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 28960/30000 [06:19<00:13, 76.87it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 28968/30000 [06:19<00:13, 77.30it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 28976/30000 [06:19<00:13, 77.21it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 28984/30000 [06:19<00:13, 76.82it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 28992/30000 [06:19<00:13, 77.07it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29000/30000 [06:19<00:12, 77.28it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29008/30000 [06:20<00:12, 77.08it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29016/30000 [06:20<00:12, 76.87it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29024/30000 [06:20<00:12, 76.74it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29032/30000 [06:20<00:12, 77.03it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29040/30000 [06:20<00:12, 76.72it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29048/30000 [06:20<00:12, 76.87it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29056/30000 [06:20<00:12, 77.21it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29064/30000 [06:20<00:12, 76.98it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29072/30000 [06:20<00:12, 76.81it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29080/30000 [06:20<00:11, 76.68it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29088/30000 [06:21<00:11, 76.71it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29096/30000 [06:21<00:11, 76.36it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29104/30000 [06:21<00:11, 76.36it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29112/30000 [06:21<00:11, 76.70it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29120/30000 [06:21<00:11, 77.20it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29128/30000 [06:21<00:11, 77.72it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29136/30000 [06:21<00:11, 77.35it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29144/30000 [06:21<00:11, 77.48it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29152/30000 [06:21<00:10, 77.73it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29160/30000 [06:22<00:10, 77.48it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29168/30000 [06:22<00:10, 77.21it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29176/30000 [06:22<00:10, 77.16it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29184/30000 [06:22<00:10, 77.26it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29192/30000 [06:22<00:10, 77.00it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29200/30000 [06:22<00:10, 77.23it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29208/30000 [06:22<00:10, 76.36it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29216/30000 [06:22<00:10, 76.45it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29224/30000 [06:22<00:10, 76.38it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29232/30000 [06:22<00:10, 76.79it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29240/30000 [06:23<00:09, 76.95it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 97%|█████████▋| 29248/30000 [06:23<00:09, 76.72it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29256/30000 [06:23<00:09, 76.79it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29264/30000 [06:23<00:09, 77.42it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29272/30000 [06:23<00:09, 77.24it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29280/30000 [06:23<00:09, 77.11it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29288/30000 [06:23<00:09, 76.85it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29296/30000 [06:23<00:09, 76.85it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29304/30000 [06:23<00:09, 77.22it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29312/30000 [06:24<00:08, 76.58it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29320/30000 [06:24<00:08, 76.54it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29328/30000 [06:24<00:08, 76.87it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29336/30000 [06:24<00:08, 76.80it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29344/30000 [06:24<00:08, 76.82it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29352/30000 [06:24<00:08, 77.18it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29360/30000 [06:24<00:08, 77.14it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29368/30000 [06:24<00:08, 76.95it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29376/30000 [06:24<00:08, 76.76it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29384/30000 [06:24<00:08, 76.72it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29392/30000 [06:25<00:07, 76.80it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29400/30000 [06:25<00:07, 76.64it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29408/30000 [06:25<00:07, 76.93it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29416/30000 [06:25<00:07, 77.23it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29424/30000 [06:25<00:07, 76.96it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29432/30000 [06:25<00:07, 76.89it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29440/30000 [06:25<00:07, 77.32it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29448/30000 [06:25<00:07, 77.65it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29456/30000 [06:25<00:07, 76.70it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29464/30000 [06:25<00:06, 77.02it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29472/30000 [06:26<00:06, 77.12it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29480/30000 [06:26<00:06, 76.68it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29488/30000 [06:26<00:06, 76.57it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29496/30000 [06:26<00:06, 77.05it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29504/30000 [06:26<00:06, 77.26it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29512/30000 [06:26<00:06, 77.12it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29520/30000 [06:26<00:06, 77.21it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29528/30000 [06:26<00:06, 77.28it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29536/30000 [06:26<00:06, 77.06it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 98%|█████████▊| 29544/30000 [06:27<00:05, 77.05it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29552/30000 [06:27<00:05, 77.08it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29560/30000 [06:27<00:05, 77.41it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29568/30000 [06:27<00:05, 77.24it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29576/30000 [06:27<00:05, 77.51it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29584/30000 [06:27<00:05, 77.47it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29592/30000 [06:27<00:05, 77.36it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29600/30000 [06:27<00:05, 77.56it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29608/30000 [06:27<00:05, 77.79it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29616/30000 [06:27<00:04, 77.88it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▊| 29624/30000 [06:28<00:04, 77.38it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29632/30000 [06:28<00:04, 76.67it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29640/30000 [06:28<00:04, 77.01it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29648/30000 [06:28<00:04, 77.23it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29656/30000 [06:28<00:04, 77.69it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29664/30000 [06:28<00:04, 77.70it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29672/30000 [06:28<00:04, 77.75it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29680/30000 [06:28<00:04, 77.30it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29688/30000 [06:28<00:04, 77.22it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29696/30000 [06:28<00:03, 77.38it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29704/30000 [06:29<00:03, 77.30it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29712/30000 [06:29<00:03, 76.58it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29720/30000 [06:29<00:03, 76.81it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29728/30000 [06:29<00:03, 77.15it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29736/30000 [06:29<00:03, 76.82it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29744/30000 [06:29<00:03, 77.07it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29752/30000 [06:29<00:03, 77.50it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29760/30000 [06:29<00:03, 77.61it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29768/30000 [06:29<00:03, 76.92it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29776/30000 [06:30<00:02, 77.38it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29784/30000 [06:30<00:02, 77.48it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29792/30000 [06:30<00:02, 77.36it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29800/30000 [06:30<00:02, 77.28it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29808/30000 [06:30<00:02, 76.89it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29816/30000 [06:30<00:02, 77.02it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29824/30000 [06:30<00:02, 76.93it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29832/30000 [06:30<00:02, 77.23it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29840/30000 [06:30<00:02, 77.31it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855] 99%|█████████▉| 29848/30000 [06:30<00:01, 77.40it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29856/30000 [06:31<00:01, 77.54it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29864/30000 [06:31<00:01, 77.14it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29872/30000 [06:31<00:01, 77.22it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29880/30000 [06:31<00:01, 77.38it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29888/30000 [06:31<00:01, 77.73it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29896/30000 [06:31<00:01, 77.43it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29904/30000 [06:31<00:01, 77.08it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29912/30000 [06:31<00:01, 76.93it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29920/30000 [06:31<00:01, 77.18it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29928/30000 [06:31<00:00, 76.97it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29936/30000 [06:32<00:00, 77.18it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29944/30000 [06:32<00:00, 76.80it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29952/30000 [06:32<00:00, 77.29it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29960/30000 [06:32<00:00, 76.83it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29968/30000 [06:32<00:00, 76.88it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29976/30000 [06:32<00:00, 76.76it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29984/30000 [06:32<00:00, 76.58it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|█████████▉| 29992/30000 [06:32<00:00, 76.61it/s, init loss: 40160.1836, avg. loss [27001-28500]: 17995.6855]100%|██████████| 30000/30000 [06:32<00:00, 76.90it/s, init loss: 40160.1836, avg. loss [28501-30000]: 17966.2871]100%|██████████| 30000/30000 [06:32<00:00, 76.35it/s, init loss: 40160.1836, avg. loss [28501-30000]: 17966.2871]

posterior_hierarchical_zero_inflated_regression_svi = sample_posterior_predictive_svi(
    rng_key=RNG_KEY,
    model=hierarchical_zero_inflated_negative_binomial_regression,
    guide=svi_hierarchical_zero_inflated_regression_guide,
    covariates_hat=zero_inflated_regression_covariates_hat,
    svi_result=svi_hierarchical_zero_inflated_regression_parameters,
    num_samples=2000,
    model_kwargs=hierarchical_zero_inflated_regression_kwargs,
    return_sites=hierarchical_zero_inflated_regression_parameters,
)

And as we can see we can obtain different estimates for the various counties in Florida

visualize_geo_regression(
    covariates_hat_df=zero_inflated_regression_covariates_hat_df,
    posterior=posterior_hierarchical_zero_inflated_regression_svi,
    parameter="spatial_component_gate",
)
plt.show()

visualize_geo_regression(
    covariates_hat_df=zero_inflated_regression_covariates_hat_df,
    posterior=posterior_hierarchical_zero_inflated_regression_svi,
    parameter="spatial_component_mean",
)
plt.show()

fig, axs = visualize_temporal_components(
    temporal_components=generate_temporal_components(
        posterior=posterior_hierarchical_zero_inflated_regression_svi,
        transformers=zero_inflated_regression_transformers,
        years=count_state_modelling_df[YEAR_COVARIATES].unique(),
        suffix="_gate",
    )
)
plt.show()

fig, axs = visualize_temporal_components(
    temporal_components=generate_temporal_components(
        posterior=posterior_hierarchical_zero_inflated_regression_svi,
        transformers=zero_inflated_regression_transformers,
        years=count_state_modelling_df[YEAR_COVARIATES].unique(),
        suffix="_mean",
    )
)
plt.show()

2 Closing remarks

We provided here a general overview on how to estimate hail risk, and extreme hail risk in particular, using spatio-temporal information. The two model we presented here could be used inside a simulation setting for assessing the likelyhood and magnitude of an extreme hail event happeinging in a particular location and at a certain point in time.

The zero-inflated negative binomial model could be used for simulating the occurence of all hail events while the quantile regression would help describing the magnitude of such events if they were to be considered extreme.

3 Hardware and Requirements

Here you can find the hardware and python requirements used for building this post.

%watermark
Last updated: 2025-12-29T16:00:18.781176+01:00

Python implementation: CPython
Python version       : 3.13.2
IPython version      : 9.0.2

Compiler    : Clang 18.1.8 
OS          : Darwin
Release     : 24.5.0
Machine     : arm64
Processor   : arm
CPU cores   : 14
Architecture: 64bit
%watermark --iversions
joblib    : 1.4.2
gif       : 23.3.0
scipy     : 1.16.3
tqdm      : 4.67.1
numpy     : 2.2.4
matplotlib: 3.10.1
sklearn   : 1.6.1
numpyro   : 0.19.0
pyextremes: 2.3.3
jax       : 0.7.2
IPython   : 9.0.2
pandas    : 2.2.3
seaborn   : 0.13.2
geopandas : 1.1.1