Ã÷ÂԿƼ¼BlockformerÓïÒôʶ±ðÄ£×ÓÔÚAISHELL-1²âÊÔ¼¯ÉÏÈ¡µÃSOTAЧ¹û
2022-09-13
Ã÷ÂԿƼ¼¼´½«¿ªÔ´BlockformerÓïÒôʶ±ðÄ£×Ó£¬£¬£¬£¬£¬£¬£¬£¬ÌáÉýÏúÊÛÀú³ÌÖеĻỰÖÇÄÜ£¬£¬£¬£¬£¬£¬£¬£¬ÖúÁ¦¸÷ÐÐÒµÊýÖÇ»¯×ªÐÍ¡£¡£¡£¡£¡£¡£¡£¡£
Éî¶ÈѧϰÒÑÀÖ³ÉÓ¦ÓÃÓÚÓïÒôʶ±ð£¬£¬£¬£¬£¬£¬£¬£¬ÖÖÖÖÉñ¾ÍøÂç±»¸÷ÈËÆÕ±éÑо¿ºÍ̽Ë÷£¬£¬£¬£¬£¬£¬£¬£¬ÀýÈ磬£¬£¬£¬£¬£¬£¬£¬Éî¶ÈÉñ¾ÍøÂ磨Deep Neural Network£¬£¬£¬£¬£¬£¬£¬£¬DNN£©¡¢¾í»ýÉñ¾ÍøÂ磨Convolutional Neural Network£¬£¬£¬£¬£¬£¬£¬£¬CNN£©¡¢Ñ»·Éñ¾ÍøÂ磨Recurrent Neural Network£¬£¬£¬£¬£¬£¬£¬£¬RNN£©ºÍ¶Ëµ½¶ËµÄÉñ¾ÍøÂçÄ£×Ó¡£¡£¡£¡£¡£¡£¡£¡£
ÏÖÔÚ£¬£¬£¬£¬£¬£¬£¬£¬Ö÷ÒªÓÐÈýÖֶ˵½¶ËµÄÄ£×Ó¿ò¼Ü£ºÉñ¾ÍøÂç´«¸ÐÆ÷£¨Neural Transducer£¬£¬£¬£¬£¬£¬£¬£¬NT£©£¬£¬£¬£¬£¬£¬£¬£¬»ùÓÚ×¢ÖØÁ¦µÄ±àÂëÆ÷-½âÂëÆ÷£¨Attention-based Encoder Decoder£¬£¬£¬£¬£¬£¬£¬£¬AED£©ºÍÅþÁ¬Ê±Ðò·ÖÀࣨConnectionist Temporal Classification£¬£¬£¬£¬£¬£¬£¬£¬CTC£©¡£¡£¡£¡£¡£¡£¡£¡£
NTÊÇCTCµÄÔöÇ¿°æ±¾£¬£¬£¬£¬£¬£¬£¬£¬ÒýÈëÁËÕ¹ÍûÍøÂçÄ£¿£¿£¿£¿£¿£¿é£¬£¬£¬£¬£¬£¬£¬£¬¿ÉÀà±È¹Å°åÓïÒôʶ±ð¿ò¼ÜÖеÄÓïÑÔÄ£×Ó£¬£¬£¬£¬£¬£¬£¬£¬½âÂëÆ÷ÐèÒª°ÑÏÈǰչÍûµÄÀúÊ·×÷ΪÉÏÏÂÎÄÊäÈë¡£¡£¡£¡£¡£¡£¡£¡£NTѵÁ·²»Îȹ̣¬£¬£¬£¬£¬£¬£¬£¬ÐèÒª¸ü¶àÄڴ棬£¬£¬£¬£¬£¬£¬£¬Õâ¿ÉÄÜ»áÏÞÖÆÑ·üçٶȡ£¡£¡£¡£¡£¡£¡£¡£
AEDÓɱàÂëÆ÷£¬£¬£¬£¬£¬£¬£¬£¬½âÂëÆ÷ºÍ×¢ÖØÁ¦»úÖÆÄ£¿£¿£¿£¿£¿£¿é×é³É£¬£¬£¬£¬£¬£¬£¬£¬Ç°Õß¶ÔÉùÑ§ÌØÕ÷¾ÙÐбàÂ룬£¬£¬£¬£¬£¬£¬£¬½âÂëÆ÷ÌìÉú¾ä×Ó£¬£¬£¬£¬£¬£¬£¬£¬×¢ÖØÁ¦»úÖÆÓÃÀ´¶ÔÆë±àÂëÆ÷ÊäÈëÌØÕ÷Ï¢ÕùÂë״̬¡£¡£¡£¡£¡£¡£¡£¡£ÒµÄÚ²»ÉÙASRϵͳ¼Ü¹¹»ùÓÚAED¡£¡£¡£¡£¡£¡£¡£¡£È»¶ø£¬£¬£¬£¬£¬£¬£¬£¬AEDÄ£×ÓÖð¸öµ¥Î»Êä³ö£¬£¬£¬£¬£¬£¬£¬£¬ÆäÖÐÿ¸öµ¥Î»¼ÈÈ¡¾öÓÚÏÈËÞÊÀ³ÉµÄЧ¹û£¬£¬£¬£¬£¬£¬£¬£¬ÓÖÒÀÀµºóÐøµÄÉÏÏÂÎÄ£¬£¬£¬£¬£¬£¬£¬£¬Õâ»áµ¼ÖÂʶ±ðÑÓ³Ù¡£¡£¡£¡£¡£¡£¡£¡£
ÁíÍ⣬£¬£¬£¬£¬£¬£¬£¬ÔÚÏÖʵµÄÓïÒôʶ±ðʹÃüÖУ¬£¬£¬£¬£¬£¬£¬£¬AEDµÄ×¢ÖØÁ¦»úÖÆµÄ¶ÔÆëЧ¹û£¬£¬£¬£¬£¬£¬£¬£¬ÓÐʱҲ»á±»ÔëÉùÆÆË𡣡£¡£¡£¡£¡£¡£¡£
CTCµÄ½âÂëËÙÂʱÈAED¿ì£¬£¬£¬£¬£¬£¬£¬£¬¿ÉÊÇÓÉÓÚÊä³öµ¥Î»Ö®¼äµÄÌõ¼þ×ÔÁ¦ÐÔºÍȱ·¦ÓïÑÔÄ£×ÓµÄÔ¼Êø£¬£¬£¬£¬£¬£¬£¬£¬Æäʶ±ðÂÊÓÐÌáÉý¿Õ¼ä¡£¡£¡£¡£¡£¡£¡£¡£
ÏÖÔÚÓÐһЩ¹ØÓÚÈÚºÏAEDºÍCTCÁ½ÖÖ¿ò¼ÜµÄÑо¿£¬£¬£¬£¬£¬£¬£¬£¬»ùÓÚ±àÂëÆ÷¹²ÏíµÄ¶àʹÃüѧϰ£¬£¬£¬£¬£¬£¬£¬£¬Ê¹ÓÃCTCºÍAEDÄ¿µÄͬʱѵÁ·¡£¡£¡£¡£¡£¡£¡£¡£ÔÚÄ£×ӽṹÉÏ£¬£¬£¬£¬£¬£¬£¬£¬TransformerÒѾÔÚ»úе·Ò룬£¬£¬£¬£¬£¬£¬£¬ÓïÒôʶ±ð£¬£¬£¬£¬£¬£¬£¬£¬ºÍÅÌËã»úÊÓ¾õÁìÓòÏÔʾÁ˼«´óµÄÓÅÊÆ¡£¡£¡£¡£¡£¡£¡£¡£
Ã÷ÂԿƼ¼¼¯ÍŸ߼¶×ܼࡢÓïÒôÊÖÒÕÈÏÕæÈËÖì»á·åÏÈÈÝ£¬£¬£¬£¬£¬£¬£¬£¬Ã÷ÂÔÍŶÓÖØµãÑо¿ÁËÔÚCTCºÍAEDÈÚºÏѵÁ·¿ò¼ÜÏ£¬£¬£¬£¬£¬£¬£¬£¬ÔõÑùʹÓÃTransformerÄ£×ÓÀ´Ìá¸ßʶ±ðЧ¹û¡£¡£¡£¡£¡£¡£¡£¡£

Ã÷ÂÔÍŶÓͨ¹ý¿ÉÊÓ»¯ÆÊÎöÁ˲î±ðBLOCKºÍHEADÖ®¼äµÄ×¢ÖØÁ¦ÐÅÏ¢£¬£¬£¬£¬£¬£¬£¬£¬ÕâЩÐÅÏ¢µÄ¶àÑùÐÔÊǺÜÊÇÓÐ×ÊÖúµÄ£¬£¬£¬£¬£¬£¬£¬£¬±àÂëÆ÷Ï¢ÕùÂëÆ÷ÖÐÿ¸öBLOCKµÄÊä³öÐÅÏ¢²¢²»ÍêÈ«°üÀ¨£¬£¬£¬£¬£¬£¬£¬£¬Ò²¿ÉÄÜÊÇ»¥²¹µÄ¡£¡£¡£¡£¡£¡£¡£¡££¨https://doi.org/10.48550/arXiv.2207.11697£©
»ùÓÚÕâÖÖ¶´²ì£¬£¬£¬£¬£¬£¬£¬£¬Ã÷ÂÔÍŶÓÌá³öÁËÒ»ÖÖÄ£×ӽṹ£¬£¬£¬£¬£¬£¬£¬£¬Block-augmented Transformer £¨BlockFormer£©£¬£¬£¬£¬£¬£¬£¬£¬Ñо¿ÁËÔõÑùÒÔ²ÎÊý»¯µÄ·½·¨»¥²¹ÈÚºÏÿ¸ö¿éµÄ»ù±¾ÐÅÏ¢£¬£¬£¬£¬£¬£¬£¬£¬ÊµÏÖÁËWeighted Sum of the Blocks Output£¨Base-WSBO£©ºÍSqueeze-and-Excitation module to WSBO£¨SE-WSBO£©Á½ÖÖblock¼¯³ÉÒªÁì¡£¡£¡£¡£¡£¡£¡£¡£


ʵÑé֤ʵ£¬£¬£¬£¬£¬£¬£¬£¬BlockformerÄ£×ÓÔÚÖÐÎÄͨË×»°²âÊÔ¼¯£¨AISHELL-1£©ÉÏ£¬£¬£¬£¬£¬£¬£¬£¬²»Ê¹ÓÃÓïÑÔÄ£×ÓµÄÇéÐÎÏÂʵÏÖÁË4.35%µÄCER£¬£¬£¬£¬£¬£¬£¬£¬Ê¹ÓÃÓïÑÔÄ£×ÓʱµÖ´ïÁË4.10%µÄCER¡£¡£¡£¡£¡£¡£¡£¡£



AISHELL-1ÊÇÏ£¶û±´¿Ç2017Ä꿪ԴµÄÖÐÎÄͨË×»°ÓïÒôÊý¾Ý¿â£¬£¬£¬£¬£¬£¬£¬£¬Â¼Òôʱ³¤178Сʱ£¬£¬£¬£¬£¬£¬£¬£¬ÓÉ400ÃûÖйú²î±ðµØÇøÓïÑÔÈ˾ÙÐÐÂ¼ÖÆ¡£¡£¡£¡£¡£¡£¡£¡£¸ÃÊý¾Ý¿âÉæ¼°ÖÇÄܼҾӡ¢ÎÞÈ˼ÝÊ»¡¢¹¤ÒµÉú²úµÈ11¸öÁìÓò£¬£¬£¬£¬£¬£¬£¬£¬±»¸ßƵӦÓÃÔÚÓïÒôÊÖÒÕ¿ª·¢¼°ÊµÑéÖУ¬£¬£¬£¬£¬£¬£¬£¬Êǵ±½ñÖÐÎÄÓïÒôʶ±ðÆÀ²âµÄȨÍþÊý¾Ý¿âÖ®Ò»¡£¡£¡£¡£¡£¡£¡£¡£
AI WikiÍøÕ¾Papers With CodeÏÔʾ£¬£¬£¬£¬£¬£¬£¬£¬BlockformerÔÚAISHELL-1ÉÏÈ¡µÃSOTAµÄʶ±ðЧ¹û£¬£¬£¬£¬£¬£¬£¬£¬×Ö´íÂʽµµÍµ½4.10%£¨Ê¹ÓÃÓïÑÔÄ£×Óʱ£©¡£¡£¡£¡£¡£¡£¡£¡£
£¨https://paperswithcode.com/sota/speech-recognition-on-aishell-1£©
Ã÷ÂԿƼ¼¼¯ÍÅCTOºÂ½ÜÌåÏÖ£¬£¬£¬£¬£¬£¬£¬£¬Ã÷ÂԵĻỰÖÇÄܲúÆ·Õë¶Ô»ùÓÚÏßÉÏÆó΢»á»°ºÍÏßÏÂÃŵê»á»°µÄÏúÊÛ³¡¾°£¬£¬£¬£¬£¬£¬£¬£¬ÓïÒôʶ±ðÍŶӾ۽¹ÃÀ×±¡¢Æû³µ¡¢½ÌÓýµÈÐÐÒµµÄ³¡¾°ÓÅ»¯ºÍ¶¨ÖÆÑµÁ·£¬£¬£¬£¬£¬£¬£¬£¬¿ÉÊÇÒ²²»ËÉ¿ª¶ÔͨÓÃÓïÒôʶ±ðпò¼Ü¡¢ÐÂÄ£×ÓµÄ̽Ë÷£¬£¬£¬£¬£¬£¬£¬£¬BlockformerÄ£×ÓµÄÕâ¸öSOTAЧ¹ûΪÓïÒôʶ±ðµÄ¶¨ÖÆÓÅ»¯ÌṩÁËÒ»¸ö¸ßÆðµã£¬£¬£¬£¬£¬£¬£¬£¬Ã÷ÂÔ¼´½«¿ªÔ´Blockformer¡£¡£¡£¡£¡£¡£¡£¡£
ÐÅÏ¢Ìîд