
INTERSPEECH 2023 Paper Kit
Author
Simon King
Last Updated
5 months ago
License
Creative Commons CC BY 4.0
Abstract
Paper Kit for the INTERSPEECH 2023 conference
https://www.interspeech2023.org

\documentclass{INTERSPEECH2023}
% 2023-01-06 modified by Simon King (Simon.King@ed.ac.uk)
% **************************************
% * DOUBLE-BLIND REVIEW SETTINGS *
% **************************************
% Comment out \interspeechcameraready when submitting the
% paper for review.
% If your paper is accepted, uncomment this to produce the
% 'camera ready' version to submit for publication.
% \interspeechcameraready
% **************************************
% * *
% * STOP ! DO NOT DELETE ! *
% * READ THIS FIRST *
% * *
% * This template also includes *
% * important INSTRUCTIONS that you *
% * must follow when preparing your *
% * paper. Read it BEFORE replacing *
% * the content with your own work. *
% **************************************
\title{Paper Instructions and Template for INTERSPEECH 2023}
\name{First Author Name$^1$, Second Author Name$^2$, Third Author Name$^3$}
%The maximum number of authors in the author list is 20. If the number of contributing authors is more than this, they should be listed in a footnote or the acknowledgement section.
\address{
$^1$First Author Affiliation, CountryX\\
$^2$Second Author Affiliation, CountryY \\
$^3$Third Author Affiliation, CountryZ}
\email{first@university.edu, second@companyA.com, third@companyB.ai}
\begin{document}
\maketitle
\begin{abstract}
% 1000 characters. ASCII characters only. No citations.
Manuscripts submitted to INTERSPEECH 2023 must use this document as both an instruction set and as a template. Do not use a past paper as a template. Always start from a fresh copy, and read it all before replacing the content with your own.
Before submitting, check that your manuscript conforms to this template. If it does not, it may be rejected. Do not be tempted to adjust the format! Instead, edit your content to fit the allowed space. The maximum number of manuscript pages is 5. The 5th page is reserved exclusively for references, which may begin on an earlier page if there is space.
The abstract is limited to 1000 characters. The one in your manuscript and the one entered in the submission form must be identical. Avoid non-ASCII characters, symbols, maths, italics, etc as they may not display correctly in the abstract book. Do not use citations in the abstract: the abstract booklet will not include a bibliography. Index terms appear immediately below the abstract.
\end{abstract}
\noindent\textbf{Index Terms}: speech recognition, human-computer interaction, computational paralinguistics
\section{Introduction}
Templates are provided on the conference website for Microsoft Word\textregistered, and \LaTeX. We strongly recommend \LaTeX\xspace
which can be used conveniently in a web browser on \url{overleaf.com} where this template is available in the Template Gallery.
\subsection{General advice}
Authors are encouraged to describe how their work relates to prior work by themselves and by others, and to make clear statements about the novelty of their work. This may require careful wording in the version submitted for review (guidance in Section \ref{section:doubleblind}). All submissions must be compliant with the ISCA Code of Ethics for Authors, the Pre-prints Policy, and the Conference Policy. These can be found on the conference website.
\subsubsection{Conference theme}
The theme of INTERSPEECH~2023 is Inclusive Spoken Language Science and Technology – Breaking Down Barriers. Whilst it is not a requirement to address this theme, INTERSPEECH~2023 encourages submissions that: report performance metric distributions in addition to averages; break down results by demographic; employ diverse data; evaluate with diverse target users; report barriers that could prevent other researchers adopting a technique, or users from benefitting. This is not an exhaustive list, and authors are encouraged to discuss the implications of the conference theme for their own work.
\subsubsection{Reproducible research}
Authors may wish to describe whether their work could be reproduced by others. The following checklist will be part of the submission form, and is intended to encourage authors to think about reproducibility, noting that not every point will be applicable to all work.
% THIS LIST IS PROVISIONAL - pending final version from Kate.
% \setlist{noitemsep,topsep=0pt,parsep=2pt,partopsep=0pt,leftmargin=1em}
\begin{enumerate}
\item Reproducibility for all papers
\begin{itemize}
\item The paper clearly states what claims are being investigated.
\item The main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope.
\item The limitations of your work are described.
\item All assumptions made in your work are stated in the paper.
\end{itemize}
\item Data sets - for all data sets used, the paper includes information about the following:
\begin{itemize}
\item Relevant details such as languages, number of examples and label distributions.
\item Details of train/validation (development)/test splits.
\item Explanation of all pre-processing steps, including any data that was excluded.
\item Reference(s) to all data set(s) drawn from the existing literature.
\item For new data collected, a complete description of the data collection process, such as subjects, instructions to annotators and methods for quality control.
\item Whether ethics approval was necessary for the data.
\end{itemize}
\item Non-public data sets
\begin{itemize}
\item We use non-public data sets (if no or n/a ignore remaining questions in this section).
\item We will release a copy of the data set in connection with the final paper.
\item We are unable to release a copy of the data set due to the licence restrictions but have included full details to enable comparison to similar data sets and tasks.
\item If only non-public data set is used, we have discussed why in the paper.
\end{itemize}
\item For all experiments with hyperparameter search - you have included
\begin{itemize}
\item The exact number of training and evaluation runs, and how the models were initialised in each case.
\item Bounds for each hyperparameter.
\item Hyperparameter configurations for best-performing models.
\item The method of choosing hyper parameter values and the criterion used to select among them.
\item Summary statistics of the results (e.g. mean, variance, error bars etc)
\end{itemize}
\item Reported experimental results - your paper includes:
\begin{itemize}
\item A clear description of the mathematical formula(e), algorithm and/or model.
\item Description of the computing infrastructure used.
\item The average runtime for each model or algorithm (e.g. training, inference etc) or estimated energy cost.
\item Number of parameters in each model.
\item Explanation of evaluation metrics used.
\item For publicly available software, the corresponding version numbers and links and/or references to the software.
\end{itemize}
\item Non-public source code
\begin{itemize}
\item We use non-public source code for experiments reports in this paper (if no or n/a ignore the remaining questions in this section).
\item All source code required for conducting experiments will be made publicly available upon publication of the paper with a license that allows free usage for research purposes.
\item We are unable to release a copy of the source code due to licence restrictions but have included sufficient detail for our work to be reproduced.
\end{itemize}
\end{enumerate}
\subsection{Double-blind review}
\label{section:doubleblind}
INTERSPEECH~2023 is the first conference in this series to use double-blind review, so please pay special attention to this requirement.
\subsubsection{Version submitted for review}
The manuscript submitted for review must not include any information that might reveal the authors' identities or affiliations. This also applies to the metadata in the submitted PDF file (guidance in Section \ref{section:pdf_sanitise}), uploaded multimedia, online material (guidance in Section \ref{section:multimedia}), and references to pre-prints (guidance in Section \ref{section:preprints}).
Take particular care to cite your own work in a way that does not reveal that you are also the author of that work. For example, do not use constructions like ``In previous work [23], we showed that \ldots'' but instead use something like ``Jones et al. [23] showed that \ldots''.
Authors who reveal their identity may be asked to provide a replacement manuscript. Papers for which a suitable replacement is not provided in a timely manner may be withdrawn.
Note that the full list of authors must still be provided in the online submission system, since this is necessary for detecting conflicts of interest.
\subsubsection{Camera-ready version}
Authors should include names and affiliations in the final version of the manuscript, for publication. \LaTeX\xspace users can do this simply by uncommenting \texttt{\textbackslash interspeechcameraready}. The maximum number of authors in the author list is 20. If the number of contributing authors is more than this, they should be listed in a footnote or the Acknowledgements section. Include the country as part of each affiliation. Do not use company logos anywhere in the manuscript, including in affiliations and Acknowledgements. After acceptance, authors may of course reveal their identity in other ways, including: adjusting the wording around self-citations; adding further acknowledgements; updating multimedia and online material.
\subsubsection{Pre-prints}
\label{section:preprints}
Authors should comply with the policy on pre-prints, which can be found on the conference website. Note that this policy applies not only to pre-prints (e.g., on arXiv) but also to other material being placed in the public domain that overlaps with the content of a submitted manuscript, such as blog posts.
Do not make any reference to pre-print(s) -- including extended versions -- of your submitted manuscript. Note that ISCA has a general policy regarding referencing publications that have not been peer-reviewed (Section \ref{section:references}).
\section{Related work}
\subsection{Layout}
Authors should observe the following specification for page layout by using the provided template. Do not modify the template layout! Do not reduce the line spacing!
\subsubsection{Page layout}
\begin{itemize}
\item Paper size must be DIN A4.
\item Two columns are used except for the title section and for large figures that may need a full page width.
\item Left and right margin are \SI{20}{\milli\metre} each.
\item Column width is \SI{80}{\milli\metre}.
\item Spacing between columns is \SI{10}{\milli\metre}.
\item Top margin is \SI{25}{\milli\metre} (except for the first page which is \SI{30}{\milli\metre} to the title top).
\item Bottom margin is \SI{35}{\milli\metre}.
\item Text height (without headers and footers) is maximum \SI{235}{\milli\metre}.
\item Page headers and footers must be left empty.
\item No page numbers.
\item Check indentations and spacing by comparing to the example PDF file.
\end{itemize}
\subsubsection{Section headings}
Section headings are centred in boldface with the first word capitalised and the rest of the heading in lower case. Sub-headings appear like major headings, except they start at the left margin in the column. Sub-sub-headings appear like sub-headings, except they are in italics and not boldface. See the examples in this file. No more than 3 levels of headings should be used.
\subsubsection{Fonts}
Times or Times Roman font is used for the main text. Font size in the main text must be 9 points, and in the References section 8 points. Other font types may be used if needed for special purposes. \LaTeX\xspace users should use Adobe Type 1 fonts such as Times or Times Roman, which is done automatically by the provided \LaTeX\xspace class. Do not use Type 3 (bitmap) fonts. Phonemic transcriptions should be placed between forward slashes and phonetic transcriptions between square brackets, for example \textipa{/lO: \ae nd O:d3/} vs. \textipa{[lO:r@nO:d@]}, and authors are encouraged to use the terms `phoneme' and `phone' correctly \cite{moore19_interspeech}.
\subsubsection{Hyperlinks}
For technical reasons, the proceedings editor will strip all active links from the papers during processing. URLs can be included in your paper, if written in full, e.g., \url{https://www.interspeech2023.org/call-for-papers}. The text must be all black. Please make sure that they are legible when printed on paper.
\subsection{Figures}
Figures must be centred in the column or page. Figures which span 2 columns must be placed at the top or bottom of a page.
Captions should follow each figure and have the format used in Figure~\ref{fig:speech_production}. Diagrams should be preferably be vector graphics. Figures must be legible when printed in monochrome on DIN A4 paper; a minimum font size of 8 points for all text within figures is recommended. Diagrams must not use stipple fill patterns because they will not reproduce properly in Adobe PDF. Please use only solid fill colours in diagrams and graphs. All content should be viewable by individuals with colour vision deficiency (e.g., red-green colour blind) which can be achieved by using a suitable palette such one from \url{https://colorbrewer2.org} with the `colorblind safe' and `print friendly' options selected.
\subsection{Tables}
An example of a table is shown in Table~\ref{tab:example}. The caption text must be above the table. Tables must be legible when printed in monochrome on DIN A4 paper; a minimum font size of 8 points is recommended.
\begin{table}[th]
\caption{This is an example of a table}
\label{tab:example}
\centering
\begin{tabular}{ r@{}l r }
\toprule
\multicolumn{2}{c}{\textbf{Ratio}} &
\multicolumn{1}{c}{\textbf{Decibels}} \\
\midrule
$1$ & $/10$ & $-20$~~~ \\
$1$ & $/1$ & $0$~~~ \\
$2$ & $/1$ & $\approx 6$~~~ \\
$3.16$ & $/1$ & $10$~~~ \\
$10$ & $/1$ & $20$~~~ \\
$100$ & $/1$ & $40$~~~ \\
$1000$ & $/1$ & $60$~~~ \\
\bottomrule
\end{tabular}
\end{table}
\subsection{Equations}
Equations should be placed on separate lines and numbered. We define
%
\begin{align}
x(t) &= s(t') \nonumber \\
&= s(f_\omega(t))
\end{align}
%
where \(f_\omega(t)\) is a special warping function. Equation \ref{equation:eq2} is a little more complicated.
%
\begin{align}
f_\omega(t) &= \frac{1}{2 \pi j} \oint_C
\frac{\nu^{-1k} \mathrm{d} \nu}
{(1-\beta\nu^{-1})(\nu^{-1}-\beta)}
\label{equation:eq2}
\end{align}
%
\begin{figure}[t]
\centering
\includegraphics[width=\linewidth]{figure.pdf}
\caption{Schematic diagram of speech production.}
\label{fig:speech_production}
\end{figure}
\subsection{Style}
Manuscripts must be written in English. Either US or UK spelling is acceptable (but do not mix them).
\subsubsection{References}
\label{section:references}
It is ISCA policy that papers submitted to INTERSPEECH should refer to peer-reviewed publications. References to non-peer-reviewed publications (including public repositories such as arXiv, Preprints, and HAL, software, and personal communications) should only be made if there is no peer-reviewed publication available, should be kept to a minimum, and should appear as footnotes in the text (i.e., not listed in the References).
References should be in standard IEEE format, numbered in order of appearance, for example \cite{Davis80-COP} is cited before \cite{Rabiner89-ATO}. For longer works such as books, provide a single entry for the complete work in the References, then cite specific pages \cite[pp.\ 417--422]{Hastie09-TEO} or a chapter \cite[Chapter 2]{Hastie09-TEO}. Multiple references may be cited in a list \cite{Smith22-XXX, Jones22-XXX}.
\subsubsection{International System of Units (SI)}
Use SI units, correctly formatted with a non-breaking space between the quantity and the unit. In \LaTeX\xspace this is best achieved using the \texttt{siunitx} package (which is already included by the provided \LaTeX\xspace class). This will produce
\SI{25}{\milli\second}, \SI{44.1}{\kilo\hertz} and so on.
\begin{table}[b!]
\caption{Main predefined styles in Word}
\label{tab:word_styles}
\centering
\begin{tabular}{ll}
\toprule
\textbf{Style Name} & \textbf{Entities in a Paper} \\
\midrule
Title & Title \\
Author & Author name \\
Affiliation & Author affiliation \\
Email & Email address \\
AbstractHeading & Abstract section heading \\
Body Text & First paragraph in abstract \\
Body Text Next & Following paragraphs in abstract \\
Index & Index terms \\
1. Heading 1 & 1\textsuperscript{st} level section heading \\
1.1 Heading 2 & 2\textsuperscript{nd} level section heading \\
1.1.1 Heading 3 & 3\textsuperscript{rd} level section heading \\
Body Text & First paragraph in section \\
Body Text Next & Following paragraphs in section \\
Figure Caption & Figure caption \\
Table Caption & Table caption \\
Equation & Equations \\
\textbullet\ List Bullet & Bulleted lists \\\relax
[1] Reference & References \\
\bottomrule
\end{tabular}
\end{table}
\section{Specific information for Microsoft Word}
For ease of formatting, please use the styles listed in Table \ref{tab:word_styles}. The styles are defined in the Word version of this template and are shown in the order in which they would be used when writing a paper. When the heading styles in Table \ref{tab:word_styles} are used, section numbers are no longer required to be typed in because they will be automatically numbered by Word. Similarly, reference items will be automatically numbered by Word when the ``Reference'' style is used.
If your Word document contains equations, you must not save your Word document from ``.docx'' to ``.doc'' because this will convert all equations to images of unacceptably low resolution.
\section{Results}
Information on how and when to submit your paper is provided on the conference website.
\subsection{Manuscript}
Authors are required to submit a single PDF file of each manuscript. The PDF file should comply with the following requirements: (a) no password protection; (b) all fonts must be embedded; (c) text searchable (do ctrl-F and try to find a common word such as ``the''). The conference organisers may contact authors of non-complying files to obtain a replacement. Papers for which an acceptable replacement is not provided in a timely manner will be withdrawn.
\subsubsection{Embed all fonts}
It is \textit{very important} that the PDF file embeds all fonts! PDF files created using \LaTeX, including on \url{overleaf.com}, will generally embed all fonts from the body text. However, it is possible that included figures (especially those in PDF or PS format) may use additional fonts that are not embedded, depending how they were created.
On Windows, the bullzip printer can convert any PDF to have embedded and subsetted fonts. On Linux \& MacOS, converting to and from Postscript will embed all fonts:
\\
\noindent\textsf{pdf2ps file.pdf}\\
\noindent\textsf{ps2pdf -dPDFSETTINGS=/prepress file.ps file.pdf}
\subsubsection{Sanitise PDF metadata}
\label{section:pdf_sanitise}
Check that author identity is not revealed in the PDF metadata. The provided \LaTeX\xspace class ensures this. Metadata can be inspected using a PDF viewer.
\subsection{Optional multimedia files or links to online material}
\label{section:multimedia}
\subsubsection{Submitting material for inclusion in the proceedings}
INTERSPEECH offers the option of submitting multimedia files. These files are meant for audio-visual illustrations that cannot be conveyed in text, tables and graphs. Just as with figures used in your manuscript, make sure that you have sufficient author rights to all other materials that you submit for publication. The proceedings will NOT contain readers or players, so be sure to use widely accepted formats, such as MPEG, WAVE PCM (.wav), and standard codecs.
Your multimedia files must be submitted in a single ZIP file for each separate paper. Within the ZIP file you can use folders to organise the files. In the ZIP file you should include a \texttt{README.txt} or \texttt{index.html} file to describe the content. In the manuscript, refer to a multimedia illustration by filename. Use short file names with no spaces.
The ZIP file you submit will be included as-is in the proceedings media and will be linked to your paper in the navigation interface of the proceedings. The organisers will not check that the contents of your ZIP file work.
Users of the proceedings who wish to access your multimedia files will click the link to the ZIP file which will then be opened by the operating system of their computer. Access to the contents of the ZIP file will be governed entirely by the operating system of the user's computer.
\subsubsection{Online resources such as web sites, blog posts, code, and data}
It is common to provide links in manuscripts to web sites (e.g., as an alternative to including a multimedia ZIP file in the proceedings), code repositories, data sets, or other online resources. Provision of such materials is generally encouraged; however, they should not be used to circumvent the limit on manuscript length.
Authors must take particular care not to reveal their identity during the reviewing period. If a link to a particular resource is included in the version of the manuscript submitted for review, authors should attempt to provide an anonymised version of that resource for the purposes of review. If this is not possible, and a linked resource will inevitably reveal author identity, then authors should provide a warning to the reviewers, e.g., by linking to a special landing page that states `Clicking this link will reveal author identities'.
Online resources should comply with the policy on pre-prints, which can be found on the conference web site.
\section{Discussion}
Authors must proofread their PDF file prior to submission, to ensure it is correct. Do not rely on proofreading the \LaTeX\xspace source or Word document. \textbf{Please proofread the PDF file before it is submitted.}
\section{Conclusions}
\lipsum[66]
\section{Acknowledgements}
\ifinterspeechfinal
The INTERSPEECH 2023 organisers
\else
The authors
\fi
would like to thank ISCA and the organising committees of past INTERSPEECH conferences for their help and for kindly providing the previous version of this template.
As a final reminder, the 5th page is reserved exclusively for references. No other content must appear on the 5th page. Appendices, if any, must be within the first 4 pages. The references may start on an earlier page, if there is space.
\bibliographystyle{IEEEtran}
\bibliography{mybib}
\end{document}