Intelligent Signal Processing Coursework for midterm

Please Note:

You are permitted to upload your Coursework in the final submission area as many times as you like before the deadline. You will receive a similarity/originality score which represents what the Turnitin system identifies as work similar to another source. The originality score can take over 24 hours to generate, especially at busy times e.g. submission deadline.
If you upload the wrong version of your Coursework, you are able to upload the correct version of your Coursework via the same submission area. You simply need to click on the ‘submit paper’ button again and submit your new version before the deadline. In doing so, this will delete the previous version which you submitted and your new updated version will replace it. Therefore your Turnitin similarity score should not be affected. If there is a change in your Turnitin similarity score, it will be due to any changes you may have made to your Coursework.
Please note, when the due date is reached, the version you have submitted last, will be considered as your final submission and it will be the version that is marked.
Once the due date has passed, it will not be possible for you to upload a different version of your assessment. Therefore, you must ensure you have submitted the correct version of your assessment which you wish to be marked, by the due date.

Your overall total word count should not exceed 2500 words (Weighted at 50% of the final mark for the module)

Coursework Description

The midterm coursework for Intelligent Signal Processing consists of four individual exercises. These exercises cover the first five topics of the course:

Digitising, representing, and storing audio signals
Editing and processing digital audio
Frequency domain representations
Extracting features from audio signals
Speech recognition.

The exercises are strongly based on the subjects covered during the course but also invite the student to further investigation.

Write My Assignment

Hire a Professional Essay & Assignment Writer for completing your Academic Assessments

Native Singapore Writers Team

100% Plagiarism-Free Essay
Highest Satisfaction Rate
Free Revision
On-Time Delivery

It is recommended that the students carefully read all the sections of this document, both to ensure a good understanding of the coursework exercises, in addition to know what to submit.

Submission Requirements

There are four exercises, and you are required to submit the following files:

One PDF file containing a report for all exercises.
One PDF file containing code you developed with annotations for all exercises.
One ZIP file (containing sub-folders that include files for each exercise, i.e., ex1, ex2, ex3, ex4.)
- Please exclude the data files supplied to you in this Zip file to keep the size smaller.
One Video demonstration for all exercises in an MP4 format (maximum six minutes duration)

Please note that you MUST submit the supporting (B) Code PDF file & (D) Video demonstration of all exercises otherwise NO MARKS will be awarded for (C) code implementation tasks.

Exercise 1

The goal of this exercise is to create a web-based audio application using p5.js and its library p5.sound that processes a pre-recorded sound file, sending the processed audio signal to the computer’s speakers or audio output. Optionally, the user could also record the processed audio signal as a digital audio file on the computer’s drive.

The application should include the following effects: low-pass filter, waveshaper distortion, dynamic compressor, reverb and master volume.

Figure 1. Schema of the GUI of the application.

Figure 2. Internal signal flow of the application.

Task 1.1: Web Application with p5.js

The functionality of the web application should meet the following requirements:

The application should include the playback controls and effects controls shown in Figure 1.
Internally, the effects must be connected in a chain, as shown in Figure 2.
The application should include a Record button that allows the user to start/stop recording the processed signal as a WAV file.
The application must display both the spectrum of the original sound and the spectrum of the processed sound.

Task 1.2: Enhanced filter effects

Enhance the filter effect by adding a type selector that allows the user to select between a low-pass, high-pass or band-pass filter.
Allow the user to select between the live microphone input and the pre-recorded audio file as the audio source for the application.
Configure a delay audio effect and add this to the audio chain before the dynamic compressor.

Task 1.3: Report

A written report of approximately 500 words that includes:

A brief description of the main characteristics of each effect and how they have been programmed in task 1.1.
A brief analysis of the application discussing how the low-pass filter and the master volume effects affect the sound’s spectrum in task 1.1. This should also be illustrated through screenshots.
A brief description and analysis of the enhanced filter effects in task 1.2.

Exercise 2

Task 2.1: Audio Captcha

The goal of this exercise is to create a new Audio captcha method on a web-based audio application using p5.js and its library p5.speech that processes a pre-recorded sound file, sending the processed audio signal to the computer’s speakers or audio output.

The new Audio captcha method should make it difficult for computer applications to automate but the voice is scrambled enough for humans to understand.

Design and develop your own approach for randomly generated Audio Captcha with suitable justification on how filtering options available using p5.js and its library p5.speech are used for processing a pre-recorded sound file, sending the processed audio signal to the computer’s speakers or audio output. See Figure 3 as an example of a typical Audio Captcha on an interactive web page.
Justify and discuss your implementation with fragments of code and the final result in your report.
Highlight future enhancements to strengthen audio captcha capabilities.

Figure 3. Example of Audio Captcha.

[Input Textbox] Type Your Answer
[Information] Play the audio file and submit your answer.
[Button] Submit
[Result] Status message
[Visualise] Include audio visualisation based on the audio being played.

Task 2.2: Audio Visualisation with Meyda

Develop an interactive web-based application for visualising music files. The application must be based on p5.js, p5.speech and the JavaScript audio feature extraction library Meyda.

A local DJ has sent you three sounds (Ex2_sound1.wav, Ex2_sound2.wav and Ex2_sound3.wav) and you have to select Meyda audio features that could help represent these sounds visually in an appropriate manner. For example, if the ‘brightness’ of one of the sounds radically changes over time, to select an audio feature that measures the brightness of this sound could be a good choice from the perspective of producing visual impact.

Buy Custom Answer of This Assessment & Raise Your Grades

Get A Free Quote

Complete the following table to list the three Meyda audio features selected for each sound and justify your selections.

Meyda audio features	Justification
Sound 1
Sound 2
Sound 3

Figure 4. Audio visualisation example.

You could use the image of Figure 4 as an inspiration. The visual variables could include:

Number of rectangles.
Rectangle size.
Rectangle fill colour.
Rectangle border size.
Rectangle border colour.
Rectangle fill colour opacity.
Rectangle border opacity.
Rectangle rotation.
Background colour.

You have the full freedom to choose which audio features to use and how to map them to the visual variables.

Task 2.3: Voice Controller

Incorporate a voice control system to visualise other music files (i.e., Kalte_Ohren_(_Remix_).mp3 *) using p5.speech, that could recognise voice commands such as:

a. Black, White, Red, Blue, Green: to change the background colour to one of these colours.

b. Square, Triangle, Circle, Pentagon: to change the shape of the generated figures to one of these shapes.

(*) Kalte Ohren (Remix) by Dysfunction_AL (c) copyright 2019 Licensed under a Creative Commons Attribution (3.0) license. http://dig.ccmixter.org/files/destinazione_altrove/59536 Ft: Starfrosch, Kara Square

Task 2.4: Report

A written report of approximately 500 words containing the following:

A description and justification of the audio features and mapping implemented in Tasks 2.1 & 2.2 from a perspective of visual impact.
A brief description of the voice controller in task 2.3 with implementation results.

Exercise 3

Audio steganography is the art of hiding data in an audio signal in an imperceptible manner.

Let us assume that you work at the UK Security Service (MI5) and, as an expert in audio steganography, you have to perform the following tasks:

Task 3.1: Audio Analysis

The first task consists of analysing a group of suspicious audio files (Ex3_sound1.wav, Ex3_sound2.wav, Ex3_sound3.wav and Ex3_sound4.wav), determining which one contains a four-number secret code. In this case, the spy has used a more sophisticated method. It seems he has used amplitude modulation to ‘move’ the secret code to an ultrasonic range of frequencies, then mixing this code with that of the suspicious audio files.

To solve this task, you must create an application in Python (exercise 3.1) that should detect which one of the audio files seems to contain suspicious data in an ultrasonic range of frequencies and should be able to play the secret code in an audible range of frequencies.

Stuck with a lot of homework assignments and feeling stressed ? Take professional academic assistance & Get 100% Plagiarism free papers

Get A Free Quote

Task 3.2: Embedding hidden messages

The second task consists of creating a more sophisticated algorithm for embedding hidden messages based on the LSB audio steganography method (exercise 3.2). You will create an application in Python and use the audio file Ex3_sound5.wav to embed the secret message ‘An eye for an eye makes the whole world blind’. The application must include an algorithm that performs the opposite operation (i.e. an algorithm able to extract the hidden message embedded in Ex3_sound5.wav).

You could, for example, to distribute the hidden message through non-consecutive audio samples using a random pattern, use more than one least significant bits to hide the secret data, etc.

Task 3.3: Further investigation of state-of-the-art audio steganographic approaches

The final task consists of writing a brief investigation report to identify a state-of-the-art audio steganographic approach not covered in the module. Provide a technical analysis of the characteristics, implementation, benefits and limitations of the approach. Compare this approach with a Least Significant Bit (LSB) algorithm.

Task 3.4: Report

A written report in approximately 1200 words. This report must include:

Brief description and results of audio file analysis and embedding in tasks 3.1 and 3.2.
Short review on findings on state-of-the-art audio steganographic for task 3.3.

Exercise 4

A software development company has contacted you to create a speech recognition system to integrate into a Python project they are developing. In particular, the project consists of an airport virtual assistant.

Task 4.1: Automatic Speech Recognition (ASR) System

You have to build a prototype of the application (exercise 4) that should meet the following requirements:

The application must be written in Python.
Your client prefers to host the automatic speech recognition (ASR) software package in the application to avoid slow-downs or interruptions to the system in the event of issues with the internet connection.
The client is unsure which speech recognition software package to employ. Use one of the speech recognition systems illustrated in this module and provide your justifications.
The application must be capable of language selection and at the very least compatible with the following languages: English, Italian and Spanish (see document Ex4_models.pdf).
The airport virtual assistant will be installed in an environment that can be extremely noisy. So, the speech recognition system should be configured to be able to handle this situation. Your client gives you the freedom to implement any solution (for example, to configure in Python a gain/amplification, low pass filter, or some other audio filter to improve the error rate).
The company has prepared a set of audio files with which you can evaluate the system. For this evaluation, you will test how well it recognises several phrases in each language. You also have to record and evaluate two short sentences (your_sentence1.wav and your_sentence2.wav). Feel free to prepare your own sentences.

The output of your work should be a table with the following information, where the Word error rate (WER) (*) is the word error rate for the phrase (see Ex4_audio_files.zip):

Language	File	WER
English	suitcase.wav	0%
Spanish	maleta.wav	25%
English	your_sentence1.wav	0%
English	your_sentence2.wav	20%
…	…	…

Please note that in this step of the project, you have to build a prototype, so you have to focus on the functionality of the application rather than on its visual design.

(*) Word error rate (WER) can be computed as:

𝑊𝐸𝑅 = (𝑆 + 𝐷 + 𝐼)/𝑁 × 100

where

𝑆 is the number of substitutions,
𝐷 is the number of deletions,
𝐼 is the number of insertions,
𝑁 is the number of words in the sentence

Task 4.2: Investigate on state-of-the-art ASR systems

Identify and critically discuss at least two other recent ASR systems in comparison to the ASR system you used in task 4.1.

Recommend a suitable ASR system that the company should use in the future, or if you do not think any of them are suitable, you should say that. Justify your findings with supportive academic references.

Please note that you are not expected to program other offline ASR systems and compare their results for this task.

Task 4.3: Report

A written report in approximately 800 words. This report must include:

A brief description of the ASR software package used and how it has been configured.
A brief analysis of the solution applied to the issue of the noisy environment.
An analysis of the result of the test based on the set of audio files provided by the client and your own recordings.
A brief discussion and analysis of other ASR libraries.

Write My Assignment

Hire a Professional Essay & Assignment Writer for completing your Academic Assessments

Native Singapore Writers Team

100% Plagiarism-Free Essay
Highest Satisfaction Rate
Free Revision
On-Time Delivery

Deliverables

Exercises Report
Submit a single report in a PDF format containing discussions on tasks carried out for all exercises. Exercise-specific tasks discussions are listed below. Generate separate links and include them in the report for the applications running on a web page using the Coursera static web page function for exercises 1 & 2 and the Jupyter Notebook links in the Coursera environment for exercises 3 & 4.
Code report
Copy/paste the textual code you have developed for all four exercises with appropriate annotations in a single PDF file.
Code files
The source code of the application for all four exercises is in separate folders and compressed in one ZIP file.
Please note, for exercise 4:
- You do not have to include the acoustic models used.
- Include the audio files as your_sentence1.wav and your_sentence2.wav file names.
- Include any source code used to perform the evaluation of other ASR systems (optional further investigation).
Video demonstration
A screencast recording demonstrating that the application meets all the exercise requirements in an MP4 format (maximum length of SIX minutes).
- Please ensure the screencast recording contains the following:
  - Your voice-over as a minimum (not computer-generated) and no background music that is irrelevant to the exercises.
  - Demonstrate the application being compiled and running for all exercises.
  - Brief discussions on the core logic for each coding exercise.
- For screen recording, you can use applications such as Google Chrome Extensions such as Screen recorder, and Screencastify – Screen Video Recorder. For video editing, you can use open source tools: Shotcut, kdenlive or other tools such as DaVinci, VideoPad, Movie Maker, VSDC, Openshot, and iMovie.

Marking Criteria

Descriptions	Marks
A. Report in PDF
No submission, unable to open or irrelevant	0
Includes information about some exercises in the report	0.5
Includes information about all exercises in the report	1
Exercise 1 (approximately 500 words)
The written report includes a brief description of the main characteristics of each effect and how they have been programmed for exercise 1.1.	3
The written report includes a brief analysis of the application discussing how the low-pass filter and the master volume effects affect the sound’s spectrum for exercise 1.1.	3
A brief description and analysis of the enhanced filter effects for exercise 1.2.	4
Exercise 2 (approximately 500 words)
The written report includes brief design & implementation discussions with justifications and future enhancements for exercise 2.1.	3
The written report includes a description, completed table and justification of the audio features and mapping implemented in exercise 2.2 from a perspective of visual impact.	3
The written report includes descriptions and logic of voice controller implementation exercise 2.3.	4
Exercise 3 (approximately 500 words)
The written report includes the results of the audio file analysis and how the secret message was embedded in exercises 3.1 and 3.2.	5
The written report includes a brief review of the audio steganographic approach in exercise 3.3	5
Exercise 4 (approximately 800 words)
The written report includes implementation descriptions and results for the automatic speech recognition (ASR) system in exercise 4.1.	4
The written report includes a brief review of the state-of-the-art ASR system in exercise 4.2.	8
B. Code files
No submission, unable to open, or irrelevant	0
Most if not all exercise files submitted	0.5
Includes code files for all exercises in an appropriate folder structure	1
Implementation
Exercise 1
The application includes the requested playback controls, and these have been satisfactorily implemented for exercise 1.1. In particular, the Record button allows the user to record the processed audio signal in WAV format.	5
The effects have been connected in a chain for exercise 1.2. The chain is functioning properly, and the user can listen to the processed audio signal.	2
The filters have been correctly configured and include the requested controls for exercise 1.2.	3
Exercise 2
The audio captcha in exercise 2.1 allows users to play filtered audio and user input is validated.	5
In the application exercise 2.2, the audio feature extraction and the mapping between visual variables and audio features have been correctly configured.	2
The application visualises the song in an appropriate manner for exercise 2.2.	4
The application voice controls the visualisation of the song for exercise 2.3.	4
Exercise 3
The application exercise 3.1 correctly detects which one of the audio files includes the secret code.	3
The application exercise 3.1 is able to play, in an intelligible way, the secret code hidden in one of the audio files.	3
The application exercise 3.2 embeds the required message in Ex3_sound5.wav using your own system based on the Least Significant Bit (LSB) audio steganography method.	3
The application exercise 3.2 includes an algorithm able to extract the hidden message embedded in Ex3_sound5.wav.	3
The Jupyter notebook for exercises 3.1 & 3.2 includes markdown cells describing the code in detail.	3
Exercise 4
The application satisfactorily passes the test in English (WER < 25%) based on the set of audio files for exercise 4.1.	2
The application satisfactorily passes the test in Italian (WER < 35%) based on the set of audio files for exercise 4.1.	2
The application satisfactorily passes the test in Spanish (WER < 35%) based on the set of audio files for exercise 4.1.	2
The application includes solutions that attenuate the issue of the noisy environment for exercise 4.1.	2
The application should generate and include the audio files for exercise 4.1 as “your_sentence1.wav” and “your_sentence2.wav” in the submission.	3
C. Code file PDF
No submission, unable to open, or irrelevant	0
Partial code files in the PDF provided with limited or no annotations	0.5
Includes all developed code for all exercises with appropriate annotations	1
D. Video demonstration
No submission, unable to open, irrelevant or unclear what is demonstrated.	0
Poor video demonstration showing limited or no completion of any of the exercises with some or no voice narration.	1
Adequate video demonstration file was submitted but shows some exercises partially completed with attempted voice narration.	2
Reasonable video demonstration file showing completion of most of the exercises with attempted voice narration.	3
Good video demonstration file was submitted which shows most completions of the exercises with clear voice narration and concise core code explanations.	4
Excellent video demonstration of all exercises with appropriate voice narration and core code walkthrough.	5
Total	100

Plagiarism, Penalties and other regulations

Please refer to the latest programme regulations documents made available to you within the virtual learning environment platform.

Buy Custom Answer of This Assessment & Raise Your Grades

Get A Free Quote

University	Singapore University of Social Science (SUSS)
Subject	CM3065 Intelligent Signal Processing

CM3065 Intelligent Signal Processing Assignment Report: Midterm Exercises on Audio Captcha, Steganography & Speech Recognition, Singapore

Hire a Professional Essay & Assignment Writer for completing your Academic Assessments

Ask Your Homework Today!