Introduction

The intention is to introduce PIC32 into ece4760 in the Fall 2015 semester. This is a start at understanding how the chip works. The general way to learn this processor is to look at examples which come with the MPLAB distrubution, Use the MPLAB help to understand the plib syntax, Use the MCU datasheet to figure out the size and meaning of the data field in each control register, then use plib.h, or the header files it includes, to get the actual values of the constants used in the examples for each data field. Then iterate. See links at the end of this page.

Also refer to Tahmid's Blog for other experiments on the PIC32. Discussions with Syed Tahmid Mahbub have been essential for my learning to use the PIC32.

  1. NTSC video synthesis and output
    --NTSC video is an old standard, but is still used in North America for closed circuit TV. It is fairly simple to generate a black/white NTSC signal. Also, the frame buffer for a 1-bit, 256x200 pixel image is only 1600 words (6400 bytes) of RAM. Chapter 13 of Programming 32-bit Microcontrollers in C: Exploring the PIC32 by Lucio Di Jasio was very useful. I used Di Jasio's method of generating sync pulses using one output-compare unit. Video is sent to the SPI controller using DMA bursts from memory (also similar to Di Jasio), but DMA timing-start control was implemented using another output-compare unit rather than chaining two DMA channels. This allowed easy control of video content timing. Timer2 is ticking away with an match time equal to one video line time. Ouput-compare 2 is slaved to timer2 to generate a series of pulses at the line-rate. The duration of the OC2 pulses (for vertical sync) is controlled by the Timer2 match ISR in which a simple state machine is running, but the pulse durations are not dependent on ISR execution time. Output-compare 3 is also slaved to timer2 and set up to generate an interrupt at a time appropriate for the end of the NTSC back porch, at which time the DMA burst to the SPI port starts. I got best video stability when the core is running at 60 MHz and the peripheral bus running at 30 MHz. The first example is just a bounding ball with some text. The example requires that the ascii character header file be in the project folder.
    --The second example is a particle system explosion. Without doing any space optimization 1500 particles (along with screen buffer) use up memory. All the positions can be updated in every frame. Giving each particle a high initial velocity, and high drag makes a nice cloud.
    -- The third example is a particle system fountain, which is a slight modification of the explosion. I optimized the point-draw and one ISR for more efficient execution. Frame update now takes 7.2 mSec. Video. The overhead for NTSC TV signal generation is about 5 microSec per 63.5 microSec line, or about 8%.
  2. SPI control of a AD7303 DAC
    -- It is useful to get a serial channel running for fairly high speed peripherials. The first device I tried is an Analog Devices AD7303. It is a two channel, 8-bit DAC with buffered voltage output. The channels may be updated simultaneously or separately. Each channel write requires a two-byte transfer to the DAC. The first is a control byte, and the second is the channel data byte. The control byte specifies which channel will be updated as well as the update mode. Each two-byte transfer must be signaled by dropping the voltage on a SYNC pin before the beginning of the transfer, then raising it at the end. Like most microcontrollers the PIC32 SPI interface is simple enough to handle that direct register manipulation is probably the easiest, although the higher level SpiChnOpen function also worked well. The SPI standard supports four clock phases. The microconctoller master has to match the requirements of the slave. This is often the most annoying part of getting SPI running. Careful analysis of the slave datasheet is required. The AD7303 requires the slave to generate a clock frequency less than 30 MHz, and expects the data to be stable on the the positive clock edge. The required configuration is
    SpiChnOpen(spiChn, SPI_OPEN_ON | SPI_OPEN_MODE16 | SPI_OPEN_MSTEN | SPI_OPEN_CKE_REV | SPI_OPEN_CKP_HIGH , spiClkDiv);
    or equivalently:
    SPI1CON = 0x8560 ; // SPI on, 16-bit, master, CKE=1, CKP=1
    //The SPI baudrate BR is given by: BR=Fpb/(2*(SPI1BRG+1))
    SPI1BRG = 0; // Fperipheralbus/2

    The basic SPI transaction is to start a simultaneous send/receive. On the DAC used here, no useful data is received, but you must do the receive operation to reset the SPI1STATbits.SPIRBF flag. For this application the SPI transaction is
    mPORTBClearBits(BIT_0); // start transaction
    SPI1BUF = DAC_cntl_1 | DAC_value ; // write to SPI
    while( !SPI1STATbits.SPIRBF); // check for complete transmit
    junk = SPI1BUF ; // read the received value (not used by DAC in this example)
    mPORTBSetBits(BIT_0); // end transaction

    You clear the SYNC bit, write to SPI1BUF to trigger the hardware trnasmit/receive, wait for it to finish, then do the manditory read and set the SYNC bit. Connections between the two devices are shown below, assuming a certain PPS setup as shown in the code.
    AD7303 PIC32
    SCLK SCK1 is pin 25
    DIN SDO1 is PPS group 2, map to RPA1 (pin 3)
    ~SYNC PortB.0 (pin 4)
    not used SDI1 is PPS group 2, map to RPB8 (pin 17)


    In addition to the SPI protocol, each different device you attach to the SPI bus has a command syntax which is specific to the device. In this case, the first byte transmitted has the following bit definitions, while the second byte represents the voltage output in straight binary, where binary zero outputs zero volts and binary 0xff outputs Vref..
    bit 7 notINT/EXT set to notINT = 0. Use internal Vref
    bit 6 = 0 (not used)
    bit 5 LDAC load and update both channels when set
    bit 4 PDB = 0 pwer down channel B
    bit 3 PDA = 0 pwer down channel A
    bit 2 notA/B = 0 chooses A
    bit 1 CR1=0 control bits modify load mode
    bit 0 CR0=1 set to load A from SR

    The actual commands I used here:
    Command: Load A from shift register: DAC_cntl_1 = 0b00000001 ;
    Command : Load B from SR and and update both outputs: DAC_cntl_2 = 0b00100100 ;The following image shows the SYNC on the top trace and the SCK1 on the bottom trace. The core frequency and peripheral bus frequency are set 40 mHz. The SCK1 is running at Fpb/2=20 MHz. The total transaction time for the two channels is 2.6 microSec. The second image shows the DAC outputing a DDS sawtooth on one channel and the ADC input on the other at a sampling rate of 100 KHz. Setting the core and peripheral bus to 60 MHz runs the AD7303 at its maximum bus speed and drops the total time to transmit one 16-bit transaction to 850 nS and both channels to 1.75 microSec. The code in the ISR was arranged so that all ISR housekeeping is being done while the SPI hardware does the transmit.
    spi1 spi2

  3. IIR filters for DSP
    I decided to start by implementing Butterworth IIR filters using the fixed point formats below. The first step is to find out how much the limited precision arithmetic will affect the filters. I am not using the PIC32 DSP ligrary because I could not figure out the format for the constants. Filters are implemented as unfactored first or second order Direct forms or as second order sections (SOS) which take a scale factor input as well as two a-vector values. For Butterworth, the b-vector is fixed (refer to Matlab or Octave butter function). This matlab program allows you to check SOS filter response accuracy for a given bandwidth. This program uses an unfactored IIR filter design for comparision. For low-order filters (one or two samples), unfactored IIR filters will be faster, while being accurate enough.
    -- A one-sample (RC type) lowpass filter executes in 89 cycles on the PIC32 as shown in this C program. The filter uses an array for the filter coefficients and another for the filter history. Inserting this filter into the ADC-to-DAC code in a section below results in a program in which the ISR samples the ADC, filters, scales and outputs to the DAC in 2.7 microseconds, including ISR overhead (60 MHz core clock). In this code, the sample rate Fs=100 kHz, the filter cutoff is 0.01/(Fs/2)=500 Hz. Actual cutoff measured in 490 Hz. This filter has less than 1% error above a cutoff=0.002.
    // coeff = {b2, -a2} noting that b1=b2 and a1=1 (for first order Butterworth)
    // history = { last_input, last_output}
    fix16 coeff[2], history[2], output, input ;
    fix16 IIR_butter_1_16(fix16 input, fix16 *coeff, fix16 *history )
    {
        fix16 output;
        output = multfix16(input+history[0], coeff[0]) + multfix16(history[1], coeff[1]) ;
        history[0] = input ;
        history[1] = output ;
        return output ;
    } 

    -- A second order Butterworth lowpass has less than 1% response error down to cutoff=0.04. At cutoff=0.1 (and Fs=100 kHz) the measured cutoff frequency is very close to 5 kHz, as predicted. The ISR takes 3.5 microSec to execute (60 MHz core clock).
    -- A second order Butterworth bandpass can be set down to a bandwidth of about 0.003 with less than 1% error. At cutoff=[0.1, 0.11] (and Fs=100 kHz) the measured peask response is 5.25 kHz and the cutoffs are at 4.98 and 5.5 kHz, as predicted. Tthe DC component of the input is removed by the highpass characteristic of the filter, so a DC correction has to be made because the DAC can only produce a positive voltage. In the ISR, 128 is added to the 8 bit DAC value DAC_value = (output>>6) + 128 ;.
    -- A generalized SOS Butterworth lowpass executes in 7.5 microSec (ISR, 60 MHz core clock, 4-pole). The 4-pole version with a cutoff=0.1 has the predicted resopnses near the cutoff frequency and where the response drops to 0.1 of peak. Six-pole takes 10 microSec, 8-pole takes 12.5 microSec. The sample frequency is set to 20 kHz. This matlab program computes the filter parameters for SOS and prints C code to the matlab console window to paste into the program for both lowpass and bandpass filters.
    -- A generalized SOS Butterworth bandpass filter which uses the matlab progam mentioned just above to construct parameters. The example is set to a bandwidth of 0.002, about as narrow as you can get with 2.14 format fixed coefficients.
    -- The SOS filters above are easy to read, but slow to execute because of the 2D arrays. Unrolling the loops, and making all the indices constant, speeds up the filters by almost a factor of two. This program uses the UART interface to profile number of cycles to execute. Putting the revised SOS filters back into the realtime program with ADC and DAC gives an ISR time of 7.7 microSec for the 4-section, eight-pole, bandpass filter. Almost all of the time in the ISR is the filter. A reasonable rule would be 2 microSec/pole of filtering. At 8 kHz sample rate, could do about 60 poles of filtering, or about fifteen 4-pole filters.


  4. Fixed point arithmetic performance
    -- Fixed point arithmetic is the first step to building DSP functions. I decided to implement 2.30 and 2.14 formats. This means two bits to the left of the binary-point, one of which is the sign bit. The dynamic range of the systems is either -2 to 2-2-14 or -2 to 2-2-30. The resolution is either 2-14=6*10-5 or 2-30=9*10-10. The resolution is necessary to make stable, accurate, filters. The dynamic range is sufficient for Butterworth, IIR filters, made with second order sections (SOS). SOS help to minimize filter roundoff errors. This program defined the data types and macros for converting float-to-fix, fix-to-float and fixed point multiply. Add and subtract just work. The program uses timer2 to count cycles to profile the time for the add and multiply operations, then uses the UART (see section below) to print the results. The 2.30 format takes 40 cycles to to a multiply-and-accumulate (MAC) operation. The 2.14 format takes 17 cycles for a MAC operation. The 2.14 result (1.5*0.05-0.25) is in error by 4*10-5, the 2.30 result is correct to 8 places. The macros for the 2.30 follow:
    typedef signed int fix32 ;
    #define multfix32(a,b) ((fix32)(((( signed long long)(a))*(( signed long long)(b)))>>30)) //multiply two fixed 2:30
    #define float2fix32(a) ((fix32)((a)*1073741824.0)) // 2^30
    #define fix2float32(a) ((float)(a)/1073741824.0) 
  5. ADC performance
    -- The PIC32 has a 10-bit ADC which runs up to 1 MHz sample rate, although the first example is limited to a little more than 500kHz because sampling is controlled by an ISR (see below for interrupt perfomance). The first example samples one channel, with most ADC features disabled. The ADC_CLK_AUTO is turned on so that conversion immediately follows sample-aqusition, but aquisition is started manually in the ISR and goes into one slot of the buffer array where it is immediately copied into the Vref DAC for output to the scope. The first image shows the signal generator input to AN4 on channel one and the limited resolution Vref DAC on channel two. The triangle wave is set to 50 kHz. Since the sampling rate is 500 kHz there are 10 samples on each cycle, probably about the limit for sampling. There appears to be about 1.5 samples of phase delay at the positive peak, but less at the negative peak because of nonlinear loading effects noted in the Vref DAC section.
    -- The ADC specification (search on TAD in the datasheet) says that for a low Z source (<500 ohms) the ADC bit-clock period must be >65 nSec and the ADC sample period must be >132 nSec. At peripheral bus clock of 40 MHz (period 25 nSec), ADC_SAMPLE_TIME_6 should work for the sample period (150 nSec) and ADC_CONV_CLK_Tcy2, while a little fast (50 nSec), seems to work for the ADC clock.A safer code turns ADC_AUTO_SAMPLING_ON so that the ADC runs as fast as possible makes it possible to sample at 500 KHz using ADC_CONV_CLK_Tcy, which exceeds the minimum bit-clock period time by setting the bit sample time to 100 nSec. Hooking up a higher quality DAC ( 4116R-R2R-253LF resistor array) gives 8-bit resolution and allows accuracy testing of the ADC. The DAC resistor array is mapped to port B bits (lsb) 0-5 and bits 14 and 15 (msb). These port signals are pins (from lsb to msb) 4,5,6,7,11,14,25,26. The mapping does not use port B, bit-6 because the 28 pin PDIP package does not support it (see datasheet Table 1.1). Some care is needed to prevent the digital signals from coupling to the analog output. The second image shows the output from the 8-bit DAC on the bottom trace with small coupling artifacts near the 50% point.
    -- Changing the core clock to 60 MHz and the peripheral bus clock to 30 MHz allows a ADC bit-clock period of 66 nSec, exactly the minimum for the bit-clock. With ADC_AUTO_SAMPLING_ON and with ADC_SAMPLE_TIME_5 and ADC_CONV_CLK_Tcy2 the system can sample at 750 KHz with a few cycles left over for main.
    Oscillator configuration:
    #pragma config FNOSC = FRCPLL, POSCMOD = HS, FPLLIDIV = DIV_2, FPLLMUL = MUL_15, FPBDIV = DIV_2, FPLLODIV = DIV_1
    #pragma config FWDTEN = OFF
    #pragma config FSOSCEN = OFF, JTAGEN = OFF
    // core frequency we're running at // peripherals at 30 MHz
    #define	SYS_FREQ 60000000


    one channel sampling 8 bit DAC

  6. Using Vref output as a 4-bit DAC to play a WAV file
    -- As shown in the section below, the Vref generator can be used as a DAC (pin 25 on PIC32MX250F128B). While 4-bits of dynamic range is not going to hack it for playing back Grateful Dead albums, it is good enough for a quick sound effect or medium quality voice production. There are several steps. First get a low dynamic range WAV file. I use the AT&T text-to-voice site to produce a WAV file with a male voice saying the digits zero to nine. Then the WAV file is processed with a Matlab program to adjust the sample rate, truncate the PCM values to 4-bits, then pack two four bit samples into each byte for storage efficiency. Next the Matlab program produces a header file of the packed samples formated so that it is loaded into flash memory. Then the playback program running on the PIC32 traverses the packed array at 16 KHz and drops the unpacked samples onto the Vref DAC. The low sample rate means that the male voices at the AT&T site sound better because we can more heavily filter and not lose too much voice content. The following image used matlab's spectrogram utility to compare the original and 4-bit quantized sounds. The top image is the 8 KHz sampled voice. The bottom image uses the signal quantized to 16 levels, then lowpass filtered with a RC filter with a cutoff of 1700 Hz. Each digit (0 to 9) is visible and the overall structure is the same, but not as crisp. The actual playback circuit used an RC filter consisitng of a 100k resistor and 1nf capacitor to get the lowpass. The quantized, filtered matlab output sounds very much like the PIC output.
    spectrogram
    -- The 4-bit data is highly redundant. Looking at the difference between sequential samples shows that over 98% of the transitions between sequenctial samples are plus/minus one or zero. This means that if we encode the difference as a two bit number, we can make a smaller header file without losing too much information. The matlab encoder takes the differences, truncates them, resynthesises the wave from the truncated derivitive and plays the digits. Still to be done: Pack the four 2-bit difference samples into one byte and write the header file and decoder in C. The following images show the spectrogram of the raw speech after sampling to 8 kHz and the spectrogram of the waveform reconstructed from 2-bit differences.
    dcpm spectrogram
  7. Using Vref output as a 4-bit DAC, following the lead of Tahmid's Blog
    -- The Vref generator can be connected to an external pin (pin 25 on PIC32MX250F128B) and can be set to 16 values between zero and two volts. The first example generates a 16-sample square wave to investigate the settling time of the DAC. According to the Reference Manual, the output impedance at output level 0 (about zero volts) is about 500 ohms, while the output impedance at output level 15 (about 2 volts) is around 10k ohms. The first screen dump shows the Vref voltage output on the bottom trace and the same signal passed through an LM358 opamp, set up as a unity gain impedance buffer, on the top trace. Rise time (level 15) is about 0.5 microSec (to 63%) and fall time (level 0) is about 0.05 microSec. The rise/fall times are dominated by the RC circuit formed by the output impedance of Vref and the capacitance of the white board (10-20 pf) and the scope (20 pf). The LM358 is slew-rate limited and thus produces a triangle wave.
    -- The next example generates a sawtooth with a period of 128 phase increments (17.4 kHz). The bottom trace is taken directly from the Vref pin, while the top trace is from the output of the unity gain LM258 follower. Notice the slew-rate limiting on the falling edge of the sawtooth.To unload the Vref pin, the output was connected to the opamp follower through a 100k resistor. A lowpass filter using the 100k resistor and a 10 pf capacitor with a time constant of around 1 microSec smooths and denoises the opamp trace (third image).
    raw rise time sawtoothsmoothed sawtooth
    -- Any real application is going to use ISR-driven timing to output samples to the DAC. The next example uses Direct Digital Synthesis (DDS) running in an ISR at 100 kHz to generate sine waves. Tiiming the ISR using a toggled bit in MAIN, suggests that the 47 assembler instruction ISR executes (with overhead) in 1.5 microSeconds. The first image shows the DDS sine wave (but at very high frequency) on the top trace and the bit being toggled in MAIN on the bottem trace. You can clearly see the 1.5 microsecond pause in MAIN every time a new sine wave value is produced. The second image is a sine generated at Middle C (261.6 Hz). The top trace in the lowpassed opamp output. The bottom is the raw Vref pin.The code is structured as a timer ISR running the DDS. The output frequency is settable within a millHertz, but accuracy is determined by the cpu clock. Sixteen voltage levels introduces some harmonic distortion. The first error harmonic is about a factor of 30 in amplitude below the fundamental and at 3 times the frequency. This is in line with Bennett for 4-bit signals. The matlab image shows the full and 16-level sampled sine waves on the left and their spectra on the right (code). Listening to the signal gives a sense of very high frequency spikes. Lowpass filtering with a time constant equal to about 1/(sample-rate) gets rid of most of the sampling noise..
    DDS DDS middle C
    distortion
  8. UART and serial communication
    -- The XC32 compiler libraries treat UART2 as standard-in and standard-out. Using the examples from (ref 1) and from
    C:\Program Files (x86)\Microchip\xc32\v1.31\examples\plib_examples
    I wrote a minimal UART interface example which can get individual characters, get strings, and use printf. I could not make scanf work, but getting a string and using sscanf is a workaround. The UART input/output is not routed by default. You must specify a peripherial pin select (PPS) option as described in
    http://people.ece.cornell.edu/land/courses/ece4760/PIC32/Microchip_stuff/2xx_datasheet.pdf

    Table 11-1 which gives input pin mapping, and Table 11.2 which gives output pin mapping. The minimal setup seems to be:
        // specify PPS group, signal, logical pin name
        PPSInput (2, U2RX, RPB11); //Assign U2RX to pin RPB11 -- Physical pin 22 on 28 PDIP
        PPSOutput(4, RPB10, U2TX); //Assign U2TX to pin RPB10 -- Physical pin 21 on 28 PDIP
        // init the uart2
        UARTConfigure(UART2, UART_ENABLE_PINS_TX_RX_ONLY);
        UARTSetLineControl(UART2, UART_DATA_SIZE_8_BITS | UART_PARITY_NONE | UART_STOP_BITS_1);
        UARTSetDataRate(UART2, PB_FREQ, BAUDRATE);
        UARTEnable(UART2, UART_ENABLE_FLAGS(UART_PERIPHERAL | UART_RX | UART_TX));
    -- All the setup functions are documented in the MPLAB-X help files under XC32-peripherial libraries. There is one helper function in the example, GetDataBuffer(), which buffers the input from the UART and echos the input, until an <enter> keystroke occurs, then zero-terminates the string and emits a CRLF to position the cursor on the next line. Notice that the function is blocking because it waits, possibly forever, in the while(!UARTReceivedDataIsAvailable(UART2)){}; for the user to type.
    -- The GetDataBuffer() function above is a little annoying because you cannot backspace over a mistake. Adding a backspace is easy (code) but you have to make sure that your terminal uses control-H (ascii code 0x08) as the backspace code. In PuTTY you have to right-click the title bar, choose Change Settings..., then choose the Terminal-Keyboard panel and choose the control-H backspace.
    -- The physical interface to the PC was a Sparkfun CP2102 USB-UART interface with the
    CP2102 TX pin hooked to MCU pin 22 (U2RX), the CP2102 RX pin hooked to MCU pin 21 (U2TX),
    and of course, the CP2102 ground pin hooked to MCU pin 27 (or pin 8, see ref 8-2).
    the connections
  9. DMA performance.
    PIC32 supports direct memory access from/to peripherials, flash memory and RAM. Code is based on examples from
    C:\Program Files (x86)\Microchip\xc32\v1.31\examples\plib_examples\dma
    -- The first image below shows a DMA burst on the top trace and a separate port pin being toggled in main on the bottom trace. The DMA burst is triggered by a timer interrupt, but the interrupt does not trigger an ISR, just the DMA. Individual transfers within the burst are not uniform in time and range from 10 MHz to 5.5 MHz. The code sets up the DMA to burst 16 entries from a table (in flash or RAM) to an i/o port once every 2.5 microseconds. If the burst length is set to one (one byte at a time) triggered by a timer, the fastest I could get the system to go is 3.7 MHz (270 nSec per transfer).
    -- The second image shows two DMA channels (code) activated by the same timer IRQ every 5 microSec. Both DMA channels have the same DMA priority and both are sending 16 bytes to an i/o port. The DMA controller seems to interleave 4-byte bursts from each DMA channel. Each byte within each 4-byte burst takes 100 nSec. The latency between one channel and the other is about 72-120 nSec (~3-4 cycles).
    -- The third image shows two DMA channels (code) activated by the same timer IRQ every 5 microSec. The DMA channels have the different DMA priorities and both are sending 16 bytes to an i/o port. The high priority channel sends, then the low priority channel. There is a 4 or 5 cycle latency between the bursts.
    DMA burst dual burstdifferent priority burst
  10. Interrupt performance.
    It is useful to know the minimum number of cycles to service an intrrupt. Overhead can include saving the state of the machine, reseting flags and restoring the state of the machine. Of course, you need to add in the actual processing you are doing in the ISR. A minimal timer ISR just toggles an i/o pin and returns. The compiler generates about 33 instructions to do this minimal ISR, but this number does not include hardware overhead. Actual execution suggests that the interrupt takes a little less than 50 cycles total for this minimal ISR (Three cycles are the actual pin toggle). This works out to be 156 kHz interrupt rate at the default 8 MHz system oscillator frequency and 780 kHz interrupt rate at 40 MHz system clock. This code has the clock set to 40 MHz, explicitly sets the peripheral bus divider to one, and documents the sections of the manual explaining the timer interface.

    Adding a bit-toggle in main results in the following image. The top trace is the ISR toggle, the bottom trace is the toggle in main. You can see that main stops executing about 700 nSec (about 28 cycles at 40 MHz) before the ISR toggle executes, then starts again about 450 nSec (about 18 cycles at 40 MHz) after the edge on the ISR trace. This gives some idea of how long it takes to get into and out of an ISR, but is only approximate (I would say +/- 6 cycles). The loop in main is running at 8.0 MHz per toggle (five instructions). Turning up the clock to 72 MHz (remember that the chip is rated at 40 MHz) gives a maximum interrupt frequency of 1414 KHz. Timer2 fails at a clock frequency of 76 MHz. Chip is warm to the touch at 72 MHz.
    isr and main
  11. Clock performance and setting the phased-lock-loop for maximum clock speed.
    This code is derived from Chapter 8 of Kibalo's book (see below) and modified to run at 40 MHz. Main was modified to loop and toggle an i/o pin as fast as possible at 5.71 MHz Using direct LATA access. This implies that the number of instructions in the main loop is 7 cycles long. Using the menu Window>Output>Disassembly Listing shows the assembler code generated.
    45:            while (1) {
    46:                  LATA =0x0001;          // set latch levels for PORTA
    9D00021C  3C02BF88   LUI V0, -16504         // Load upper immediate
    9D000220  24030001   ADDIU V1, ZERO, 1      // Integer unsigned add immediate
    9D000224  AC436030   SW V1, 24624(V0)       //Store Word Mem[Rs+offset] <= Rt
    48:                  LATA =0x0000;		  // set latch levels for PORTA
    9D000228  3C02BF88   LUI V0, -16504        // Load upper immediate
    9D00022C  AC406030   SW ZERO, 24624(V0)    // Store Word Mem[Rs+offset] <= Rt
    50:             }
    9D000230  0B400087   J 0x9D00021C          // jump back to 9D00021C
    9D000234  00000000   NOP

    Three instructions load the i/o address of the port and a one, then output the one to the address, two instructions clear the port pin by loading a zero, and two cycles are taken to jump back. The waveform stays high for two cycles (50 nSec, the time to clear the pin) as shown below.
    5.71 MHz waveform
    Using the ligher level commands
    mPORTASetBits(BIT_0);
    mPORTAClearBits(BIT_0);

    in the loop instead of setting LATA directly increases the cycle count by one and drops the frequency to 5 MHz, but is prefered style.
    The assembly code shows that set/clear are each three cycles. This implies that the time the pulse is high is 75 nSec.

    Changing two lines in the code
    #pragma config FPLLMUL = MUL_18 // PLL Multiplier (18x Multiplier)
    #pragma config FPLLODIV = DIV_1 // System PLL Output Clock Divider (PLL Divide by 1)

    will run the cpu at 72 MHz but that is out of specification. It may burn or peripherials may not work. Setting FPLLMUL = MUL_19 runs
    the cpu at its maximum frequency of 76 MHz. Setting the multilpiler to 20 fails.


References:

  1. Beginner's Guide to Programming the PIC32 Paperback by Thomas Kibalo
    and more info
  2. Programming 32-bit Microcontrollers in C: Exploring the PIC32 by Lucio Di Jasio
    and more info
  3. PIC32 Architecture and Programming by Johm Loomis Numb3rs
  4. UMass M5 PIC32 tutorials and specifically for the PIC32MX220
  5. Northwestern University mechatronics design wiki:
    1. code examples,
    2. benchmarking,
    3. Embedded programming on PIC32
  6. Tahmid's Blog
  7. chipKit
  8. DSP experiments and more and
  9. RTOS
    1. http://www.freertos.org/ and Microchip PIC32 FreeRTOS Reference Designs and MPLABX and ECE443 ref
    2. TNKernel
    3. ERIKA Enterprise
    4. Q-Kernel
    5. Proto-threads
  10. Microchip Docs
    1. MIPS-M4K Core
    2. 2xx_datasheet
    3. 32_bit peripherials library
    4. XC32 Compiler Users Guide
    5. microstickII pinout
    6. PIC32 reference manual
      and more from Northwestern University mechatronics design wiki, PIC32 page
    7. Microchip doc site on this page choose Documentation from the left column.
      The Reference Manual is particuarly useful
    8. USB Embedded Host Stack
    9. chipKIT (PIC32 arduino library)
    10. code examples (choose PIC32 in product family dropdown)
    11. code libraries (choose PIC32 in product family dropdown)
    12. application notes (choose PIC32 in Select a Product Family panel)
    13. Harmony for PIC32 -- docs --
    14. Microchip TCP/IP Stack Application Note