-
File cache considerations
05/18/2023 at 19:36 • 0 commentsAs I return to this (now quite old) project and remember how it is structured, the hack of the attributes and MIME types now strikes me as adapted only for very low traffic. If any significant traffic happens, this would increase the number of system calls (open, read, close...) which is not desired. A sort of file cache, at least for the small ones, would somehow reduce the system's load, at least during the construction of the response header.
My first idea was to keep the last X file descriptors open, to at least save on the open() calls and related kernel operations. However this does not reduce the amount of calls because there will still be a seek() and a read(). Oh and there must be a cache lookup and update system as well... So a small cache area, with a dozen of 1KB bins, is required to save on the seek()s and read()s.
File access is far from being a bottleneck. Files get read when a new session starts and this is not timing-critical. However, slowly, this brings us closer to the original intended architecture where the server manages a small filesystem by itself, free from OS considerations and easily embedded in a tiny computer or large microcontroller.
-
Refactoring with aligned strings
05/18/2023 at 12:54 • 0 commentsThe server relies a lot on strings, particularly concatenations.To keep the code small and fast, it involves a lot of manual handling, which requires care to prevent bugs or worse. It is one of the weak/sore points of the code base, which inspired the development of #Aligned Strings format. Now that this small library is functional, it is time to use it for real !
I'm still very annoyed by the limitation of the C preprocessor that prevents me from further streamlining and optimising the declarations of flexible strings inside functions, as I can't #define or alias words from inside a macro. But in this project, it's not a significant roadblock, so no need to resort to m4 or other dirty tricks : the system works, not as efficiently as I would love, but it's still a good progress compared to the overly "micromanaged" strings of the existing version. Moreover, this refactoring is a great opportunity to put the aligned strings library to the test of real life and even enhance it.
-
A poll() problem...
01/06/2021 at 18:58 • 1 commentI got something unexpected while looking for easy/simple/light ways to probe if the server is up with a bash script.
> ./serv_simple === And now, browse to 127.0.0.1:60075 === Port: 60075, Keepalive: 10s Path: files Root page: index.html Warning: chroot() failure (are you root ?) : No such file or directory Server socket ready H_BLOCKED
On the script side:
root@pi:/home/pi# echo "GET /" | telnet 127.0.0.1 60075 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. Connection closed by foreign host.
and then:
* Connected to 127.0.0.1:39298 HTTaP-Session: 2l1cvf10 H_BLOCKING 0 received 7 --------------------------- GET / * got root trying to read file: index.html Extra Header: 25 bytes Content-Type: text/html sending 95 bytes header + 12387 bytes payload : 12482 1 poll() problem on client socket: No such file or directory
Meanwhile I get a kernel message :
[ 2038.025046] TCP: request_sock_TCP: Possible SYN flooding on port 60075. Sending cookies. Check SNMP counters.
OTOH when I change the URL to an invalid one, the server closes the connection itsef and it works.
root@pi:/home/pi# echo "GET /plop" | telnet 127.0.0.1 60075 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. Connection closed by foreign host.
I can thus check the server with this command:
( wget 127.0.0.1:60075/plop -O - 2>&1 | grep 404 ) && echo OK
Or better :
wget -q '127.0.0.1:60075/?' -O - | grep HTTaP
However the poll() failure behaviour is inappropriate and it is now fixed withHTTaP_src.20210106.tgz
-
72 !
05/16/2020 at 00:15 • 0 commentsI created a little test in the sandbox page...
This result is encouraging !
It means that the rate of ping-pong between the HTTaP server and the browser can reach 72 per second.
Of course I cheated :
- There was no real workload, I just requested /?ping and didn't even bother checking the reply.
- The test is on the same computer, on a multi-processor system, so there is no network latency at all.
But it's always good to check the higher bound, right ? It confirms that it's possible to create useful interactive systems despite the serialised link.
Note however some of the untold features :
- The test can run without interference from other clients, even in another tab.
- The test can run along with other operations (such as the PING button) thanks to the queueing semaphore.
Not bad...
As usual : check the latest version in the files section.
The rate drops to 25/s on a DSL line with a ping time of 20ms so this is coherent... -
And now, JavaScript !
05/09/2020 at 00:34 • 0 commentsThe latest revision is working, at least from the C side.
Now is the time to evolve the client side, with the creation of a JavaScript client side.
So far the keepalive mechanism is working well but there is just one issue : the page can be opened in two tabs of the same web browser...
I need to find a trick to prevent this !
20200511:
The root HTTaP object includes a new element : "HTTaP_open" is 0 when /? is first accessed, and 1 subsequently.
It is up to the client to detect this situation and avoid loading anything else to prevent disruption of the already established connection.
20200514:I have also solved the problem of dangling/open connections when the page is closed or reloaded !
-
v20200504
05/04/2020 at 01:02 • 0 commentsThe new version is here ! HTTaP_src.20200504.tgz
I start to remove the s(n)printf calls that look like more trouble than they're worth. Convenience vs safety, right ? I still have to convert some places to the new system that gives me more control, and less sources of potential errors (or injections).
I also implement the new /?ping definition to help the client manage its own timers.
I still have the plans in sight, to demonstrate the "flexible" program (to dynamically switch from polled to blocking mode) but this required some tweaking here and there... At least now I have a new API to expose to the user, if/when there is a suitable #define : the default is below.
// define our own HTTaP parser : #define HTTAP_PARSE my_HTTaP_parser int my_HTTaP_parser( char *request, // pointer to the request (receive buffer) int recv_len, // byte count of the request int ReqType, // 1 for GET, 2 for POST char *b, // pointer to the send buffer int *len) { // length of the send buffer return 0; }
You can now define your own keywords without touching the other files :-)
-
More bugs
05/03/2020 at 16:49 • 0 commentsThe last iteration of the rewrite uncovered a dirty, ugly bug that should remain secret, unless you dare to diff the recent versions. But now it's fixed and the system is more stable. I didn't even encounter the problem in real life but it sure would have been easy to understand because all system calls have explicit error messages.
TL;DR : don't use versions of the code before HTTaP_src.20200502.2.tgz !
Now I start to work on a second example code that switches dynamically between polling mode and blocking mode, under control of both the workload and the user. This implies giving orders in HTTaP form from the user interface, hence more JS and C code.The C code is not really developed on the HTTaP side, I have something ambitious in mind but I prefer to develop it later. But I need HTTaP now. So I have found a solution : split the HTTaP request parsing code, away from the server. This will be a separate file. And I can provide a preliminary/quickanddirty version for the iteration that I start now, through a HTTaP_parse() function.
-
version 2020/05/01
05/01/2020 at 20:59 • 0 commentsI uploaded a newer version that seems to solve some issues I discovered these last days. I found a condition that triggered a double close, for example, and a more worrying endless loop when the peer closes the socket before the server... POSIX sockets are really as painful as they seem !
I reworked some functions and it changed the behaviour of the browsers, I got the following effects :
Local/127.0.0.1:
- Firefox 57 desktop: works well. No extra socket is opened apparently now. Load score : 24/24 elements, favicon loaded.
- Chrome still has problems : load score = 17/23, 6 extra sockets opened close to the end, no favicon.
Remote (over the internet) :
- Firefox (desktop Linux as well as Android) : 51 extra connections, 19 elements loaded, no favicon...
- Chrome : 8 elements loaded, 17 extra sockets...
The local scores are those that matter because HTTaP is not intended to be used over the WWW (though of course it would be better if it could but then multi-threading would become necessary, and it's out of scope). Normally, HTTaP is used over a simple LAN.
The change is that serving the pages is done in priority, while checking the incoming connections has lost its precedence.
Another problem is when receiving zero-sized packets. TCP/IP seems to allow this, and it might help with sending keepalive packets. However the BSD standards indicate the closure of a socket by reading 0 bytes. It is no obvious/simple to verify this with a different method (I considered testing with poll() and checking for POLLOUT in the .revent...) so I have chosen to simply leave this case alone, and close the socket.
Due to interferences with Facebook's URL hijacks, I added a special HTTaP 400 error message with a link to the main page.
For example http://httap.org:60075/?invalid will return an error 400 with the following clear text message:
HTTaP key not found. main site
The above links work well with Chrome (to the extent that it displays the page). However Firefox will fire 10 sockets and send no data on the first one... and then that connection times out after all the others have been rejected. WTF ???
On the server side, there are two big things to code now :
- the HTTaP custom code / tree build system
- creating the example with both blocking and polling
The second part is partially written so it will be done first, but having the HTTaP keys build API would help too...
Then I'll start to write more JS to be included in HTML, to help with loading the extra external elements (a for-loop to sequentially load the images from reading the HTML source code)
-
New version : Firefox vs Chromium-browser
04/26/2020 at 19:15 • 0 commentsThe latest version is out and solves quite some problems !
However things are a bit weird when it is used by "modern browsers". I have found no problem with links or lynx but these are text-based browsers that display only one page and don't load external resources, so it gets one elements and the connection times out... Since they don't really support JavaScript, they are not considered, anyway.
Firefox
This one is quite good but even if it is the recommended one, there are still wrinkles here and there.
- The good: It loads the test HTML page completely and completes more than a dozen of GET, up to loading the favicon.ico at the end, all with one socket. And it keeps the favicon in cache, so that part of the protocol is ok.
- The bad: for the it sends maybe 3 parallel requests after the first successful one. It also seems to limit the refresh rate for PING and the likes (not more than 1 or 2 per second ?).
- The offense: I can't understand WHY it sends null-sized requests (which are so far welcome with a close) some time (a minute or less ?) after the page has finished loading. Is it a method to "keep alive" ? Fortunately it doesn't interfere with an open/working socket.
Conclusion: Apparently, Firefox is smart enough to see that if one connection/socket is slammed on its face, the pending requests can/should go to the other working socket(s). Simple HTTaP/? requests seem to work rather smoothly.
So at least there is something that works, even though some behaviours need more investigations.
Chrome/Chromium-browser
It can do simple things right but its tries so much to optimise things that it sometimes feels like a fanatic or a lunatic. Maybe it was really too focused on working with the websites of the Alphabet Group.
Let's start with an easy case : GET /?PING is fine, with the little detail that apparently, Cache-Control: max-age=200 is not understood. So WHAT is required by Chrome to keep that data in cache ? Anyway at least the function is performed (though at least one parallel connection is opened and slammed) but wait for the rest.
Let's try to load a web page with about a dozen of external links.
10 parallel sockets are opened and slammed, 5 resources (including index.html) are loaded, in a seemingly random order, probably because of the extra sockets that contain the requests. The main socket closes from timeout without getting the missing ten requests. What the... ???
Setting the HTTAP_TCP_BACKLOG #define to 0 or 1 makes the situation even worse, the page loads slooooowly and incompletely... Chrome won't get the clue !
There is still the possibility to limit the number of concurrent open sockets on a webserver but it requires a user manipulation on the browser and it's too specific and intrusive. Meanwhile HTTP1.1 doesn't seem to specify anything about explicitly limiting the number in the headers
And then, there is the question : WHY does Firefox get the clue but Chrome won't ? I suspect it's because they use different queuing algorithms but this does not help me so far.
Remediation : this problem seems to affect webpages while the HTTaP parts doesn't seem to be affected.
- Keep pages small with few external elements.
- inline elements (using "data" URI in base64 encoding)
- use JS to serialise and download extra resources
I hope that I can solve this problem soon anyway...
Edit :I just found why FF sends empty packets !!!
It happens when you hover the cursor over a local link...
Firefox speculates that you will click on it and prepares the connection ! -
A much better timing system
04/24/2020 at 05:38 • 0 commentsThe project has hit a wall when I found that more than one socket needed to be read, because browsers tend to get webpages from multiple connections.
TL;DR : I have re-designed the system around poll() and the timeouts and other peripheral details are now handled in a wrapper function.
The latest archive doesn't contain a working system but the foundations seem to work well. I still need to modify the server but the surrounding code is pretty good now, and the conversion will be easy.
The new code exposes a number of global variables and functions :
- timeout_counter (flag that signals the expiration of a timeslice in polling mode)
- abort_program (flag that is set when the program must quit)
- poll_HTTaP() (the HTTaP code to poll and serves files)
- init_HTTaP() (run before the main loop)
- HTTaP_mode_block() (signals that the host program has finished heavylifting)
- HTTaP_mode_poll() (signals that the host program has work to do and needs the CPU for himself)
Overall it looks like collaborative multitasking, with timeout management sprinkled all over the code.
The constraint is that each slice of work must be reasonably short (100ms ?) to keep the system responsive.
- When all the work is done, call HTTaP_mode_block() to go into blocking mode and save/yield CPU while waiting for the next command.
- When a new command is received that requests more work, call HTTaP_mode_poll()
Behind the scene, it's not trivial but it seems to work well. The functions are rather well layered and the timeouts seem to do their work.
I have so far provided a "dumb" application that reads orders from the command line and wastes time writing to the screen, it will soon include the real socket server instead.
The cool part is that the host program has very few constraints and it should work nicely with GHDL !