Need help debugging a Lasso HTTP streaming proxy

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Need help debugging a Lasso HTTP streaming proxy

Ari Najarian
Hi folks,

I've been using CouchDB as a CDN for a few projects, as it allows you to store binary attachments, and then provides an HTTP interface directly to these documents. In the past, I've used Apache as a proxy to avoid exposing the database directly to the web. So, the user requests files at:

  example.com/download/{docid}/{attachment}

which is then proxied to CouchDB's HTTP API:

  127.0.0.1/database/{docid}/{attachment}

The benefit to this approach is that CouchDB handles the load of serving large files very well. The drawback is that Lasso is completely unaware of this traffic.

A project I'm working on right now requires robust traffic logging and much more stringent authentication when downloading certain files. Since both my logging and authentication layers are handled in Lasso, I thought I'd try and write a proxy to do what Apache used to do, but which would allow me to run arbitrary code to log the request and determine whether to serve the file.

The following snippet shows what I was able to achieve, but it needs some work:

http://pastebin.com/0SMXsUHT

The way this code words is: it performs a HEAD request against the CouchDB file endpoint so it can grab the headers that CouchDB would have sent with its response, and then parses and sets the Lasso headers to mirror them exactly. Once headers are set, then I use curl inside a loop to write the body data to a bytes object, and periodically send a chunk of data once the bytes object exceeds fcgi_bodyChunkSize. (Hat-tip to Jolle for outlining this requirement with his export-csv chunking technique)

This approach is working successfully, mostly. I haven't found any problems with small to medium-sized Word, PDF, PowerPoint, image or even WMV documents. They all load progressively, instead of waiting for Lasso to load the entire file into memory and then serve it. However, so far both FLV and MP4 files (approx 50MB) don't stream, and I can't figure out why.

My leading theory is that Apache's configuration is somehow interfering with the content Lasso is streaming, because it might treat FLV and MP4 files uniquely. In that case, I don't know how to change Apache's configuration. Alternatively, maybe the byte data in these files is breaking curl, because I haven't configured my curl request correctly. Beyond that, I'm stumped, and could use a few pairs of eyes on this code to see if I'm missing something obvious.

To try this proxying technique on any URL, copy the code provided in the pastebin link above, and set #COUCHDB_URL to any url that serves a media file. You may need to change line 23 to cherry-pick the headers you want Lasso to mimic, based on what your given server sends back.

Any and all comments / suggestions are welcome and sorely needed. Thanks!

Ari.



#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Ari Najarian
One more thing I should mention (this muddies the waters a little bit):

When I was testing with a 50MB MP4 file, I ran curl -I against both the raw CouchDB URL, and against the Lasso proxy script. Both returned the correct headers, but only the CouchDB direct URL began streaming the video in a web browser. The proxy script loaded an empty video player.

Thinking this might be because Apache was meddling, I renamed the MP4 file to .txt and re-uploaded it to CouchDB so it would be served with a text/plain Content-Type. When I run curl -I against both the CouchDB URL and the proxy script URL, I get Content-Type: text/plain in the returned header. Content-Length headers are identical, too.

When I open the text links in Safari, they both show an empty video player, despite both headers being set to text/plain. I guess the browser itself is confused when it recieves MP4 data with an incorrect header?

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Jolle Carlestam-2
In reply to this post by Ari Najarian
A shot in the dark but what happens if you skip Apache and use Spitfire?

HDB
Jolle

Sent from a mobile device. Any anomalies is due to Autocorrect.

> 28 maj 2016 kl. 19:11 skrev Ari Najarian <[hidden email]>:
>
> My leading theory is that Apache's configuration is somehow interfering with the content Lasso is streaming, because it might treat FLV and MP4 files uniquely.


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Bil Corry-3
Makes me think that Lasso isn't reading it in progressively, but rather is
loading the entire file into memory at once.  For the smaller files that
work, is there pause before you start seeing delivery of the file?  That
would indicate Lasso is busy loading the entire file.


- Bil

On Sun, May 29, 2016 at 9:09 AM, Jolle Carlestam <[hidden email]>
wrote:

> A shot in the dark but what happens if you skip Apache and use Spitfire?
>
> HDB
> Jolle
>
> Sent from a mobile device. Any anomalies is due to Autocorrect.
>
> > 28 maj 2016 kl. 19:11 skrev Ari Najarian <[hidden email]>:
> >
> > My leading theory is that Apache's configuration is somehow interfering
> with the content Lasso is streaming, because it might treat FLV and MP4
> files uniquely.
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Jolle Carlestam-2
Easy to test. Add a
stdoutnl('looping')
After this
if( #data->last->isa(::bytes) )
#bodyBytes->append(#data->last
/if

HDB
Jolle

Sent from a mobile device. Any anomalies is due to Autocorrect.

> 29 maj 2016 kl. 10:27 skrev Bil Corry <[hidden email]>:
>
> Makes me think that Lasso isn't reading it in progressively, but rather is
> loading the entire file into memory at once.  For the smaller files that
> work, is there pause before you start seeing delivery of the file?  That
> would indicate Lasso is busy loading the entire file.
>
>
> - Bil
>
> On Sun, May 29, 2016 at 9:09 AM, Jolle Carlestam <[hidden email]>
> wrote:
>
>> A shot in the dark but what happens if you skip Apache and use Spitfire?
>>
>> HDB
>> Jolle
>>
>> Sent from a mobile device. Any anomalies is due to Autocorrect.
>>
>>> 28 maj 2016 kl. 19:11 skrev Ari Najarian <[hidden email]>:
>>>
>>> My leading theory is that Apache's configuration is somehow interfering
>> with the content Lasso is streaming, because it might treat FLV and MP4
>> files uniquely.
>>
>>
>> #############################################################
>>
>> This message is sent to you because you are subscribed to
>>  the mailing list Lasso [hidden email]
>> Official list archives available at http://www.lassotalk.com
>> To unsubscribe, E-mail to: <[hidden email]>
>> Send administrative queries to  <[hidden email]>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>  the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Ari Najarian
In reply to this post by Ari Najarian
I can confirm that it is indeed loading and serving progressively.
I added a 'sleep(10)' before line 56, and then curled my script from the command line.
Then I increased the sleep interval to 100, then 1000.
In both cases, there was a longer and longer pause between chunks being served.
The streaming code isn't the problem, as far as I can tell.
I just don't know why certain file types seem to be problematic.

Would it help if I set up a few public test CouchDB file URLs to test the script against?

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Ari Najarian
In reply to this post by Ari Najarian
@Jolle, regarding Spitfire, I must confess that I've never used it as a replacement for Apache. I'll look for some documentation on the Lasso website, but it seems to me that this would be a substantial rewrite based on how I currently structure my applications.

For example, I've never played with atBegin to do dynamic URLs. Instead, I rely on an .htaccess file with my rewrite rules in it to send most requests to index.lasso for further processing. Apache handles the serving of media files (images, js, css) automatically in my current configuration, too, and probably many other things I take for granted.

I'm not opposed to someday migrating away from Apache, but at the moment I'm fixated on trying to make this work with my current stack.




#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Jolle Carlestam-2
I was more thinking in lines of testing to see if it works using Spitfire at all. If it does, then go back to Apache and examine why it differs.
If it fails on Spitfire as well then maybe it’s Lasso you have to look deeper into.

HDB
Jolle


> 29 maj 2016 kl. 17:18 skrev Ari Najarian <[hidden email]>:
>
> @Jolle, regarding Spitfire, I must confess that I've never used it as a replacement for Apache. I'll look for some documentation on the Lasso website, but it seems to me that this would be a substantial rewrite based on how I currently structure my applications.


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Bil Corry-3
In reply to this post by Ari Najarian
Next up, I wonder if it starts serving the problematic file, then stops
after a certain amount of data has been sent, or if it doesn't serve
anything and fails right away?


- Bil

On Sun, May 29, 2016 at 5:04 PM, Ari Najarian <[hidden email]> wrote:

> I can confirm that it is indeed loading and serving progressively.
> I added a 'sleep(10)' before line 56, and then curled my script from the
> command line.
> Then I increased the sleep interval to 100, then 1000.
> In both cases, there was a longer and longer pause between chunks being
> served.
> The streaming code isn't the problem, as far as I can tell.
> I just don't know why certain file types seem to be problematic.
>
> Would it help if I set up a few public test CouchDB file URLs to test the
> script against?
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Ari Najarian
In reply to this post by Ari Najarian
I'm making some progress in trying to get to the bottom of this.

I had a suspicion that my web browser was sending different request headers when it detected an MP4 mime type in its initial request. This question on Stack Overflow confirms it:

http://stackoverflow.com/questions/3128906/mp4-plays-when-accessed-directly-but-not-when-read-through-php-on-ios

I started logging the requests against my script, and noticed that each time I load it in the web browser, I get two requests logged: one from the browser itself, which I expected, and another that includes (among other things) the following unique headers:

["ACCEPT-ENCODING", "identity"]
["RANGE", "bytes=0-1"]
["X-PLAYBACK-SESSION-ID", "F8CBFFAD-2799-4CDF-9789-A02AEE0A92DD"]

This confirms that the media player is trying to sniff for chunked encoding. But, since these headers aren't being forwarded to CouchDB, I'm proxying the response to an incorrect request.

It seems that the solution is to proxy the request headers from the client to the media server (CouchDB in this case), as well as the corresponding response from the media server.

This is what I'm trying to do now, but I'm struggling with setting the headers on my curl request.
I capture the client's headers like so:

local(send_headers) = array;
web_request->headers->foreachpair => {
  #send_headers->insert( #1->first->titlecase& = #1->second);
};

This gives me an array of pairs to pass to curl_easy_setopt, but since the function isn't documented thoroughly in the Lasso documentation, I'm fumbling around in the dark.

I currently have this, before entering the loop:

#err = curl_easy_setopt(#ctoken, curlopt_httpheader, #send_headers );

Which doesn't seem to be working. I also tried:

with h in #send_headers do => {
  #err = curl_easy_setopt(#ctoken, curlopt_httpheader, pair(#h->first, #h->second) );
};

I also tried a few other data types as the third parameter, to no avail. If anybody knows how to properly set several headers on a curl instance using curl_easy_setopt, please chime in.

I feel like I'm getting close to a solution here, but I still need help.
Thanks for all the feedback so far!



#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

stevepiercy
Side note: trailing ; on statements are not needed in Lasso 9.

See the documentation on curl, then map the Lasso 9 command to
the specific option.  Some of it is intuitive.
https://curl.haxx.se/docs/manpage.html

     -H, --header <header>

...looks like a good candidate.

I get the command correct on the command line first, referring
to the official curl docs, then fiddle with curl options in
Lasso 9.

--steve


On 5/29/16 at 4:45 PM, [hidden email] (Ari Najarian) pronounced:

>I'm making some progress in trying to get to the bottom of this.
>
>I had a suspicion that my web browser was sending different
>request headers when it detected an MP4 mime type in its
>initial request. This question on Stack Overflow confirms it:
>
>http://stackoverflow.com/questions/3128906/mp4-plays-when-accessed-directly-but-not-when-
>read-through-php-on-ios
>
>I started logging the requests against my script, and noticed
>that each time I load it in the web browser, I get two requests
>logged: one from the browser itself, which I expected, and
>another that includes (among other things) the following unique headers:
>
>["ACCEPT-ENCODING", "identity"]
>["RANGE", "bytes=0-1"]
>["X-PLAYBACK-SESSION-ID", "F8CBFFAD-2799-4CDF-9789-A02AEE0A92DD"]
>
>This confirms that the media player is trying to sniff for
>chunked encoding. But, since these headers aren't being
>forwarded to CouchDB, I'm proxying the response to an incorrect request.
>
>It seems that the solution is to proxy the request headers from
>the client to the media server (CouchDB in this case), as well
>as the corresponding response from the media server.
>
>This is what I'm trying to do now, but I'm struggling with
>setting the headers on my curl request.
>I capture the client's headers like so:
>
>local(send_headers) = array;
>web_request->headers->foreachpair => {
>#send_headers->insert( #1->first->titlecase& = #1->second);
>};
>
>This gives me an array of pairs to pass to curl_easy_setopt,
>but since the function isn't documented thoroughly in the Lasso
>documentation, I'm fumbling around in the dark.
>
>I currently have this, before entering the loop:
>
>#err = curl_easy_setopt(#ctoken, curlopt_httpheader, #send_headers );
>
>Which doesn't seem to be working. I also tried:
>
>with h in #send_headers do => {
>#err = curl_easy_setopt(#ctoken, curlopt_httpheader, pair(#h->first, #h->second) );
>};
>
>I also tried a few other data types as the third parameter, to
>no avail. If anybody knows how to properly set several headers
>on a curl instance using curl_easy_setopt, please chime in.
>
>I feel like I'm getting close to a solution here, but I still need help.
>Thanks for all the feedback so far!
>
>
>
>#############################################################
>
>This message is sent to you because you are subscribed to
>the mailing list Lasso [hidden email]
>Official list archives available at http://www.lassotalk.com
>To unsubscribe, E-mail to: <[hidden email]>
>Send administrative queries to  <[hidden email]>

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Steve Piercy              Website Builder              Soquel, CA
<[hidden email]>               <http://www.stevepiercy.com/>


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Need help debugging a Lasso HTTP streaming proxy

Ari Najarian
In reply to this post by Ari Najarian
I got this working, finally!

It turns out the problem was solved by passing the client headers to CouchDB and mirroring the response headers exactly. My code wasn't passing the Range header (or any header for that matter), nor was it recreating the correct status code for range queries (206 Partial Content).

What hamstrung me in the end was incorrect documentation on LassoGuide and LassoDocs regarding the curl methods. The documentation says that CURLOPT_HTTPHEADER needs to be set by passing in an array of pairs. This is WRONG: headers need to be set as strings of the form "Header: value". Also, all headers need to be set all-at-once, so subsequent calls to curl->set will overwrite whichever headers you had set previously.

I couldn't for the life of me figure out why manually adding "Range: bytes=0-1000" resulted in the correct 206 response from CouchDB, while iterating and successively "adding" headers via curl->set kept returning a 200 response.

For my initial HEAD request, I now have:

  #req->set( CURLOPT_HTTPHEADER, #send_headers->join("\n"))

I also set the status code:

  web_response->setStatus( #req->statuscode , #result->get(1)->split(" ")->get(3)->asString );

And finally, while calling curl_easy_setopt:

  curl_easy_setopt(#ctoken, CURLOPT_HTTPHEADER, #send_headers->join("\n") );

The documentation for curl and its related methods is sparse and, more importantly, incorrect. This isn't cool. I had to search LassoTalk for curlopt_httpheader and, by luck, find a sample code snippet where someone had set a header using a string value. Then, it took another stroke of luck to discover that you can't set multiple headers individually. This area of the documentation needs some serious care and attention.

Long story short, once I proxied client headers correctly to CouchDB, and proxied the response headers back to the client, my script worked. Thanks for being such great sounding boards!

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>