Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T864 - Separate log file for unparsed queries #295

Merged
merged 16 commits into from
Dec 27, 2018

Conversation

storojs72
Copy link
Contributor

No description provided.

}

// NewAcraCensor creates new censor object.
func NewAcraCensor() *AcraCensor {
acraCensor := &AcraCensor{}
acraCensor.logger = log.WithField("service", ServiceName)
acraCensor.ignoreParseError = false
acraCensor.parseErrorsLogPath = ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need such initialization because it's already initialized with empty string. Above assigning is redundant too

if err != nil {
return err
}
defer openedFile.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we doesn't use this file after creation we can close it here without defer. but better to store file descriptor as censor field to avoid opening every write operation and reduce count of syscalls (open/close)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When and where should I close it is such case when acraserver stops?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just here) openedFile.Close() instead defer openedFile.Close() because this file doesn't used anywhere after creation. As I understand here we just create file and check that we can do this and close it before next write.
But if we will store opened file in censor then we don't need .Close() here even with defer

if acraCensor.parseErrorsLogPath != "" {
err := acraCensor.saveQuery(rawQuery)
if err != nil {
acraCensor.logger.WithError(err).Errorf("An error occurred while saving unparsable query")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Errorln to append '\n' at end of log msg. Errorf useful to create msg with some format of string

err := acraCensor.saveQuery(rawQuery)
if err != nil {
acraCensor.logger.WithError(err).Errorf("An error occurred while saving unparsable query")
return err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if censor configured with acracensor.ignoreParseError then we should not stop processing on error related with unparsed query and enough log error and try to save without stopping processing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vixentael what you think about this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, if acracensor.ignoreParseError is true, then log error and move forward

defer openedFile.Close()

_, err = openedFile.WriteString(rawQuery + "\n")
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err enough imho :)

plus better to use opened file on loading configuration and use it to avoid open/close on every save

@@ -1,4 +1,5 @@
ignore_parse_error: false
parse_errors_log: unparsed_queries_log
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about uparsed_queries.log name? to add some common extension to avoid meaning some binary file without extension on linux


defer openedFile.Close()

_, err = openedFile.WriteString(rawQuery + "\n")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we will have same problem which we had with querycapture handler where we try to buffer data before dumping to file to avoid slow write on each query. here we will have same problem with situation when client will use not supported queries. plus we don't need repeated queries, only unique. maybe would be good to write some QueryWriter (and use in querycapture if we can to avoid duplicated code) which will do this logic with buffered write and timeouts and store unique queries in memory to avoid repeated queries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will write additional handler for this (with functionality like QueryCapture)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, it makes sense to allocate same functionality into some object lke QueryWriter and use it for both QueryCapture and here

}

// NewAcraCensor creates new censor object.
func NewAcraCensor() *AcraCensor {
acraCensor := &AcraCensor{}
acraCensor.logger = log.WithField("service", ServiceName)
acraCensor.ignoreParseError = false
acraCensor.parsedQueriesWriter = nil
acraCensor.unparsedQueriesWriter = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can skip it because its already nil

// read existing queries from file
err = handler.readAllQueriesFromFile()
if err != nil {
handler.logger.WithError(ErrCantReadQueriesFromFileError).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Can't create QueryWriter instance")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should log original error err using WithError

err = handler.readAllQueriesFromFile()
if err != nil {
handler.logger.WithError(ErrCantReadQueriesFromFileError).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Can't create QueryWriter instance")
openedFile.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should process error to by logging it

case <-handler.serializationTicker.C:
err := handler.dumpBufferedQueriesToFile(openedFile)
if err != nil {
handler.logger.WithError(ErrComplexSerializationError).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Can't create QueryWriter instance")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log original error

case <-signalBackgroundExit:
err := handler.finishAndCloseFile(openedFile)
if err != nil {
handler.logger.WithError(ErrComplexSerializationError).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Can't create QueryWriter instance")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

// DumpAllQueriesToFile writes stored queries into file.
func (queryWriter *QueryWriter) DumpAllQueriesToFile() error {
// open or create file, NO APPEND
f, err := os.OpenFile(queryWriter.filePath, os.O_TRUNC|os.O_RDWR|os.O_CREATE, 0600)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to close file in function which open it with defer

return nil
}

func (queryWriter *QueryWriter) appendQueries(queries []*QueryInfo, openedFile *os.File) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we don't use any os.File specific API then we can take io.Writer as param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use openedFile.Write(lines) in this function

return linesToAppend
}

func (queryWriter *QueryWriter) finishAndCloseFile(openedFile *os.File) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/pkg/io/#Closer enough here. interfaces better to use instead specific types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please, explain in more details

signalBackgroundExit := make(chan bool)

// create handler
handler := &QueryWriter{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should assign openedFile to the related writer which use it to not forget to close it on Release/Close. resource owner responsible for releasing resources. plus it allow to avoid extra OpenFile calls because file already opened

handler.serializationTicker = time.NewTicker(handler.serializationTimeout)

case <-signalBackgroundExit:
err := handler.finishAndCloseFile(openedFile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho better to call handler.Release here and rename finishAndCloseFile to Release, which should dump data and close file

@@ -110,3 +103,39 @@ func (acraCensor *AcraCensor) HandleQuery(rawQuery string) error {
acraCensor.logger.Infof("Allowed query: '%s'", queryWithHiddenValues)
return nil
}

// GetLoggingTimeout returns current timeout of censor's logging process
func (acraCensor *AcraCensor) GetLoggingTimeout() time.Duration {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we doesn't need such methods because they will not called anywhere and logging timeout we set once at first censor configurations with some constant and it will not changed during acra-server run. so I think we don't need extend AcraCensorInterface. anyway it's specific settings for handler, not censor at all.

}

// SetLoggingTimeout sets timeout of censor's logging process
func (acraCensor *AcraCensor) SetLoggingTimeout(duration time.Duration) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save as above


func saveQuery(writer *common.QueryWriter, query string) {
//skip already captured queries
for _, capturedQuery := range writer.Queries {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is logic of QueryWriter, not censor. better to move this logic to query writer and add some method like handler.WriteQuery(query string) where it will check is this query exists or not and create new QueryInfo and append or not

after moving we will not need writer.Queries as public field and change to private

@@ -36,4 +39,6 @@ type AcraCensorInterface interface {
AddHandler(handler QueryHandlerInterface)
RemoveHandler(handler QueryHandlerInterface)
ReleaseAll()
GetLoggingTimeout() time.Duration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho we don't need it

// WriteAll writes raw queries that need to be stored to internal file
func (storage *FileLogStorage) WriteAll(p []byte) error {
if storage.openedToAppend {
openedFile, err := os.OpenFile(storage.filePath, os.O_TRUNC|os.O_RDWR|os.O_CREATE, 0600)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use already opened file in our constructor. we didn't close it and didn't reuse. I prefer to reuse:

if err := storage.file.Truncate(0); err != nil {return err} // drop all data but leave file already opened and ready to write


// DumpQueries writes all queries into file.
func (queryWriter *QueryWriter) DumpQueries() error {
rawData := queryWriter.serializeQueries(queryWriter.Queries)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we need lock to avoid background append to Queries

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after serialization, we can unlock and avoid lock on writing to file because we have serialized copy of data

// GetAllInputQueries returns a list of non-masked RawQueries.
func (queryWriter *QueryWriter) GetAllInputQueries() []string {
var queries []string
for _, queryInfo := range queryWriter.Queries {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we need Rlock. we read shareable resource that may be changed during read. and go runtime will panic on loop through map that was changed at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain what is Rlock? I googled but only usual Mutex.lock found

// It will be written to file on Stop, Reset or Release.
// Expects redacted query
func (queryWriter *QueryWriter) MarkQueryAsForbidden(queryWithHiddenValues string) {
for index, queryInfo := range queryWriter.Queries {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too RLock

queryInfo := &QueryInfo{}
queryInfo.RawQuery = queryWithHiddenValues
queryInfo.IsForbidden = false
queryWriter.Queries = append(queryWriter.Queries, queryInfo)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock on write


func (acraCensor *AcraCensor) saveUnparsedQuery(query string) {
if acraCensor.unparsedQueriesWriter != nil {
saveQuery(acraCensor.unparsedQueriesWriter, query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho better to use unparsedQueriesWriter.WriteQuery(query) or move saveQuery to file with this handler

@@ -69,7 +69,7 @@ func NewFileQueryWriter(filePath string) (*QueryWriter, error) {
// read existing queries
err = writer.readStoredQueries()
if err != nil {
writer.logger.WithError(ErrCantReadQueriesFromFileError).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Can't create QueryWriter instance")
writer.logger.WithError(ErrCantReadQueriesFromStorageError).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Can't create QueryWriter instance")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should log origin error too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be logged from readStoredQueries function


// DumpQueries writes all queries into file.
func (queryWriter *QueryWriter) DumpQueries() error {
rawData := queryWriter.serializeQueries(queryWriter.Queries)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after serialization, we can unlock and avoid lock on writing to file because we have serialized copy of data

@@ -134,13 +127,13 @@ func (queryWriter *QueryWriter) Release() {
}

func (queryWriter *QueryWriter) reset() {
queryWriter.mutex.Lock()
defer queryWriter.mutex.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer good to use if we have a lot of points of exit to avoid multiple calls of Unlock. For example

func t(){
   l.Lock()
   _, err := func1()
   if err != nil{
      l.Unlock()
      return
   }
   _, err := func2()
  if err != nil{
      l.Unlock()
      return
   }
   _, err := func3()
  if err != nil{
      l.Unlock()
      return
   }
}

In such case to not forget unlock and call it once we should use defer l.Unlock and it simple
But defer has a small overhead. It's really small but it have. And when we have simple cases such this we can avoid it without any care that we can forget

}
storage.file = openedFile
storage.openedToAppend = false
if err := storage.file.Truncate(0); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks better)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I just didn't know that Truncate method exists

@@ -7,7 +7,6 @@ import (

// FileLogStorage is a file-based implementation of LogStorage interface
type FileLogStorage struct {
filePath string
file *os.File
openedToAppend bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need it more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it seems we didn't. I used it when reopening file with other modes set, but now Truncate allows to avoid it


// Release dumps all Captured queries to file and resets Captured queries list.
func (queryWriter *QueryWriter) Release() {
queryWriter.DumpQueries()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should handle error and log about it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DumpQueries logs errors inside

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then would be better to assign error to anonym variable to be compliant with linters (goland in last versions highlight such code with unhandled error message :) plus it's good to log twice. with one more log we add some context where this function was called (DumpQueries is public function that may be called in a lot of places in a future who will use this writer). Here we will have error on Release, not on serialization by timeout

t.Fatal(err)
}
}
time.Sleep(DefaultSerializationTimeout + 100*time.Millisecond)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in previous PR's we decreased time of tests twice just by decreasing granulation of timeouts. and there we can't it handle more deterministic because we can't control environment fully
but here in unit-tests we can. what about to move background goroutine in our constructor NewFileQueryWriter to some function like start() and rewrite like something:

func (writer *QueryWriter) start(){
  for {
			select {
			case <-writer.serializationTicker.C:
				err := writer.dumpBufferedQueries()
				if err != nil {
					writer.logger.WithError(err).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Can't dump buffered queries")
				}
				writer.serializationTicker.Stop()
				writer.serializationTicker = time.NewTicker(writer.serializationTimeout)
			case <-signalBackgroundExit:
				writer.serializationTicker.Stop()
				err := writer.logStorage.Close()
				if err != nil {
					writer.logger.WithError(err).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Error occurred on exit QueryWriter instance")
				}
			case <-signalShutdown:
				writer.serializationTicker.Stop()
				err := writer.logStorage.Close()
				if err != nil {
					writer.logger.WithError(err).WithField(logging.FieldKeyEventCode, logging.EventCodeErrorCensorIOError).Errorln("Error occurred on shutdown QueryWriter instance")
				}
				return
			default:
				//do nothing. This means that channel has no data to read yet
			}
		}
}

// NewFileQueryWriter creates QueryWriter instance
func NewFileQueryWriter(filePath string) (*QueryWriter, error) {
  writer := &QueryWriter{}
// initialization logic
  go writer.Start()
  return writer, nil
}

after that we can in our tests do:

writer, err := NewFileQueryWriter(tmpFile.Name())
for _, query := range testQueries {
		_, err = writer.RedactAndCheckQuery(query)
		if err != nil {
			t.Fatal(err)
		}
	}
writer.Release() // or writer.DumpQueries() or any other private method
go writer.start()  // here we restart our background goroutine 

plus we need this background goroutine only to test logic with background serialization and stop. in other tests (dumping/serialization/deserialization) we can test without it, just call direct methods with writer.dumpBlaBla, writer.deserializeBlaBla

Copy link
Collaborator

@Lagovas Lagovas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix last comments (tests should not pass gofmt check) and ok to merge

@@ -52,26 +52,24 @@ type QueryWriter struct {
// NewFileQueryWriter creates QueryWriter instance
func NewFileQueryWriter(filePath string) (*QueryWriter, error) {
// create writer
writer := &QueryWriter{}
writer := &QueryWriter{
queryIndex:0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure, but 90% that this file not formatted with gofmt )

}
return nil
})
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return err : )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's output value of internal function that passed as argument to WalkQueries

@storojs72 storojs72 merged commit 334ee25 into cossacklabs:master Dec 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants